I reached the midterms mostly as scheduled, and NFG is pretty much feature complete now. There's still some stuff to do here and there, but the 'big ticket' items are done. So, I have been looking at what needs to be done before the gsoc_nfg branch can be merged back into trunk. And that means giving a hard look at all of the places where I've cut corners and see if they can be made better. It's mostly minor stuff, like leaving out a cast, or not paying attention to const mismatches in a few places. Most of it is just a matter of code cleanup. Until you see the extra pointer I added to string headers.
The reason behind that pointer is that I needed a place to hang the grapheme table from. It's completely unused outside of the NFG Unicode encoding, and has caused a few problems in the code. A few string pointers aren't const anymore, because we have to adjust the 'extra' pointer if the grapheme table is reallocated. I had to add some hooks into the gc to properly dispose of the table when it collects a string header. The biggest problem, however, is memory usage. Every string header is now a pointer larger, and the pointer is completely unused for most cases. Parrot's trunk is already too memory hungry as it stands now, and I don't really feel comfortable merging back the NFG functionality until I can work that point out. Fortunately, I see a way out of this, and if I'm right, I can make parrot's string header's smaller than they are on trunk. How's that for an improvement?
The one caveat for this strategy is that it implies disabling parrot's COW, which sounds worse than it really is. For starters, ever since we moved to immutable strings we've been doing a lot less with COW than we used to, and I had conversations with bacek about some pathological behaviors where COW actually hurts us. The upside, if COW's not here anymore, is that we can do away with the bufstart/strstart distinction, and the header becomes smaller, saving memory and improving performance. We also save the space where we keep the buffer refcount (which sounds like a nice place to stash our grapheme table, doesn't it?) and maybe we can simplify the gc's handling of strings a bit.
Of course, as I mentioned before I'm not comfortable with merging before I can convince myself that the branch is a clear improvement over trunk, so there's quite a bit of benchmarking ahead.