One of the features of NFG I've mentioned before is that it solves our problems with variable width characters without taking additional storage space over UCS-4, on top of which it's defined. The artifact that allows us to pull off that trick is called the 'grapheme table' and today, I'll try to explain how it works.
I've mentioned before that when converting a string into NFG we will dynamically create new codepoints for sequences of composing characters that do not map to a single Unicode code point. The information needed to turn that new codepoint back into a stream of valid unicode codepoints is stored in the grapheme table.
With one entry per created code point, and the relative rarity of graphemes that lack an Unicode code point, this table is expected to be quite small most of the time, you would have to be using some rather odd inputs for it to grow beyond a hundred entries. All of this assuming, of course, that you are not trying to break stuff intentionally.
A problematic assumption, isn't it? All it takes is a malicious input string and all of our NFG goodness goes poof! Fortunately there are a few measures that we can take to mitigate this problems. The easy solution to avoid global resource exhaustion by malicious parties is, in our case, to de-globalize the resource. If each string has it's own grapheme table, then the one malicious string can't impact the others, and we can still have all of our NFG goodness.
Sure, there's a cost to this approach, the string header gets one pointer larger, concatenation gets a bit more expensive since we need to merge the two grapheme tables and we have to generally pay more attention to our string operations when they involve negative codepoints, but it's not that big of a problem and we should have enough room to be clever if the need to optimize arises. A small price to pay for not getting your services denied, if you ask me.
I guess we'll find out soon enough, as today is the official GSoC 'start coding' date. Which means I'll start working on NFG proper, and we'll all have some nice grapheme tables here by next week. That's the plan anyway.