Camm Maguire writes: > Greetings! I've recently been considering supporting unicode in gcl by > representing strings internally in utf8. It appears that emacs does the > same or similar. Apart from the obvious memory footprint benefits,
If you need to *edit* large strings at arbitrary positions with high performance, the memory footprint benefits are reduced by the need to cache char position vs. memory position. If you're on a 64-bit architecture, those cache entries chew up memory 16 bytes at a time. I think Emacs does a much better job of handling the position cache than XEmacs does, so you're asking in the right place. Just be aware that it's possible to do it poorly. :-) > Yet setting string elements can trigger reallocations/memmove > operations. While these can be aggregated over the setting of > multiple elements, operations like nreverse look ridiculous if left > in terms of calls to aref and aset. How many of those operations are there, though? At worst, nreverse requires a few bytes of temporary storage to be implemented efficiently. If there are only a few of them, just implement them as primitives. Note that Python has chosen to use a "just big enough for the data" fixed-width representation, and AFAIK the Python-licensed code is GPL-compatible. http://legacy.python.org/dev/peps/pep-0393/ This strategy has the advantage that manipulating strings internally is always an array operation, so Python code can be efficient (enough); you don't need to reimplement such operations as primitives, and there are no gotchas for user code where the user code looks like it's operating on an array (efficient) but is actually moving large chunks of memory around all the time. _______________________________________________ Gcl-devel mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gcl-devel
