Greetings! I've recently been considering supporting unicode in gcl by representing strings internally in utf8. It appears that emacs does the same or similar. Apart from the obvious memory footprint benefits, I'd like to ask what other advantages/disadvantages have been discovered. Much of the utf8 literature emphasizes that most algorithms can proceed conventionally in byte-wise fashion, including lexicographical ordering comparisons, given that almost all jobs are sequential, at least initially. A cached internal pointer storing the last referenced codepoint offset makes access essentially O(1). Yet setting string elements can trigger reallocations/memmove operations. While these can be aggregated over the setting of multiple elements, operations like nreverse look ridiculous if left in terms of calls to aref and aset.
Thoughts, advice and experiences most appreciated. Take care, -- Camm Maguire [email protected] ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Gcl-devel mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gcl-devel
