Greetings! Eli Zaretskii <[email protected]> writes:
>> From: Camm Maguire <[email protected]> >> Cc: [email protected], [email protected] >> Date: Wed, 29 Oct 2014 11:55:13 -0400 >> >> Does every string access in emacs proceed through the utf8 decoder? > > If you need to look at the character, yes. E.g., if you need some > property of the character, you need to index the appropriate table by > that character's codepoint. But in most operations that is not > needed. You just need to recognize several specific characters, like > the null character, the slash, etc., most of which are ASCII. > Do you allocate a fresh boxed character on each aref, or output an integer referring to a fixed ~2^22 sized table? Do you maintain such a table in core? >> >> A cached internal pointer storing the last referenced codepoint >> >> offset makes access essentially O(1). >> > >> > We indeed maintain a cache for byte-to-character and character-to-byte >> > conversions. >> >> How big is this cache? > > Its size is dynamic, and depends on how frequently the conversion is > needed in places that are far away. The cache stores byte-to-char > correspondence in places that are far away, and Emacs uses binary > search in between them. > How far is 'far away'? If you had this to do all over again, would you still opt for the multibyte? While you have buffers to consider too, which probably relate to strings, it seems to me that the dominant costs are always memory allocation/gc related, making the memory footprint important but not at the expense of allocating characters, and that the most frequent operations are removals/pattern substitutions, which can proceed bytewise with the same gc overhead. GCL also supports regular expressions -- how is this modified for utf-8? Take care, -- Camm Maguire [email protected] ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah _______________________________________________ Gcl-devel mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gcl-devel
