Re: [Gcl-devel] utf8 and emacs text/string multibyte representation

Camm Maguire Wed, 29 Oct 2014 08:56:27 -0700

Greetings, and thanks so much for the feedback!

Eli Zaretskii <[email protected]> writes:


>> From: Camm Maguire <[email protected]>
>> Date: Wed, 29 Oct 2014 10:04:58 -0400
>> 
> You have basically said it yourself: memory footprint vs
> addressability.  If you want to discuss this in more detail, I suggest
> to ask more specific questions about specific aspects that bother you.
>

I thought there would be a little more on the upside, say some benefit
from having the internal representation be the same as that used in many
external representations, at least on linux, and perhaps some algorithm
coalescing with straightforward byte-wise operations.  Does every string
access in emacs proceed through the utf8 decoder?

>> A cached internal pointer storing the last referenced codepoint
>> offset makes access essentially O(1).
>
> We indeed maintain a cache for byte-to-character and character-to-byte
> conversions.

How big is this cache?

>
>> Yet setting string elements can trigger reallocations/memmove
>> operations.
>
> Emacs, as every editor, needs to handle this efficiently anyway,
> because editing operations rarely leave the buffer size unchanged.  So
> Emacs uses a gap to minimize reallocations.
>

But no gap in strings, right (i.e. just buffers)?

>> While these can be aggregated over the setting of multiple elements,
>> operations like nreverse look ridiculous if left in terms of calls
>> to aref and aset.
>
> nreverse applied to a string is a rarity, IME.
>

This is the stuff I really need to get a handle on -- what are the
dominant string operations.

Take care,
-- 
Camm Maguire                                        [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

_______________________________________________
Gcl-devel mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gcl-devel

Re: [Gcl-devel] utf8 and emacs text/string multibyte representation

Reply via email to