On 2012-10-01 12:38, Michael Meeks wrote:
We could do some magic there; of course - space is a bit of an issue - we already pointlessly bloat bazillions of ascii strings into UCS-2 (nominally UTF-16) representations and nail a ref-count and length on the beginning. If you turn on the lifecycle diagnostics in sal/rtl/source/strimp.hxx with the #ifdef and re-build sal, you can start to see the scale of the problem when you launch libreoffice ;-)

Changing subject because I'm changing the topic.

That was something I was thinking about the other day - given than the bulk of our strings are pure 7-bit ASCII, it might be a worthwhile optimisation to store a bit that says "this string is 7-bit ASCII", and then store the string as a sequence of bytes.

The latest Java VM does this trick internally - it pretends that String is stored with an array of 16-bit values, but actually it stores them as UTF-8.

Even in an app running in a language other than US-English, strings are used for so many internal things that >90% of the strings are 7-bit ASCII.


Disclaimer: http://www.peralex.com/disclaimer.html


_______________________________________________
LibreOffice mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to