STINNER Victor added the comment:
> Will not this cause performance regression? When we hardly work with
> wchar_t-based API, it looks good to cache encoded value.
Yes, it will be slower. But I prefer slower code with a lower memory footprint.
On UNIX, I don't think that anyone will notice the difference.
My concern is that the cache is never released. If the conversion is only
needed once at startup, the memory will stay until Python exits. It's not
really efficient.
On Windows, conversion to wchar_t* is common because Python uses the Windows
wide character API ("W" API vs "A" ANSI code page API). For example, most
access to the filesystem use wchar_t* type.
On Python < 3.3, Python was compiled in narrow mode and so Unicode was already
using wchar_t* internally to store characters. Since Python 3.3, Python uses a
more compact representation. wchar_t* shares Unicode data only if
sizeof(wchar_t*) == KIND where KIND is 1, 2 or 4 bytes per character. Examples:
"\u20ac" on Windows (16 bits wchar_t) or "\U0010ffff" on Linux (32 bits
wchar_t) .
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue22324>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com