On Tue, 23 Oct 2018 at 00:50, Steve Dower <steve.do...@python.org> wrote: > > On 22Oct2018 1007, Serhiy Storchaka wrote: > > 22.10.18 16:24, Steve Dower пише: > >> Yes, that's true. But "should reduce ... footprint" is also an > >> optimisation that deserves a benchmark by that standard. Also, I'm > >> proposing keeping the 'kind' as UCS-2 when the string is created from > >> UCS-2 data that is likely to be used as UCS-2. We would not create the > >> UCS-1 version in this case, so it's not the same as prefilling the > >> cache, but it would cost a bit of memory in exchange for CPU. If > >> slicing and concatentation between matching kinds also preserved the > >> kind, a lot of path handling code could avoid back-and-forth conversions. > > > > Oh, I afraid this will complicate the whole code of unicodeobject.c (and > > several other files) a much and can introduce a lot of subtle bugs. > > > > For example, when you search a UCS2 string in a UCS1 string, the current > > code returns the result fast, because a UCS1 string can't contain codes > > > 0xff, and a UCS2 string should contain codes > 0xff. And there are > > many such assumptions. > > That doesn't change though, as we're only ever expanding the range. So > searching a UCS2 string in a UCS2 string that doesn't contain any actual > UCS2 characters is the only case that would be affected, and whether > that case occurs more than the UCS2->UCS1->UCS2 conversion case is > something we can measure (but I'd be surprised if substring searches > occur more frequently than OS conversions). > > Currently, unicode_compare_eq exits early when the kinds do not match, > and that would be a problem (but is also easily fixable). But other > string operations already handle mismatched kinds.
If you did allow for denormalised UCS-2 strings, you'd probably want some kind of flag on the instance to indicate that the real kind was 8-bit. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com