[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Serhiy, does this patch also fix #8271? No, this patch not change behavior. But updated patch for issue 8271 now contains this patch (I hope this will help merge). > If so, can you also include the tests I wrote for it? Your tests included in patch for is

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Roundup Robot
Roundup Robot added the comment: New changeset 3214c9ebcf5e by Mark Dickinson in branch 'default': Issue #14923: Optimize continuation-byte check in UTF-8 decoding. Patch by Serhiy Storchaka. http://hg.python.org/cpython/rev/3214c9ebcf5e -- ___ Pyt

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Mark Dickinson
Mark Dickinson added the comment: Patch applied. Closing. Ezio: the patch is pure optimization, with no change in semantics; I don't see how it could fix #8271. -- resolution: -> fixed status: open -> closed ___ Python tracker

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Ezio Melotti
Ezio Melotti added the comment: Serhiy, does this patch also fix #8271? If so, can you also include the tests I wrote for it? -- ___ Python tracker ___ _

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Mark Dickinson
Mark Dickinson added the comment: I'm happy to apply the 'decode_utf8_range_check.patch'; I'll do that unless there are objections. The code is clearer than the original, and if we get a speedup into the bargain then I don't see a reason not to apply this. I'm less comfortable with either t

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Mark Dickinson
Mark Dickinson added the comment: Okay, will look at this this afternoon. -- assignee: -> mark.dickinson ___ Python tracker ___ ___

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Antoine Pitrou
Antoine Pitrou added the comment: > Any chance to commit the patch before final feature freeze? I'll defer to Mark :-) -- ___ Python tracker ___ ___

[issue14923] Even faster UTF-8 decoding

2012-06-23 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Any chance to commit the patch before final feature freeze? -- ___ Python tracker ___ ___ Python-

[issue14923] Even faster UTF-8 decoding

2012-06-22 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Here is a patch that uses some sort of autodetection. -- Added file: http://bugs.python.org/file26098/decode_utf8_signed_byte-2.patch ___ Python tracker

[issue14923] Even faster UTF-8 decoding

2012-05-27 Thread Mark Dickinson
Mark Dickinson added the comment: > It seems the patch relies on a two's complement representation of > integers. Mark, do you think that's ok? (1) Relying on two's complement integers seems fine to me: we're already relying on it in other places in Python (e.g., bitwise operations for ints i

[issue14923] Even faster UTF-8 decoding

2012-05-27 Thread Antoine Pitrou
Antoine Pitrou added the comment: > However, if the continuation byte check to do the simplest way ((ch) >= > 0x80 && (ch) < 0xC0), this has the same effect (speed up to +45%) on > AMD Athlon. Doesn't produce any significant speedup on Intel Core i5-2500. --

[issue14923] Even faster UTF-8 decoding

2012-05-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Yes, this is an implementation-dependent behavior (and on the supported platforms it is implemented well in a certain way). However, if the continuation byte check to do the simplest way ((ch) >= 0x80 && (ch) < 0xC0), this has the same effect (speed up to +

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Martin v . Löwis
Martin v. Löwis added the comment: The C standard says, in 6.3.1.3/3 Otherwise [*], the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. [*]: the value cannot be exactly converted, and the t

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > It seems the patch relies on a two's complement representation of integers. > Mark, do you think that's ok? Yes, the patch depends on two facts -- 8-bit bytes and a two's complement representation of integers. That's why I call it a trick. However, today C

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- stage: commit review -> patch review ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscri

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Antoine Pitrou
Antoine Pitrou added the comment: It seems the patch relies on a two's complement representation of integers. Mark, do you think that's ok? -- stage: -> commit review ___ Python tracker _

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Antoine Pitrou
Antoine Pitrou added the comment: I see a slight increase under 64-bit Linux with gcc 4.5.2, too: vanilla patched utf-8 'A'*1 7857 (+4%)8210 utf-8 'A'*+'\x80' 5392 (+8%)5843 utf-8

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file25719/bench-diff.py ___ Python tracker ___ ___ Python-bugs-list mailing l

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : Added file: http://bugs.python.org/file25718/decodebench.py ___ Python tracker ___ ___ Python-bugs-list mailing

[issue14923] Even faster UTF-8 decoding

2012-05-26 Thread Serhiy Storchaka
New submission from Serhiy Storchaka : As strange as it may seem, but using a simple trick was made UTF-8 decoding even more speed up. Here are the benchmark results. On 32-bit Linux, AMD Athlon 64 X2: vanilla patched utf-8 'A'*1