[issue3617] Add MS EULA to the list of third-party licenses in the Windows installer

2008-09-13 Thread Neil Hodgson

Neil Hodgson <[EMAIL PROTECTED]> added the comment:

The recommended addition includes the 'excluded license' section which 
appears unnecessary as Python does not distribute any source code 
redistributables, only the .DLL file which is a binary executable. 
Including this is likely to confuse those who wish to use the GPL when 
distributing projects which include Python since the license is trying 
to limit their redistributing something they will not be able to find 
and so remove from Python.

--
nosy: +nyamatongwe

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3617>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6664] readlines should understand Line Separator and Paragraph Separator characters

2009-08-07 Thread Neil Hodgson

New submission from Neil Hodgson :

Unicode includes Line Separator U+2028 and Paragraph Separator U+2029
line ending characters. The readlines method of the file object returned
by the built-in open does not treat these characters as line ends
although the object returned by codecs.open(..., encoding='utf-8') does.

The attached program creates a UTF-8 file containing three lines with
the second line ended with a Paragraph Separator. The program then reads
this file back in as a text file. Only two lines are seen when reading
the file back in.

The desired behaviour is for the file to be read in as three lines.

--
components: IO
files: lineends.py
messages: 91397
nosy: nyamatongwe
severity: normal
status: open
title: readlines should understand Line Separator and Paragraph Separator 
characters
versions: Python 3.1
Added file: http://bugs.python.org/file14671/lineends.py

___
Python tracker 
<http://bugs.python.org/issue6664>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-02 Thread Neil Hodgson

New submission from Neil Hodgson:

On Windows, non-equal comparisons (<, <=, >, >=) between strings with common 
prefixes are slower in Python 3.3 than 3.2. This is for both 32-bit and 64-bit 
builds. Performance on Linux has not decreased for the same code. The attached 
program tests comparisons for strings that have common prefixes.

On a 64-bit build, a 25 character string comparison is around 30% slower and a 
100 character string averages 85% slower. A user of 32-bit Python builds 
reported the 25 character case to average 70% slower. 

Here are two runs of the program using 3.2/3.3 on Windows 7 on an i7 870:

>c:\python32\python -u "charwidth.py"
3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176
[0.7116295577956576, 0.7055591343157613, 0.7203483026429418]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176
[0.7664397841378787, 0.7199902325464409, 0.713719289812504]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176
[0.7341851791817691, 0.6994205901833599, 0.7106807593741005]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']180
[0.7346812372666784, 0.699543377914, 0.7064768417728411]

>c:\python33\python -u "charwidth.py"
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108
[0.9913326076446045, 0.9455845241056282, 0.9459076605341776]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192
[1.0472289217234318, 1.0362342484091207, 1.0197109728048384]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192
[1.0439643704533834, 0.9878581050301687, 0.9949265834034335]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']312
[1.0987483965446412, 1.0130257167690004, 1.024832248526499]

--
components: Unicode
files: charwidth.py
messages: 185824
nosy: Neil.Hodgson, ezio.melotti
priority: normal
severity: normal
status: open
title: String comparison performance regression
versions: Python 3.3
Added file: http://bugs.python.org/file29652/charwidth.py

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-02 Thread Neil Hodgson

Neil Hodgson added the comment:

The common cases are likely to be 1:1, 2:2, and 1:2. There is already a 
specialisation for 1:1. wmemcmp is widely available but is based on wchar_t so 
is for different widths on Windows and Unix. On Windows it would handle the 2:2 
case.

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-03 Thread Neil Hodgson

Neil Hodgson added the comment:

For 32-bits whether wchar_t is signed shouldn't matter as Unicode is only 
21-bits so no character will be seen as negative. On Windows, wchar_t is 
unsigned.

C11 has char16_t and char32_t which are both unsigned but it doesn't include 
comparison functions.

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-03 Thread Neil Hodgson

Neil Hodgson added the comment:

For 32-bit Windows, the code generated for unicode_compare is quite slow.

There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 
calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind 
checks out of the loop and specialize the loop but MSVC 2010 appears to not do 
so. The assembler (32-bit build) for each PyUnicode_READ looks like

movecx, DWORD PTR _kind1$[ebp]
cmpecx, 1
jneSHORT $LN17@unicode_co@2
leaecx, DWORD PTR [ebx+eax]
movzxedx, BYTE PTR [ecx+edx]
jmpSHORT $LN16@unicode_co@2
$LN17@unicode_co@2:
cmpecx, 2
jneSHORT $LN15@unicode_co@2
movzxedx, WORD PTR [ebx+edi]
jmpSHORT $LN16@unicode_co@2
$LN15@unicode_co@2:
movedx, DWORD PTR [ebx+esi]
$LN16@unicode_co@2:

   The kind1/kind2 variables aren't even going into registers and at least one 
test+branch and a jump are executed for every character. Two tests for 2 and 4 
byte kinds. len1 and len2 don't get to go into registers either.

   My system isn't set up for 64-bit MSVC 2010 but looking at the code from 
64-bit MSVC 2012 shows that all the variables have been moved into registers 
but the kind checking is still inside the loop. This accounts for better 
results with 64-bit Python 3.3 on Windows but isn't as good as Unix or Python 
3.2.

; 10431: c1 = PyUnicode_READ(kind1, data1, i);

cmp rsi, 1
jne SHORT $LN17@unicode_co
lea rax, QWORD PTR [r9+rcx]
movzx   r8d, BYTE PTR [rax+rbx]
jmp SHORT $LN16@unicode_co
$LN17@unicode_co:
cmp rsi, 2
jne SHORT $LN15@unicode_co
movzx   r8d, WORD PTR [r9+r11]
jmp SHORT $LN16@unicode_co
$LN15@unicode_co:
mov r8d, DWORD PTR [r9+r10]
$LN16@unicode_co:

   Attached the 32-bit assembler listing.

--
Added file: http://bugs.python.org/file29673/unicode_compare.asm

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-04 Thread Neil Hodgson

Neil Hodgson added the comment:

Looking at the assembler output from gcc 4.7 on Linux shows that it specialises 
the loop 9 times - once for each pair of kinds. This is why there was far less 
slow-down on Linux.

Explicitly writing out the 9 loops is inelegant and would make accurate 
maintenance more difficult. There may be some way to use the preprocessor to do 
this cleanly.

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-07 Thread Neil Hodgson

Neil Hodgson added the comment:

The patch fixes the performance regression on Windows. The 1:1 case is better 
than either 3.2.4 or 3.3.1 downloads from python.org. Other cases are close to 
3.2.4, losing at most around 2%. Measurements from 32-bit builds:

## Download 3.2.4
3.2.4 (default, Apr  6 2013, 20:07:44) [MSC v.1500 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']148
[0.9251519691803254, 0.9228673224604178, 0.9270485054253375]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']148
[0.9088621585998959, 0.916762355170341, 0.9102371386441703]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']148
[0.9071172334674893, 0.9079409638903551, 0.9188950414432817]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']152
[0.9154984634528134, 0.9211241439998155, 0.9235272150680487]

## Download 3.3.1
3.3.1 (v3.3.1:d9893d13c628, Apr  6 2013, 20:25:12) [MSC v.1600 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']84
[1.107935584141198, 1.080932736716823, 1.079060304542709]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
[1.2201494661996297, 1.238101814896, 1.217881936863404]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']156
[1.1195841384034795, 1.1172607155695182, 1.1198056163882537]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']276
[1.2389038306958007, 1.2207520679720822, 1.2370782093260395]

## Local build of 3.3.0 before patch
3.3.0 (default, Apr  8 2013, 14:06:26) [MSC v.1600 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']84
[1.0824058797164942, 1.0680695468818941, 1.0685949457606005]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
[1.2159871472901957, 1.2169558514728118, 1.209515728255596]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']156
[1.012521191492, 1.1091369450081352, 1.1049337539784823]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']276
[1.2080548119585544, 1.2094420187054578, 1.2138603997013906]

## Local build of 3.3.0 after patch
3.3.0 (default, Apr  8 2013, 14:23:45) [MSC v.1600 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']84
[0.8673423724763649, 0.8545937643117921, 0.8289229288053079]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
[0.9235338524209049, 0.9305998385376584, 0.9229137839304098]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']156
[0.891971842253179, 0.8971224280694345, 0.9036679059885344]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']276
[0.9310441918446486, 0.9431070566588904, 0.9355432690779342]

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-08 Thread Neil Hodgson

Neil Hodgson added the comment:

A quick rewrite showed the single level case slightly faster (1%) on average 
but its less readable/maintainable. Perhaps taking a systematic approach to 
naming would allow Py_UCS1 to be deduced from PyUnicode_1BYTE_KIND and so avoid 
repeating the information in the case selector and macro invocation.

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-08 Thread Neil Hodgson

Neil Hodgson added the comment:

Including the wmemcmp patch did not improve the times on MSC v.1600 32 bit - if 
anything, the performance was a little slower for the test I used:

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
specialised:
[0.9125948707773204, 0.8990815272107868, 0.9055365478250721]
wmemcmp:
[0.9287715478844594, 0.926606017373151, 0.9155132192031097]

Looking at the assembler, there is a real call to wmemcmp which adds some time 
and wmemcmp does not seem to be optimized compared to a simple loop.

However, the use of memcmp for 1:1 is a big win. Replacing the memcmp with 
COMPARE(Py_UCS1, Py_UCS1) shows memcmp is 45% faster on 100 character strings. 
memcmp doesn't generate a real call: instead there is an inline unrolled (4 
bytes per iteration) loop.

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17615] String comparison performance regression

2013-04-09 Thread Neil Hodgson

Neil Hodgson added the comment:

Windows is the only widely used OS that has a 16-bit wchar_t. I can't recall 
what OS/2 did but Python doesn't support OS/2 any more.

--

___
Python tracker 
<http://bugs.python.org/issue17615>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com