Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Serhiy Storchaka

On 09.03.15 08:12, Ethan Furman wrote:

On 03/08/2015 11:07 PM, Serhiy Storchaka wrote:


If you don't call isinstance(x, int) (PyLong_Check* in C).

Most conversions from Python to C implicitly call __index__ or __int__, but 
unfortunately not all.


[snip examples]

Thanks, Serhiy, that's what I was looking for.


May be most if not all of these examples can be considered as bugs and 
slowly fixed, but we can't control third-party code.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Maciej Fijalkowski
Not all your examples are good.

* float(x) calls __float__ (not __int__)

* re.group requires __eq__ (and __hash__)

* I'm unsure about OSError

* the % thing at the very least works on pypy

On Mon, Mar 9, 2015 at 8:07 AM, Serhiy Storchaka  wrote:
> On 09.03.15 06:33, Ethan Furman wrote:
>>
>> I guess it could boil down to:  if IntEnum was not based on 'int', but
>> instead had the __int__ and __index__ methods
>> (plus all the other __xxx__ methods that int has), would it still be a
>> drop-in replacement for actual ints?  Even when
>> being used to talk to non-Python libs?
>
>
> If you don't call isinstance(x, int) (PyLong_Check* in C).
>
> Most conversions from Python to C implicitly call __index__ or __int__, but
> unfortunately not all.
>
 float(Thin(42))
> 42.0
 float(Wrap(42))
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: float() argument must be a string or a number, not 'Wrap'
>
 '%*s' % (Thin(5), 'x')
> 'x'
 '%*s' % (Wrap(5), 'x')
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: * wants int
>
 OSError(Thin(2), 'No such file or directory')
> FileNotFoundError(2, 'No such file or directory')
 OSError(Wrap(2), 'No such file or directory')
> OSError(<__main__.Wrap object at 0xb6fe81ac>, 'No such file or directory')
>
 re.match('(x)', 'x').group(Thin(1))
> 'x'
 re.match('(x)', 'x').group(Wrap(1))
> Traceback (most recent call last):
>   File "", line 1, in 
> IndexError: no such group
>
> And to be ideal drop-in replacement IntEnum should override such methods as
> __eq__ and __hash__ (so it could be used as mapping key). If all methods
> should be overridden to quack as int, why not take an int?
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Antoine Pitrou
On Mon, 9 Mar 2015 15:12:44 +1100
Steven D'Aprano  wrote:
> 
> My summary is as follows:
> 
> __int__ is used as the special method for int(), and it should coerce 
> the object to an integer. This may be lossy e.g. int(2.999) --> 2 or may 
> involve a conversion from a non-numeric type to integer e.g. int("2").

Your example is misleading. Strings don't have an __int__:

>>> s = "3"
>>> s.__int__()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'str' object has no attribute '__int__'

Only int-compatible or int-coercible types (e.g. float, Decimal) should
have an __int__ method.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Serhiy Storchaka

On 09.03.15 10:19, Maciej Fijalkowski wrote:

Not all your examples are good.

* float(x) calls __float__ (not __int__)

* re.group requires __eq__ (and __hash__)

* I'm unsure about OSError

* the % thing at the very least works on pypy


Yes, all these examples are implementation defined and can differ 
between CPython and PyPy. There is about a dozen of similar examples 
only in C part of CPython. Most of them have in common is that the 
behavior of the function depends on the argument type. For example in 
case of re.group an argument is either integer index or string group 
name. OSError constructor can produce OSError subtype if first argument 
is known integer errno. float either convert a number to float or parse 
a string (or bytes).


Python functions can be more lenient (if they allows ducktyping) or more 
strict (if they explicitly check the type). They rarely call __index__ 
or __int__.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-09 Thread Larry Hastings

On 03/07/2015 06:13 PM, Victor Stinner wrote:

Hi,

FYI I commited the implementation of os.scandir() written by Ben Hoyt.
I hope that it will be part of Python 3.5 alpha 2


It is.


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [RELEASED] Python 3.5.0a2 is now available

2015-03-09 Thread Larry Hastings



On behalf of the Python development community and the Python 3.5 release 
team, I'm thrilled to announce the availability of Python 3.5.0a2.   
Python 3.5.0a2 is the second alpha release of Python 3.5, which will be 
the next major release of Python.  Python 3.5 is still under heavy 
development, and is far from complete.


This is a preview release, and its use is not recommended for production 
settings.


Two important notes for Windows users about Python 3.5.0a2:

 * If you have previously installed Python 3.5.0a1, you must manually
   uninstall it before installing Python 3.5.0a2 (issue23612).
 * If installing Python 3.5.0a2 as a non-privileged user, you may need
   to escalate to administrator privileges to install an update to your
   C runtime libraries.


You can find Python 3.5.0a2 here:

   https://www.python.org/downloads/release/python-350a2/


Happy hacking,


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.5.0a2 is now available

2015-03-09 Thread Paul Moore
On 9 March 2015 at 09:34, Larry Hastings  wrote:
> On behalf of the Python development community and the Python 3.5 release
> team, I'm thrilled to announce the availability of Python 3.5.0a2.   Python
> 3.5.0a2 is the second alpha release of Python 3.5, which will be the next
> major release of Python.  Python 3.5 is still under heavy development, and
> is far from complete.

Hmm, I just tried to install the 64-bit "full installer" version, for
all users with the default options. This is on a PC that hasn't had
3.5 installed before, and doesn't have Visual Studio 2015 installed.
When it got to the step "precompiling standard library" I got an error
window pop up saying "python.exe - system error. The program can't
start because api-ms-win-crt-runtime-l1-1-0.dll is missing from your
computer. Try reinstalling the program to fix this problem." All there
was was an "OK" button. Pressing that told me "Setup was successful"
but then "py -3.5 -V" gives me nothing (no error, no version, just
returns me to the command prompt). Same result if I do "& 'C:\Program
Files\Python 3.5\python.exe' -V".

Python 3.5.0a2 (64-bit) is showing in my "Add/Remove Programs".

This is Windows 7, 64-bit.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.5.0a2 is now available

2015-03-09 Thread Paul Moore
Submitted as http://bugs.python.org/issue23619

On 9 March 2015 at 10:20, Paul Moore  wrote:
> On 9 March 2015 at 09:34, Larry Hastings  wrote:
>> On behalf of the Python development community and the Python 3.5 release
>> team, I'm thrilled to announce the availability of Python 3.5.0a2.   Python
>> 3.5.0a2 is the second alpha release of Python 3.5, which will be the next
>> major release of Python.  Python 3.5 is still under heavy development, and
>> is far from complete.
>
> Hmm, I just tried to install the 64-bit "full installer" version, for
> all users with the default options. This is on a PC that hasn't had
> 3.5 installed before, and doesn't have Visual Studio 2015 installed.
> When it got to the step "precompiling standard library" I got an error
> window pop up saying "python.exe - system error. The program can't
> start because api-ms-win-crt-runtime-l1-1-0.dll is missing from your
> computer. Try reinstalling the program to fix this problem." All there
> was was an "OK" button. Pressing that told me "Setup was successful"
> but then "py -3.5 -V" gives me nothing (no error, no version, just
> returns me to the command prompt). Same result if I do "& 'C:\Program
> Files\Python 3.5\python.exe' -V".
>
> Python 3.5.0a2 (64-bit) is showing in my "Add/Remove Programs".
>
> This is Windows 7, 64-bit.
>
> Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Paul Moore
I just thought I'd give installing Python 3.5 a go on my PC, now that
3.5a2 has come out. I didn't get very far (see earlier message) but it
prompted me to think about how I'd use it, and what habits I'd need to
change.

I'd suggest that the "what's new in 3.5" document probably needs a
section on the new installer that explains this stuff...

First of all, I always use "all users" installs, so I have Python in
"Program Files" now. As a result, doing "pip install foo" requires
elevation. As that's a PITA, I probably need to switch to using "pip
install --user". All that's fine, and from there "py -3.5" works fine,
as does "py -3.5 -m foo". But even if it is, not every entry point has
a corresponding "-m" invocation (pygments' pygmentize command doesn't
seem to, for example)

But suppose I want to put Python 3.5 on my PATH. The installer has an
"add Python to PATH" checkbox, but (if I'm reading the WiX source
right, I didn't select that on install) that doesn't add the user
directory. So I need to add that to my PATH. Is that right? And of
course, that means I need to *know* the user site directory
($env:LOCALAPPDATA\Python\Python35\Scripts), correct?

It feels to me like this might be a frustrating step backwards for
Windows users who have recently (with the arrival of ensurepip) got to
the point where they can just run Python with "Add to path" and then
simply do

pip install pygments
pygmentize --help

Maybe the answer is that we simply start recommending that everyone on
Windows uses per-user installs. It makes little difference to me
(beyond the fact that when I want to look at the source of something
in the stdlib, the location of the file is a lot harder to remember
than C:\Apps\Python34\Lib\whatever.py) but I doubt it's what most
people will expect.

I'm completely OK with the idea that we move to "Program Files" as a
default location. And I have no real issue with Steve's position that
the "Add to Path" option has issues that can't really be solved
because of the way Windows constructs the PATH. But I know there have
been a lot of people frustrated by the complicated instructions needed
to get something like pygmentize working, for whom getting pip in 2.7
and 3.4 was a major improvement. So I think we need *some*
documentation helping them deal with what could well seem like a step
backwards in Python 3.5...

If I knew what the best (recommended) answer was, I'd be happy to
write it up. But I'm not sure I do (after all, the target audience is
people for whom "Add C:\PythonXY\Scripts to your PATH" was a problem
in the pre-3.4 days).

Should I raise this as a bug against the 3.5 documentation? If so,
should it be a release blocker for the final release?

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-09 Thread Ben Hoyt
Hi Ryan,

> ./configure --with-pydebug && make -j7
>
> I then ran ./python.exe ~/Workspace/python/scandir/benchmark.py and I got:
>
> Creating tree at /Users/rstuart/Workspace/python/scandir/benchtree: depth=4, 
> num_dirs=5, num_files=50
> Using slower ctypes version of scandir
> Comparing against builtin version of os.walk()
> Priming the system's cache...
> Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, 
> repeat 1/3...
> Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, 
> repeat 2/3...
> Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, 
> repeat 3/3...
> os.walk took 0.184s, scandir.walk took 0.158s -- 1.2x as fast

Note that this benchmark is invalid for a couple of reasons. First,
you're compiling Python in debug mode (--with-pydebug), which produces
significantly slower code in my tests -- for example, on Windows
benchmark.py is about twice as slow when Python is compiled in debug
mode.

Second, as the output above shows, benchmark.py is "Using slower
ctypes version of scandir" and not a C version at all. If os.scandir()
is available, benchmark.py should use that, so there's something wrong
here -- maybe the patch didn't apply correctly or maybe you're testing
with a different version of Python than the one you built?

In any case, the easiest way to test it now is to download Python 3.5
alpha 2 which just came out:
https://www.python.org/downloads/release/python-350a2/

I just tried this on my Mac Mini (i5 2.3GHz, 2 GB RAM, HFS+ on
rotational drive) and got the following results:

Using Python 3.5's builtin os.scandir()
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on benchtree, repeat 1/3...
Benchmarking walks on benchtree, repeat 2/3...
Benchmarking walks on benchtree, repeat 3/3...
os.walk took 0.074s, scandir.walk took 0.016s -- 4.7x as fast

> I then did ./python.exe ~/Workspace/python/scandir/benchmark.py -s and got:

Also note that "benchmark.py -s" tests the system os.walk() against a
get_tree_size() function using scandir's DirEntry.stat().st_size,
which provides huge gains on Windows (because stat().st_size doesn't
require on OS call) but only modest gains on POSIX systems, which
still require an OS stat call to get the size (though not the file
type, so at least it's only one stat call). I get "2.2x as fast" on my
Mac for "benchmark.py -s".

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.5.0a2 is now available

2015-03-09 Thread Ben Hoyt
I'm seeing the same issue (though I also get the missing-DLL error dialog
when I run python.exe). -Ben

On Mon, Mar 9, 2015 at 6:20 AM, Paul Moore  wrote:

> On 9 March 2015 at 09:34, Larry Hastings  wrote:
> > On behalf of the Python development community and the Python 3.5 release
> > team, I'm thrilled to announce the availability of Python 3.5.0a2.
>  Python
> > 3.5.0a2 is the second alpha release of Python 3.5, which will be the next
> > major release of Python.  Python 3.5 is still under heavy development,
> and
> > is far from complete.
>
> Hmm, I just tried to install the 64-bit "full installer" version, for
> all users with the default options. This is on a PC that hasn't had
> 3.5 installed before, and doesn't have Visual Studio 2015 installed.
> When it got to the step "precompiling standard library" I got an error
> window pop up saying "python.exe - system error. The program can't
> start because api-ms-win-crt-runtime-l1-1-0.dll is missing from your
> computer. Try reinstalling the program to fix this problem." All there
> was was an "OK" button. Pressing that told me "Setup was successful"
> but then "py -3.5 -V" gives me nothing (no error, no version, just
> returns me to the command prompt). Same result if I do "& 'C:\Program
> Files\Python 3.5\python.exe' -V".
>
> Python 3.5.0a2 (64-bit) is showing in my "Add/Remove Programs".
>
> This is Windows 7, 64-bit.
>
> Paul
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASED] Python 3.5.0a2 is now available

2015-03-09 Thread Steve Dower
Thanks for finding this. I'm following up on the issue of anyone else is having 
the same issue.

As an aside, it'd be great to hear if it's worked for anyone at all :)

Cheers,
Steve

Top-posted from my Windows Phone

From: Ben Hoyt
Sent: ‎3/‎9/‎2015 5:29
To: Paul Moore
Cc: Python Dev
Subject: Re: [Python-Dev] [RELEASED] Python 3.5.0a2 is now available

I'm seeing the same issue (though I also get the missing-DLL error dialog when 
I run python.exe). -Ben

On Mon, Mar 9, 2015 at 6:20 AM, Paul Moore 
mailto:p.f.mo...@gmail.com>> wrote:
On 9 March 2015 at 09:34, Larry Hastings 
mailto:la...@hastings.org>> wrote:
> On behalf of the Python development community and the Python 3.5 release
> team, I'm thrilled to announce the availability of Python 3.5.0a2.   Python
> 3.5.0a2 is the second alpha release of Python 3.5, which will be the next
> major release of Python.  Python 3.5 is still under heavy development, and
> is far from complete.

Hmm, I just tried to install the 64-bit "full installer" version, for
all users with the default options. This is on a PC that hasn't had
3.5 installed before, and doesn't have Visual Studio 2015 installed.
When it got to the step "precompiling standard library" I got an error
window pop up saying "python.exe - system error. The program can't
start because api-ms-win-crt-runtime-l1-1-0.dll is missing from your
computer. Try reinstalling the program to fix this problem." All there
was was an "OK" button. Pressing that told me "Setup was successful"
but then "py -3.5 -V" gives me nothing (no error, no version, just
returns me to the command prompt). Same result if I do "& 'C:\Program
Files\Python 3.5\python.exe' -V".

Python 3.5.0a2 (64-bit) is showing in my "Add/Remove Programs".

This is Windows 7, 64-bit.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Nick Coghlan
On 9 March 2015 at 21:19, Paul Moore  wrote:
> Should I raise this as a bug against the 3.5 documentation? If so,
> should it be a release blocker for the final release?

I'm happy to let the folks that use Windows for development regularly
decide on the best answer from a user experience perspective, but I
think a release blocker docs issue would be a good way to go for
ensuring it gets resolved be the release goes out.

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Neil Girdhar
On Mon, Mar 9, 2015 at 2:07 AM, Serhiy Storchaka 
wrote:

> On 09.03.15 06:33, Ethan Furman wrote:
>
>> I guess it could boil down to:  if IntEnum was not based on 'int', but
>> instead had the __int__ and __index__ methods
>> (plus all the other __xxx__ methods that int has), would it still be a
>> drop-in replacement for actual ints?  Even when
>> being used to talk to non-Python libs?
>>
>
> If you don't call isinstance(x, int) (PyLong_Check* in C).
>
> Most conversions from Python to C implicitly call __index__ or __int__,
> but unfortunately not all.
>
> >>> float(Thin(42))
> 42.0
> >>> float(Wrap(42))
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: float() argument must be a string or a number, not 'Wrap'
>
> >>> '%*s' % (Thin(5), 'x')
> 'x'
> >>> '%*s' % (Wrap(5), 'x')
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: * wants int
>
> >>> OSError(Thin(2), 'No such file or directory')
> FileNotFoundError(2, 'No such file or directory')
> >>> OSError(Wrap(2), 'No such file or directory')
> OSError(<__main__.Wrap object at 0xb6fe81ac>, 'No such file or directory')
>
> >>> re.match('(x)', 'x').group(Thin(1))
> 'x'
> >>> re.match('(x)', 'x').group(Wrap(1))
> Traceback (most recent call last):
>   File "", line 1, in 
> IndexError: no such group
>
> And to be ideal drop-in replacement IntEnum should override such methods
> as __eq__ and __hash__ (so it could be used as mapping key). If all methods
> should be overridden to quack as int, why not take an int?
>
>
You're absolutely right that if *all the methods should be overrriden to
quack as int, then you should subclass int (the Liskov substitution
principle).  But all methods should not be overridden — mainly the methods
you overrode in your patch should be exposed.  Here is a list of methods on
int that should not be on IntFlags in my opinion (give or take a couple):

__abs__, __add__, __delattr__, __divmod__, __float__, __floor__,
__floordiv__, __index__, __lshift__, __mod__, __mul__, __pos__, __pow__,
__radd__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__, __rmul__,
__round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__,
__sub__, __truediv__, __trunc__, conjugate, denominator, imag, numerator,
real.

I don't think __index__ should be exposed either since are you really going
to slice a list using IntFlags?  Really?


>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> mistersheik%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Ryan Smith-Roberts
I suspect that you will find the Python community extremely conservative
about any changes to its sorting algorithm, given that it took thirteen
years and some really impressive automated verification software to find
this bug:

http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

On Sun, Mar 8, 2015 at 5:17 PM, nha pham  wrote:

> We can optimize the TimSort algorithm by optimizing its binary insertion
> sort.
>
>
>
> The current version of binary insertion sort use this idea:
>
> Use binary search to find a final position in sorted list for a new
> element X. Then insert X to that location.
>
>
>
> I suggest another idea:
>
> Use binary search to find a final postion in sorted list for a new element
> X. Before insert X to that location, compare X with its next element.
>
> For the next element, we already know if it is lower or bigger than X, so
> we can reduce the search area to the left side or on the right side of X in
> the sorted list.
>
>
>
> I have applied my idea on java.util. ComparableTimSort.sort() and testing.
> The execute time is reduced by 2%-6% with array of random integer.
>
>
>
> Here is detail about algorithm and testing:
> https://github.com/nhapq/Optimize_binary_insertion_sort
>
>
>
> Sincerely.
>
> phqnha
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rmsr%40lab.net
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Paul Moore
On 9 March 2015 at 13:45, Nick Coghlan  wrote:
> I'm happy to let the folks that use Windows for development regularly
> decide on the best answer from a user experience perspective, but I
> think a release blocker docs issue would be a good way to go for
> ensuring it gets resolved be the release goes out.

Done. http://bugs.python.org/issue23623
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Serhiy Storchaka
понеділок, 09-бер-2015 10:18:50 ви написали:
> On Mon, Mar 9, 2015 at 10:10 AM, Serhiy Storchaka  wrote:
> > понеділок, 09-бер-2015 09:52:01 ви написали:
> > > On Mon, Mar 9, 2015 at 2:07 AM, Serhiy Storchaka 
> > > > And to be ideal drop-in replacement IntEnum should override such methods
> > > > as __eq__ and __hash__ (so it could be used as mapping key). If all 
> > > > methods
> > > > should be overridden to quack as int, why not take an int?
> > > 
> > > You're absolutely right that if *all the methods should be overrriden to
> > > quack as int, then you should subclass int (the Liskov substitution
> > > principle).  But all methods should not be overridden — mainly the methods
> > > you overrode in your patch should be exposed.  Here is a list of methods 
> > > on
> > > int that should not be on IntFlags in my opinion (give or take a couple):
> > > 
> > > __abs__, __add__, __delattr__, __divmod__, __float__, __floor__,
> > > __floordiv__, __index__, __lshift__, __mod__, __mul__, __pos__, __pow__,
> > > __radd__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__, __rmul__,
> > > __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__,
> > > __sub__, __truediv__, __trunc__, conjugate, denominator, imag, numerator,
> > > real.
> > > 
> > > I don't think __index__ should be exposed either since are you really 
> > > going
> > > to slice a list using IntFlags?  Really?
> > 
> > Definitely __index__ should be exposed. __int__ is for lossy conversion to 
> > int
> > (as in float). __index__ is for lossless conversion.
> 
> Is it?  __index__ promises lossless conversion, but __index__ is *for*
> indexing.

I spite of its name it is for any lossless conversion.

> > __add__ should be exposed because some code can use + instead of | for
> > combining flags. But it shouldn't preserve the type, because this is not
> > recommended way.
> 
> I think it should be blocked because it can lead to all kinds of weird
> bugs.  If the flag is already set and you add it a copy, it silently spills
> over into other flags.  This is a mistake that a good interface prevents.

I think this is a case when backward compatibility has larger weight.

> > For the same reason I think __lshift__, __rshift__, __sub__,
> > __mul__, __divmod__, __floordiv__, __mod__, etc should be exposed too. So 
> > the
> > majority of the methods should be exposed, and there is a risk that we loss
> > something.
> 
> I totally disagree with all of those.
> 
> > For good compatibility with Python code IntFlags should expose also
> > __subclasscheck__ or __subclasshook__. And when we are at this point, why 
> > not
> > use int subclass?
> 
> Here's another reason.  What if someone wants to use an IntFlags object,
> but wants to use a fixed width type for storage, say numpy.int32?   Why
> shouldn't they be able to do that?  By using composition, you can easily
> provide such an option.

You can design abstract interface Flags that can be combined with int or other 
type. But why you want to use numpy.int32 as storage? This doesn't save much 
memory, because with composition the IntFlags class weighs more than int 
subclass.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Steve Dower
Paul Moore wrote:
> I just thought I'd give installing Python 3.5 a go on my PC, now that
> 3.5a2 has come out. I didn't get very far (see earlier message) but it 
> prompted
> me to think about how I'd use it, and what habits I'd need to change.
> 
> I'd suggest that the "what's new in 3.5" document probably needs a section on
> the new installer that explains this stuff...

This is true. Right now I'm in experimentation mode, and being more aggressive 
about changing things than is probably a good idea (because it solicits 
feedback like this :) ). When things settle down I expect to end up closer to 
where we started, so there's not a huge amount of value in writing it all up 
right now. I'll get there.

> First of all, I always use "all users" installs, so I have Python in "Program
> Files" now. As a result, doing "pip install foo" requires elevation. As 
> that's a
> PITA, I probably need to switch to using "pip install --user". All that's 
> fine,
> and from there "py -3.5" works fine, as does "py -3.5 -m foo". But even if it
> is, not every entry point has a corresponding "-m" invocation (pygments'
> pygmentize command doesn't seem to, for example)

I know you're already involved in this Paul, but for everyone else there's a 
big discussion going on at https://github.com/pypa/pip/issues/1668 about 
changing pip's default behaviour to handle falling back to --user automatically.

> But suppose I want to put Python 3.5 on my PATH. The installer has an "add
> Python to PATH" checkbox, but (if I'm reading the WiX source right, I didn't
> select that on install) that doesn't add the user directory. So I need to add
> that to my PATH. Is that right? And of course, that means I need to *know* the
> user site directory ($env:LOCALAPPDATA\Python\Python35\Scripts), correct?

Correct. There's no way to add a per-user directory to PATH from an all-users 
installation (except for a few approaches that I expect/hope would trigger 
malware detectors...)

> Maybe the answer is that we simply start recommending that everyone on Windows
> uses per-user installs. It makes little difference to me (beyond the fact that
> when I want to look at the source of something in the stdlib, the location of
> the file is a lot harder to remember than C:\Apps\Python34\Lib\whatever.py) 
> but
> I doubt it's what most people will expect.

I'm okay with this. Installing for all users is really something that could be 
considered an advanced option rather than the default, especially since the aim 
(AIUI) of the all-users install is to pretend that Python was shipped with the 
OS. (I'd kind of like to take that further by splitting things more sensibly 
between Program Files, Common Files and System32, but there's very little gain 
from that and much MUCH pain as long as people are still expecting C:\PythonXY 
installs...)

Cheers,
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Steven D'Aprano
On Mon, Mar 09, 2015 at 09:52:01AM -0400, Neil Girdhar wrote:

> Here is a list of methods on
> int that should not be on IntFlags in my opinion (give or take a couple):
> 
> __abs__, __add__, __delattr__, __divmod__, __float__, __floor__,
> __floordiv__, __index__, __lshift__, __mod__, __mul__, __pos__, __pow__,
> __radd__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__, __rmul__,
> __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__,
> __sub__, __truediv__, __trunc__, conjugate, denominator, imag, numerator,
> real.
> 
> I don't think __index__ should be exposed either since are you really going
> to slice a list using IntFlags?  Really?

In what way is this an *Int*Flags object if it is nothing like an int? 
It sounds like what you want is a bunch of Enum inside a set with a custom 
__str__, not IntFlags.


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Neil Girdhar
On Mon, Mar 9, 2015 at 11:46 AM, Steven D'Aprano 
wrote:

> On Mon, Mar 09, 2015 at 09:52:01AM -0400, Neil Girdhar wrote:
>
> > Here is a list of methods on
> > int that should not be on IntFlags in my opinion (give or take a couple):
> >
> > __abs__, __add__, __delattr__, __divmod__, __float__, __floor__,
> > __floordiv__, __index__, __lshift__, __mod__, __mul__, __pos__, __pow__,
> > __radd__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__, __rmul__,
> > __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__,
> > __sub__, __truediv__, __trunc__, conjugate, denominator, imag, numerator,
> > real.
> >
> > I don't think __index__ should be exposed either since are you really
> going
> > to slice a list using IntFlags?  Really?
>
> In what way is this an *Int*Flags object if it is nothing like an int?
> It sounds like what you want is a bunch of Enum inside a set with a custom
> __str__, not IntFlags.
>
>
It doesn't matter what you call it.  I believe the goal of this is to have
a flags object with flags operations and pretty-printing.  It makes more
sense to me to decide the interface and then the implementation.

>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mistersheik%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Steven D'Aprano
On Sun, Mar 08, 2015 at 10:57:30PM -0700, Ryan Smith-Roberts wrote:
> I suspect that you will find the Python community extremely conservative
> about any changes to its sorting algorithm, given that it took thirteen
> years and some really impressive automated verification software to find
> this bug:

On the other hand, the only person who really needs to be convinced is 
Tim Peters. It's really not up to the Python community.

The bug tracker is the right place for discussing this.

-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Neil Girdhar
On Mon, Mar 9, 2015 at 11:11 AM, Serhiy Storchaka 
wrote:

> понеділок, 09-бер-2015 10:18:50 ви написали:
> > On Mon, Mar 9, 2015 at 10:10 AM, Serhiy Storchaka 
> wrote:
> > > понеділок, 09-бер-2015 09:52:01 ви написали:
> > > > On Mon, Mar 9, 2015 at 2:07 AM, Serhiy Storchaka <
> storch...@gmail.com>
> > > > > And to be ideal drop-in replacement IntEnum should override such
> methods
> > > > > as __eq__ and __hash__ (so it could be used as mapping key). If
> all methods
> > > > > should be overridden to quack as int, why not take an int?
> > > >
> > > > You're absolutely right that if *all the methods should be
> overrriden to
> > > > quack as int, then you should subclass int (the Liskov substitution
> > > > principle).  But all methods should not be overridden — mainly the
> methods
> > > > you overrode in your patch should be exposed.  Here is a list of
> methods on
> > > > int that should not be on IntFlags in my opinion (give or take a
> couple):
> > > >
> > > > __abs__, __add__, __delattr__, __divmod__, __float__, __floor__,
> > > > __floordiv__, __index__, __lshift__, __mod__, __mul__, __pos__,
> __pow__,
> > > > __radd__, __rdivmod__, __rfloordiv__, __rlshift__, __rmod__,
> __rmul__,
> > > > __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__,
> > > > __sub__, __truediv__, __trunc__, conjugate, denominator, imag,
> numerator,
> > > > real.
> > > >
> > > > I don't think __index__ should be exposed either since are you
> really going
> > > > to slice a list using IntFlags?  Really?
> > >
> > > Definitely __index__ should be exposed. __int__ is for lossy
> conversion to int
> > > (as in float). __index__ is for lossless conversion.
> >
> > Is it?  __index__ promises lossless conversion, but __index__ is *for*
> > indexing.
>
> I spite of its name it is for any lossless conversion.
>

You're right.

>
> > > __add__ should be exposed because some code can use + instead of | for
> > > combining flags. But it shouldn't preserve the type, because this is
> not
> > > recommended way.
> >
> > I think it should be blocked because it can lead to all kinds of weird
> > bugs.  If the flag is already set and you add it a copy, it silently
> spills
> > over into other flags.  This is a mistake that a good interface prevents.
>
> I think this is a case when backward compatibility has larger weight.
>
>
So you agree that the ideal solution is composition, but you prefer
inheritance in order to not break code?  Then,I think the big question is
how much code would actually break if you presented the ideal interface.  I
imagine that 99% of the code using flags only uses __or__ to compose and
__and__, __invert__ to erase flags.


> > > For the same reason I think __lshift__, __rshift__, __sub__,
> > > __mul__, __divmod__, __floordiv__, __mod__, etc should be exposed too.
> So the
> > > majority of the methods should be exposed, and there is a risk that we
> loss
> > > something.
> >
> > I totally disagree with all of those.
> >
> > > For good compatibility with Python code IntFlags should expose also
> > > __subclasscheck__ or __subclasshook__. And when we are at this point,
> why not
> > > use int subclass?
> >
> > Here's another reason.  What if someone wants to use an IntFlags object,
> > but wants to use a fixed width type for storage, say numpy.int32?   Why
> > shouldn't they be able to do that?  By using composition, you can easily
> > provide such an option.
>
> You can design abstract interface Flags that can be combined with int or
> other type. But why you want to use numpy.int32 as storage? This doesn't
> save much memory, because with composition the IntFlags class weighs more
> than int subclass.
>
>
Maybe you're storing a bunch of flags in a numpy array having dtype
np.int32?  It's contrived, I agree.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Skip Montanaro
On Mon, Mar 9, 2015 at 10:53 AM, Steven D'Aprano  wrote:
> On Sun, Mar 08, 2015 at 10:57:30PM -0700, Ryan Smith-Roberts wrote:
>> I suspect that you will find the Python community extremely conservative
>> about any changes to its sorting algorithm, given that it took thirteen
>> years and some really impressive automated verification software to find
>> this bug:
>
> On the other hand, the only person who really needs to be convinced is
> Tim Peters. It's really not up to the Python community.

Also, there's no sense discouraging people who have ideas to
contribute. So what if nha pham's contribution isn't accepted? You
never know when the next Tim Peters, Georg Brandl, Mark Dickinson, or
Brett Cannon will turn up. (That list is practically endless. Don't
feel slighted if I failed to mention you.) With seven billion people
out there, a few of them are bound to be pretty smart.

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Donald Stufft

> On Mar 9, 2015, at 11:37 AM, Steve Dower  wrote:
> 
> Paul Moore wrote:
>> I just thought I'd give installing Python 3.5 a go on my PC, now that
>> 3.5a2 has come out. I didn't get very far (see earlier message) but it 
>> prompted
>> me to think about how I'd use it, and what habits I'd need to change.
>> 
>> I'd suggest that the "what's new in 3.5" document probably needs a section on
>> the new installer that explains this stuff...
> 
> This is true. Right now I'm in experimentation mode, and being more 
> aggressive about changing things than is probably a good idea (because it 
> solicits feedback like this :) ). When things settle down I expect to end up 
> closer to where we started, so there's not a huge amount of value in writing 
> it all up right now. I'll get there.
> 
>> First of all, I always use "all users" installs, so I have Python in "Program
>> Files" now. As a result, doing "pip install foo" requires elevation. As 
>> that's a
>> PITA, I probably need to switch to using "pip install --user". All that's 
>> fine,
>> and from there "py -3.5" works fine, as does "py -3.5 -m foo". But even if it
>> is, not every entry point has a corresponding "-m" invocation (pygments'
>> pygmentize command doesn't seem to, for example)
> 
> I know you're already involved in this Paul, but for everyone else there's a 
> big discussion going on at https://github.com/pypa/pip/issues/1668 about 
> changing pip's default behaviour to handle falling back to --user 
> automatically.
> 
>> But suppose I want to put Python 3.5 on my PATH. The installer has an "add
>> Python to PATH" checkbox, but (if I'm reading the WiX source right, I didn't
>> select that on install) that doesn't add the user directory. So I need to add
>> that to my PATH. Is that right? And of course, that means I need to *know* 
>> the
>> user site directory ($env:LOCALAPPDATA\Python\Python35\Scripts), correct?
> 
> Correct. There's no way to add a per-user directory to PATH from an all-users 
> installation (except for a few approaches that I expect/hope would trigger 
> malware detectors...)
> 
>> Maybe the answer is that we simply start recommending that everyone on 
>> Windows
>> uses per-user installs. It makes little difference to me (beyond the fact 
>> that
>> when I want to look at the source of something in the stdlib, the location of
>> the file is a lot harder to remember than C:\Apps\Python34\Lib\whatever.py) 
>> but
>> I doubt it's what most people will expect.
> 
> I'm okay with this. Installing for all users is really something that could 
> be considered an advanced option rather than the default, especially since 
> the aim (AIUI) of the all-users install is to pretend that Python was shipped 
> with the OS. (I'd kind of like to take that further by splitting things more 
> sensibly between Program Files, Common Files and System32, but there's very 
> little gain from that and much MUCH pain as long as people are still 
> expecting C:\PythonXY installs…)

Maybe the answer is to write up a PEP and standardize the idea of entry points, 
specifically the console_scripts and ui_scripts (or whatever it’s called) 
entrypoints and then give Python something like -m, but which executes a 
specific entry point name instead of a module name (or maybe -m can fall back 
to looking at entry points? I don’t know).

I’ve given this like… 30s worth of thought, but maybe:

pip install pygmentize  # Implicit —user
py -e pygmetize

Is an OK UX for people to have without needing to add the user site bin 
directory to their PATH. Maybe it’s a horrible idea and we should all forget I 
mentioned it :)

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Paul Moore
On 9 March 2015 at 16:35, Donald Stufft  wrote:
>> I'm okay with this. Installing for all users is really something that could 
>> be considered an advanced option rather than the default, especially since 
>> the aim (AIUI) of the all-users install is to pretend that Python was 
>> shipped with the OS. (I'd kind of like to take that further by splitting 
>> things more sensibly between Program Files, Common Files and System32, but 
>> there's very little gain from that and much MUCH pain as long as people are 
>> still expecting C:\PythonXY installs…)
>
> Maybe the answer is to write up a PEP and standardize the idea of entry 
> points, specifically the console_scripts and ui_scripts (or whatever it’s 
> called) entrypoints and then give Python something like -m, but which 
> executes a specific entry point name instead of a module name (or maybe -m 
> can fall back to looking at entry points? I don’t know).
>
> I’ve given this like… 30s worth of thought, but maybe:
>
> pip install pygmentize  # Implicit —user
> py -e pygmetize
>
> Is an OK UX for people to have without needing to add the user site bin 
> directory to their PATH. Maybe it’s a horrible idea and we should all forget 
> I mentioned it :)

That would be good. You can do it now using setuptools' entry point
API, so making it part of the core is certainly not impossible. But is
it practical on the 3.5 release timescales? It's worth taking into
account the fact that pygmentize is exposed by pygments, so there's no
way to deduce the package from the entry point name. Also, even if a
core feature appears, nobody will be using it for ages.

On 9 March 2015 at 15:37, Steve Dower  wrote:
>> Maybe the answer is that we simply start recommending that everyone on 
>> Windows
>> uses per-user installs. It makes little difference to me (beyond the fact 
>> that
>> when I want to look at the source of something in the stdlib, the location of
>> the file is a lot harder to remember than C:\Apps\Python34\Lib\whatever.py) 
>> but
>> I doubt it's what most people will expect.
>
> I'm okay with this. Installing for all users is really something that could 
> be considered an advanced option rather than the default, especially since 
> the aim (AIUI) of the all-users install is to pretend that Python was shipped 
> with the OS. (I'd kind of like to take that further by splitting things more 
> sensibly between Program Files, Common Files and System32, but there's very 
> little gain from that and much MUCH pain as long as people are still 
> expecting C:\PythonXY installs...)

Just to be clear, I'm *not* okay with this in the form I proposed it.
I think we're a long way yet from a clean and understandable proposal
for having user installs be as usable as system installs. It's worth
noting that (as far as I know) users don't typically use user installs
even on Unix, where the issue of the system install being read-only
has always been the norm. To me, that implies that there are still
some pieces of the puzzle to be addressed.

And again, I don't know if this is going to be solved in Python 3.5
timescales. Beta 1 is May 24th - would a major change like a version
of pip defaulting to user installs be acceptable after that? More
specifically, would core changes to install processes and recommended
practices to support that be allowed after the beta?

Maybe I'm being pessimistic. A good solution would be fantastic. But
at the moment, I feel like 3.4 (and 2.7.9) with pip and the scripts
directory on PATH was a huge usability step forward for naive users on
Windows, and we're in danger here of reverting it. And getting "naive
user" feedback until after the release is notoriously hard... A
"practicality beats purity" solution of sticking to the old install
scheme for one more version seems like it should be left as an option,
at least as a fallback if we don't solve the issues in time.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Serhiy Storchaka

On 09.03.15 17:48, Neil Girdhar wrote:

So you agree that the ideal solution is composition, but you prefer
inheritance in order to not break code?


Yes, I agree. There is two advantages in the inheritance: larger 
backward compatibility and simpler implementation.



Then,I think the big question
is how much code would actually break if you presented the ideal
interface.  I imagine that 99% of the code using flags only uses __or__
to compose and __and__, __invert__ to erase flags.


I don't know and don't want to guess. Let just follow the way of bool 
and IntEnum. When users will be encouraged to use IntEnum and IntFlags 
instead of plain ints we could consider the idea of dropping inheritance 
of bool, IntEnum and IntFlags from int. This is not near future.



> Here's another reason.  What if someone wants to use an IntFlags object,
> but wants to use a fixed width type for storage, say numpy.int32?   Why
> shouldn't they be able to do that?  By using composition, you can easily
> provide such an option.
You can design abstract interface Flags that can be combined with
int or other type. But why you want to use numpy.int32 as storage?
This doesn't save much memory, because with composition the IntFlags
class weighs more than int subclass.
Maybe you're storing a bunch of flags in a numpy array having dtype
np.int32?  It's contrived, I agree.


I afraid that composition will not help you with this. Can numpy array 
pack int-like objects into fixed-width integer array and then restore 
original type on unboxing?



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Isaac Schwabacher
On 15-03-08, nha pham 
 wrote:
> 
> We can optimize the TimSort algorithm by optimizing its binary insertion sort.
> 
> The current version of binary insertion sort use this idea:
> 
> Use binary search to find a final position in sorted list for a new element 
> X. Then insert X to that location.
> 
> I suggest another idea:
> 
> Use binary search to find a final postion in sorted list for a new element X. 
> Before insert X to that location, compare X with its next element.
> 
> For the next element, we already know if it is lower or bigger than X, so we 
> can reduce the search area to the left side or on the right side of X in the 
> sorted list.

I don't understand how this is an improvement, since with binary search the 
idea is that each comparison cuts the remaining list to search in half; i.e., 
each comparison yields one bit of information. Here, you're spending a 
comparison to cut the list to search at the element you just inserted, which is 
probably not right in the middle. If you miss the middle, you're getting on 
average less than a full bit of information from your comparison, so you're not 
reducing the remaining search space by as much as you would be if you just 
compared to the element in the middle of the list.

> I have applied my idea on java.util. ComparableTimSort.sort() and testing. 
> The execute time is reduced by 2%-6% with array of random integer.

For all that, though, experiment trumps theory...

> Here is detail about algorithm and testing: 
> https://github.com/nhapq/Optimize_binary_insertion_sort
> 
> Sincerely.
> 
> phqnha
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread nha pham
I do not know exactly, one thing I can imagine is: it turns the worst case
of binary insertion sort to best case.
With sorted array in range of 32 or 64 items, built from zero element. The
new element you put into the sorted list has a high chance of being the
smallest or the the highest of the sorted list (or nearly highest or nearly
smallest)

If that case happen, the old binary insertion sort will have the
investigate all the list, while with my idea, it just have to compare more
1-2 times.
I will try to run more test an more thinking to make sure though.

On Mon, Mar 9, 2015 at 11:48 AM, nha pham  wrote:

> I do not know exactly, one thing I can imagine is: it turns the worst case
> of binary insertion sort to best case.
> With sorted array in range of 32 or 64 items, built from zero element. The
> new element you put into the sorted list has a high chance of being the
> smallest or the the highest of the sorted list (or nearly highest or nearly
> smallest)
>
> If that case happen, the old binary insertion sort will have the
> investigate all the list, while with my idea, it just have to compare more
> 1-2 times.
> I will try to run more test an more thinking to make sure though.
>
>
>
> On Mon, Mar 9, 2015 at 10:39 AM, Isaac Schwabacher 
> wrote:
>
>> On 15-03-08, nha pham
>>  wrote:
>> >
>> > We can optimize the TimSort algorithm by optimizing its binary
>> insertion sort.
>> >
>> > The current version of binary insertion sort use this idea:
>> >
>> > Use binary search to find a final position in sorted list for a new
>> element X. Then insert X to that location.
>> >
>> > I suggest another idea:
>> >
>> > Use binary search to find a final postion in sorted list for a new
>> element X. Before insert X to that location, compare X with its next
>> element.
>> >
>> > For the next element, we already know if it is lower or bigger than X,
>> so we can reduce the search area to the left side or on the right side of X
>> in the sorted list.
>>
>> I don't understand how this is an improvement, since with binary search
>> the idea is that each comparison cuts the remaining list to search in half;
>> i.e., each comparison yields one bit of information. Here, you're spending
>> a comparison to cut the list to search at the element you just inserted,
>> which is probably not right in the middle. If you miss the middle, you're
>> getting on average less than a full bit of information from your
>> comparison, so you're not reducing the remaining search space by as much as
>> you would be if you just compared to the element in the middle of the list.
>>
>> > I have applied my idea on java.util. ComparableTimSort.sort() and
>> testing. The execute time is reduced by 2%-6% with array of random integer.
>>
>> For all that, though, experiment trumps theory...
>>
>> > Here is detail about algorithm and testing:
>> https://github.com/nhapq/Optimize_binary_insertion_sort
>> >
>> > Sincerely.
>> >
>> > phqnha
>>
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Neil Girdhar
It may be that the comparison that you do is between two elements that are
almost always in the same cache line whereas the binary search might often
incur a cache miss.

On Mon, Mar 9, 2015 at 2:49 PM, nha pham  wrote:

> I do not know exactly, one thing I can imagine is: it turns the worst case
> of binary insertion sort to best case.
> With sorted array in range of 32 or 64 items, built from zero element. The
> new element you put into the sorted list has a high chance of being the
> smallest or the the highest of the sorted list (or nearly highest or nearly
> smallest)
>
> If that case happen, the old binary insertion sort will have the
> investigate all the list, while with my idea, it just have to compare more
> 1-2 times.
> I will try to run more test an more thinking to make sure though.
>
> On Mon, Mar 9, 2015 at 11:48 AM, nha pham  wrote:
>
>> I do not know exactly, one thing I can imagine is: it turns the worst
>> case of binary insertion sort to best case.
>> With sorted array in range of 32 or 64 items, built from zero element.
>> The new element you put into the sorted list has a high chance of being the
>> smallest or the the highest of the sorted list (or nearly highest or nearly
>> smallest)
>>
>> If that case happen, the old binary insertion sort will have the
>> investigate all the list, while with my idea, it just have to compare more
>> 1-2 times.
>> I will try to run more test an more thinking to make sure though.
>>
>>
>>
>> On Mon, Mar 9, 2015 at 10:39 AM, Isaac Schwabacher > > wrote:
>>
>>> On 15-03-08, nha pham
>>>  wrote:
>>> >
>>> > We can optimize the TimSort algorithm by optimizing its binary
>>> insertion sort.
>>> >
>>> > The current version of binary insertion sort use this idea:
>>> >
>>> > Use binary search to find a final position in sorted list for a new
>>> element X. Then insert X to that location.
>>> >
>>> > I suggest another idea:
>>> >
>>> > Use binary search to find a final postion in sorted list for a new
>>> element X. Before insert X to that location, compare X with its next
>>> element.
>>> >
>>> > For the next element, we already know if it is lower or bigger than X,
>>> so we can reduce the search area to the left side or on the right side of X
>>> in the sorted list.
>>>
>>> I don't understand how this is an improvement, since with binary search
>>> the idea is that each comparison cuts the remaining list to search in half;
>>> i.e., each comparison yields one bit of information. Here, you're spending
>>> a comparison to cut the list to search at the element you just inserted,
>>> which is probably not right in the middle. If you miss the middle, you're
>>> getting on average less than a full bit of information from your
>>> comparison, so you're not reducing the remaining search space by as much as
>>> you would be if you just compared to the element in the middle of the list.
>>>
>>> > I have applied my idea on java.util. ComparableTimSort.sort() and
>>> testing. The execute time is reduced by 2%-6% with array of random integer.
>>>
>>> For all that, though, experiment trumps theory...
>>>
>>> > Here is detail about algorithm and testing:
>>> https://github.com/nhapq/Optimize_binary_insertion_sort
>>> >
>>> > Sincerely.
>>> >
>>> > phqnha
>>>
>>
>>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mistersheik%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Neil Girdhar
On Mon, Mar 9, 2015 at 12:54 PM, Serhiy Storchaka 
wrote:

> On 09.03.15 17:48, Neil Girdhar wrote:
>
>> So you agree that the ideal solution is composition, but you prefer
>> inheritance in order to not break code?
>>
>
> Yes, I agree. There is two advantages in the inheritance: larger backward
> compatibility and simpler implementation.
>
>
Inheritance might be more backwards compatible, but I believe that you
should check how much code is genuine not restricted to the idealized flags
interface.   It's not worth talking about "simpler implementation" since
the two solutions differ by only a couple dozen lines.

On the other hand, composition is better design.  It prevents you from
making mistakes like adding to flags and having carries, or using flags in
an unintended way.


>  Then,I think the big question
>> is how much code would actually break if you presented the ideal
>> interface.  I imagine that 99% of the code using flags only uses __or__
>> to compose and __and__, __invert__ to erase flags.
>>
>
> I don't know and don't want to guess. Let just follow the way of bool and
> IntEnum. When users will be encouraged to use IntEnum and IntFlags instead
> of plain ints we could consider the idea of dropping inheritance of bool,
> IntEnum and IntFlags from int. This is not near future.


I think it's the other way around.  You should typically start with the
modest interface and add methods as you need.  If you start with full blown
inheritance, you will find it only increasingly more difficult to remove
methods in changing your solution.  Using inheritance instead of
composition is one of the most common errors in objected oriented
programming, and I get the impression from your other paragraph that you're
seduced by the slightly shorter code.  I don't think it's worth giving in
to that without proof that composition will actually break a significant
amount of code.

Regarding IntEnum — that should inherit from int since they are truly just
integer constants.  It's too late for bool; that ship has sailed
unfortunately.


>
>
>  > Here's another reason.  What if someone wants to use an IntFlags
>> object,
>> > but wants to use a fixed width type for storage, say numpy.int32?
>>  Why
>> > shouldn't they be able to do that?  By using composition, you can
>> easily
>> > provide such an option.
>> You can design abstract interface Flags that can be combined with
>> int or other type. But why you want to use numpy.int32 as storage?
>> This doesn't save much memory, because with composition the IntFlags
>> class weighs more than int subclass.
>> Maybe you're storing a bunch of flags in a numpy array having dtype
>> np.int32?  It's contrived, I agree.
>>
>
> I afraid that composition will not help you with this. Can numpy array
> pack int-like objects into fixed-width integer array and then restore
> original type on unboxing?


You're right.

>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> mistersheik%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Nick Coghlan
On 10 Mar 2015 02:37, "Donald Stufft"  wrote:
> >
> > I'm okay with this. Installing for all users is really something that
could be considered an advanced option rather than the default, especially
since the aim (AIUI) of the all-users install is to pretend that Python was
shipped with the OS. (I'd kind of like to take that further by splitting
things more sensibly between Program Files, Common Files and System32, but
there's very little gain from that and much MUCH pain as long as people are
still expecting C:\PythonXY installs…)
>
> Maybe the answer is to write up a PEP and standardize the idea of entry
points, specifically the console_scripts and ui_scripts (or whatever it’s
called) entrypoints and then give Python something like -m, but which
executes a specific entry point name instead of a module name (or maybe -m
can fall back to looking at entry points? I don’t know).

While I like the idea of offering something more "built in" in this space,
my initial inclination is to prefer extending "-m" to accept the
"module.name:function.name" format to let you invoke entry points by the
name of the target function (Possible API name: runpy.run_cli_function),
and then add a "runpy.call" that can be used to call an arbitrary function
with positional and keyword string arguments based on sys.argv and
(optionally?) print the repr of the result.

It wouldn't be a universal panacea (and would need a PEP to work out the
exact UX details), but would likely make quite a few libraries more command
line accessible without needing to modify them.

Cheers,
Nick.

>
> I’ve given this like… 30s worth of thought, but maybe:
>
> pip install pygmentize  # Implicit —user
> py -e pygmetize
>
> Is an OK UX for people to have without needing to add the user site bin
directory to their PATH. Maybe it’s a horrible idea and we should all
forget I mentioned it :)
>
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Donald Stufft

> On Mar 9, 2015, at 7:11 PM, Nick Coghlan  wrote:
> 
> 
> On 10 Mar 2015 02:37, "Donald Stufft"  > wrote:
> > >
> > > I'm okay with this. Installing for all users is really something that 
> > > could be considered an advanced option rather than the default, 
> > > especially since the aim (AIUI) of the all-users install is to pretend 
> > > that Python was shipped with the OS. (I'd kind of like to take that 
> > > further by splitting things more sensibly between Program Files, Common 
> > > Files and System32, but there's very little gain from that and much MUCH 
> > > pain as long as people are still expecting C:\PythonXY installs…)
> >
> > Maybe the answer is to write up a PEP and standardize the idea of entry 
> > points, specifically the console_scripts and ui_scripts (or whatever it’s 
> > called) entrypoints and then give Python something like -m, but which 
> > executes a specific entry point name instead of a module name (or maybe -m 
> > can fall back to looking at entry points? I don’t know).
> 
> While I like the idea of offering something more "built in" in this space, my 
> initial inclination is to prefer extending "-m" to accept the 
> "module.name:function.name " format to let you invoke 
> entry points by the name of the target function (Possible API name: 
> runpy.run_cli_function), and then add a "runpy.call" that can be used to call 
> an arbitrary function with positional and keyword string arguments based on 
> sys.argv and (optionally?) print the repr of the result.
> 
> It wouldn't be a universal panacea (and would need a PEP to work out the 
> exact UX details), but would likely make quite a few libraries more command 
> line accessible without needing to modify them.
> 
> 

If I understand this correctly, you’re suggesting that to run ``pygmentize`` 
without using the script wrapper, you’d need to do ``py -m 
pygments.cmdline:main`` instead of ``pygmentize``? I don’t think that actually 
solves the problem (except by making it so that the script wrappers can maybe 
just be exactly #!/usr/bin/python -m pygments.cmdline:main but that’s a 
different thing..). I’m not against it in general though, I just don’t know 
that it solves the problem Paul was mentioning.


---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)

2015-03-09 Thread Paul Moore
On 9 March 2015 at 23:11, Nick Coghlan  wrote:
> While I like the idea of offering something more "built in" in this space,
> my initial inclination is to prefer extending "-m" to accept the
> "module.name:function.name" format to let you invoke entry points by the
> name of the target function (Possible API name: runpy.run_cli_function), and
> then add a "runpy.call" that can be used to call an arbitrary function with
> positional and keyword string arguments based on sys.argv and (optionally?)
> print the repr of the result.
>
> It wouldn't be a universal panacea (and would need a PEP to work out the
> exact UX details), but would likely make quite a few libraries more command
> line accessible without needing to modify them.

Personally I doubt it would make much difference. If the docs say
"pygmentize" I'm unlikely to dig around to find that the incantation
"python -m pygments.somemodule:main" does the same thing using 3 times
as many characters. I'd just add Python to my PATH and say stuff it.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Nick Coghlan
On 10 Mar 2015 06:51, "Neil Girdhar"  wrote:
>
>
>
> On Mon, Mar 9, 2015 at 12:54 PM, Serhiy Storchaka 
wrote:
>>
>> On 09.03.15 17:48, Neil Girdhar wrote:
>>>
>>> So you agree that the ideal solution is composition, but you prefer
>>> inheritance in order to not break code?
>>
>>
>> Yes, I agree. There is two advantages in the inheritance: larger
backward compatibility and simpler implementation.
>>
>
> Inheritance might be more backwards compatible, but I believe that you
should check how much code is genuine not restricted to the idealized flags
interface.   It's not worth talking about "simpler implementation" since
the two solutions differ by only a couple dozen lines.

We literally can't do this, as the vast majority of Python code in the
world is locked up behind institutional firewalls or has otherwise never
been published. The open source stuff is merely the tip of a truly enormous
iceberg.

If we want to *use* IntFlags in the standard library (and that's the only
pay-off significant enough to justify having it in the standard library),
then it needs to inherit from int.

However, cloning the full enum module architecture to create
flags.FlagsMeta, flags.Flags and flags.IntFlags would make sense to me.

It would also make sense to try that idea out on PyPI for a while before
incorporating it into the stdlib.

Regards,
Nick.

>
> On the other hand, composition is better design.  It prevents you from
making mistakes like adding to flags and having carries, or using flags in
an unintended way.
>
>>>
>>> Then,I think the big question
>>> is how much code would actually break if you presented the ideal
>>> interface.  I imagine that 99% of the code using flags only uses __or__
>>> to compose and __and__, __invert__ to erase flags.
>>
>>
>> I don't know and don't want to guess. Let just follow the way of bool
and IntEnum. When users will be encouraged to use IntEnum and IntFlags
instead of plain ints we could consider the idea of dropping inheritance of
bool, IntEnum and IntFlags from int. This is not near future.
>
>
> I think it's the other way around.  You should typically start with the
modest interface and add methods as you need.  If you start with full blown
inheritance, you will find it only increasingly more difficult to remove
methods in changing your solution.  Using inheritance instead of
composition is one of the most common errors in objected oriented
programming, and I get the impression from your other paragraph that you're
seduced by the slightly shorter code.  I don't think it's worth giving in
to that without proof that composition will actually break a significant
amount of code.
>
> Regarding IntEnum — that should inherit from int since they are truly
just integer constants.  It's too late for bool; that ship has sailed
unfortunately.
>
>>
>>
>>
>>> > Here's another reason.  What if someone wants to use an IntFlags
object,
>>> > but wants to use a fixed width type for storage, say
numpy.int32?   Why
>>> > shouldn't they be able to do that?  By using composition, you can
easily
>>> > provide such an option.
>>> You can design abstract interface Flags that can be combined with
>>> int or other type. But why you want to use numpy.int32 as storage?
>>> This doesn't save much memory, because with composition the IntFlags
>>> class weighs more than int subclass.
>>> Maybe you're storing a bunch of flags in a numpy array having dtype
>>> np.int32?  It's contrived, I agree.
>>
>>
>> I afraid that composition will not help you with this. Can numpy array
pack int-like objects into fixed-width integer array and then restore
original type on unboxing?
>
>
> You're right.
>>
>>
>>
>>
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/mistersheik%40gmail.com
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-09 Thread Ryan Stuart
Hi Ben,

On Mon, 9 Mar 2015 at 21:58 Ben Hoyt  wrote:

> Note that this benchmark is invalid for a couple of reasons. (...)
>

Thanks a lot for the guidance Ben, greatly appreciated. Just starting to
take an interest in the development of CPython and so something like
running a benchmark seemed like a good a place as any to start.

Since I want to get comfortable with compiling from source I tried this
again. Instead of applying the patch, since the issue is now closed, I just
compiled from the tip of the default branch which at the time
was 94920:0469af231d22. I also didn't configure with --with-pydebug. Here
are the new results:

*Ryans-MacBook-Pro:cpython rstuart$ ./python.exe
~/Workspace/python/scandir/benchmark.py*
Using Python 3.5's builtin os.scandir()
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 1/3...
Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 2/3...
Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 3/3...
os.walk took 0.061s, scandir.walk took 0.012s -- 5.2x as fast

*Ryans-MacBook-Pro:cpython rstuart$ ./python.exe
~/Workspace/python/scandir/benchmark.py -s*
Using Python 3.5's builtin os.scandir()
Comparing against builtin version of os.walk()
Priming the system's cache...
Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 1/3...
Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 2/3...
Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 3/3...
os.walk size 23400, scandir.walk size 23400 -- equal
os.walk took 0.109s, scandir.walk took 0.049s -- 2.2x as fast

This is on a Retina Mid 2012 MacBook Pro with an SSD.

Cheers


> you're compiling Python in debug mode (--with-pydebug), which produces
> significantly slower code in my tests -- for example, on Windows
> benchmark.py is about twice as slow when Python is compiled in debug
> mode.
>
> Second, as the output above shows, benchmark.py is "Using slower
> ctypes version of scandir" and not a C version at all. If os.scandir()
> is available, benchmark.py should use that, so there's something wrong
> here -- maybe the patch didn't apply correctly or maybe you're testing
> with a different version of Python than the one you built?
>
> In any case, the easiest way to test it now is to download Python 3.5
> alpha 2 which just came out:
> https://www.python.org/downloads/release/python-350a2/
>
> I just tried this on my Mac Mini (i5 2.3GHz, 2 GB RAM, HFS+ on
> rotational drive) and got the following results:
>
> Using Python 3.5's builtin os.scandir()
> Comparing against builtin version of os.walk()
> Priming the system's cache...
> Benchmarking walks on benchtree, repeat 1/3...
> Benchmarking walks on benchtree, repeat 2/3...
> Benchmarking walks on benchtree, repeat 3/3...
> os.walk took 0.074s, scandir.walk took 0.016s -- 4.7x as fast
>
> > I then did ./python.exe ~/Workspace/python/scandir/benchmark.py -s and
> got:
>
> Also note that "benchmark.py -s" tests the system os.walk() against a
> get_tree_size() function using scandir's DirEntry.stat().st_size,
> which provides huge gains on Windows (because stat().st_size doesn't
> require on OS call) but only modest gains on POSIX systems, which
> still require an OS stat call to get the size (though not the file
> type, so at least it's only one stat call). I get "2.2x as fast" on my
> Mac for "benchmark.py -s".
>
> -Ben
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] boxing and unboxing data types

2015-03-09 Thread Neil Girdhar
Totally agree
On 9 Mar 2015 19:22, "Nick Coghlan"  wrote:

>
> On 10 Mar 2015 06:51, "Neil Girdhar"  wrote:
> >
> >
> >
> > On Mon, Mar 9, 2015 at 12:54 PM, Serhiy Storchaka 
> wrote:
> >>
> >> On 09.03.15 17:48, Neil Girdhar wrote:
> >>>
> >>> So you agree that the ideal solution is composition, but you prefer
> >>> inheritance in order to not break code?
> >>
> >>
> >> Yes, I agree. There is two advantages in the inheritance: larger
> backward compatibility and simpler implementation.
> >>
> >
> > Inheritance might be more backwards compatible, but I believe that you
> should check how much code is genuine not restricted to the idealized flags
> interface.   It's not worth talking about "simpler implementation" since
> the two solutions differ by only a couple dozen lines.
>
> We literally can't do this, as the vast majority of Python code in the
> world is locked up behind institutional firewalls or has otherwise never
> been published. The open source stuff is merely the tip of a truly enormous
> iceberg.
>
> If we want to *use* IntFlags in the standard library (and that's the only
> pay-off significant enough to justify having it in the standard library),
> then it needs to inherit from int.
>
> However, cloning the full enum module architecture to create
> flags.FlagsMeta, flags.Flags and flags.IntFlags would make sense to me.
>
> It would also make sense to try that idea out on PyPI for a while before
> incorporating it into the stdlib.
>
> Regards,
> Nick.
>
> >
> > On the other hand, composition is better design.  It prevents you from
> making mistakes like adding to flags and having carries, or using flags in
> an unintended way.
> >
> >>>
> >>> Then,I think the big question
> >>> is how much code would actually break if you presented the ideal
> >>> interface.  I imagine that 99% of the code using flags only uses __or__
> >>> to compose and __and__, __invert__ to erase flags.
> >>
> >>
> >> I don't know and don't want to guess. Let just follow the way of bool
> and IntEnum. When users will be encouraged to use IntEnum and IntFlags
> instead of plain ints we could consider the idea of dropping inheritance of
> bool, IntEnum and IntFlags from int. This is not near future.
> >
> >
> > I think it's the other way around.  You should typically start with the
> modest interface and add methods as you need.  If you start with full blown
> inheritance, you will find it only increasingly more difficult to remove
> methods in changing your solution.  Using inheritance instead of
> composition is one of the most common errors in objected oriented
> programming, and I get the impression from your other paragraph that you're
> seduced by the slightly shorter code.  I don't think it's worth giving in
> to that without proof that composition will actually break a significant
> amount of code.
> >
> > Regarding IntEnum — that should inherit from int since they are truly
> just integer constants.  It's too late for bool; that ship has sailed
> unfortunately.
> >
> >>
> >>
> >>
> >>> > Here's another reason.  What if someone wants to use an IntFlags
> object,
> >>> > but wants to use a fixed width type for storage, say
> numpy.int32?   Why
> >>> > shouldn't they be able to do that?  By using composition, you
> can easily
> >>> > provide such an option.
> >>> You can design abstract interface Flags that can be combined with
> >>> int or other type. But why you want to use numpy.int32 as storage?
> >>> This doesn't save much memory, because with composition the
> IntFlags
> >>> class weighs more than int subclass.
> >>> Maybe you're storing a bunch of flags in a numpy array having dtype
> >>> np.int32?  It's contrived, I agree.
> >>
> >>
> >> I afraid that composition will not help you with this. Can numpy array
> pack int-like objects into fixed-width integer array and then restore
> original type on unboxing?
> >
> >
> > You're right.
> >>
> >>
> >>
> >>
> >> ___
> >> Python-Dev mailing list
> >> Python-Dev@python.org
> >> https://mail.python.org/mailman/listinfo/python-dev
> >> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mistersheik%40gmail.com
> >
> >
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
> >
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-09 Thread Ben Hoyt
>
> os.walk took 0.061s, scandir.walk took 0.012s -- 5.2x as fast
>

Great, looks much better. :-) Even a bit better than what I'm seeing --
possibly due to your SSD.

-Ben
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread nha pham
Thank you for your comment. I admit that I did not thinking about the proof
before.
Here is my naive proof. I hope you can give me some comments:

=
# This proof is an empirical thinking and is not completed, it just gives
us a closer look.
I hope someone can make it more mathematically.

In the proof, we assume we are dealing with unique values array (none of
them are equal together).
Because if they are equal, the "lucky search" can happen and it is
obviously not fair.

Statement_1: With an array of size N or less than N, we need at most
log2(N) comparisons to find a value
(or a position, incase the search miss), using the binary search algorithm.

proof: This statement is trivia, and I believe, someone outthere already
proved it. We can check again
by counting manually.

let assume we have array of 32 items:
32 => 16 => 8 => 4 => 2 => 1  (5 comparison)

how about 24 items (24 < 32):
24 => 12 => 6 => 3 => 2 => 1  (5 comparison)

ok, good enough. Let's just believe on it to move on.


Statement_2: If we divide an array into two parts, the more unbalanced
arrays we divide, the more
benefit we get from the binary search algorithm.

proof: Let's assume we have an array of 256 items.

case1:
If we divide in middle: 128 - 128
Now, if we search on the left, it costs log2(128) = 7
If we search on the right, it cost los2(128) = 7

case2:
If we divide unbalanced: 32 - 224
Now, if we search on the left, it costs log2(32) = 5
If we search on the right, it cost at max 8 comparisons (based on the
statement_1).
You may not believe me, so let's count it by hand:
224 => 112 => 56 => 28 => 14 => 7 => 4 => 2 => 1
So, if we search on the left, we win 2 comparisons compare to case1.
We search on the right, we lose 1 comparison compare to case1
I call this is a "benefit".

case3:
What if we divide more unbalanced: 8 - 248
Search on the left: log2(8) = 3 comparisons.
Search on the right, it costs at max 8 comparisons.
So if we search on the left, we win 4 comparisons.
We search on the right, we lose 1 comparisons.
It is "more benefit", isnt it?


Statement3: Because we are using random array. There is a 50-50 chance that
 next_X will be bigger or smaller than X.


Statement4: We call N is the size of the sorted list, "index" is the
position of X in the sorted list.
Because the array is random, index has an equal chance to exist in any
position in the sorted list.

Statement5: Now we build a model based on previous statements:

My idea costs 1 comparison (between X and next_X) to devide the array
into two unbalanced parts.
The old idea costs 1 comparison to divide the array into two balanced
parts.
Now let's see which one can find position for next_X faster:

If index belongs to [N/4 to 3N/4]: we may lose 1 comparison, or we may
not lose.
If index belongs to [N/8 to N/4] or [3N/4 to 7N/8]: We may lose 1
comparison, or we win 1 comparison.
If index belongs to [N/16 to N/8] or [7N/8 to 15N/16]: We may lose 1
comparison, or we win 2 comparison.
If index belongs to [N/32 to N/16] or [15N/16 to 31N/32]: We may lose 1
comparison, or we win 3 comparison.
If index belongs to [N/64 to N/32] or [31N/32 to 64N/64]: We may lose 1
comparison, or we win 4 comparison.
...
and so on.

Statement6: Now we apply the model to a real example.

Assume that we already has a sorted list with 16 items. And we already
know about "index" of X.
We can think of it as a gamble game with 16 slots. In every slot, we
only can bid 1 dollar (statement4).

From slot 5th to slot 12th, we may lose 1, or we may not lose, 50-50
chance.
So after a huge play times, probability told us that we will lose (8 x
1)/2 = 4 dollars.

For slot 3, slot 4, slot 13, slot 14, We may lose 1, or we win 1. So
after a huge play times,
We wont lose or win anything.

For slot 2, slot 15. We may lose 1, or we win 2. So after a huge play
times, we can win
(2-1)x2 = 2 dollars.

For slot 1, slot 16. We may lose 1, or we win 3. So after a huge play
times, we can win 4 dollars.

In total, after a huge play times, we win 4 + 2 + 0 -4 = 2 dollars !

You can test with sorted list 32 or 64 items or any number you want, I
believe the benefit is even more.

Conclusion:
The unbalanced model give us more benefit than the balanced model. That
means with an array big enough,
My idea give more benefit than the old idea.


I think the lucky ticket companies is already know about this. It is a
shame that I do not know
mathematic principle about this problem.


If I have something more, I will update my proof at:
https://github.com/nhapq/Optimize_binary_insertion_sort/blob/master/proof.txt

==
Thank you.
Nha Pham.

On Mon, Mar 9, 2015 at 10:39 AM, Isaac Schwabacher 
wrote:

> On 15-03-08, nha pham
>  wrote:
> >
> > We can optimize the TimSort algorithm by optimizing its binary insertion
> sort.

Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Tim Peters
[nha pham ]
> Statement_1: With an array of size N or less than N, we need at most log2(N)
> comparisons to find a value
> (or a position, incase the search miss), using the binary search algorithm.
>
> proof: This statement is trivia, and I believe, someone outthere already
> proved it.

Sorry for the quick message here.  It's just a simple point where it
will pay not to get off on a wrong foot ;-)

Correct:  for an array of size N, binary search can require as many as
ceiling(log2(N+1)) comparisons.

That's because there are N+1 possible results for an array of size N.
For example, for an array of size 3, [A, B, C], "the answer" may be
"before A", "between A and B", "between B and C", or "after C".  3
elements, 3+1 = 4 possible results.  log2(3) comparisons are not
enough to distinguish among 4 results.

Make it trivial, an array of length 1.  Then 1 comparison is obviously
necessary and sufficient in all cases.  And, indeed,
ceiling(log2(1+1)) = 1.  log2(1) equals 0, too small.

For the rest, I haven't been able to understand your description or
your pseudo-code.  I'll try harder.  Some things clearly aren't doing
what you _intend_ them to do.  For example, in your Python code, each
time through the outer loop you're apparently trying to sort the next
CHUNK elements, but you end up appending CHUNK+1 values to data2 (or
data3).

Or in this part:

for i in range(low,high):
x = data[i]
if x >= data[i-1]:

the first time that loop is executed low == 0, and so i == 0 on the
first iteration, and so the conditional is

   if x >= data[0-1]

That's referencing data[-1], which is the very last element in data -
which has nothing to do with the CHUNK you're trying to sort at the
time.

So there are a number of errors here, which makes it that much harder
to sort out (pun intended ) what you're trying to do.  It would
help you to add some asserts to verify your code is doing what you
_hope_ it's doing.  For example, add

assert data2[low: high] == sorted(data[low: high])
assert len(data2) == high

to the end of your `sample` loop, and similarly for data3 in your
`new` loop.  Until those asserts all pass, you're not testing code
that's actually sorting correctly.  Repair the errors and you almost
certainly won't find `new` running over 10 times faster than `sample`
anymore.  I don't know what you _will_ discover, though.  If the code
doesn't have to sort correctly, there are much simpler ways to make it
run _very_ much faster ;-)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread nha pham
Thank you very much. I am very happy that I got a reply from Tim Peter.

You are correct, my mistake.

The python code should be:
for i in range(low+1,high):  //because we already add
data[low]
x = data[i]
if x >= data[i-1]:

After I fix it, here is the result:

random array 10^6:
Old binsort:  1.3322
New binsort: 1.0015
ratio: 0.33

You are right, it is not ten times faster anymore. I will update other
results soon.

I do check the result of two sorting methods many times to make sure they
are the same. It is just because I do not know how to put assert into the
timeit.Timer class. I am pretty sure about this.

I will try to write the proof more clearly, sorry for inconvenience.

Thank you very much.
Nha Pham.

On Mon, Mar 9, 2015 at 9:27 PM, Tim Peters  wrote:

> [nha pham ]
> > Statement_1: With an array of size N or less than N, we need at most
> log2(N)
> > comparisons to find a value
> > (or a position, incase the search miss), using the binary search
> algorithm.
> >
> > proof: This statement is trivia, and I believe, someone outthere already
> > proved it.
>
> Sorry for the quick message here.  It's just a simple point where it
> will pay not to get off on a wrong foot ;-)
>
> Correct:  for an array of size N, binary search can require as many as
> ceiling(log2(N+1)) comparisons.
>
> That's because there are N+1 possible results for an array of size N.
> For example, for an array of size 3, [A, B, C], "the answer" may be
> "before A", "between A and B", "between B and C", or "after C".  3
> elements, 3+1 = 4 possible results.  log2(3) comparisons are not
> enough to distinguish among 4 results.
>
> Make it trivial, an array of length 1.  Then 1 comparison is obviously
> necessary and sufficient in all cases.  And, indeed,
> ceiling(log2(1+1)) = 1.  log2(1) equals 0, too small.
>
> For the rest, I haven't been able to understand your description or
> your pseudo-code.  I'll try harder.  Some things clearly aren't doing
> what you _intend_ them to do.  For example, in your Python code, each
> time through the outer loop you're apparently trying to sort the next
> CHUNK elements, but you end up appending CHUNK+1 values to data2 (or
> data3).
>
> Or in this part:
>
> for i in range(low,high):
> x = data[i]
> if x >= data[i-1]:
>
> the first time that loop is executed low == 0, and so i == 0 on the
> first iteration, and so the conditional is
>
>if x >= data[0-1]
>
> That's referencing data[-1], which is the very last element in data -
> which has nothing to do with the CHUNK you're trying to sort at the
> time.
>
> So there are a number of errors here, which makes it that much harder
> to sort out (pun intended ) what you're trying to do.  It would
> help you to add some asserts to verify your code is doing what you
> _hope_ it's doing.  For example, add
>
> assert data2[low: high] == sorted(data[low: high])
> assert len(data2) == high
>
> to the end of your `sample` loop, and similarly for data3 in your
> `new` loop.  Until those asserts all pass, you're not testing code
> that's actually sorting correctly.  Repair the errors and you almost
> certainly won't find `new` running over 10 times faster than `sample`
> anymore.  I don't know what you _will_ discover, though.  If the code
> doesn't have to sort correctly, there are much simpler ways to make it
> run _very_ much faster ;-)
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Tunning binary insertion sort algorithm in Timsort.

2015-03-09 Thread Tim Peters
[nha pham ]
> Thank you very much. I am very happy that I got a reply from Tim Peter.

My pleasure to speak with you too :-)


> You are correct, my mistake.
>
> The python code should be:
> for i in range(low+1,high):  //because we already add
> data[low]
> x = data[i]
> if x >= data[i-1]:
>
> After I fix it, here is the result:
>
> random array 10^6:
> Old binsort:  1.3322
> New binsort: 1.0015
> ratio: 0.33
>
> You are right, it is not ten times faster anymore. I will update other
> results soon.
>
> I do check the result of two sorting methods many times to make sure they
> are the same. It is just because I do not know how to put assert into the
> timeit.Timer class.

`assert` is just another Python statement.  You simply add it to the
code - there's nothing tricky about this.  You could, e.g., simply
copy and paste the `assert`s I suggested last time.

Before you do, trying adding `print index` to your inner loops, and
make SIZE much smaller (say, 1000) so you're not overwhelmed with
output.  You'll be surprised by what you see on the second (and
following) CHUNKs.  For example, in both `sample` and `new` it will
print 900 ninety nine times in a row when doing the last CHUNK.  The
code still isn't doing what you intend.  Until it does, timing it
makes little sense :-)

> I am pretty sure about this.

Note that I'm talking about the Python code here, the code you run
through timeit.  You cannot have checked the results of running _that_
specific code, because it doesn't work at all.  You may have checked
_other_ code many times.  We may get to that later, but since I speak
Python, I'm not going to understand what you're doing until we have
Python code that works ;-)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com