[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt
Ben Hoyt added the comment: Yep, I'm good -- go ahead and close! -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscri

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread STINNER Victor
STINNER Victor added the comment: Thanks for your great work Ben! I close the issue. The PEP 471 has already the Final status. -- resolution: -> fixed status: open -> closed ___ Python tracker ___

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread STINNER Victor
STINNER Victor added the comment: Ben: do you think that we are done with this issue? can I close it? -- ___ Python tracker ___ ___ Py

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks for the explanation (and the comment fix). > What's your point about complexity? Would you like to drop os.scandir > changes in os.walk(), or do you have a simpler version to propose? No, not at all! I was just noting it and trying to brainstorm any ways to s

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-30 Thread Roundup Robot
Roundup Robot added the comment: New changeset 1ad3d8d82b18 by Victor Stinner in branch 'default': Issue #23605: Fix typo in an os.walk() comment https://hg.python.org/cpython/rev/1ad3d8d82b18 -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-27 Thread STINNER Victor
STINNER Victor added the comment: Ben Hoyt added the comment: > 1) The new implementation is more complex. Of course, most of this is necessary due to the topdown directory issue. Sure, correctness matters more than performances. > However, one thing I'm not sure about is why you create scandi

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-26 Thread Ben Hoyt
Ben Hoyt added the comment: Victor, great work on pushing this out, especially with the "modifying the directories" fix. (And thanks Serhiy for pushing on correctness here.) Couple of comments/questions about your new os.walk() implementation. 1) The new implementation is more complex. Of cour

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-18 Thread STINNER Victor
STINNER Victor added the comment: I commited fast_bottom-up.patch to fix the regression of os.walk(). > Could you please add a test based on my example (i.e. converting symlinks to > a directory during walking) and may be other (creating new directory and > adding it to the dirs list)? Sorry,

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-18 Thread Roundup Robot
Roundup Robot added the comment: New changeset 4fb829f8c04d by Victor Stinner in branch 'default': Issue #23605: Fix os.walk(topdown=True), don't cache entry.is_symlink() because https://hg.python.org/cpython/rev/4fb829f8c04d -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread STINNER Victor
STINNER Victor added the comment: >> Are you testing the top-bottom or bottom-up? > My benchmark.py calls os.walk() with topdown=True, which is the default. Is it worth to mention in the os.walk() doc that topdown=False can be faster? -- ___ Python t

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Ben Hoyt
Ben Hoyt added the comment: > I don't understand your benchmark. Do you mean that os.walk() is slower > with fast_bottom-up.patch because islink() is called or because I replaced > "for entry in scandir(top):" with "entry = next(scandir_it)"? No, sorry, I was making two separate comments: 1) the

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: You fast_bottom-up looks awesome, but much more correct. This is what I meant when warned that correct implementations with scandir() will be complex. Could you please add a test based on my example (i.e. converting symlinks to a directory during walking) an

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-13 Thread STINNER Victor
STINNER Victor added the comment: I don't understand your benchmark. Do you mean that os.walk() is slower with fast_bottom-up.patch because islink() is called or because I replaced "for entry in scandir(top):" with "entry = next(scandir_it)"? Are you testing the top-bottom or bottom-up? Here

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Ben Hoyt
Ben Hoyt added the comment: Thanks, Victor. I haven't quite grokked all the changes here -- it's gotten somewhat more complicated with the scandir_it and manual next() call -- but I ran some benchmarks (via a hacked version of my scandir project's benchmark.py). The results were surprising, a

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor
STINNER Victor added the comment: And now something completly differenet: fast_bottom-up.patch, fast bottom-up, but "slow" top-down. In bottom-up mode (topdown=False), use entry.path and entry.is_symlink() => optimization compared to Python 3.4 (avoid one call to os.stat()). In top-down mode

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Happy? :-) No, it doesn't fix my example. :-( I see following possibilities: 1. Partially revert the patch and always call path.islink(). I will not kill all the benefit of using os.scandir(), because there is a benefit from avoiding path.isdir() and in

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor
STINNER Victor added the comment: walk_added_symlink_to_dir-2.patch: Ok, here is a new patch, now with tests. Without the fix on os.walk(), WalkTests.test_add_dir_symlink() fails, whereas FwalkTests.test_add_dir_symlink() pass. -- Added file: http://bugs.python.org/file38451/walk_added

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Roundup Robot
Roundup Robot added the comment: New changeset c06ebb57d4ed by Victor Stinner in branch '3.4': Issue #23605: Refactor os.walk() tests to also run them on os.fwalk() https://hg.python.org/cpython/rev/c06ebb57d4ed -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread STINNER Victor
STINNER Victor added the comment: > 1) Serhiy's point about not needing to build the symlinks set when > followlinks is True is a good one, because it'll never get used. Just a "if > not followlinks: ..." around that try/except would do it. Though this is a > small optimization, as I expect 95

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-12 Thread Roundup Robot
Roundup Robot added the comment: New changeset 1a972140ab62 by Victor Stinner in branch 'default': Issue #23605: os.walk() doesn't need to call entry.is_symlink() if followlinks https://hg.python.org/cpython/rev/1a972140ab62 -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt
Ben Hoyt added the comment: Note specifically in the unsymlink() example Serhiy gave, you probably don't *want* os.walk() to recurse into a symlink-to-a-directory-that's-changed-to-a-directory, and you probably haven't thought about it doing so, so maybe the new behaviour is fine? --

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt
Ben Hoyt added the comment: To Victor and Serhiy: 1) Serhiy's point about not needing to build the symlinks set when followlinks is True is a good one, because it'll never get used. Just a "if not followlinks: ..." around that try/except would do it. Though this is a small optimization, as I

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-11 Thread Ben Hoyt
Ben Hoyt added the comment: @Scott Dial: just a response about this benchmark: note that this benchmark isn't really valid, as it says "Using slower ctypes version of scandir", which is the slow all-Python version. You want it to be saying "Using Python 3.5's builtin os.scandir()". --

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: My first note was about efficiency of the implementation. When followlinks is true, you can avoid testing entry.is_link() and creating the symlinks set. The new implementation of os.walk() changes a behavior. The problem is that a symlink to a directory can

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread STINNER Victor
STINNER Victor added the comment: Serhiy Storchaka added the comment: > When followlinks is true, symlinks is not needed. Hum, it's not easy to understand your code. I guess that the problem is that a symlink to a directory can become something else (not a directory or a symlink to a directory).

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: When followlinks is true, symlinks is not needed. But I this commit breaks a code like following: def unsymlink(top): for root, dirs, files in os.walk(top): for name in dirs: path = os.path.join(root, name) if os.path.islin

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Roundup Robot
Roundup Robot added the comment: New changeset 50e32059a404 by Victor Stinner in branch '3.4': Issue #23605: os.walk() doc now mentions shutil.rmtree() in the last example https://hg.python.org/cpython/rev/50e32059a404 -- ___ Python tracker

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread STINNER Victor
STINNER Victor added the comment: My suggestion to add a new walk_dirs list is wrong: os.walk() documentation explicitly says that the dirs list can be modified to delete some directories: https://docs.python.org/dev/library/os.html#os.walk """ When topdown is True, the caller can modify the dir

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-10 Thread Roundup Robot
Roundup Robot added the comment: New changeset f805fdacdfe0 by Victor Stinner in branch 'default': Issue #23605: os.walk() now calls os.scandir() instead of os.listdir(). https://hg.python.org/cpython/rev/f805fdacdfe0 -- nosy: +python-dev ___ Python t

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread Scott Dial
Scott Dial added the comment: I cloned https://github.com/benhoyt/scandir @ 494f34d784 and ran benchmark.py on a couple systems that are Linux backed by a couple different NFS servers of various quality. First, a Linux VM backed by a Mac OS X NFS server backed by a SSD: $ python benchmark.py

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread STINNER Victor
STINNER Victor added the comment: Note: glob.glob() might be faster with os.scandir() on very large directories. But on my benchmarks, listdir() was always faster than scandir() when only the name of directory entries i used. Maybe we need an option glob.glob(pattern, scandir=True) :-p --

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-09 Thread STINNER Victor
STINNER Victor added the comment: I reviewed your patch. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: htt

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread Ben Hoyt
Ben Hoyt added the comment: Attaching a first cut at this -- basically the implementation I use for walk() in scandir.py on GitHub. One thing that's really weird to me: are the os.walk() unit tests actually being run? In test_os.py, I notice everything's in WalkTest.setUp, which is kinda weir

[issue23605] Use the new os.scandir() function in os.walk()

2015-03-07 Thread STINNER Victor
New submission from STINNER Victor: The PEP 471 announces a huge speed up of the os.walk() function, but os.walk() was not modified yet. I just merged the implementation of os.scandir() (issue #22524), it's now time to work on os.walk(). We need a patch and benchmarks on Linux and Windows. On