[issue32040] Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths
New submission from QbLearningPython :
While testing a module, I have found a weird behaviour of pathlib package. I
have a list of pathlib.Paths and I sorted() it. I assumed that the order
retrieved by sorting a list of Paths would be the same as the order retrieved
by sorting the list of their corresponding (string) filenames. But it is not
the case.
I run the following example:
==
from pathlib import Path
# order string filenames
filenames_for_testing = (
'/spam/spams.txt',
'/spam/spam.txt',
'/spam/another.txt',
'/spam/binary.bin',
'/spam/spams/spam.ttt',
'/spam/spams/spam01.txt',
'/spam/spams/spam02.txt',
'/spam/spams/spam03.ppp',
'/spam/spams/spam04.doc',
)
sorted_filenames = sorted(filenames_for_testing)
# output ordered list of string filenames
print()
print("Ordered list of string filenames:")
print()
[print(f'\t{element}') for element in sorted_filenames]
print()
# order paths (build from same string filenames)
paths_for_testing = [
Path(filename)
for filename in filenames_for_testing
]
sorted_paths = sorted(paths_for_testing)
# outoput ordered list of pathlib.Paths
print()
print("Ordered list of pathlib.Paths:")
print()
[print(f'\t{element}') for element in sorted_paths]
print()
# compare
print()
if sorted_filenames == [str(path) for path in sorted_paths]:
print('Ordered lists of string filenames and pathlib.Paths are EQUAL.')
else:
print('Ordered lists of string filenames and pathlib.Paths are DIFFERENT.')
for element in range(0, len(sorted_filenames)):
if sorted_filenames[element] != str(sorted_paths[element]):
print()
print('First different element:')
print(f'\tElement #{element}')
print(f'\t{sorted_filenames[element]} != {sorted_paths[element]}')
break
print()
==
The output of this script was:
==
Ordered list of string filenames:
/spam/another.txt
/spam/binary.bin
/spam/spam.txt
/spam/spams.txt
/spam/spams/spam.ttt
/spam/spams/spam01.txt
/spam/spams/spam02.txt
/spam/spams/spam03.ppp
/spam/spams/spam04.doc
Ordered list of pathlib.Paths:
/spam/another.txt
/spam/binary.bin
/spam/spam.txt
/spam/spams/spam.ttt
/spam/spams/spam01.txt
/spam/spams/spam02.txt
/spam/spams/spam03.ppp
/spam/spams/spam04.doc
/spam/spams.txt
Ordered lists of string filenames and pathlib.Paths are DIFFERENT.
First different element:
Element #3
/spam/spams.txt != /spam/spams/spam.ttt
==
As you can see, 'spam/spams.txt' goes in different places if you have sorted by
pathlib.Paths than if you have sorted by string filenames.
I think that it is weird that sorting pathlib.Paths yields a different result
than sorting their string filenames. I think that pathlib.Paths should be
ordered by alphabetical order of their corresponding filenames.
Thank you.
--
components: Extension Modules
messages: 306304
nosy: QbLearningPython
priority: normal
severity: normal
status: open
title: Sorting pahtlib.Paths does give the same order as sorting the (string)
filenames of that pathlib.Paths
type: behavior
versions: Python 3.6
___
Python tracker
<https://bugs.python.org/issue32040>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32040] Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths
QbLearningPython added the comment: Thanks, serhiy.storchaka, for your answer. I am not fully convinced. You have described the current behaviour of the pathlib package. But let me ask: should be this the desired behaviour? Since string filenames and pathlib.Paths are different ways to refer to the same object (a path in a filesystem), should not be they behaved in the same way when sorting? You pointed out that the current behaviour is "more natural order" for pathlib.Paths. I am not truly sure about that. Can you please provide any citation or additional information about that? Thank you. -- ___ Python tracker <https://bugs.python.org/issue32040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33660] pathlib.Path.resolve() returns path with double slash when resolving a relative path in root directory
New submission from QbLearningPython :
I have recently found a weird behaviour while trying to resolve a relative path
located on the root directory on a macOs.
I tried to resolve a Path('spam') and the interpreter answered
PosixPath('//spam') —double slash for root— instead of (my) expected
PosixPath('/spam').
I think that this is a bug.
I ran the interpreter from root directory (cd /; python). Once running the
interpreter, this is what I did:
>>> import pathlib
>>> pathlib.Path.cwd()
PosixPath('/')
# since the interpreter has been launched from root
>>> p = pathlib.Path('spam')
>>> p
PosixPath('spam')
# just for checking
>>> p.resolve()
PosixPath('//spam')
# beware of double slash instead of single slash
I also checked the behaviour of Path.resolve() in a non-root directory (in my
case launching the interpreter from /Applications).
>>> import pathlib
>>> pathlib.Path.cwd()
PosixPath('/Applications')
>>> p = pathlib.Path('eggs')
>>> p
PosixPath('eggs')
>>> p.resolve()
PosixPath('/Applications/eggs')
# just one slash as root in this case (as should be)
So it seems that double slashes just appear while resolving relative paths in
the root directory.
More examples are:
>>> pathlib.Path('spam/egg').resolve()
PosixPath('//spam/egg')
>>> pathlib.Path('./spam').resolve()
PosixPath('//spam')
>>> pathlib.Path('./spam/egg').resolve()
PosixPath('//spam/egg')
but
>>> pathlib.Path('').resolve()
PosixPath('/')
>>> pathlib.Path('.').resolve()
PosixPath('/')
Intriguingly,
>>> pathlib.Path('spam').resolve().resolve()
PosixPath('/spam')
# 'spam'.resolve = '//spam'
# '//spam'.resolve = '/spam'!!!
>>> pathlib.Path('//spam').resolve()
PosixPath('/spam')
I have found the same behaviour in several Python versions:
Python 3.6.5 (default, May 15 2018, 08:20:57)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Python 3.4.8 (default, Mar 29 2018, 16:18:25)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Python 3.5.5 (default, Mar 29 2018, 16:22:58)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Python 3.7.0b4 (default, May 4 2018, 22:01:49)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin
All running on: macOs High Sierra 10.13.4 (17E202)
There is also confirmation of same issue on Ubuntu 16.04 (Python 3.5.2) and
Opensuse tumbleweed (Python 3.6.5)
I have searched for some information on this issue but I did not found anything
useful.
Python docs (https://docs.python.org/3/library/pathlib.html) talks about "UNC
shares" but this is not the case (in using a macOs HFS+ filesystem).
PEP 428 (https://www.python.org/dev/peps/pep-0428/) says:
Multiple leading slashes are treated differently depending on the path
flavour. They are always retained on Windows paths (because of the UNC
notation):
>>> PureWindowsPath('//some/path')
PureWindowsPath('//some/path/')
On POSIX, they are collapsed except if there are exactly two leading
slashes, which is a special case in the POSIX specification on pathname
resolution [8] (this is also necessary for Cygwin compatibility):
>>> PurePosixPath('///some/path')
PurePosixPath('/some/path')
>>> PurePosixPath('//some/path')
PurePosixPath('//some/path')
I do not think that this is related to the aforementioned issue.
However, I also checked the POSIX specification link
(http://pubs.opengroup.org/onlinepubs/009...#tag_04_11) and found:
A pathname that begins with two successive slashes may be interpreted in an
implementation-defined manner, although more than two leading slashes shall be
treated as a single slash.
I do not really think that this can cause a double slashes while resolving a
relative path on macOs.
So, I think that this issue could be a real bug in pathlib.Path.resolve()
method. Specifically on POSIX flavour.
A user of Python Forum (killerrex) and I have traced the bugs to
Lib/pathlib.py:319 in the Python 3.6 repository
https://github.com/python/cpython/blob/3...pathlib.py.
Specifically, in line 319:
newpath = path + sep + name
For pathlib.Path('spam').resolve() in the root directory, newpath is '//spam'
since:
path is '/'
sep is '/'
name is 'spam'
killerrex has suggested two solutions:
1) from line 345
base = '' if path.is_absolute() else os.getcwd()
if base == sep
