Bugs item #1560161, was opened at 2006-09-17 14:09
Message generated for change (Comment added) made by einsteinmg
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1560161&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
>Resolution: None
Priority: 5
Submitted By: Michael Gebetsroither (einsteinmg)
Assigned to: Nobody/Anonymous (nobody)
Summary: Better/faster implementation of os.path.split
Initial Comment:
hi,
os.path.split is quite bad regarding performance on
long pathnames:
def split(p):
i = p.rfind('/') + 1
head, tail = p[:i], p[i:]
if head and head != '/'*len(head):
head = head.rstrip('/')
return head, tail
especially this: '/'*len(head)
this constructs an unnecessary string sometimes
thousands of chars long.
better would be:
if head and len(head) != head.count('/')
BUT:
what is this 'if head and head != '/'*len(head):' for?
this if is imho useless, because
if head exists and is not all '/' => rstrip '/'
imho better would be:
rstrip '/' from head and if head is empty add a '/'
would be the same effect, because a singel '/' is just
the same as a path as '/'*len(head).
def split(p):
i = p.rfind('/') + 1
head, tail = p[:i], p[i:]
head = head.rstrip('/')
if not head:
head = '/'
return head, tail
such a implementation would be ways faster for long
pathnames.
greets,
michael
----------------------------------------------------------------------
>Comment By: Michael Gebetsroither (einsteinmg)
Date: 2006-09-18 11:25
Message:
Logged In: YES
user_id=1600082
patch passes all unittests for posixpath.
basename( 310 ) means basename called with path of length
310
sum = 0.0453672409058 min = 4.19616699219e-05
posixpath.basename( 310 )
sum = 0.15571641922 min = 0.000146865844727
posixpath_orig.basename( 310 )
sum = 0.0432558059692 min = 4.10079956055e-05
posixpath.basename( 106 )
sum = 0.128361940384 min = 0.000113964080811
posixpath_orig.basename( 106 )
sum = 0.0422701835632 min = 4.10079956055e-05
posixpath.basename( 21 )
sum = 0.118340730667 min = 0.000111818313599
posixpath_orig.basename( 21 )
so this optimized basename is about 3 times faster as the
old one and gets even faster for longer paths.
sum = 0.124966621399 min = 0.000120878219604
posixpath.dirname( 310 )
sum = 0.156893730164 min = 0.000144958496094
posixpath_orig.dirname( 310 )
sum = 0.0986065864563 min = 9.10758972168e-05
posixpath.dirname( 106 )
sum = 0.117443084717 min = 0.000113964080811
posixpath_orig.dirname( 106 )
sum = 0.0905299186707 min = 8.89301300049e-05
posixpath.dirname( 21 )
sum = 0.118889808655 min = 0.000111103057861
posixpath_orig.dirname( 21 )
optimized dirname is also faster but not that much.
but it saves an allocation which could save a few cycles
later.
----------------------------------------------------------------------
Comment By: Michael Gebetsroither (einsteinmg)
Date: 2006-09-18 11:08
Message:
Logged In: YES
user_id=1600082
sorry, haven't benchmarked my solution
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1560161&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com