Quoting Josh Triplett (2020-12-31 07:38:58) > With a large number of path exclusions specified (around 500), > mmtarfilter starts to become a noticeable performance bottleneck. > > It looks like mmtarfilter checks each file linearly against each filter > using fnmatch. > > Python's fnmatch implementation works by translating shell patterns into > regular > expressions. Python also provides a function to do that translation > separate from fnmatch. One fairly simple optimization would be to walk the > list of > patterns *once*, take each series of consecutive exclude or include > filters, turn each one into a regex, join all the regexes in > each group together using (?:...)|(?:...) , and compile the resulting > regexes once. That should provide a substantial performance improvement.
Alternatively, if a rewrite in Perl is preferred, there's libarchive-tar-wrapper-perl which does not slurp the whole tarball into memory (noticing the comment in current script), and libregexp-assemble-perl to fuse regexes together. - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc
Description: signature