Quoting Josh Triplett (2020-12-31 07:38:58)
> With a large number of path exclusions specified (around 500),
> mmtarfilter starts to become a noticeable performance bottleneck.
> 
> It looks like mmtarfilter checks each file linearly against each filter
> using fnmatch.
> 
> Python's fnmatch implementation works by translating shell patterns into 
> regular
> expressions. Python also provides a function to do that translation
> separate from fnmatch. One fairly simple optimization would be to walk the 
> list of
> patterns *once*, take each series of consecutive exclude or include
> filters, turn each one into a regex, join all the regexes in
> each group together using (?:...)|(?:...) , and compile the resulting
> regexes once. That should provide a substantial performance improvement.

Alternatively, if a rewrite in Perl is preferred, there's 
libarchive-tar-wrapper-perl which does not slurp the whole tarball into 
memory (noticing the comment in current script), and 
libregexp-assemble-perl to fuse regexes together.


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

Attachment: signature.asc
Description: signature

Reply via email to