Hi,

Quoting Jonas Smedegaard (2020-12-31 07:51:42)
> Quoting Josh Triplett (2020-12-31 07:38:58)
> > With a large number of path exclusions specified (around 500),
> > mmtarfilter starts to become a noticeable performance bottleneck.
> > 
> > It looks like mmtarfilter checks each file linearly against each filter
> > using fnmatch.
> > 
> > Python's fnmatch implementation works by translating shell patterns into 
> > regular
> > expressions. Python also provides a function to do that translation
> > separate from fnmatch. One fairly simple optimization would be to walk the 
> > list of
> > patterns *once*, take each series of consecutive exclude or include
> > filters, turn each one into a regex, join all the regexes in
> > each group together using (?:...)|(?:...) , and compile the resulting
> > regexes once. That should provide a substantial performance improvement.
> 
> Alternatively, if a rewrite in Perl is preferred, there's 
> libarchive-tar-wrapper-perl which does not slurp the whole tarball into 
> memory (noticing the comment in current script), and libregexp-assemble-perl
> to fuse regexes together.

yes, a rewrite is preferred, but it should not bloat mmdebstraps dependencies
more. Ideally, mmdebstrap should be able to run on a very minimal system (so
that you can run mmdebstrap inside mmdebstrap) thus I would also love to kick
Python from its dependencies.

So if a rewrite happens, then it should probably be written in C. Since
libarchive doesn't work, it should probably copypaste the tar handling code
from dpkg.

Thanks!

cheers, josch

Attachment: signature.asc
Description: signature

Reply via email to