Hi, Quoting Jonas Smedegaard (2020-12-31 07:51:42) > Quoting Josh Triplett (2020-12-31 07:38:58) > > With a large number of path exclusions specified (around 500), > > mmtarfilter starts to become a noticeable performance bottleneck. > > > > It looks like mmtarfilter checks each file linearly against each filter > > using fnmatch. > > > > Python's fnmatch implementation works by translating shell patterns into > > regular > > expressions. Python also provides a function to do that translation > > separate from fnmatch. One fairly simple optimization would be to walk the > > list of > > patterns *once*, take each series of consecutive exclude or include > > filters, turn each one into a regex, join all the regexes in > > each group together using (?:...)|(?:...) , and compile the resulting > > regexes once. That should provide a substantial performance improvement. > > Alternatively, if a rewrite in Perl is preferred, there's > libarchive-tar-wrapper-perl which does not slurp the whole tarball into > memory (noticing the comment in current script), and libregexp-assemble-perl > to fuse regexes together.
yes, a rewrite is preferred, but it should not bloat mmdebstraps dependencies more. Ideally, mmdebstrap should be able to run on a very minimal system (so that you can run mmdebstrap inside mmdebstrap) thus I would also love to kick Python from its dependencies. So if a rewrite happens, then it should probably be written in C. Since libarchive doesn't work, it should probably copypaste the tar handling code from dpkg. Thanks! cheers, josch
signature.asc
Description: signature