Hi,

Quoting Josh Triplett (2020-12-31 22:04:49)
> On Wed, Dec 30, 2020 at 10:38:58PM -0800, Josh Triplett wrote:
> > With a large number of path exclusions specified (around 500),
> > mmtarfilter starts to become a noticeable performance bottleneck.

while I'm amazed, that my software is being used in exotic setups I'm scared to
ask in which scenario you ended up using 500 path exclusions. :D

> > It looks like mmtarfilter checks each file linearly against each filter
> > using fnmatch.
> > 
> > Python's fnmatch implementation works by translating shell patterns into
> > regular expressions. Python also provides a function to do that translation
> > separate from fnmatch. One fairly simple optimization would be to walk the
> > list of patterns *once*, take each series of consecutive exclude or include
> > filters, turn each one into a regex, join all the regexes in each group
> > together using (?:...)|(?:...) , and compile the resulting regexes once.
> > That should provide a substantial performance improvement.
> 
> Turns out there's a much simpler explanation with a simpler fix. fnmatch has
> a 256-entry LRU cache for the translated regular expressions. Once there are
> more than 256 path filters, the cache stops working entirely, and every shell
> pattern gets re-translated and re-compiled on every invocation of fnmatch.
> 
> I wrote the attached patch for mmtarfilter to address this. On an invocation
> of mmdebstrap with around 500 path filters, this saves more than a minute.

Thanks! It's applied upstream:

https://gitlab.mister-muffin.de/josch/mmdebstrap/commit/5a7dbc10c74167c8f0105f5803b81317cf676f42

Attachment: signature.asc
Description: signature

Reply via email to