Hi, Quoting Josh Triplett (2020-12-31 22:04:49) > On Wed, Dec 30, 2020 at 10:38:58PM -0800, Josh Triplett wrote: > > With a large number of path exclusions specified (around 500), > > mmtarfilter starts to become a noticeable performance bottleneck.
while I'm amazed, that my software is being used in exotic setups I'm scared to ask in which scenario you ended up using 500 path exclusions. :D > > It looks like mmtarfilter checks each file linearly against each filter > > using fnmatch. > > > > Python's fnmatch implementation works by translating shell patterns into > > regular expressions. Python also provides a function to do that translation > > separate from fnmatch. One fairly simple optimization would be to walk the > > list of patterns *once*, take each series of consecutive exclude or include > > filters, turn each one into a regex, join all the regexes in each group > > together using (?:...)|(?:...) , and compile the resulting regexes once. > > That should provide a substantial performance improvement. > > Turns out there's a much simpler explanation with a simpler fix. fnmatch has > a 256-entry LRU cache for the translated regular expressions. Once there are > more than 256 path filters, the cache stops working entirely, and every shell > pattern gets re-translated and re-compiled on every invocation of fnmatch. > > I wrote the attached patch for mmtarfilter to address this. On an invocation > of mmdebstrap with around 500 path filters, this saves more than a minute. Thanks! It's applied upstream: https://gitlab.mister-muffin.de/josch/mmdebstrap/commit/5a7dbc10c74167c8f0105f5803b81317cf676f42
signature.asc
Description: signature