jpountz commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1438199152
Thanks Greg for sharing more info about how it helped on Amazon Product search. Do your queries early terminate somehow (in which case I'd expect this change to help the most since it can skip evaluating the tail of long postings)? I like the idea of having multiple rewrite methods and possibly an `auto` method that tries to guess a sensible rewrite method given index statistics. It helps keep things simple without having a single rewrite method that needs to be heroic. Reuse of postings enums looks ok to me, we could improve naming and add more comments to make it more obviously ok, but we only create up to 16 postings enums from scratch, reuse otherwise, and make sure to never reuse a postings enum that is in the priority queue. The threshold of 16 looks conservative to me so I wouldn't worry about NIOFSDirectory, if we have a problem with NIOFSDirectory and this threshold of 16 then many simple boolean queries have problems too, which I don't think is the case in practice? The threshold on the minimum document frequency should also help here, e.g. a near-PK field would only accumulate hits into a DocIdSetBuilder and not pull postings enums? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org