gsmiller commented on PR #12055: URL: https://github.com/apache/lucene/pull/12055#issuecomment-1435477300
Found the issue where the "constant score auto rewrite" implementation was removed: [LUCENE-5938](https://issues.apache.org/jira/browse/LUCENE-5938). If I'm understanding the history, it seems like the auto rewrite logic was trying to balance two things: 1. Number of terms. If there are "few" terms (< 350) it would favor using a boolean query. When it passed the term threshold, it would use a filter rewrite. 2. Sparsity of docs (total docFreq over visited terms relative to total docs in segment). If the "density" of the docs passed a certain point (0.1%), it would favor a filter approach instead of a boolean query. This point seems to have been in place to account for the fact that rewriting required using a fixed bitset, which wasn't efficient when very sparse. It looks like this rewrite method was removed in LUCENE-5938 since it introduced the idea of a sparse bitset, which removed the issue with #2 above. In my opinion, it seems like #1 is still a very valid trade-off (many terms are inefficient to manage in a boolean query due to the associated PQ). This, of course, is what the current rewrite method takes into consideration (with a threshold of 16 terms). What I still _don't_ like about the existing implementation is how it completely changes behavior to a full bitset rewrite after passing 16 terms. I _do_ think it's a nice win overall to rewrite in the way proposed by this PR. As far as I can tell, the former implementation never "incrementally" pre-processed postings into a filter bitset. It was an "all or nothing" approach. I think the key benefit of this PR is to allow for "incremental" processing. But, I also recognize it might not be applicable in all cases, for all users, and/or for all file systems. I think it's really good feedback to introduce this idea as a new rewrite option instead of modifying the existing one in-place. I'll look into that as a next step. @rmuir / @jpountz / @mikemccand - since you all were involved in LUCENE-5938 and the earlier implementation of the "auto rewrite," please let me know if I'm missing anything. I the best "digital archeology" I could, but it's very possible I'm missing something. Thanks again for the feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org