gsmiller commented on PR #12055:
URL: https://github.com/apache/lucene/pull/12055#issuecomment-1435477300

   Found the issue where the "constant score auto rewrite" implementation was 
removed: [LUCENE-5938](https://issues.apache.org/jira/browse/LUCENE-5938). If 
I'm understanding the history, it seems like the auto rewrite logic was trying 
to balance two things:
   1. Number of terms. If there are "few" terms (< 350) it would favor using a 
boolean query. When it passed the term threshold, it would use a filter rewrite.
   2. Sparsity of docs (total docFreq over visited terms relative to total docs 
in segment). If the "density" of the docs passed a certain point (0.1%), it 
would favor a filter approach instead of a boolean query. This point seems to 
have been in place to account for the fact that rewriting required using a 
fixed bitset, which wasn't efficient when very sparse.
   
   It looks like this rewrite method was removed in LUCENE-5938 since it 
introduced the idea of a sparse bitset, which removed the issue with #2 above.
   
   In my opinion, it seems like #1 is still a very valid trade-off (many terms 
are inefficient to manage in a boolean query due to the associated PQ). This, 
of course, is what the current rewrite method takes into consideration (with a 
threshold of 16 terms). What I still _don't_ like about the existing 
implementation is how it completely changes behavior to a full bitset rewrite 
after passing 16 terms. I _do_ think it's a nice win overall to rewrite in the 
way proposed by this PR. As far as I can tell, the former implementation never 
"incrementally" pre-processed postings into a filter bitset. It was an "all or 
nothing" approach. I think the key benefit of this PR is to allow for 
"incremental" processing.
   
   But, I also recognize it might not be applicable in all cases, for all 
users, and/or for all file systems. I think it's really good feedback to 
introduce this idea as a new rewrite option instead of modifying the existing 
one in-place. I'll look into that as a next step.
   
   @rmuir / @jpountz / @mikemccand - since you all were involved in LUCENE-5938 
and the earlier implementation of the "auto rewrite," please let me know if I'm 
missing anything. I the best "digital archeology" I could, but it's very 
possible I'm missing something. Thanks again for the feedback!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to