benwtrent opened a new issue, #15324:
URL: https://github.com/apache/lucene/issues/15324

   ### Description
   
   We are hitting a weird EOF in Lucene. It appears that its possible for an 
essentialQueue scorer can hit `NO_MORE_DOCS`, advance the TermScorer past its 
maxDoc, and then attempt to gather a score. 
   
   I do see we adjusted this path in 10.2: 
https://github.com/apache/lucene/pull/14186
   
   Haven't been able to test this same data in 10.2 yet. 
   
   But, I have been staring at these code paths for days and just cannot see 
how we are progressing the `top` iterator to a doc past the maxDoc in the 
segment when using a filter. 
   
   Note:
   
    - It HAS to be a filter. I tried the same exact query, but with a `must` 
clause with boosting by `0` (so not contributing to score at all), and we don't 
hit the EOF
    - It MUST be done with disjunctions, I tried with conjunctions and a 
filter, and it worked just fine
    - It requires a fairly restricted filter (matching a few percentage of docs 
or less than a percent of docs).
    - One of the clauses much match more docs than the other (the particular 
failure is when one clause matches 3x more docs). But both match less than a 
1/3 of the docs. But are less restrictive than the filter.
   
   As for the TermScorer and such, that code path hasn't changed in a long 
time. basically, it never verifies its at NO_MORE_DOCS when scoring, it just 
always accepts the "advanceExact" as `true` even when that passes the maxDoc. 
Which seems like a mistake already. 
   
   I would expect `advanceExact` to return `false` if its advancing past maxDoc 
right?
   
   
   ```
   
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:762)
     at 
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3.longValue(Lucene90NormsProducer.java:399)
     at org.apache.lucene.search.TermScorer.score(TermScorer.java:93)
     at 
org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:43)
     at 
org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:176)
     at 
org.apache.lucene.search.MaxScoreBulkScorer.scoreInnerWindowWithFilter(MaxScoreBulkScorer.java:201)
     at 
org.apache.lucene.search.MaxScoreBulkScorer.scoreInnerWindow(MaxScoreBulkScorer.java:147)
     at 
org.apache.lucene.search.MaxScoreBulkScorer.score(MaxScoreBulkScorer.java:128)
     at 
org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:46)
     at 
org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:461)
     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:810)
     at 
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:388)
     at 
org.elasticsearch.search.internal.ContextIndexSearcher.lambda$search$3(ContextIndexSearcher.java:368)
     at java.util.concurrent.FutureTask.run(FutureTask.java:328)
     at org.apache.lucene.search.TaskExecutor$Task.run(TaskExecutor.java:173)
     at 
org.apache.lucene.search.TaskExecutor.lambda$invokeAll$1(TaskExecutor.java:98)
   ```
   
   ### Version and environment details
   
   Lucene 10.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to