dweiss opened a new issue, #13391:
URL: https://github.com/apache/lucene/issues/13391

   ### Description
   
   I stumbled across this one in a real-life application, where matches-API 
based highlighting of a query like this:
   
   field:(a OR b OR c OR d OR ...)
   
   took very long to complete, even though query execution itself is blazing 
fast. The reason is (I think!) in how the MultiTermQuery handles matches - the 
AbstractMultiTermQueryConstantScoreWrapper returns a disjunction of iterators 
from a terms enum:
   
   ```
       @Override
       public Matches matches(LeafReaderContext context, int doc) throws 
IOException {
         final Terms terms = context.reader().terms(q.field);
         if (terms == null) {
           return null;
         }
         return MatchesUtils.forField(
             q.field,
             () ->
                 DisjunctionMatchesIterator.fromTermsEnum(
                     context, doc, q, q.field, q.getTermsEnum(terms)));
       }
   ```
   
   but for a large set of alternatives, the loop scan inside fromTermsEnum can 
take a long time until it hits the right document:
   ```
     static MatchesIterator fromTermsEnum(
         LeafReaderContext context, int doc, Query query, String field, 
BytesRefIterator terms)
         throws IOException {
       Objects.requireNonNull(field);
       Terms t = Terms.getTerms(context.reader(), field);
       TermsEnum te = t.iterator();
       PostingsEnum reuse = null;
       for (BytesRef term = terms.next(); term != null; term = terms.next()) {
         if (te.seekExact(term)) {
           PostingsEnum pe = te.postings(reuse, PostingsEnum.OFFSETS);
           if (pe.advance(doc) == doc) {
             return new TermsEnumDisjunctionMatchesIterator(
                 new TermMatchesIterator(query, pe), terms, te, doc, query);
           } else {
             reuse = pe;
           }
         }
       }
       return null;
     }
   ```
   
   I've no idea what the fix can be here, just mentioning the problem before I 
forget it.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to