[ 
https://issues.apache.org/jira/browse/LUCENE-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275373#comment-17275373
 ] 

David Smiley commented on LUCENE-9712:
--------------------------------------

I suspect the culprit is in this method: 
{{org.apache.lucene.search.uhighlight.FieldOffsetStrategy#createOffsetsEnumsWeightMatcher
 }}

-- which will compute the Matches from the Weight – but this will be called 
once per field.  I suspect most of the cost is in re-computing this over and 
over again.  If we can assume there is no "getFieldMatcher" (i.e. assume 
{{hl.requireFieldMatch=true}}), and that offset source is from the actual index 
(no re-analysis), then the leafReader could be the same across fields, and thus 
the Matches would be re-usable.  But how to re-use it across fields?  There's 
no clear place nearby since this part of the code is very field-centric.  
UHComponents is immutable; that could be changed to hold some Map.  Or, I was 
thinking maybe the Query could be wrapped with an impl that has a Weight that 
caches its Matches result for a given leafReader docId pair. Hmmmm.

This kind of highlights a structural challenge in the UH in which it is very 
field centric, and thus it's not clear where to share info across fields of the 
same doc.  Above I qualified some ideas that would only work for an index based 
offset source (in postings), but it'd suck not to handle re-analysis, which is 
popular.  Again, if there was a more document centric approach, then the 
underling MemoryIndex could be built across fields, which would then enable 
re-use of Matches since MI's leafReader would be the same.

> UnifiedHighlighter, optimize WEIGHT_MATCHES when many fields
> ------------------------------------------------------------
>
>                 Key: LUCENE-9712
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9712
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Priority: Major
>
> A user reported that highlighting many fields per document in WEIGHT_MATCHES 
> mode is quite slow:   
> [https://lists.apache.org/thread.html/r152c74a884b5ff72f3d530fc452bb0865cc7f24ca35ccf7d1d1e4952%40%3Csolr-user.lucene.apache.org%3E]
> The query is a DisjunctionMax over many fields – basically the ones being 
> highlighted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to