[ https://issues.apache.org/jira/browse/LUCENE-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275373#comment-17275373 ]
David Smiley commented on LUCENE-9712: -------------------------------------- I suspect the culprit is in this method: {{org.apache.lucene.search.uhighlight.FieldOffsetStrategy#createOffsetsEnumsWeightMatcher }} -- which will compute the Matches from the Weight – but this will be called once per field. I suspect most of the cost is in re-computing this over and over again. If we can assume there is no "getFieldMatcher" (i.e. assume {{hl.requireFieldMatch=true}}), and that offset source is from the actual index (no re-analysis), then the leafReader could be the same across fields, and thus the Matches would be re-usable. But how to re-use it across fields? There's no clear place nearby since this part of the code is very field-centric. UHComponents is immutable; that could be changed to hold some Map. Or, I was thinking maybe the Query could be wrapped with an impl that has a Weight that caches its Matches result for a given leafReader docId pair. Hmmmm. This kind of highlights a structural challenge in the UH in which it is very field centric, and thus it's not clear where to share info across fields of the same doc. Above I qualified some ideas that would only work for an index based offset source (in postings), but it'd suck not to handle re-analysis, which is popular. Again, if there was a more document centric approach, then the underling MemoryIndex could be built across fields, which would then enable re-use of Matches since MI's leafReader would be the same. > UnifiedHighlighter, optimize WEIGHT_MATCHES when many fields > ------------------------------------------------------------ > > Key: LUCENE-9712 > URL: https://issues.apache.org/jira/browse/LUCENE-9712 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: David Smiley > Priority: Major > > A user reported that highlighting many fields per document in WEIGHT_MATCHES > mode is quite slow: > [https://lists.apache.org/thread.html/r152c74a884b5ff72f3d530fc452bb0865cc7f24ca35ccf7d1d1e4952%40%3Csolr-user.lucene.apache.org%3E] > The query is a DisjunctionMax over many fields – basically the ones being > highlighted. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org