[PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

via GitHub Wed, 06 Dec 2023 04:30:47 -0800


dweiss opened a new pull request, #12881:
URL: https://github.com/apache/lucene/pull/12881


   This patch provides a number of small improvements aimed at improving 
performance of MatchHighlighter (and MatchRegionRetriever), especially in 
corner cases like:
   
   * queries that result in a large number of hits, especially in long fields. 
This causes hit passage scoring to be time-consuming, only to result in a few 
"best" passages. A configurable `maxHitsPerField` limit is added to allow 
capping the number of matches retrieved (and scored) to a reasonable number.
   
   * queries that require highlighting of hundreds of documents. The major 
performance bottleneck here is field value loading. MatchRegionRetriever now 
allows specifying which fields to load unconditionally, as well as filtering 
fields that contain hits (to skip computing highlights for fields which are 
used for filtering or are never displayed). Another improvement here is that 
highlights are now computed in parallel (using index searcher's task executor).
   
   There are a few minor API changes so I think it should be targeted for 10.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Performance improvements to MatchHighlighter and MatchRegionRetriever [lucene]

Reply via email to