[jira] [Commented] (LUCENE-8319) A Time-limiting collector that works with CollectorManagers

Adrien Grand (Jira) Sat, 29 Aug 2020 13:41:37 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17187102#comment-17187102
 ]


Adrien Grand commented on LUCENE-8319:
--------------------------------------

A problem with TimeLimitingCollector and ExitableDirectoryReader is that they 
add layers of abstraction to things that are called in very tight loops. One 
combination that we found to work well for Elasticsearch is to use 
ExitableDirectoryReader only for terms/points and make IndexSearcher wrap the 
top-level bulk scorer to split the doc ID space in exponentially growing 
windows of doc IDs and check the timeout between windows in order to keep the 
overhead to a minimum. Timeout handling seems to be a frequent need so maybe we 
should add support for it directly on IndexSearcher where we could more easily 
do the right thing?

> A Time-limiting collector that works with CollectorManagers
> -----------------------------------------------------------
>
>                 Key: LUCENE-8319
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8319
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tony Xu
>            Priority: Minor
>
> Currently Lucene has *TimeLimitingCollector* to support time-bound collection 
> and it will throw 
> *TimeExceededException* if timeout happens. This only works nicely with the 
> single-thread low-level API from the IndexSearcher. The method signature is --
> *void search(List<LeafReaderContext> leaves, Weight weight, Collector 
> collector)*
> The intended use is to always enclose the searcher.search(query, collector) 
> call with a try ... catch and handle the timeout exception. Unfortunately 
> when working with a *CollectorManager* in the multi-thread search context, 
> the *TimeExceededException* thrown during collecting one leaf slice will be 
> re-thrown by *IndexSearcher* without calling *CollectorManager*'s reduce(), 
> even if other slices are successfully collected. The signature 
> of the search api with *CollectorManager* is --
> *<C extends Collector, T> T search(Query query, CollectorManager<C, T> 
> collectorManager)*
>  
> The good news is that IndexSearcher handles *CollectionTerminatedException* 
> gracefully by ignoring it. We can either wrap TimeLimitingCollector and throw 
>  *CollectionTerminatedException* when timeout happens or simply replace 
> *TimeExceededException* with *CollectionTerminatedException*. In either way, 
> we also need to maintain a flag that indicates if timeout occurred so that 
> the user know it's a partial collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8319) A Time-limiting collector that works with CollectorManagers

Reply via email to