[ https://issues.apache.org/jira/browse/LUCENE-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218292#comment-17218292 ]
Bruno Roustant commented on LUCENE-9455: ---------------------------------------- Thanks [~zacharymorn]. I added comments to the review. Fyi watchers, I have a broader question in the PR, I repeat it here: Overall I wonder if we can do better with the sampling. The goal is to avoid doing numerous repetitive calls to QueryTimeout.shouldExit(). This is essentially the case for multi-terms queries. But actually for multi-terms queries, a new TermsEnum is created for each matching term (in TermQuery.getTermsEnum(), to get doc ids). So we end up only sampling half of the calls to QueryTimeout.shouldExit() since the other half is done by the ExitableTermsEnum constructor which is not sampled. It would be better to also sample the ExitableTermsEnum constructor, but I don't know yet how to do that. > ExitableTermsEnum (in ExitableDirectoryReader) should sample next() > ------------------------------------------------------------------- > > Key: LUCENE-9455 > URL: https://issues.apache.org/jira/browse/LUCENE-9455 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other > Reporter: David Smiley > Priority: Major > Labels: newdev > Time Spent: 40m > Remaining Estimate: 0h > > ExitableTermsEnum calls "checkAndThrow" on *every* call to next(). This is > too expensive; it should sample. I observed ElasticSearch uses the same > approach; I think Lucene would benefit from this: > https://github.com/elastic/elasticsearch/blob/4af4eb99e18fdaadac879b1223e986227dd2ee71/server/src/main/java/org/elasticsearch/search/internal/ExitableDirectoryReader.java#L151 > CC [~jimczi] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org