[ https://issues.apache.org/jira/browse/LUCENE-9455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218706#comment-17218706 ]
Zach Chen edited comment on LUCENE-9455 at 10/22/20, 2:56 AM: -------------------------------------------------------------- Thanks Bruno. I thought about the proposal above a bit. I feel using *System.identityHashCode(this)* is a bit different from using *counter* in *(counter & TIMEOUT_CHECK_SAMPLING) == 0*, as counter is monotonically increasing and thus somewhat "guaranteed" to hit the threshold to check timeout status. If we assume the computed hash code is uniformly distributed, then with *System.identityHashCode(this)* we are effectively sampling with a chance of 1/2 * 1/2 * 1/2 * 1/2 = 1/16 (chance of an hash code integer with all 4 lowest bits to be 0, if *TIMEOUT_CHECK_SAMPLING* is 15). Could this probability be too small for actual production scenarios to trigger the timeout check? was (Author: zacharymorn): Thanks Bruno. I thought about the proposal above a bit. I feel using *System.identityHashCode(this)* is a bit different from using *counter* in *(counter & TIMEOUT_CHECK_SAMPLING) == 0*, as counter is monotonically increasing and thus somewhat "guaranteed" to hit the threshold to check timeout status. If we assume the computed hash code is uniformly distributed, then with *System.identityHashCode(this)* we are effectively sampling with a chance of 1/2 * 1/2 * 1/2 * 1/2 = 1/16 (chance of an hash code integer with all 4 lowest bits to be 0, if TIMEOUT_CHECK_SAMPLING is 15). Could this probability be too small for actual production scenarios to trigger the timeout check? > ExitableTermsEnum (in ExitableDirectoryReader) should sample next() > ------------------------------------------------------------------- > > Key: LUCENE-9455 > URL: https://issues.apache.org/jira/browse/LUCENE-9455 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other > Reporter: David Smiley > Priority: Major > Labels: newdev > Time Spent: 1h 20m > Remaining Estimate: 0h > > ExitableTermsEnum calls "checkAndThrow" on *every* call to next(). This is > too expensive; it should sample. I observed ElasticSearch uses the same > approach; I think Lucene would benefit from this: > https://github.com/elastic/elasticsearch/blob/4af4eb99e18fdaadac879b1223e986227dd2ee71/server/src/main/java/org/elasticsearch/search/internal/ExitableDirectoryReader.java#L151 > CC [~jimczi] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org