[ https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958835#comment-16958835 ]
Jason Gerlowski commented on LUCENE-9025: ----------------------------------------- I think an improvement at the TermsEnum level would be helpful, but at the end of the day that's only covering 1 of 10+ concrete implementations of the SortedSetDocValues class. Granted, it covers the main implementation, but it'd be better to cover them all. It also (depending on the implementation, I'm not sure what specific seekCeil optimization you had in mind) could hurt uses where the looked up term is _less_ than the current one. Adding lower/upper bound params to a {{lookupTerm}} overload avoids both of these drawbacks since all implementations get the benefit, and callers have control over when the optimization (smaller search range) is triggered. > Add more efficient lookupTerm() overload to SortedSetDocValues > -------------------------------------------------------------- > > Key: LUCENE-9025 > URL: https://issues.apache.org/jira/browse/LUCENE-9025 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: master (9.0) > Reporter: Jason Gerlowski > Priority: Minor > Attachments: LUCENE-9025.patch > > > {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the > entire docValues range to find the ordinal of the requested BytesRef. > For an individual invocation, this is optimal. Without other context, binary > search needs to cover the entire space. > But there are some common uses of {{lookupTerm}} where this shouldn't be > necessary. For example: making multiple {{lookupTerm}} calls to fetch the > ordinals for each value in a sorted list of terms. {{lookupTerm}} will > binary-search the whole space on each invocation, even though the caller > knows that there's no point searching anything before the ordinal that came > back from the previous {{lookupTerm}} call. > I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a > lower-bound to start the binary search at: {{public long lookupTerm(BytesRef > key, long lowerSearchBound) throws IOException}} This saves each > binary-search a few iterations in usage scenarios like the one described > above, which can conceivably add up. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org