[ https://issues.apache.org/jira/browse/LUCENE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958870#comment-16958870 ]
Adrien Grand commented on LUCENE-9025: -------------------------------------- Adding this overload has a downside as well: it complicates the API by adding another method, and we care about keeping API surface area to a minimum. I think your proposal also has the problem that it doesn't help significantly alone: if the lookup logic keeps doing a regular binary search, this will only save one lookup by ord + comparison on average, which doesn't sound very compelling. Instead I think that a more efficient solution would be to keep state about the last lookup in Lucene80DocValuesProducer's terms dict, there are several interesting things we could do: - move from a regular binary search to exponential search when looking up blocks, - optimize for the case that two consecutive looked up terms belong to the same block, we could save many iterations in the final linear scan across terms that belong to the same block > Add more efficient lookupTerm() overload to SortedSetDocValues > -------------------------------------------------------------- > > Key: LUCENE-9025 > URL: https://issues.apache.org/jira/browse/LUCENE-9025 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Affects Versions: master (9.0) > Reporter: Jason Gerlowski > Priority: Minor > Attachments: LUCENE-9025.patch > > > {{SortedSetDocValues.lookupTerm(BytesRef)}} performs a binary search of the > entire docValues range to find the ordinal of the requested BytesRef. > For an individual invocation, this is optimal. Without other context, binary > search needs to cover the entire space. > But there are some common uses of {{lookupTerm}} where this shouldn't be > necessary. For example: making multiple {{lookupTerm}} calls to fetch the > ordinals for each value in a sorted list of terms. {{lookupTerm}} will > binary-search the whole space on each invocation, even though the caller > knows that there's no point searching anything before the ordinal that came > back from the previous {{lookupTerm}} call. > I propose we add a {{SortedSetDocValues.lookupTerm}} overload which takes a > lower-bound to start the binary search at: {{public long lookupTerm(BytesRef > key, long lowerSearchBound) throws IOException}} This saves each > binary-search a few iterations in usage scenarios like the one described > above, which can conceivably add up. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org