[ https://issues.apache.org/jira/browse/LUCENE-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976762#comment-16976762 ]
Adrien Grand commented on LUCENE-9051: -------------------------------------- Code duplication might look unnecessary, but I also see benefits in having independent forks so that they can evolve according to their own constraints. For instance today's implementation of Lucene80's IndexedDISI might be close to your needs, but if we find a way to make it better for the access pattern that is typical to doc values, it would be a shame that it would slow down nearest-neighbor search or vice-versa. One could make the argument that we could delay the decision to fork until it's needed, but then it's an incentive against simple changes, e.g. reordering some loops or replacing a binary search with an exponential search would make the diff very large because of the need to duplicate IndexedDISI. > Implement random access seeks in IndexedDISI (DocValues) > -------------------------------------------------------- > > Key: LUCENE-9051 > URL: https://issues.apache.org/jira/browse/LUCENE-9051 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael Sokolov > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In LUCENE-9004 we have a use case for random-access seeking in DocValues, > which currently only support forward-only iteration (with efficient > skipping). One idea there was to write an entirely new format to cover these > cases. While looking into that, I noticed that our current DocValues > addressing implementation, {{IndexedDISI}}, already has a pretty good basis > for providing random accesses. I worked up a patch that does that; we already > have the ability to jump to a block, thanks to the jump-tables added last > year by [~toke]; the patch uses that, and/or rewinds the iteration within > current block as needed. > I did a very simple performance test, comparing forward-only iteration with > random seeks, and in my test I saw no difference, but that can't be right, so > I wonder if we have a more thorough performance test of DocValues somwhere > that I could repurpose. Probably I'll go back and dig into the issue where we > added the jump tables - I seem to recall some testing was done then. > Aside from performance testing the implementation, there is the question > should we alter our API guarantees in this way. This might be controversial, > I don't know the history or all the reasoning behind the way it is today. We > provide {{advanceExact}} and some implementations support docids going > backwards, others don't. {{AssertingNumericDocValues.advanceExact}} does > enforce forward-iteration (in tests); what would the consequence be of > relaxing that? We'd then open ourselves up to requiring all DV impls to > support random access. Are there other impls to worry about though? I'm not > sure. I'd appreciate y'all's input on this one. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org