zihanx commented on issue #15905:
URL: https://github.com/apache/lucene/issues/15905#issuecomment-4464273677
Sorry for the long delay circling back to this. And thanks again @gsmiller
for the detailed feedback!
- On `ScoreDoc[]` vs `int[]`
I agree that the four-method explosion is the thing to avoid, and it sounds
reasonable to move `partitionByLeaf` to `int[]` input. Our previous decision is
based on the fact that no one requires low level `int[]` input and `ScoreDoc[]`
is already sufficient. But now the retriever needs `int[]`, and the
partitioning logic only ever looks at docId, and `ScoreDoc[]` introduces things
that the operation ignores. Asking `ScoreDoc[]` callers to do a small
translation seems fair.
- On the `int[][] ordinalsByLeaf` alternative
I really like the elegance of the single-method shape, but it still
introduces overhead for user's who don't need ordinal tracking. And at
retrieval time it adds indirection as well.
So I'd propose using two methods, both int[] input, deprecating our previous
`ScoreDoc[]` input for `partitionByLeaf`:
```
// No ordinals
public static int[][] partitionByLeaf(int[] globalDocIds,
List<LeafReaderContext> leaves)
// With ordinals
public record PartitionedHits(int[][] docIdsByLeaf, int[][] ordinalsByLeaf)
{}
public static PartitionedHits partitionByLeafWithOrdinals( int[]
globalDocIds, List<LeafReaderContext> leaves)
```
If this sounds reasonable, I'm happy to put up a PR first to switch
`partitionByLeaf` from `ScoreDoc[]` to `int[]` input, and then build on this
issue once that's in.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]