rmuir commented on pull request #389: URL: https://github.com/apache/lucene/pull/389#issuecomment-946248949
> Sorted set doc values don't have a docValueCount API, they're just expected to return NO_MORE_ORDS when all ords have been exhausted. Thanks, sorry I had completely forgotten that, and that's the inconsistency that is root cause of the trouble here (padding/alignment that hid the bug didn't help). SortedSet was added first, and not having a count method may not have been the best decision. I am not sure it is even slightly helpful to save space if you want to implement as a vint-list, because you still need to store "some kind of length" to have per-document random access. With the SortedNumeric, there is no available sentinel value that can be used (without boxing or something nasty), so we had to do a count method. Maybe it is worth a second thought, if the SortedSet could get a count method to be more consistent and efficient like the numeric one. It would have costs (e.g. we'd need to hard-break the api in a way that it isnt trappy on users), but it would also have benefits: e.g. none of this state-keeping inside the codec, instead based on a more natural loop that happens outside of the codec code. Then AssertingCodec would really detect issues, maybe the compiler can do a better job with it, etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org