rmuir commented on pull request #389:
URL: https://github.com/apache/lucene/pull/389#issuecomment-946248949


   > Sorted set doc values don't have a docValueCount API, they're just 
expected to return NO_MORE_ORDS when all ords have been exhausted.
   
   Thanks, sorry I had completely forgotten that, and that's the inconsistency 
that is root cause of the trouble here (padding/alignment that hid the bug 
didn't help). SortedSet was added first, and not having a count method may not 
have been the best decision. I am not sure it is even slightly helpful to save 
space if you want to implement as a vint-list, because you still need to store 
"some kind of length" to have per-document random access.
   
   With the SortedNumeric, there is no available sentinel value that can be 
used (without boxing or something nasty), so we had to do a count method.
   
   Maybe it is worth a second thought, if the SortedSet could get a count 
method to be more consistent and efficient like the numeric one. It would have 
costs (e.g. we'd need to hard-break the api in a way that it isnt trappy on 
users), but it would also have benefits: e.g. none of this state-keeping inside 
the codec, instead based on a more natural loop that happens outside of the 
codec code. Then AssertingCodec would really detect issues, maybe the compiler 
can do a better job with it, etc.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to