[GitHub] [lucene] jpountz commented on pull request #12460: Allow reading binary doc values as a DataInput

via GitHub Tue, 12 Sep 2023 00:08:48 -0700


jpountz commented on PR #12460:
URL: https://github.com/apache/lucene/pull/12460#issuecomment-1715126194


   The more I think of this change, the more I like it: most of the time, you 
would need to read data out of binary doc values, e.g. (variable-length) 
integers, strings, etc. and exposing binary doc values as a data input not only 
makes this easier to do, but also more efficient by saving one memory copy.
   
   Thoughts:
    - We are losing the symmetry between the index-time API 
(BinaryDocValuesField), which takes a byte[] while the search-time API produces 
a `DataInput`. Not a deal breaker, but users have to be careful about 
endianness.
    - Today it's possible to consume values multiple times, but the data input 
is stateful, so if one consumer reads data from it, and then another consumer 
tries to read data again it will fail unless it resets the input via a call to 
`advanceExact`. Again not a deal breaker but something that could be a surprise 
to some callers.
   
   In terms of backward compatibility, I'm contemplating not introducing a new 
`DataInputDocValues` class, and instead have a `dataInput()` method on 
`BinaryDocValues`, as well as a default implementation of `binaryValue()` that 
reads all bytes from the data input. What do you think? I'm also curious of the 
opinion of our sophisticated backward compatibility policeman @uschindler.
   
   To @stefanvodita 's point, it would be nice to migrate at least one or two 
call sites that would get simpler with a `DataInput` to further validate this 
change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #12460: Allow reading binary doc values as a DataInput

Reply via email to