kishoreg commented on issue #4317: Support variable length Offline Dictionary Indexes for bytes, strings and maps to save on storage URL: https://github.com/apache/incubator-pinot/issues/4317#issuecomment-502867589 @mcvsubbu I had not looked into the details of MutableOffheapByteArrayStore. I do see some similarity between the two implementations. I also see some similarity with VarByteChunkWriter and VarByteChunkReader. Looks like we have 3 classes of use cases 1. Multiple Buffers, maintain offset index for each entry in the buffer (MutableByteArrayStore) 2. Multiple Buffers, maintain offset index only at chunk level (Scan within the chunk) - VarByteChunkReader/Writer 3. Single Buffer, maintain offset for each entry in the buffer (This is what @buchireddy has implemented) Ideally, we should have a VarByte Indexed/Unindexed Reader/Writer with no notion of expansion. Multiple buffer implementations should just be wrappers on top of these things. My suggestion is to implement the VarByteIndexedReaderWriter since we don't have such a primitive (This should look similar to the Buffer class within MutableOffHeapByteStore). In another PR, we can change MutableOffHeapByteStore to use the VarByteIndexedReaderWriter. Similarly, VarByteChunkReaderWriter should use VarByteUnIndexedReaderWriter implementation. This will also allow us to support NoDictionary for BYTES data-type in real-time which is not currently supported. Thoughts?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org