siddharthteotia opened a new pull request #5409: URL: https://github.com/apache/incubator-pinot/pull/5409
Couple of improvements have been done for bit-unpacking. - Use hand-written unpack methods for power of 2 (1, 2, 4, 8, 16, 32) number of bits used to encode the dictionaryId. The hand-written methods are faster than generic due to simplified bit math. - Amortize the overhead of function calls. Right now, the new code isn't yet wired into existing bit reader and writer. Couple of follow-ups will be coming soon: - Evaluate this optimization for non power of 2 number of bits. It is fairly possible but the performance benefit of using a special hand-written function for unpacking seems to get lost as the bit math itself gets complicated with branches for non power of 2 number of bits. - Consider using a new format where if the number of bits to encode is non power of 2, we convert it to nearest power of 2. This means if you need more than 16 bits, we use 32 bits (raw value). We get diminished returns as the overhead of unpacking itself increases at the cost of saving 10-12 bits. - Integrate the new changes with existing code. **Description of changes:** A new version of FixedBitIntReaderWriter is written that underneath uses a new version of fast bit unpack reader PinotDataBitSetV2. There are 3 important APIs here: `public int readInt(int index)` Exists in the current code as well - Used by the scan operator to read through the forward index and dictId for each docId `public void readInt(int startDocId, int length, int[] buffer)` Exists in the current code as well - Used by the multi-value bit reader to get dictId for all MVs in a given cell. `public void readValues(int[] docIds, int docIdStartIndex, int docIdLength, int[] values, int valuesStartIndex)` Exists at the FixedBitSingleColumnSingleValueReader interface and used by the dictionary based group by executor to get dictIds for a set of docIds (monotonically increasing but not necessarily contiguous). But the API still issued single read calls underneath. This PR introduces this API at the FixedBitIntReaderWriterV2 level so that group by executor can leverage it using the bulk read semantics. When this code is wired in, the scan operator will start using one of the second or third API. Please see the [spreadsheet ](https://docs.google.com/spreadsheets/d/1mz_TQe0rXadWPtA_Xov6cXwYrSvUpQB1p1b_ZqROTDQ/edit?usp=sharing)for performance numbers. Two kinds of tests were done: - Compare the performance of sequential consecutive reads using single read API `getInt(index)` with faster bit unpacking code. - Compare the performance of sequential consecutive reads using array API `readInt(int startDocId, int length, int[] buffer)` with faster bit unpacking code. Will be adding some units tests. The current PR has performance test. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org