gortiz commented on PR #10372: URL: https://github.com/apache/pinot/pull/10372#issuecomment-1455758280
> IIUC, the intention is to batch more reads from ForwardIndexReader to reduce the total times of accessing the reader? If that is the case, I think we should add the batch size into the SVScanDocIdIterator constructor without changing the applyAnd() interface That would make the change quite more complex. There are two buffers in this algorithm: the one used by `WhateverMatcher` and the one used by `applyAnd`. Let's call `buffer` to the one in `WhateverMatcher` and `docIds` to the one in `applyAnd`. The code assumes that: - `length` <= `docIds.length` <= `buffer.length`. The reality is that `docIds.length` == `buffer.length` - `length` entries are going to be read into `buffer`. I'm not sure on what you suggest, but I'm going to assume that the idea is to increase `buffer.length` and break the second invariant. I think that would make the code quite more complex. `buffer` would need to be an actual cache over the data that is read and must read in some batch size defined at construction time. Then we would need a cursor to indicate which entry was the last that was consumed by the caller. Finally each time we call `readValuesSV` (I'm assuming with MV would be the same) we would need to check whether there are enough elements in `buffer` which may be resolved as: 1. There are more documents to read than `length`, so directly read from `buffer` and move the cursor. 2. `buffer` is exhausted, so read more from the backend, set the cursor to 0 and repeat. 3. `buffer` is not exhausted but there are not enough elements to read. First consume the elements on `buffer`, then read more and then read from 0 to the remaining number of elements. We can avoid the 3rd case if we assume that: 1. `docIds.length = k * buffer.length` where `k` is some natural positive number 2. In all calls but the last, `length = docIds.length` 3. In the last call `length` = remaining docs in the segment and therefore `buffer` will contain exactly `length` elements. This algorithm introduce several not so trivial to analyze costs in Pinot's hotpath and contrary to my proposal, these changes will affect Pinot even when this customization is not used. In case we add something like that, I would be with Richard and I would require to add some benchmarks in order to hint whether the change may affect the performance negatively in some cases. Contrary to this approach, mine is simpler (it doesn't change the current algorithm or invariants at all), it doesn't require to add the new concept of fetch size into the reader, it doesn't affect the performance on OSS Pinot and it doesn't close the door to the possibility of adding the fetch size to the reader in a future in case we find other interesting cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org