itschrispeck commented on PR #13503: URL: https://github.com/apache/pinot/pull/13503#issuecomment-2211467078
> Is it possible to share some numbers on how this improvement has helped with freshness and at the same time the corresponding impact on CPU / mem / IO utilization / GC etc > > I think it may be useful to analyze these numbers for a prod like setting for a steady state workload. Yes. For CPU/Mem/GC, we found the queue poll/offer pattern in such a tight loop caused ~1.2gbps allocations per thread. We use Generational ZGC and this had an impact on % CPU spent on GC, especially when increasing refresh thread count. I can't find the flame graphs for this, but the simple change to an ArrayList solves that and the reduced allocations should be apparent even profiling locally. <img width="785" alt="image" src="https://github.com/apache/pinot/assets/27231838/5d09a033-4aab-4664-994c-3c3b6df2e482"> For Disk IO improvement, it is mostly from taking advantage of the `LuceneNRTCachingMergePolicy`. We do a best effort attempt to merge only segments that are entirely in memory, which reduces FDs and avoids most of the IO. Here's an example of reduced delay w/ 10 threads/server. For reference, this is in a production cluster with hundreds of consuming partitions per node. The narrow spikes are mostly due to server restart, the wide periods of narrow spikes are due to rollouts (I need to make a change to avoid emitting delay metrics if server is still catching up). With a single queue, we see all tables are sensitive to ingestion spikes/data pattern changes in a single table. Partitioning helps reduce the 'noisy neighbor' indexes. <img width="708" alt="image" src="https://github.com/apache/pinot/assets/27231838/1e2492ff-8ca8-4697-9955-ea7e092941b0"> Here's some host metrics around the same time frame, showing no significant change in heap, a slight disk IO reduction, and increased CPU usage (since we went from 1 to 10 threads). <img width="709" alt="image" src="https://github.com/apache/pinot/assets/27231838/af3a3892-d830-4876-9f8b-d7f5a608d706"> > IIRC, apart from freshness , there has also been a correctness concern with the way Lucene NRT works and the whole snapshot refresh business. Are we fixing that too ? I think this is mostly a separate effort. As I understand it, the snapshot refresh business is done since it's inherently expensive to build/use Lucene like structures in memory (especially since input is not necessarily ordered). For an entire segment, this is prohibitive and part of the reason why native text index's true real-time indexing is relatively resource intensive. By reducing the indexing delay, I think we can reduce the scope of the problem so that we only require building/holding such a structure in memory for a very small portion of data (i.e., the portion that has not been refreshed yet). I opened an [issue](https://github.com/apache/pinot/issues/13504) to track this and will share a doc there with more details, pending further testing. For now, I think this is a standalone feature that is good to have regardless as it can reduce the amount of incorrect data. If you have any thoughts on this, I would love to continue discussion there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org