itschrispeck commented on PR #13503:
URL: https://github.com/apache/pinot/pull/13503#issuecomment-2211467078

   > Is it possible to share some numbers on how this improvement has helped 
with freshness and at the same time the corresponding impact on CPU / mem / IO 
utilization / GC etc
   > 
   > I think it may be useful to analyze these numbers for a prod like setting 
for a steady state workload.
   
   
   Yes. For CPU/Mem/GC, we found the queue poll/offer pattern in such a tight 
loop caused ~1.2gbps allocations per thread. We use Generational ZGC and this 
had an impact on % CPU spent on GC, especially when increasing refresh thread 
count. I can't find the flame graphs for this, but the simple change to an 
ArrayList solves that and the reduced allocations should be apparent even 
profiling locally.
   <img width="785" alt="image" 
src="https://github.com/apache/pinot/assets/27231838/5d09a033-4aab-4664-994c-3c3b6df2e482";>
   
   For Disk IO improvement, it is mostly from taking advantage of the 
`LuceneNRTCachingMergePolicy`. We do a best effort attempt to merge only 
segments that are entirely in memory, which reduces FDs and avoids most of the 
IO. 
   
   Here's an example of reduced delay w/ 10 threads/server. For reference, this 
is in a production cluster with hundreds of consuming partitions per node. The 
narrow spikes are mostly due to server restart, the wide periods of narrow 
spikes are due to rollouts (I need to make a change to avoid emitting delay 
metrics if server is still catching up). With a single queue, we see all tables 
are sensitive to ingestion spikes/data pattern changes in a single table. 
Partitioning helps reduce the 'noisy neighbor' indexes. 
   <img width="708" alt="image" 
src="https://github.com/apache/pinot/assets/27231838/1e2492ff-8ca8-4697-9955-ea7e092941b0";>
   
   Here's some host metrics around the same time frame, showing no significant 
change in heap, a slight disk IO reduction, and increased CPU usage (since we 
went from 1 to 10 threads). 
   <img width="709" alt="image" 
src="https://github.com/apache/pinot/assets/27231838/af3a3892-d830-4876-9f8b-d7f5a608d706";>
   
   
   > IIRC, apart from freshness , there has also been a correctness concern 
with the way Lucene NRT works and the whole snapshot refresh business. Are we 
fixing that too ?
   
   I think this is mostly a separate effort. As I understand it, the snapshot 
refresh business is done since it's inherently expensive to build/use Lucene 
like structures in memory (especially since input is not necessarily ordered). 
For an entire segment, this is prohibitive and part of the reason why native 
text index's true real-time indexing is relatively resource intensive. By 
reducing the indexing delay, I think we can reduce the scope of the problem so 
that we only require building/holding such a structure in memory for a very 
small portion of data (i.e., the portion that has not been refreshed yet). 
   
   I opened an [issue](https://github.com/apache/pinot/issues/13504) to track 
this and will share a doc there with more details, pending further testing. For 
now, I think this is a standalone feature that is good to have regardless as it 
can reduce the amount of incorrect data. If you have any thoughts on this, I 
would love to continue discussion there


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to