disk io usage [pinot]

via GitHub Fri, 05 Jul 2024 15:20:15 -0700


itschrispeck commented on PR #13503:
URL: https://github.com/apache/pinot/pull/13503#issuecomment-2211467078

> Is it possible to share some numbers on how this improvement has helped
with freshness and at the same time the corresponding impact on CPU / mem / IO
utilization / GC etc
>
> I think it may be useful to analyze these numbers for a prod like setting
for a steady state workload.

Yes. For CPU/Mem/GC, we found the queue poll/offer pattern in such a tight
loop caused ~1.2gbps allocations per thread. We use Generational ZGC and this
had an impact on % CPU spent on GC, especially when increasing refresh thread
count. I can't find the flame graphs for this, but the simple change to an
ArrayList solves that and the reduced allocations should be apparent even
profiling locally.
<img width="785" alt="image"
src="https://github.com/apache/pinot/assets/27231838/5d09a033-4aab-4664-994c-3c3b6df2e482";>

For Disk IO improvement, it is mostly from taking advantage of the
`LuceneNRTCachingMergePolicy`. We do a best effort attempt to merge only
segments that are entirely in memory, which reduces FDs and avoids most of the
IO.

Here's an example of reduced delay w/ 10 threads/server. For reference, this
is in a production cluster with hundreds of consuming partitions per node. The
narrow spikes are mostly due to server restart, the wide periods of narrow
spikes are due to rollouts (I need to make a change to avoid emitting delay
metrics if server is still catching up). With a single queue, we see all tables
are sensitive to ingestion spikes/data pattern changes in a single table.
Partitioning helps reduce the 'noisy neighbor' indexes.
<img width="708" alt="image"
src="https://github.com/apache/pinot/assets/27231838/1e2492ff-8ca8-4697-9955-ea7e092941b0";>

Here's some host metrics around the same time frame, showing no significant
change in heap, a slight disk IO reduction, and increased CPU usage (since we
went from 1 to 10 threads).
<img width="709" alt="image"
src="https://github.com/apache/pinot/assets/27231838/af3a3892-d830-4876-9f8b-d7f5a608d706";>

> IIRC, apart from freshness , there has also been a correctness concern
with the way Lucene NRT works and the whole snapshot refresh business. Are we
fixing that too ?

I think this is mostly a separate effort. As I understand it, the snapshot
refresh business is done since it's inherently expensive to build/use Lucene
like structures in memory (especially since input is not necessarily ordered).
For an entire segment, this is prohibitive and part of the reason why native
text index's true real-time indexing is relatively resource intensive. By
reducing the indexing delay, I think we can reduce the scope of the problem so
that we only require building/holding such a structure in memory for a very
small portion of data (i.e., the portion that has not been refreshed yet).

I opened an [issue](https://github.com/apache/pinot/issues/13504) to track
this and will share a doc there with more details, pending further testing. For
now, I think this is a standalone feature that is good to have regardless as it
can reduce the amount of incorrect data. If you have any thoughts on this, I
would love to continue discussion there

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Improve realtime Lucene text index freshness/cpu/disk io usage [pinot]

Reply via email to