ankitsultana commented on PR #13721:
URL: https://github.com/apache/pinot/pull/13721#issuecomment-2260831471

   > For example, consider applying a simple equality filter with inverted 
index:
   >
   > 1. Search for the value in the sorted dictionary
   > 2. Find the corresponding bitmap locations in the offset map
   > 3. Read the inverted index
   >
   > Since pages needed for this are likely to be sufficiently far apart, 
MADV_RANDOM makes the most sense to avoid pollution from read ahead.
   > ...
   > Since we have a fairly wide variety of usecases at Linkedin, this 
indicates to me that using a default value of MADV_RANDOM likely makes the most 
sense.
   
   Logically, a reasonably high read ahead should be quite useful in most 
cases. e.g. consider a reasonably high-cardinality UUID column which is dict 
encoded and has an inverted index like your example. If the segment has 100k 
unique UUIDs, the UUIDs themselves would span 3.6MB. At Uber we have a high 
readahead and page size, and the entire dictionary would be loaded into memory 
in a single I/O stall, leading to a largely CPU intensive binary search on the 
dictionary.
   
   There are use-cases though where madv_random would be helpful (e.g. we have 
had issues with high ingestion throughput partial upsert tables at Uber (refer: 
[talk](https://youtu.be/z4Chhref1BM?si=GPekPgkVMlyrI7us&t=1462))).
   
   But we can't change the default without a wide spectrum of consequences and 
I'd discourage that. Though it's obviously good to have this feature and make 
it configurable.
   
   ---
   
   Though Lucene might be quite different from Pinot in terms of the access 
pattern, have you folks looked at their journey on this?
   
   They started by adding 
[NativePosixUtil](https://github.com/apache/lucene/tree/releases/lucene-solr/8.8.1/lucene/misc/src/java/org/apache/lucene/store)
 which added a way to configure madvise.
   
   But they dropped this in Lucene 9 (AFAICT) and they are now using 
`MemorySegment`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to