ankitsultana commented on PR #13721: URL: https://github.com/apache/pinot/pull/13721#issuecomment-2260831471
> For example, consider applying a simple equality filter with inverted index: > > 1. Search for the value in the sorted dictionary > 2. Find the corresponding bitmap locations in the offset map > 3. Read the inverted index > > Since pages needed for this are likely to be sufficiently far apart, MADV_RANDOM makes the most sense to avoid pollution from read ahead. > ... > Since we have a fairly wide variety of usecases at Linkedin, this indicates to me that using a default value of MADV_RANDOM likely makes the most sense. Logically, a reasonably high read ahead should be quite useful in most cases. e.g. consider a reasonably high-cardinality UUID column which is dict encoded and has an inverted index like your example. If the segment has 100k unique UUIDs, the UUIDs themselves would span 3.6MB. At Uber we have a high readahead and page size, and the entire dictionary would be loaded into memory in a single I/O stall, leading to a largely CPU intensive binary search on the dictionary. There are use-cases though where madv_random would be helpful (e.g. we have had issues with high ingestion throughput partial upsert tables at Uber (refer: [talk](https://youtu.be/z4Chhref1BM?si=GPekPgkVMlyrI7us&t=1462))). But we can't change the default without a wide spectrum of consequences and I'd discourage that. Though it's obviously good to have this feature and make it configurable. --- Though Lucene might be quite different from Pinot in terms of the access pattern, have you folks looked at their journey on this? They started by adding [NativePosixUtil](https://github.com/apache/lucene/tree/releases/lucene-solr/8.8.1/lucene/misc/src/java/org/apache/lucene/store) which added a way to configure madvise. But they dropped this in Lucene 9 (AFAICT) and they are now using `MemorySegment`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org