jpountz commented on PR #13223:
URL: https://github.com/apache/lucene/pull/13223#issuecomment-2025770130

   For reference, this change is based on similar observations as made on 
https://biriukov.dev/docs/page-cache/3-page-cache-and-basic-file-operations. 
`mmap` comes with a 128kB readahead while `read()` only does 16kB readahead. I 
can reproduce the exact same numbers with code as below, plus dropping caches 
and using `vmtouch`.
   
   ```java
     public static void main(String[] args) throws Exception {
       try (FSDirectory dir = FSDirectory.open(Paths.get("/data/a")); // switch 
to NIOFSDirectory to test with read()
           IndexInput in = dir.openInput("term-ids__47.tmp", IOContext.READ)) {
         in.readInt();
       }
     }
   ```
   
   While 16kB has proved workable in practice, we've seen major performance 
issues with Elasticsearch, a 128kB readahead and indexes that exceed the size 
of the page cache. My first take was that 128kB feels huge for a default 
readahead, almost buggy, and it's not clear to me why it's so much higher than 
with read(). Since this is controversial, I'm ok with the alternative approach 
of using a MADV_RANDOM all the time for `IOContext.READ`. We should benchmark 
the impact of a smaller readahead to confirm it performs well, from my testing 
it only reads one page at a time in that case, but intuitively it should be ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to