uschindler commented on PR #13219: URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020886166
> > P.S.: Are we using RANDOM at the moment? > > Not yet, we'd need to start using it where it makes sense like we do for (PRE)LOAD. > > > I also found [elastic/elasticsearch#27748](https://github.com/elastic/elasticsearch/issues/27748), this person suggests to pass RANDOM for everything. > > Yeah, Wikimedia also did testing and they [report](https://phabricator.wikimedia.org/T169498) getting best performance with a mmap readahead of 16kB instead of the default of 128kB (it's shared on the same thread). It feels a bit like a bug to me that mmap has such a higher readahead than regular read operations, I wonder if we should recommend lowering this default readahead in our wiki / javadocs instead of trying to work around it by passing RANDOM everywhere. My preference would be to not index too much on how the various hints perform in practice and try to provide what seems to be the correct read advice based on what we know of the access patterns. E.g. postings and doc values data should probably use NORMAL, stored fields, term vectors and vectors data should probably use RANDOM, etc. The question that I have about this: How to handle merging then? If we use random access for some files and then want to merge away the segments. As you said before, the problem is with reused NRT readers for merging. I think, we should not hardcode the RANDOM flag now on all files?! It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is this really used in all cases when we merge? When its hardcoded while opening index files we have a problem. The example of the vargaps reader has exactly that problem: It always uses readOnce. I think you are more familar with how the merging works, these are just some points to consider. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org