uschindler commented on PR #13219:
URL: https://github.com/apache/lucene/pull/13219#issuecomment-2020886166

   > > P.S.: Are we using RANDOM at the moment?
   > 
   > Not yet, we'd need to start using it where it makes sense like we do for 
(PRE)LOAD.
   > 
   > > I also found 
[elastic/elasticsearch#27748](https://github.com/elastic/elasticsearch/issues/27748),
 this person suggests to pass RANDOM for everything.
   > 
   > Yeah, Wikimedia also did testing and they 
[report](https://phabricator.wikimedia.org/T169498) getting best performance 
with a mmap readahead of 16kB instead of the default of 128kB (it's shared on 
the same thread). It feels a bit like a bug to me that mmap has such a higher 
readahead than regular read operations, I wonder if we should recommend 
lowering this default readahead in our wiki / javadocs instead of trying to 
work around it by passing RANDOM everywhere. My preference would be to not 
index too much on how the various hints perform in practice and try to provide 
what seems to be the correct read advice based on what we know of the access 
patterns. E.g. postings and doc values data should probably use NORMAL, stored 
fields, term vectors and vectors data should probably use RANDOM, etc.
   
   The question that I have about this: How to handle merging then? If we use 
random access for some files and then want to merge away the segments. As you 
said before, the problem is with reused NRT readers for merging. I think, we 
should not hardcode the RANDOM flag now on all files?!
   
   It is good that IOContext with MergeInfo always requires SEQUENTIAL, but is 
this really used in all cases when we merge? When its hardcoded while opening 
index files we have a problem. The example of the vargaps reader has exactly 
that problem: It always uses readOnce.
   
   I think you are more familar with how the merging works, these are just some 
points to consider.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to