uschindler commented on PR #11917:
URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313875966

   > @uschindler I think I understand Robert's concerns against mlock, but less 
yours against loading. Do you think that it would be useless to preload if 
these files can then get paged out?
   
   I have not measured this, but i would say it is better to have a very short 
delay on the first access to the index and not preload it. If the preloading is 
so fast and - as you say - would not cause a delay on opening the index, why 
would it then cause a significant delay on first access?
   
   > My thinking was that most users would rather like Lucene to take a bit 
longer to open indices if it helps provide better performance on initial 
queries, noting that it wouldn't make opening much slower given how tiny these 
files tend to be compared to the overall dataset (~0.17% of the total index 
size for the index generated by nightly benchmarks). In general these files 
would stay in the page cache all the time, but if they don't because of unusual 
access patterns or competing processes and the page cache decides to page out 
some data, it wouldn't be much worse than today. I do think some users will see 
appeal in the mlock option and getting stronger guarantees that this data 
wouldn't get paged out, but this feels like something that should be an opt-in 
rather than the default?
   
   That was my plan, allow to enable mlock2 optionally (optionally for 2 
reasons: (1) it needs a command line parameter to enable unsafe memory 
accesses; (2) for the reasons you said before and Robert's issue). I was just 
waiting for the correct IOContext to control this. If you agree that all files 
marked with IOContext.LOAD are also candidates for mlock2, then I am fine.
   
   Instead of enabling preload automatically, I tend to make the setPreload 
setting a tristate or a `BiPredicate<String/*filename*/,IOContext>`. So users 
can easly preload files they like they did on `mmapdir.setPreload(true)` would 
be replaced by `setPreload(ALL_FILES)` or like that, `setPreload(false)` would 
get `setPreload(NO_FILES)` and your current variant something like 
`setPreload((name, context) -> (context == IOContext.LOAD))`. 
   
   What do you think of making preload better controlable (also for 
elasticsearch), if we replace the boolean by a `BiPredicate`?
   
   For mlock2 I would add, to allow the same: 
`setLockPages(BiPredicate<String/*filename*/,IOContext>)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to