uschindler commented on PR #11917: URL: https://github.com/apache/lucene/pull/11917#issuecomment-1313875966
> @uschindler I think I understand Robert's concerns against mlock, but less yours against loading. Do you think that it would be useless to preload if these files can then get paged out? I have not measured this, but i would say it is better to have a very short delay on the first access to the index and not preload it. If the preloading is so fast and - as you say - would not cause a delay on opening the index, why would it then cause a significant delay on first access? > My thinking was that most users would rather like Lucene to take a bit longer to open indices if it helps provide better performance on initial queries, noting that it wouldn't make opening much slower given how tiny these files tend to be compared to the overall dataset (~0.17% of the total index size for the index generated by nightly benchmarks). In general these files would stay in the page cache all the time, but if they don't because of unusual access patterns or competing processes and the page cache decides to page out some data, it wouldn't be much worse than today. I do think some users will see appeal in the mlock option and getting stronger guarantees that this data wouldn't get paged out, but this feels like something that should be an opt-in rather than the default? That was my plan, allow to enable mlock2 optionally (optionally for 2 reasons: (1) it needs a command line parameter to enable unsafe memory accesses; (2) for the reasons you said before and Robert's issue). I was just waiting for the correct IOContext to control this. If you agree that all files marked with IOContext.LOAD are also candidates for mlock2, then I am fine. Instead of enabling preload automatically, I tend to make the setPreload setting a tristate or a `BiPredicate<String/*filename*/,IOContext>`. So users can easly preload files they like they did on `mmapdir.setPreload(true)` would be replaced by `setPreload(ALL_FILES)` or like that, `setPreload(false)` would get `setPreload(NO_FILES)` and your current variant something like `setPreload((name, context) -> (context == IOContext.LOAD))`. What do you think of making preload better controlable (also for elasticsearch), if we replace the boolean by a `BiPredicate`? For mlock2 I would add, to allow the same: `setLockPages(BiPredicate<String/*filename*/,IOContext>)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org