Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-10 Thread via GitHub
jpountz merged PR #13337: URL: https://github.com/apache/lucene/pull/13337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-09 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2103397669 > also, i'm a little concerned about low-level parallelization of e.g. individual stored documents. seems like a lot of overhead! if you need 10,000 documents ranges, at least make a sin

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-08 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2101570838 In general we may still look at Robert's suggestion. If we plan to send a preload for mayne slices, we should think of adding another random API to `RandomAccessIndexInput`, something

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-08 Thread via GitHub
uschindler commented on code in PR #13337: URL: https://github.com/apache/lucene/pull/13337#discussion_r1594719247 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -50,6 +51,7 @@ abstract class MemorySegmentIndexInput extends IndexInput impl

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-08 Thread via GitHub
uschindler commented on code in PR #13337: URL: https://github.com/apache/lucene/pull/13337#discussion_r1594719247 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -50,6 +51,7 @@ abstract class MemorySegmentIndexInput extends IndexInput impl

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-07 Thread via GitHub
rmuir commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2099251913 > * I'm contemplating changing the signature from `void prefetch()` to `void prefetch(long offset, long length)`. The benefit is that this would allow reading from multiple places with a s

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-07 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2098635698 > Some questions about the API, curious to get your thoughts: > > * Should we remove `ReadAdvice#WILL_NEED` and instead introduce a new API such as `NativeAccess#madviseWillNeed

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-07 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2097602534 I reverted changed so `BufferedIndexInput`, agreed to focus on `MMapDirectory` for now. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
rmuir commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2097001668 Thanks Uwe, maybe the correct solution is to simply add the api and implement with `madvise()` for MMapDirectory, for now? To me this is just another `madvise` being hooked in. I fe

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096977864 https://bugs.openjdk.org/browse/JDK-8292771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
rmuir commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096977066 ability to madvise/fadvise without resorting to native code would be awesome too. I don't know how it may translate to windows. but it seems like it does exactly what this PR wants to do:

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096970275 > > We can't get the file handle in Java (still open issue). > > hmm, ok. I felt like we were able to get it somewhere thru the guts of nio/2 filesystem apis, maybe I am wrong?

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
rmuir commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096968924 > We can't get the file handle in Java (still open issue). hmm, ok. I felt like we were able to get it somewhere thru the guts of nio/2 filesystem apis, maybe I am wrong? -- This

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096965609 > If madvise does the trick for mmapdir, why not try POSIX_FADV_WILLNEED for the niofs case? We can't get the file handle in Java (still open issue). -- This is an automated

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
rmuir commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096958396 If madvise does the trick for mmapdir, why not try POSIX_FADV_WILLNEED for the niofs case? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096948490 > It works on my Linux box and returns 4096. 🎉 We could now also fix my hack regarding smaller chunk sizes and just ensure the chunk size is greater page size to enable madvise

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096944622 It works on my Linux box and returns 4096. :tada: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096921573 Yes, that was my idea. I also quickyl implemented the page size problem. I haven't tested it (on windows at moment). If you like you could quickly check the return value on lin

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096902937 > Can't we use the same filechannel and do a positional read in another thread (not async)? I gave it a try in the last commit, is this what you had in mind? The benchmark suggest

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096458975 Please go ahead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096443560 Hi, > It looks we need to have 2 options: > > * use try/catch around the madvise and do nothing if unaliged on 4k boundaries > * use the obsolete/deprecated `getpagesi

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096314678 > We can also use the Hotspot bean to get page size, but this fails on OpenJ9 or any 3rd party JVM. So we could try to get page size from HotSpt bean in Constants.java and save it in

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096310679 > Thanks for taking a look Uwe, and suggesting approaches for the page size issue! By the way, feel free to push directly to the branch. > > > I also have some questions about t

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096293398 Thanks for taking a look Uwe, and suggesting approaches for the page size issue! By the way, feel free to push directly to the branch. > I also have some questions about the NIOFS

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096286224 We can also use the Hotspot bean to get page size, but this fails on OpenJ9 or any 3rd party JVM. So we could try to get page size from HotSpt bean in Constants.java and save it in Op

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2096280645 Hi, give me some time to review. I got the concept! I also have some questions about the NIOFS one because I don't like to use twice as much file handles just for the prefetching.

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-06 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2095430556 @rmuir asked if we could add support for this on `MMapDirectory` via `madvise` + `POSIX_MADV_WILLNEED`. I pushed a new commit that does this (with several nocommits). This seems to perfo

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-02 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2090882979 I created the following benchmark to simulate lookups in a terms dictionary that cannot fit in the page cache. ```java import java.io.IOException; import java.nio

[PR] Add IndexInput#prefetch. [lucene]

2024-05-02 Thread via GitHub
jpountz opened a new pull request, #13337: URL: https://github.com/apache/lucene/pull/13337 This adds `IndexInput#prefetch`, which is an optional operation that instructs the `IndexInput` to start fetching bytes from storage in the background. These bytes will be picked up by follow-up call