Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-21 Thread via GitHub
jpountz commented on code in PR #13190: URL: https://github.com/apache/lucene/pull/13190#discussion_r1533418454 ## lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java: ## @@ -445,8 +456,15 @@ private static String rateToString(double mbPerSec) { }

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-21 Thread via GitHub
jpountz commented on code in PR #13190: URL: https://github.com/apache/lucene/pull/13190#discussion_r1533418454 ## lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java: ## @@ -445,8 +456,15 @@ private static String rateToString(double mbPerSec) { }

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2011662051 > @ChrisHegarty: For MMapDirectory there's another way for madvise to work (at least for reading index files): When the user has enabled native Panama access, we can also mmap

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-21 Thread via GitHub
uschindler commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2011672091 > Not to mention that we map in 16GB chunks, so may require a little more plumbing (but nothing too hard). Is the mapping in chunks still needed? Did you @uschindler encounter is

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-21 Thread via GitHub
mikemccand commented on PR #13190: URL: https://github.com/apache/lucene/pull/13190#issuecomment-2011765808 > Thanks for pushing on this change, I like it. The fact that the extra merge concurrency may not starve merging from threads is good. I'm curious how the nightly benchmarks will reac

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-21 Thread via GitHub
jpountz commented on PR #13190: URL: https://github.com/apache/lucene/pull/13190#issuecomment-2011813580 > Nightly benchy hardwires the maxMergeCount=16, maxThreadCount=12 Do you remember how you came up with these numbers? Is there some reasoning behind these numbers, or do they come

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-21 Thread via GitHub
uschindler commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2012002493 I just noticed, for madvise no file descriptor is needed. We can implement that straight away using the memory segments. So fadvise is the bigger problem needed for writing

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2012042036 > I just noticed, for madvise no file descriptor is needed. We can implement that straight away using the memory segments. ++ I think we have all we need enable this for

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-21 Thread via GitHub
uschindler commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2012136360 > > I just noticed, for madvise no file descriptor is needed. We can implement that straight away using the memory segments. > > ++ I think we have all we need enable this

Re: [I] Create a read-only index that drops index files not needed for searching [lucene]

2024-03-21 Thread via GitHub
mikemccand commented on issue #13158: URL: https://github.com/apache/lucene/issues/13158#issuecomment-2012195070 This is a cool idea @msokolov. It is wasteful to lug around those `float32` precision vectors out to the searchers in an NRT segment replication architecture. In practice, they

Re: [I] Create a read-only index that drops index files not needed for searching [lucene]

2024-03-21 Thread via GitHub
benwtrent commented on issue #13158: URL: https://github.com/apache/lucene/issues/13158#issuecomment-2012237415 Don't "deletes" require "writes"? Meaning, if enough docs get deleted in a segment, it will ultimately require to be merged, which then is a "write"? For a quicker win in sc

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-21 Thread via GitHub
benwtrent merged PR #13190: URL: https://github.com/apache/lucene/pull/13190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
uschindler opened a new pull request, #13196: URL: https://github.com/apache/lucene/pull/13196 This is a first idea how we can use Panama Foreign to pass `madvise()` hints to the kernel when mapping memory segments. The code looks up the function pointer from stdlib (libc) on Linux an

Re: [I] Should we fold DirectIODirectory into FSDirectory? [lucene]

2024-03-21 Thread via GitHub
uschindler commented on issue #13194: URL: https://github.com/apache/lucene/issues/13194#issuecomment-2012529188 I created a draft PR for IOContext#readOnce: #13196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Create a read-only index that drops index files not needed for searching [lucene]

2024-03-21 Thread via GitHub
msokolov commented on issue #13158: URL: https://github.com/apache/lucene/issues/13158#issuecomment-2012935594 > Don't "deletes" require "writes"? Meaning, if enough docs get deleted in a segment, it will ultimately require to be merged, which then is a "write"? The goal of segment-re

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
jpountz commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2012967809 I remember this kind of things being discussed more than 10 years ago, it's extremely exciting to see it close to being included in the default `Directory`! > current "readOnce =>

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534254114 ## lucene/core/src/java21/org/apache/lucene/store/NativeAccess.java: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
uschindler commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534481239 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2013348340 I refactored the code a bit: - @jpountz I added the merge context (I checked DirectIODirectory). - I retrieve the pageSize to check if a segment is correctly aligned. Unfortunate

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2013453556 I removed the page size retrieval again. No need for it, we just be safe and only apply the advice, when the chunk size is large enough. -- This is an automated message from the Apa

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
uschindler commented on PR #13196: URL: https://github.com/apache/lucene/pull/13196#issuecomment-2013499830 Unfortunately after the problems with other constants I tend to hide the advice constants in the class any use some abstraction, so we can call We should maybe also simply remov

[PR] New int4 scalar quantization [lucene]

2024-03-21 Thread via GitHub
benwtrent opened a new pull request, #13197: URL: https://github.com/apache/lucene/pull/13197 So, there are a handful of new and interesting things that this PR adds: - Confidence interval optimizations, unlocked even smaller quantization bytes - New int4 codec which is 8x smalle

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534598790 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -92,10 +110,19 @@ private final MemorySegment[] map( }

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534636363 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534678873 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -92,10 +110,19 @@ private final MemorySegment[] map( }

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
uschindler commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534727218 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] DRAFT: Add support for posix_madvise to Java 21 MMapDirectory [lucene]

2024-03-21 Thread via GitHub
ChrisHegarty commented on code in PR #13196: URL: https://github.com/apache/lucene/pull/13196#discussion_r1534775594 ## lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [I] Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation [lucene]

2024-03-21 Thread via GitHub
msfroh commented on issue #13029: URL: https://github.com/apache/lucene/issues/13029#issuecomment-2014023729 I think the `else` clause for the cost estimate is also not great. I came across this same problem where a user was essentially running a single-term `TermInSetQuery` (that ac

Re: [I] Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation [lucene]

2024-03-21 Thread via GitHub
msfroh commented on issue #13029: URL: https://github.com/apache/lucene/issues/13029#issuecomment-2014084615 I get that part of the point of this cost estimate is to avoid the (potentially-expensive) rewrite if, e.g. we can do a doc-value rewrite instead, but I'm thinking we could do someth