[PR] Enhancing the Turkish stop word list with additional common words [lucene]
HakanBayazitHabes opened a new pull request, #14549: URL: https://github.com/apache/lucene/pull/14549 ### Description This pull request proposes the addition of several frequently used Turkish stopwords to the stopwords.txt file. These words are commonly considered non-informative in NLP tasks and are consistent with standard Turkish stopword sets. - Words like şu, şöyle, şayet, and öz were added - Alphabetically sorted for consistency - Based on analysis of multiple Turkish NLP resources This enhancement improves coverage and ensures better compatibility with text processing tasks involving Turkish. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make competitive iterators more robust. [lucene]
jpountz commented on PR #14532: URL: https://github.com/apache/lucene/pull/14532#issuecomment-2827499874 Benchmark results with #14550 applied: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrStopWords 33.75 (8.8%) 32.53 (7.2%) -3.6% ( -18% - 13%) 0.399 IntervalsOrdered2.29 (4.1%)2.23 (5.3%) -2.4% ( -11% -7%) 0.334 AndStopWords 30.50 (5.6%) 29.81 (5.9%) -2.3% ( -13% -9%) 0.462 OrMany 19.52 (4.4%) 19.15 (5.7%) -1.9% ( -11% -8%) 0.483 Or2Terms2StopWords 163.00 (5.5%) 159.95 (4.7%) -1.9% ( -11% -8%) 0.492 Or3Terms 169.56 (5.8%) 166.70 (4.8%) -1.7% ( -11% -9%) 0.553 OrHighHigh 52.71 (5.7%) 51.88 (3.8%) -1.6% ( -10% -8%) 0.542 And3Terms 175.92 (4.0%) 173.27 (4.4%) -1.5% ( -9% -7%) 0.504 And2Terms2StopWords 165.34 (3.5%) 162.88 (3.9%) -1.5% ( -8% -6%) 0.456 DismaxOrHighHigh 117.18 (2.3%) 115.70 (1.6%) -1.3% ( -5% -2%) 0.236 Term 441.32 (3.6%) 435.76 (4.9%) -1.3% ( -9% -7%) 0.586 OrHighMed 194.68 (4.8%) 192.28 (3.2%) -1.2% ( -8% -7%) 0.569 AndHighOrMedMed 47.26 (0.9%) 46.68 (0.9%) -1.2% ( -2% -0%) 0.011 DismaxOrHighMed 171.78 (2.7%) 170.05 (1.8%) -1.0% ( -5% -3%) 0.408 CombinedOrHighHigh 19.21 (1.1%) 19.03 (0.9%) -0.9% ( -2% -1%) 0.074 CombinedTerm 31.67 (4.4%) 31.40 (3.9%) -0.8% ( -8% -7%) 0.703 CombinedOrHighMed 73.65 (0.7%) 73.04 (0.9%) -0.8% ( -2% -0%) 0.059 CombinedAndHighHigh 11.61 (0.8%) 11.51 (1.5%) -0.8% ( -3% -1%) 0.211 Fuzzy1 101.38 (1.6%) 100.61 (2.3%) -0.8% ( -4% -3%) 0.462 CombinedAndHighMed 39.16 (1.2%) 38.87 (1.1%) -0.8% ( -3% -1%) 0.236 Fuzzy2 85.09 (1.5%) 84.46 (2.1%) -0.7% ( -4% -2%) 0.444 FilteredOrStopWords 47.30 (1.6%) 47.01 (2.2%) -0.6% ( -4% -3%) 0.550 FilteredPhrase 33.45 (1.4%) 33.25 (1.4%) -0.6% ( -3% -2%) 0.439 AndHighHigh 43.56 (0.9%) 43.36 (1.4%) -0.5% ( -2% -1%) 0.460 CountAndHighHigh 359.83 (1.3%) 358.29 (3.2%) -0.4% ( -4% -4%) 0.743 Prefix3 160.80 (2.7%) 160.13 (5.2%) -0.4% ( -8% -7%) 0.851 CountAndHighMed 309.90 (2.0%) 308.69 (1.6%) -0.4% ( -3% -3%) 0.682 FilteredAnd3Terms 183.33 (2.0%) 182.64 (1.6%) -0.4% ( -3% -3%) 0.700 DismaxTerm 486.91 (1.4%) 485.15 (1.9%) -0.4% ( -3% -3%) 0.688 OrHighRare 274.71 (2.8%) 273.73 (4.4%) -0.4% ( -7% -7%) 0.857 FilteredAndStopWords 46.11 (1.6%) 45.95 (1.8%) -0.4% ( -3% -3%) 0.701 FilteredAndHighMed 128.57 (2.5%) 128.13 (2.6%) -0.3% ( -5% -4%) 0.801 AndMedOrHighHigh 66.66 (0.9%) 66.43 (1.8%) -0.3% ( -3% -2%) 0.664 TermMonthSort 3290.83 (2.4%) 3281.44 (3.0%) -0.3% ( -5% -5%) 0.846 CountPhrase4.21 (1.8%)4.20 (4.4%) -0.3% ( -6% -6%) 0.882 CountFilteredOrHighMed 149.22 (0.5%) 148.84 (0.7%) -0.3% ( -1% -0%) 0.449 FilteredOr2Terms2StopWords 149.95 (1.2%) 149.59 (1.4%) -0.2% ( -2% -2%) 0.729 AndHighMed 136.50 (0.8%) 136.19 (0.8%) -0.2% ( -1% -1%) 0.616 FilteredOr3Terms 168.18 (1.1%) 167.82 (1.5%) -0.2% ( -2% -2%) 0.751 FilteredTerm 160.37 (1.8%) 160.11 (1.1%) -0.2% ( -3% -2%) 0.838 CountOrHighMed 369.17 (1.4%) 368.63 (2.5%) -0.1% ( -3% -3%) 0
Re: [I] Allow Histogram Collection using PointTree when SortedNumericDocValues is absent [lucene]
jpountz commented on issue #14536: URL: https://github.com/apache/lucene/issues/14536#issuecomment-2827511771 In my opinion, it's fine to require doc values to be indexed for faceting to work. I don't think we should try to support faceting (or sorting) when the field has a points index but no doc values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Remove DISIDocIdStream. [lucene]
jpountz opened a new pull request, #14550: URL: https://github.com/apache/lucene/pull/14550 I had initially introduced `DISIDocIdStream` to avoid introducing regressions when `DenseConjunctionBulkScorer` started accepting single clauses. However, benchmarks on #14532 suggested that going through `DISIDocIdStream` is slower than loading docs into a bit set first and then iterating the bit set, when the postings list has many of its blocks encoded as bit sets. This makes sense, the way how `BitSetDocIdStream` iterates set bits saves a number of operations compared with calling `FixedBitSet#nextSetBit` in a loop. So I'm suggesting removing `DISIDocIdStream` for now for simplicity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
gf2121 merged PR #14529: URL: https://github.com/apache/lucene/pull/14529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Add `SkipIndex` in SortedNumericDocValuesSetQuery [lucene]
kkewwei opened a new issue, #14551: URL: https://github.com/apache/lucene/issues/14551 ### Description In #13449, SkipIndex was introduced. It is primarily utilized in range query like `SortedSetDocValuesRangeQuery` and `SortedNumericDocValuesRangeQuery`. I'm wondering if we could incorporate it into `SortedNumericDocValuesSetQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]
mikemccand commented on issue #14431: URL: https://github.com/apache/lucene/issues/14431#issuecomment-2828061328 > > I don't know if we are already doing this -- is this TieredMergePolicy's default behavior (1 -> 1) for forceMergeDeletes? I don't think so? > > It's not the default indeed. TieredMergePolicy always optimizes for lower write amplification over latency to my knowledge. I suspect it's a better default, not knowing anything about the user's use-case. Yeah +1 to defaulting that way. > > Yeah we actually set natural merging to tolerate lower deletes after forceMergeDeletes finishes. > > Out of curiosity, what about making natural merging the way how deletes are reclaimed instead of using `forceMergeDeletes`? Something like > > * Update `TieredMergePolicy` to tolerate lower deletes. > * Call `IndexWriter#maybeMerge` > * Poll until the deletes percent is below a tolerable value or the timeout. > > The benefit I see is that `findMerges` naturally creates more and smaller merges vs `findForced(Deletes)Merges`, and prioritizes merges that are cheaper and reclaim more deletes. Ahhh that is a brilliant idea! It also avoids the possible discontinuity of the different merge selection producing "exotic" segment geometry from `forcMergeDeletes` since it'd always be natural merges that are selected. I like this idea! Thanks @jpountz. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
jpountz commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2058189258 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -693,6 +723,34 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.index = disi.numberOfOnes - Long.bitCount(leftBits); return (leftBits & 1L) != 0; } + + @Override + boolean intoBitSetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.bitSet == null) { + disi.bitSet = new FixedBitSet(BLOCK_SIZE); +} + +int sourceFrom = disi.doc & 0x; +int sourceTo = Math.min(upTo - disi.block, BLOCK_SIZE); +int destFrom = disi.block - offset + sourceFrom; Review Comment: Is this the same as `disi.doc - offset`? ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -693,6 +723,34 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.index = disi.numberOfOnes - Long.bitCount(leftBits); return (leftBits & 1L) != 0; } + + @Override + boolean intoBitSetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.bitSet == null) { + disi.bitSet = new FixedBitSet(BLOCK_SIZE); +} + +int sourceFrom = disi.doc & 0x; +int sourceTo = Math.min(upTo - disi.block, BLOCK_SIZE); +int destFrom = disi.block - offset + sourceFrom; + +long fp = disi.slice.getFilePointer(); +disi.slice.seek(fp - Long.BYTES); Review Comment: nit: maybe leave a small comment explaining why we're seeking backwards, I believe that it's to include the word that contains the current doc, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Encode numeric docvalue with per block gcd [lucene]
jpountz commented on issue #14545: URL: https://github.com/apache/lucene/issues/14545#issuecomment-2827604411 If you're interested in storage efficiency, you'd probably want to use a doc-values format like this one: https://github.com/apache/lucene/issues/11072. It stores data into blocks of 128 values like postings, where each block is encoded with its own GCD among other things. Elasticsearch uses something similar for logs and metrics. It wasn't adopted by Lucene because the way how it decodes whole blocks when it needs a single value in a block slowed down sparse queries but IMO it would be a better trade-off for a number of applications. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
expani commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2058428825 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -282,6 +290,10 @@ public PostingsEnum postings( @Override public ImpactsEnum impacts(FieldInfo fieldInfo, BlockTermState state, int flags) throws IOException { +if (state.docFreq <= BLOCK_SIZE) { + // no skip data + return new SlowImpactsEnum(postings(fieldInfo, state, null, flags)); +} Review Comment: Sure I saw that removing this also works. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
benwtrent commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2827970657 @weizijun I am running some benchmarking as well. But the key thing is to update the parameters in `knnPerfTest.py` making sure `numMergeWorker` and `numMergeThread` are greater than `1` ,etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
weizijun commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2827978234 @benwtrent I see the default parameters: ``` "numMergeWorker": (12,), "numMergeThread": (4,), ``` Is the current merger effective with these parameters? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
weizijun commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2827261485 > We need to make sure that there are no significant performance or concurrency bugs introduced with this. Could you test with https://github.com/mikemccand/luceneutil to verify recall, multi-threaded merge, etc. ? hi, @benwtrent , could you tell me which benchmark tool can test this case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make task executor non-final [lucene]
jpountz commented on PR #14524: URL: https://github.com/apache/lucene/pull/14524#issuecomment-2827538594 IMO Lucene should own how queries execute concurrently instead of making it pluggable. So I'd rather not allow users to pass a custom `TaskExecutor`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensuring skip list is read for fields indexed with only DOCS [lucene]
jpountz commented on code in PR #14511: URL: https://github.com/apache/lucene/pull/14511#discussion_r2058399082 ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; } @Override public int getDocIdUpTo(int level) { -if (indexHasFreq == false) { - return NO_MORE_DOCS; -} Review Comment: Likewise, let's keep this? ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1286,14 +1298,11 @@ public long cost() { @Override public int numLevels() { -return indexHasFreq == false || level1LastDocID == NO_MORE_DOCS ? 1 : 2; +return level1LastDocID == NO_MORE_DOCS ? 1 : 2; Review Comment: I would keep it the way it was for now. Exposing a single level that covers the whole doc ID space may cause inefficiencies with some scorers, but we should fix these scorers instead of tuning this class. If all docs have the same score, it should be just fine to return a single level of impact that covers the whole doc ID space. ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -282,6 +290,10 @@ public PostingsEnum postings( @Override public ImpactsEnum impacts(FieldInfo fieldInfo, BlockTermState state, int flags) throws IOException { +if (state.docFreq <= BLOCK_SIZE) { + // no skip data + return new SlowImpactsEnum(postings(fieldInfo, state, null, flags)); +} Review Comment: Let's not use `SlowImpactsEnum` again or it would cancel the speedup from https://github.com/apache/lucene/pull/14017 (annotation HJ at https://benchmarks.mikemccandless.com/OrHighHigh.html). ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -66,13 +67,20 @@ public final class Lucene103PostingsReader extends PostingsReaderBase { static final VectorizationProvider VECTORIZATION_PROVIDER = VectorizationProvider.getInstance(); + // Dummy impacts, composed of the maximum possible term frequency and the lowest possible // (unsigned) norm value. This is typically used on tail blocks, which don't actually record - // impacts as the storage overhead would not be worth any query evaluation speedup, since there's + // impacts as the storage overhead would not be worth any query evaluation speedup, since + // there's // less than 128 docs left to evaluate anyway. private static final List DUMMY_IMPACTS = Collections.singletonList(new Impact(Integer.MAX_VALUE, 1L)); + // We stopped storing a placeholder impact with freq=1 for fields with IndexOptions.DOCS after + // 9.12.0 + private static final List NON_COMPETITIVE_IMPACTS = + Collections.singletonList(new Impact(1, 1L)); Review Comment: Maybe call it `DUMMY_IMPACTS_NO_FREQS` or something like that to draw a parallel with `DUMMY_IMPACTS` above? ## lucene/core/src/java/org/apache/lucene/codecs/lucene103/Lucene103PostingsReader.java: ## @@ -1309,8 +1318,9 @@ public List getImpacts(int level) { if (level == 1) { return readImpacts(level1SerializedImpacts, level1Impacts); } + return DUMMY_IMPACTS; } -return DUMMY_IMPACTS; +return NON_COMPETITIVE_IMPACTS; Review Comment: I would keep it the way it was before and just add this at the beginning of this method (similar to `getDocIdUpTo`): ```java if (indexHasFreq == false) { return DUMMY_IMPACTS_NO_FREQS; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Speed up flush of softdelete by intoBitset [lucene]
gf2121 opened a new pull request, #14552: URL: https://github.com/apache/lucene/pull/14552 Speed up the flush of softdelete by intoBitset. relates: https://github.com/apache/lucene/issues/14521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
jpountz commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2059102197 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: > After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation My reading of recent reports is that users want to be able to customize it because there isn't a one-size-fits-all. For instance, vectors work better with `ReadAdvice.NORMAL` when they can fit in the page cache, and better with `ReadAdvice.RANDOM` when they significantly exceed the size of the page cache. So I agree that Lucene should come with sensible defaults, but I think that we also need a way to diverge from Lucene's defaults? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
ChrisHegarty commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057896977 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The final api position as I understand it, `IOContext` has no knowledge of `ReadAdvice` ( `IOConext::withReadAdvice` and `IOContext::readAdvice` will be removed ). `ReadAdvice` is rather determined by the directory when given the higher-level hints from the IOContext. For example, the `MMapDirectory` implementation of openInput would do something like. ```java public IndexInput openInput(String name, IOContext context) throws IOException { ... return PROVIDER.openInput(..., toReadAdvice(context), preload.test(name, context),...); } ``` ... so it's somewhat akin to the the current preLoad model. For a custom filter wrapper one would do: ```java Directory dir = new FilterDirectory(mmapDirectory) { @Override public ReadAdvice toReadAdvice(IOContext context) { if (someCondition) { // enforce specific read advice instead of relying on hints return ReadAdvice.NORMAL; } // hints to readAdvice } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses). So the code would be: ``` Directory dir = new FilterDirectory(mmapDirectory) { @Override public IndexInput openInput(String name, IOContext context) { if (someCondition()) { // be more specific in what this file is being opened for, // MMapDirectory chooses what to do with this info context = context.withHints(FileDataHint.VECTORS, DataAccessHint.RANDOM); } return in.openInput(name, context); } } ``` If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, and _not_ the responsibility of the `Directory`; or we use some sort of lambda passed in to the relevant Directory constructor to do that mapping, which feels kinda hacky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses). So the code would be: ``` Directory dir = new FilterDirectory(mmapDirectory) { @Override public IndexInput openInput(String name, IOContext context) { if (someCondition()) { // be more specific in what this file is being opened for, // MMapDirectory chooses what to do with it context = context.withHints(FileDataHint.VECTORS, DataAccessHint.RANDOM); } return in.openInput(name, context); } } ``` If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, and _not_ the responsibility of the `Directory`; or we use some sort of lambda passed in to the relevant Directory constructor to do that mapping, which feels kinda hacky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses). So the code would be: ``` Directory dir = new FilterDirectory(mmapDirectory) { @Override public IndexInput openInput(String name, IOContext context) { if (someCondition()) { // be more specific in what this file is being opened for, // MMapDirectory chooses what to do with this info context = context.withHints(FileDataHint.VECTORS, DataAccessHint.RANDOM); } return in.openInput(name, context); } } ``` If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, and _not_ the responsibility of the `Directory`; or we use some sort of lambda override similar to `setPreload` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
gf2121 commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2057923869 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -693,6 +727,44 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.index = disi.numberOfOnes - Long.bitCount(leftBits); return (leftBits & 1L) != 0; } + + @Override + boolean intoBitSetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.doc >= upTo) { + advanceWithinBlock(disi, disi.doc); Review Comment: This was useful before we advancing `disi.block`, but not now. I like the simplification more :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses) If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, or we use some sort of lambda passed in to the relevant Directory constructor to override it, which feels kinda hacky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses) If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, and _not_ the responsibility of the `Directory`; or we use some sort of lambda passed in to the relevant Directory constructor to override it, which feels kinda hacky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely opaque within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses) If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, and _not_ the responsibility of the `Directory`; or we use some sort of lambda passed in to the relevant Directory constructor to do that mapping, which feels kinda hacky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
ChrisHegarty commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057828737 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { Review Comment: >It does mean that no hint at all can have more than one instance as a hint though - but that might be ok? That limitation seems ok to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make competitive iterators more robust. [lucene]
jpountz commented on PR #14532: URL: https://github.com/apache/lucene/pull/14532#issuecomment-2826606964 I could confirm that it's due to `#docIdEnd()` now being implemented on the competitive iterators. This triggers usage of `DISIDocIdStream`, which calls `nextDoc()` in a loop, and calling `nextDoc()` in a loop happens to be a slower way of iterating doc IDs than calling `intoBitSet` and then iterating set bits when blocks are encoded as a bit set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
gf2121 commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2057706654 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -491,6 +492,14 @@ public int advance(int target) throws IOException { return doc; } + @Override + public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws IOException { +assert doc >= offset; +while (method.intoBitsetWithinBlock(this, upTo, bitSet, offset) == false) { + readBlockHeader(); Review Comment: I like the simplification! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
gf2121 commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2057680596 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -625,6 +634,29 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.exists = false; return false; } + + @Override + boolean intoBitsetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.doc >= upTo) { + return true; +} +bitSet.set(disi.doc - offset); + +for (int i = disi.index + 1, to = disi.nextBlockIndex; i <= to; i++) { Review Comment: In `advanceWithinBlock`, `disi.index++` is executed in the loop body before `if (doc >= targetInBlock)` so it is not actually excluded. `advanceWithinBlock` can return true when `disi.index` equals `disi.disi.nextBlockIndex` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
gf2121 commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2057720050 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -625,6 +634,29 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.exists = false; return false; } + + @Override + boolean intoBitsetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.doc >= upTo) { + return true; +} +bitSet.set(disi.doc - offset); + +for (int i = disi.index + 1, to = disi.nextBlockIndex; i <= to; i++) { Review Comment: I update to the pattern like `advanceWithinBlock` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
jpountz commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2057729121 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -693,6 +727,44 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.index = disi.numberOfOnes - Long.bitCount(leftBits); return (leftBits & 1L) != 0; } + + @Override + boolean intoBitSetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.doc >= upTo) { + advanceWithinBlock(disi, disi.doc); Review Comment: Do we really need to call `advanceWithinBlock`? If `disi.doc` is already `>= upTo`, there shouldn't be anything to do? ## lucene/core/src/test/org/apache/lucene/codecs/lucene90/TestIndexedDISI.java: ## @@ -555,6 +565,47 @@ private void assertAdvanceExactRandomized( } } + private void assertIntoBitsetRandomized( + IndexedDISI disi, BitSetIterator disi2, int disi2length, int step) throws IOException { +int index = -1; +FixedBitSet set1 = new FixedBitSet(step); +FixedBitSet set2 = new FixedBitSet(step); + +for (int upTo = 0; upTo < disi2length; ) { + int lastUpTo = upTo; + upTo += TestUtil.nextInt(random(), 0, step); + int offset = TestUtil.nextInt(random(), lastUpTo, upTo); + + if (disi.docID() < offset) { +disi.advance(offset); + } + int doc = disi2.docID(); + while (doc < offset) { +index++; +doc = disi2.nextDoc(); + } + while (doc < upTo) { +set2.set(doc - offset); +index++; +doc = disi2.nextDoc(); + } + + disi.intoBitSet(upTo, set1, offset); + assertEquals(index, disi.index()); + BitSetIterator expected = new BitSetIterator(set2, set2.cardinality()); + BitSetIterator actual = new BitSetIterator(set1, set1.cardinality()); + for (int expectedDoc = expected.nextDoc(); + expectedDoc != DocIdSetIterator.NO_MORE_DOCS; + expectedDoc = expected.nextDoc()) { +int actualDoc = actual.nextDoc(); +assertEquals(expectedDoc + offset, actualDoc + offset); // plus offset for better message. + } + assertEquals(DocIdSetIterator.NO_MORE_DOCS, actual.nextDoc()); + set1.clear(); + set2.clear(); Review Comment: Can you also check that nextDoc() returns the expected value after `intoBitSet` returns? To make sure that all required state is set correctly? ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java: ## @@ -693,6 +727,44 @@ boolean advanceExactWithinBlock(IndexedDISI disi, int target) throws IOException disi.index = disi.numberOfOnes - Long.bitCount(leftBits); return (leftBits & 1L) != 0; } + + @Override + boolean intoBitSetWithinBlock(IndexedDISI disi, int upTo, FixedBitSet bitSet, int offset) + throws IOException { +if (disi.doc >= upTo) { + advanceWithinBlock(disi, disi.doc); + return true; +} + +int sourceFrom = disi.doc & 0x; +int sourceStartWord = sourceFrom >>> 6; +int sourceTo = Math.min(upTo - disi.block, BLOCK_SIZE); +int numWords = FixedBitSet.bits2words(sourceTo) - sourceStartWord; + +if (disi.bitSet == null || disi.bitSet.getBits().length < numWords) { Review Comment: Should we always create a bit set of size `BLOCK_SIZE` when it's null for simplicity? This is what users will get in the worst-case scenario anyway, and it's not crazy (65k bits = 8kB)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
ChrisHegarty commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057896977 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The final api position as I understand it, `IOContext` has no knowledge of `ReadAdvice` ( `IOConext::withReadAdvice` and `IOContext::readAdvice` will be removed ). EDIT: this is incorrect. D'oh!! ~`ReadAdvice` is rather determined by the directory when given the higher-level hints from the IOContext. For example, the `MMapDirectory` implementation of openInput would do something like.~ ```java public IndexInput openInput(String name, IOContext context) throws IOException { ... return PROVIDER.openInput(..., toReadAdvice(context), preload.test(name, context),...); } ``` ~... so it's somewhat akin to the the current preLoad model.~ ~For a custom filter wrapper one would do:~ ```java Directory dir = new FilterDirectory(mmapDirectory) { @Override public ReadAdvice toReadAdvice(IOContext context) { if (someCondition) { // enforce specific read advice instead of relying on hints return ReadAdvice.NORMAL; } // hints to readAdvice } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
thecoop commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057921646 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: The `MMapDirectory` wouldn't call the overridden `toReadAdvice`, as thats on a different class, not in the same type hierarchy. Hmm... One way to think of it might be that trying to set a specific ReadAdvice is wrong. You should instead be setting the correct hints _such that_ the directory chooses the behaviour you want. So rather than setting `ReadAdvice.NORMAL` directly, you would specify hints to make MMapDirectory select the `ReadAdvice` that you want - or whatever else MMapDirectory chooses to do to follow the hints you've specified. The `ReadAdvice` is entirely hidden within the Directory implementation, and you can _only_ set hints. After all, you're specifying a specific read advice for a reason - if you encode the reason (as hints), rather than the specific behaviour that you want, then the directory can choose the best way to do that according to the directory implementation, which may or may not be using certain ReadAdvices (or any other setting it chooses). So the code would be: ``` Directory dir = new FilterDirectory(mmapDirectory) { @Override public IndexInput openInput(String name, IOContext context) { if (someCondition()) { // be more specific in what this file is being opened for, // MMapDirectory chooses what to do with this info context = context.withHints(FileDataHint.VECTORS, DataAccessHint.RANDOM); } return in.openInput(name, context); } } ``` If we don't want to do that, then the hint -> ReadAdvice mapping would indeed need to be on the IOContext, and _not_ the responsibility of the `Directory`; or we use some sort of lambda override similar to `setPreload` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
ChrisHegarty commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2057993129 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: > or we use some sort of lambda override similar to setPreload I think that we can have both. Ultimately, as the author of a directory implementation or writing tests, or adapting to certain system environments, I want the ability to override. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Impl intoBitset for IndexedDISI and Docvalues [lucene]
gf2121 commented on code in PR #14529: URL: https://github.com/apache/lucene/pull/14529#discussion_r2058000145 ## lucene/core/src/test/org/apache/lucene/codecs/lucene90/TestIndexedDISI.java: ## @@ -555,6 +565,47 @@ private void assertAdvanceExactRandomized( } } + private void assertIntoBitsetRandomized( + IndexedDISI disi, BitSetIterator disi2, int disi2length, int step) throws IOException { +int index = -1; +FixedBitSet set1 = new FixedBitSet(step); +FixedBitSet set2 = new FixedBitSet(step); + +for (int upTo = 0; upTo < disi2length; ) { + int lastUpTo = upTo; + upTo += TestUtil.nextInt(random(), 0, step); + int offset = TestUtil.nextInt(random(), lastUpTo, upTo); + + if (disi.docID() < offset) { +disi.advance(offset); + } + int doc = disi2.docID(); + while (doc < offset) { +index++; +doc = disi2.nextDoc(); + } + while (doc < upTo) { +set2.set(doc - offset); +index++; +doc = disi2.nextDoc(); + } + + disi.intoBitSet(upTo, set1, offset); + assertEquals(index, disi.index()); + BitSetIterator expected = new BitSetIterator(set2, set2.cardinality()); + BitSetIterator actual = new BitSetIterator(set1, set1.cardinality()); + for (int expectedDoc = expected.nextDoc(); + expectedDoc != DocIdSetIterator.NO_MORE_DOCS; + expectedDoc = expected.nextDoc()) { +int actualDoc = actual.nextDoc(); +assertEquals(expectedDoc + offset, actualDoc + offset); // plus offset for better message. + } + assertEquals(DocIdSetIterator.NO_MORE_DOCS, actual.nextDoc()); + set1.clear(); + set2.clear(); Review Comment: This helps me find a bug: `upTo` could be a number in last block when `RANGE` do `intoBitSetWithinBlock`, i'm missing a `doc > upTo` check there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make competitive iterators more robust. [lucene]
jpountz commented on PR #14532: URL: https://github.com/apache/lucene/pull/14532#issuecomment-2826597720 I'm seeing a reproducible slowdown with this change: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value TermDTSort 399.37 (2.2%) 367.70 (2.3%) -7.9% ( -12% - -3%) 0.000 TermTitleSort 87.74 (4.2%) 83.02 (5.5%) -5.4% ( -14% -4%) 0.001 IntervalsOrdered2.31 (2.0%)2.27 (7.1%) -1.6% ( -10% -7%) 0.324 CountPhrase4.18 (2.2%)4.13 (5.6%) -1.3% ( -8% -6%) 0.345 PKLookup 315.43 (5.0%) 312.54 (6.4%) -0.9% ( -11% - 11%) 0.614 TermDayOfYearSort 286.52 (1.8%) 284.23 (1.7%) -0.8% ( -4% -2%) 0.144 CombinedAndHighHigh 11.51 (3.4%) 11.44 (5.9%) -0.6% ( -9% -8%) 0.690 CombinedAndHighMed 38.75 (3.1%) 38.52 (5.2%) -0.6% ( -8% -7%) 0.663 Phrase 14.38 (1.5%) 14.30 (2.3%) -0.6% ( -4% -3%) 0.363 FilteredPrefix3 151.34 (3.7%) 150.70 (3.9%) -0.4% ( -7% -7%) 0.726 CountAndHighMed 308.56 (1.9%) 307.41 (1.5%) -0.4% ( -3% -3%) 0.498 FilteredTerm 159.62 (2.2%) 159.03 (2.3%) -0.4% ( -4% -4%) 0.606 Prefix3 159.86 (3.9%) 159.40 (4.1%) -0.3% ( -7% -8%) 0.822 TermMonthSort 3191.33 (3.5%) 3186.61 (3.8%) -0.1% ( -7% -7%) 0.899 FilteredIntNRQ 302.80 (1.2%) 302.51 (1.4%) -0.1% ( -2% -2%) 0.819 CombinedTerm 30.71 (4.7%) 30.69 (4.7%) -0.1% ( -9% -9%) 0.960 FilteredOr3Terms 165.78 (1.3%) 165.83 (1.6%)0.0% ( -2% -3%) 0.951 CountFilteredOrHighHigh 136.74 (1.0%) 136.79 (1.1%)0.0% ( -1% -2%) 0.906 CountAndHighHigh 358.72 (2.2%) 358.91 (2.5%)0.1% ( -4% -4%) 0.944 CountFilteredOrHighMed 148.20 (0.7%) 148.31 (0.8%)0.1% ( -1% -1%) 0.760 FilteredOrHighHigh 67.56 (1.6%) 67.61 (1.9%)0.1% ( -3% -3%) 0.891 FilteredOrHighMed 152.31 (1.0%) 152.49 (1.3%)0.1% ( -2% -2%) 0.755 DismaxTerm 477.31 (2.6%) 477.99 (2.5%)0.1% ( -4% -5%) 0.860 FilteredPhrase 33.05 (1.4%) 33.10 (1.7%)0.1% ( -2% -3%) 0.769 FilteredOr2Terms2StopWords 147.00 (1.2%) 147.26 (1.6%)0.2% ( -2% -2%) 0.688 FilteredOrStopWords 46.26 (2.0%) 46.34 (2.6%)0.2% ( -4% -4%) 0.799 Fuzzy1 99.78 (2.8%) 99.99 (3.3%)0.2% ( -5% -6%) 0.827 FilteredAndStopWords 45.41 (2.1%) 45.50 (2.8%)0.2% ( -4% -5%) 0.785 CombinedOrHighHigh 18.78 (2.9%) 18.82 (2.8%)0.2% ( -5% -6%) 0.813 IntNRQ 309.33 (1.1%) 310.04 (1.5%)0.2% ( -2% -2%) 0.580 FilteredAnd3Terms 178.96 (1.7%) 179.39 (2.0%)0.2% ( -3% -4%) 0.683 CountOrMany 30.46 (1.9%) 30.55 (2.4%)0.3% ( -3% -4%) 0.656 FilteredOrMany 16.44 (1.5%) 16.49 (2.4%)0.3% ( -3% -4%) 0.628 FilteredAnd2Terms2StopWords 177.94 (1.4%) 178.49 (1.6%)0.3% ( -2% -3%) 0.511 Fuzzy2 83.87 (2.4%) 84.14 (2.9%)0.3% ( -4% -5%) 0.704 CombinedOrHighMed 71.67 (2.8%) 71.90 (2.9%)0.3% ( -5% -6%) 0.719 CountFilteredOrMany 27.41 (1.8%) 27.51 (1.9%)0.3% ( -3% -4%) 0.558 Wildcard 91.57 (2.9%) 91.90 (2.9%)0.4% ( -5% -6%) 0.700 FilteredAndHighHigh 65.89 (1.9%) 66.13 (2.2%)0.4% ( -3% -4%) 0.566 AndHighMed 133.85 (2.6%) 134.37 (2.1%)0.4% ( -4% -5%) 0.608 CountOrHighHigh 346.70 (2.1%) 348.06 (2.3%)0.4% (
Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]
rmuir commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2828267119 IMO it would be better to make a 10.2.2 if you want to do that? Especially since I don't think it would be a trivial change: I'm concerned that changing just the "default" would have any benefit, since it is hardcoded in other places in codecs. AFAIK #14422 is working on fixing that "real problem". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]
jainankitk commented on code in PR #14439: URL: https://github.com/apache/lucene/pull/14439#discussion_r2059224354 ## lucene/CHANGES.txt: ## @@ -78,6 +78,8 @@ Optimizations - * GITHUB#14418: Quick exit on filter query matching no docs when rewriting knn query. (Pan Guixin) +* GITHUB#14439: Efficient Histogram Collection using multi range traversal over PointTrees Review Comment: ah, my bad. Fixed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Move sloppySin into SloppyMath from GeoUtils [lucene]
jainankitk commented on PR #14516: URL: https://github.com/apache/lucene/pull/14516#issuecomment-2828864383 > thanks for adding tests and benchies Thank you for the pointers on jmh benchmark. Helped me reason about and demonstrate the performance improvements for Histogram Collector PR - https://github.com/apache/lucene/pull/14439#issuecomment-2819914834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add AnytimeRankingSearcher for SLA-Aware Early Termination with Bin-Based Score Boosting [lucene]
atris commented on PR #14525: URL: https://github.com/apache/lucene/pull/14525#issuecomment-2828868856 @jpountz thanks for looking! Just for my understanding, the first PR should contain the index time binning logic that is currently in this PR, just with a simpler model of rank on bin boosting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] How to control the total number of merging threads when vector data merging easily leads to memory overflow and high CPU cost [lucene]
weizijun opened a new issue, #14554: URL: https://github.com/apache/lucene/issues/14554 ### Description When there are many shards to merge, vector data merging can easily lead to memory overflow and high CPU cost. The index.merge.scheduler.max_thread_count parameter can't control the merge thread count, it only pause the writeByte by MergeRateLimiter when the merge thread is bigger then max_thread_count. But OnHeapHnswGraph has been built during the pause phase, and it will take up so much memory that the Java heap is not enough. This problem can easily be caused when a datanode with a 32G heap size holds 2-3TB of vector documents(with bbq, the node can contain these data). The PR https://github.com/apache/lucene/pull/14527 can reduce the heap size, but it don't solve the problem totally. Is there any solution to this problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up flush of softdelete by intoBitset [lucene]
gf2121 commented on PR #14552: URL: https://github.com/apache/lucene/pull/14552#issuecomment-2829443943 I managed to get some numbers on this change: ``` Baseline: Soft delete flush total took: 30311ms, IndexedDISI#writeBitset total took: 8324ms. Candidate: Soft delete flush total took: 22635ms, IndexedDISI#writeBitset total took: 964ms. ``` BTW, i added some docvalue fields in my benchmark. Then i'm seeing rather huge cost of virtual calls represented by `vtable stub` when computing min/max/gcd.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Allow docID == NO_MORE_DOCS for asserting leaf reader [lucene]
gf2121 opened a new pull request, #14555: URL: https://github.com/apache/lucene/pull/14555 `exists` will be set false if `docID() == NO_MORE_DOCS`, while it is allowed by contract for `intoBitset`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Encode numeric docvalue with per block gcd [lucene]
HUSTERGS closed issue #14545: Encode numeric docvalue with per block gcd URL: https://github.com/apache/lucene/issues/14545 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Encode numeric docvalue with per block gcd [lucene]
HUSTERGS commented on issue #14545: URL: https://github.com/apache/lucene/issues/14545#issuecomment-2829240138 Got it! Thanks for your reply, close the issue for now :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix leadCost calculation in BooleanScorerSupplier.requiredBulkScorer [lucene]
jpountz commented on PR #14543: URL: https://github.com/apache/lucene/pull/14543#issuecomment-2827497431 Thanks for catching this! For reference, this was introduced in 10.0 (by me). The change looks good, let me know if you need help writing a test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] removing constructor with deprecated attribute 'onlyLongestMatch [lucene]
renatoh commented on PR #14356: URL: https://github.com/apache/lucene/pull/14356#issuecomment-2828336563 @rmuir Could you please have a look at the PR, it has been open for more than a month. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]
stefanvodita commented on code in PR #14439: URL: https://github.com/apache/lucene/pull/14439#discussion_r2059125314 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/PointTreeBulkCollector.java: ## @@ -0,0 +1,219 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.sandbox.facet.plain.histograms; + +import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS; + +import java.io.IOException; +import java.util.function.Function; +import org.apache.lucene.index.PointValues; +import org.apache.lucene.internal.hppc.LongIntHashMap; +import org.apache.lucene.search.CollectionTerminatedException; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.util.ArrayUtil; + +class PointTreeBulkCollector { + static boolean collect( + final PointValues pointValues, + final long bucketWidth, + final LongIntHashMap collectorCounts, + final int maxBuckets) + throws IOException { +// TODO: Do we really need pointValues.getDocCount() == pointValues.size() +if (pointValues == null +|| pointValues.getNumDimensions() != 1 +|| pointValues.getDocCount() != pointValues.size() +|| ArrayUtil.getValue(pointValues.getBytesPerDimension()) == null) { + return false; Review Comment: I see your point and I'm happy to have a follow-up issue if necessary for this and the related comment below. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Logic for collecting Histogram efficiently using Point Trees [lucene]
stefanvodita commented on code in PR #14439: URL: https://github.com/apache/lucene/pull/14439#discussion_r2059130450 ## lucene/CHANGES.txt: ## @@ -78,6 +78,8 @@ Optimizations - * GITHUB#14418: Quick exit on filter query matching no docs when rewriting knn query. (Pan Guixin) +* GITHUB#14439: Efficient Histogram Collection using multi range traversal over PointTrees Review Comment: You should add your name too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up filtered disjunctions by loading the filter into a bit set. [lucene]
github-actions[bot] commented on PR #14024: URL: https://github.com/apache/lucene/pull/14024#issuecomment-2829119993 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for querying multiple fields to QueryBuilder. [lucene]
github-actions[bot] commented on PR #14262: URL: https://github.com/apache/lucene/pull/14262#issuecomment-2829119918 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
weizijun commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2828149196 > I left a comment on the underlying structures used. > > Please update CHANGES.txt under 10.3 optimizations for this nice optimization! It will be very nice to have better heap utilization on graph building! Ok, and about the OnHeapHnswGraph.ramBytesUsed, I didn't change to code. But it maybe need to change the logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
benwtrent commented on code in PR #14527: URL: https://github.com/apache/lucene/pull/14527#discussion_r2058763024 ## lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java: ## @@ -32,13 +33,15 @@ public class NeighborArray { private final boolean scoresDescOrder; private int size; - private final float[] scores; - private final int[] nodes; + private final int maxSize; + private final FloatArrayList scores; + private final IntArrayList nodes; private int sortedNodeSize; public NeighborArray(int maxSize, boolean descOrder) { -nodes = new int[maxSize]; -scores = new float[maxSize]; +this.maxSize = maxSize; +nodes = new IntArrayList(); +scores = new FloatArrayList(); Review Comment: two things, I think these should be initialized to something like `maxSize/4` or `maxSize/8`. Additionally, the array lists should enforce the max size and ensure the underlying buffer does not get bigger than the expected max. I think you can do this pretty simply by having something like `MaxSizedIntArrayList` that accepts an init & max size parameters in its ctor, and overrides `ensureBufferSpace` so that it: - disallows growth past `maxSize` - Instead of doing `ArrayUtil.grow` it does `ArrayUtil.growInRange(buffer, elementsCount + expectedAdditions, maxSize)` This will prevent overallocation of the underlying buffer and enforce max size limitations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make task executor non-final [lucene]
Shibi-bala commented on PR #14524: URL: https://github.com/apache/lucene/pull/14524#issuecomment-2828849510 @jpountz `IMO Lucene should own how queries execute concurrently instead of making it pluggable` then why allow an executor service to be passed in? Previously, lucene didn't have control of this because a user's executor service could do whatever it wanted with the tasks. Now, control over execution is split between the user's executor service and what lucene does on the main query thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
msokolov commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2828649522 re: concurrency; in theory it should be safe since we lock a row before inserting anything into it. Consider that even with fixed-size arrays we need to track the occupancy (how full the array is) in a thread-safe way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create file open hints on IOContext to replace ReadAdvice [lucene]
jpountz commented on code in PR #14482: URL: https://github.com/apache/lucene/pull/14482#discussion_r2059104981 ## lucene/core/src/java/org/apache/lucene/store/Directory.java: ## @@ -79,6 +83,31 @@ public abstract class Directory implements Closeable { */ public abstract long fileLength(String name) throws IOException; + protected void validateIOContext(IOContext context) { +Map, List> hintClasses = + context.hints().stream().collect(Collectors.groupingBy(IOContext.FileOpenHint::getClass)); + +// there should only be one of FileType, FileData, DataAccess +List fileTypes = +hintClasses.getOrDefault(FileTypeHint.class, List.of()); +if (fileTypes.size() > 1) { + throw new IllegalArgumentException("Multiple file type hints specified: " + fileTypes); +} +List fileData = hintClasses.getOrDefault(FileDataHint.class, List.of()); +if (fileData.size() > 1) { + throw new IllegalArgumentException("Multiple file data hints specified: " + fileData); +} +List dataAccess = +hintClasses.getOrDefault(DataAccessHint.class, List.of()); +if (dataAccess.size() > 1) { + throw new IllegalArgumentException("Multiple data access hints specified: " + dataAccess); +} + } + + protected ReadAdvice toReadAdvice(IOContext context) { Review Comment: > or we use some sort of lambda override similar to setPreload FWIW one thing I like about it is that the `ReadAdvice` then becomes an implementation detail of `MMapDirectory` instead of a general API on `Directory`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] DrillSideways does not support intra-segment concurrency [lucene]
kashkambath commented on issue #13753: URL: https://github.com/apache/lucene/issues/13753#issuecomment-2828705042 Hi @javanna, Your explanation makes sense that intra-segment concurrency can't be leveraged by the existing `DrillSidewaysScorer` since it goes through all the docs in a segment. However, `DrillSidewaysScorer` is only used when a non-concurrent execution of drill sideways occurs via [`DrillSideways#searchSequentially()`](https://github.com/apache/lucene/blob/e2e44bcf6baf7dc97f72b999a8d7e236700d574c/lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java#L441-L465) AFAIK. If an `Executor` is passed into `DrillSideways` at its instantiation, then `DrillSideways` will execute the [`DrillSideways#searchConcurrently()`](https://github.com/apache/lucene/blob/e2e44bcf6baf7dc97f72b999a8d7e236700d574c/lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java#L483) codepath, which does not leverage the `DrillSidewaysScorer`. `searchConcurrently` instead runs in parallel N+1 queries for N facets (one base query + N holdout queries for facet counts.) I believe each of these N+1 queries are traditional search queries which could leverage inter-segment/intra-segment parallelism. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Fail spotless check for wildcard imports [lucene]
rmuir commented on issue #14553: URL: https://github.com/apache/lucene/issues/14553#issuecomment-2828727305 I can help with eclipse formatter. It has a limit that you set in the preference files and I think the solution is to set it to an obnoxious value (999 or something like that) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add AnytimeRankingSearcher for SLA-Aware Early Termination with Bin-Based Score Boosting [lucene]
jpountz commented on PR #14525: URL: https://github.com/apache/lucene/pull/14525#issuecomment-2828722257 > the implementation is more ambitious I like ambition, but it also makes this change harder to review/integrate, especially with the high LOC count. I would suggest splitting this PR into multiple PRs, for instance: - First PR just works with indexes created with existing recursive graph bisection and uses basic heuristics to determine which ranges of doc IDs to score first (e.g. using impacts) to hopefully increase the top-k score quickly. No extra data stored in the index. All code under lucene/misc rather than core. - Another PR can introduce the SLA-based termination logic. - Another PR can introduce topical clustering mechanism of Kulkarni and Callan, that the paper suggests combining with recursive graph bisection. - Another PR can discuss augmenting index formats to enhance the range selection logic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Fail spotless check for wildcard imports [lucene]
dweiss commented on issue #14553: URL: https://github.com/apache/lucene/issues/14553#issuecomment-2828723311 I've adopted and used what opensearch came up with. It would be ideal to also fix wildcard imports - not just detect them. It should be possible with intellij or eclipse formatter somehow but I couldn't figure it out in the limited time I had. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Examine the affects of MADV_RANDOM when MGLRU is enabled in Linux kernel [lucene]
vigyasharma commented on issue #14408: URL: https://github.com/apache/lucene/issues/14408#issuecomment-2828222009 Should we target a revert of the default ReadAdvice.RANDOM setting, for the 10.2.1 RC that's currently being evaluated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce NeighborArray heap memory [lucene]
benwtrent commented on PR #14527: URL: https://github.com/apache/lucene/pull/14527#issuecomment-2828219136 > Ok, and about the OnHeapHnswGraph.ramBytesUsed, I didn't change to code. But it maybe need to change the logic. I think utilizing the underlying array estimations is what we need and the testing needs to be updated to assert within a valid range. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Fail spotless check for wildcard imports [lucene]
dweiss commented on issue #14553: URL: https://github.com/apache/lucene/issues/14553#issuecomment-2828789291 > I can help with eclipse formatter. It has a limit that you set in the preference files and I think the solution is to set it to an obnoxious value (999 or something like that) The problem I had was to only expand wildcard imports, without touching anything else. In theory, re-applying gjf should bring the code back to the same state but in practice there were subtle differences before and after. I don't know if you fan "only" expand wildcards, without reformatting the rest of the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org