Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]
jpountz commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2435611867 Yes, exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]
shatejas commented on issue #13920: URL: https://github.com/apache/lucene/issues/13920#issuecomment-2435944343 > @shatejas I think all the required details are present, so are you going to raise a PR for this? Yeah I am working on it, I have the changes and I am trying to figure out a good way to benchmark lucene -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Check ahead of time if the `count` can be obtained [lucene]
LuXugang closed issue #13890: Check ahead of time if the `count` can be obtained URL: https://github.com/apache/lucene/issues/13890 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Check ahead if we can get the count [lucene]
LuXugang merged PR #13899: URL: https://github.com/apache/lucene/pull/13899 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
yugushihuang commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2435780436 We have measured performance using [knnPerfTest.py](https://github.com/mikemccand/luceneutil/blob/main/src/python/knnPerfTest.py) in lucene util with this PR [commit](https://github.com/goankur/lucene/commit/85d78116f87b679078a80cf606cd4bc7219ee793) as candidate branch. ### cmd ``` '/usr/lib/jvm/java-21-amazon-corretto/bin/java', '-cp', [...], '--add-modules', 'jdk.incubator.vector', '-Djava.library.path=/home/[user_name]/lucene_candidate/lucene/native/build/libs/dotProduct/shared', 'knn.KnnGraphTester', '-quantize', '-ndoc', '150', '-maxConn', '32', '-beamWidthIndex', '50', '-fanout', '6', '-quantizeBits', '7', '-numMergeWorker', '12', '-numMergeThread', '4', '-encoding', 'float32', '-topK', '10', '-dim', '768', '-docs', 'enwiki-20120502-lines-1k-mpnet.vec', '-reindex', '-search-and-stats', 'enwiki-20120502-mpnet.vec', '-forceMerge', '-quiet' ``` ### Lucene_Baseline ``` Graph level=3 size=46, connectedness=1.00 Graph level=2 size=1405, connectedness=1.00 Graph level=1 size=46174, connectedness=1.00 Graph level=0 size=150, connectedness=1.00 Results: recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s force merge s num segments index size (MB) 0.332 0.333 15010 6 32 50 7 bits 432.69 271.51 1 5558.90 ``` ### Lucene_Candidate ``` Graph level=3 size=46, connectedness=1.00 Graph level=2 size=1410, connectedness=1.00 Graph level=1 size=46205, connectedness=1.00 Graph level=0 size=150, connectedness=1.00 Results: recall latency (ms) nDoc topK fanout maxConn beamWidth quantized index s force merge s num segments index size (MB) 0.337 0.260 15010 6 32 50 7 bits 441.25 293.41 1 5558.91 ``` The latency has dropped from 0.333ms to 0.26ms. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Check ahead if we can get the count [lucene]
jpountz commented on code in PR #13899: URL: https://github.com/apache/lucene/pull/13899#discussion_r1815247300 ## lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -186,10 +186,44 @@ public boolean isCacheable(LeafReaderContext ctx) { @Override public int count(LeafReaderContext context) throws IOException { if (context.reader().hasDeletions() == false) { - IteratorAndCount itAndCount = getDocIdSetIteratorOrNull(context); + if (lowerValue > upperValue) { +return 0; + } Review Comment: This could be moved before the check of whether the segment has deletes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]
mikemccand commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1814888763 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -87,6 +87,28 @@ public Builder add(BooleanClause clause) { return this; } +/** + * Add a collection of BooleanClause's to this {@link Builder}. Note that the order in which Review Comment: Remove the `'` -- just `BooleanClauses`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1816322441 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInputDataInput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.store; + +import java.io.IOException; + +/** + * DataInput backed by a {@link RandomAccessInput}. WARNING: This class omits all low-level + * checks. + * + * @lucene.experimental + */ +public final class RandomAccessInputDataInput extends DataInput { + + private RandomAccessInput input; + + private long pos; + + public RandomAccessInputDataInput() {} + + // NOTE: sets pos to 0, which is not right if you had + // called reset w/ non-zero offset!! + public void rewind() { +pos = 0; + } + + public long getPosition() { +return pos; + } + + public void setPosition(long pos) { +this.pos = pos; + } + + public void reset(RandomAccessInput input) { +this.input = input; +pos = 0; + } Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1816321990 ## lucene/core/src/java/org/apache/lucene/index/BinaryDocValues.java: ## @@ -33,4 +34,15 @@ protected BinaryDocValues() {} * @return binary value */ public abstract BytesRef binaryValue() throws IOException; + + /** + * Returns the binary value as a {@link RandomAccessInput} for the current document ID. The bytes + * start at position 0 up to {@link RandomAccessInput#length()}. It is illegal to call this method + * after {@link #advanceExact(int)} returned {@code false}. Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]
iverase commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1816323891 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInputDataInput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.store; + +import java.io.IOException; + +/** + * DataInput backed by a {@link RandomAccessInput}. WARNING: This class omits all low-level + * checks. + * + * @lucene.experimental + */ +public final class RandomAccessInputDataInput extends DataInput { + + private RandomAccessInput input; + + private long pos; + + public RandomAccessInputDataInput() {} + + // NOTE: sets pos to 0, which is not right if you had + // called reset w/ non-zero offset!! + public void rewind() { +pos = 0; + } + + public long getPosition() { +return pos; + } + + public void setPosition(long pos) { +this.pos = pos; + } + + public void reset(RandomAccessInput input) { +this.input = input; +pos = 0; + } + + public long length() { +return input.length(); + } + + @Override + public void skipBytes(long count) { +pos += count; + } + + @Override + public short readShort() throws IOException { +try { + return input.readShort(pos); +} finally { + pos += Short.BYTES; Review Comment: This class is a copy / paste from ByteArrayDataInput so it makes me wonder if that's something we need to change in that implementation too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]
ljak commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2435609721 Ha, I see. Could we say that the new `List orderedQueries` would have the same behavior that `Query[] disjuncts` before https://github.com/apache/lucene/pull/110/files ? If yes, I presume it would work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
msokolov commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816770842 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class CosineSupplier extends Lucene99MemorySegmentByteVectorScorerS @Override public RandomVectorScorer scorer(int ord) { checkOrdinal(ord); + MemorySegmentAccessInput slice = input.clone(); + byte[] scratch1 = new byte[vectorByteSize]; + byte[] scratch2 = new byte[vectorByteSize]; Review Comment: I'm not sure I understand your idea, Chris, but if you want to have a go at it, by all means please do, and maybe I'll understand then :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
benwtrent commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437820707 I think a "merging scorer" would be good. The only place the "scorer supplier" is used is during graph building. My initial concern with a "mutable scorer" is that it would also make the single scorer mutable, which seems weird to me. But I am happily to revisit this, especially since its blocking a nice refactor. Given that all these random scorer stuff is internal APIs, we can do whatever is best with what we have. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816687000 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class CosineSupplier extends Lucene99MemorySegmentByteVectorScorerS @Override public RandomVectorScorer scorer(int ord) { checkOrdinal(ord); + MemorySegmentAccessInput slice = input.clone(); + byte[] scratch1 = new byte[vectorByteSize]; + byte[] scratch2 = new byte[vectorByteSize]; Review Comment: I have another idea. maybe we just delegate the null cases to the other on-heap scorer. That might be simpler. We do something similar in the native scorer we have in Elasticsearch. I can see how this looks in the branch, if u like? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437836935 Yes, OK I now see quite a bit of this is a "preexisting condition" and maybe not exacerbated by this change. We are still creating more scratch arrays than we did before though, I think, because previously we would `copy()` the VectorValues in a caller, and allocate a new scratch array there, whereas now since we have pushed down the "create new scratch array" into the Scorer creation, and this happens many more times than we would previously have copied the VectorValues, we are creating and destroying many more of these scratch arrays. Maybe this is acceptable and we can iterate in a futher cleanup? Let me try a few more benchmarking runs and be a little clearer about the impact on query and indexing times. I'd like to also report allocations, but not sure how to do that w/luceneutil -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]
jpountz commented on PR #13948: URL: https://github.com/apache/lucene/pull/13948#issuecomment-2437732473 In my experience, binary doc values are more often used to encode structured data, such as maps that help build scoring signals, geo shapes, etc. than actual binary content, so this change makes sense to me. I'm interested in having more opinions though. Would be nice to extend AssertingBinaryDocValues to make sure that all reads in the input are within bounds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437752945 > Can you clarify which allocation is the problematic one, and where it's done on the indexing path? See Ben's comments from ~2 weeks ago where he calls out the problem of overallocation. During indexing we call HnswGraphBuilder.diversityCheck() multiple times for each document (graph node) we insert, and in each of those calls we create scorers multiple times -- this is an n^2 algorithm (with n ~ number of neighbors). I'm proposing that instead of calling scorer() and creating a new scorer each time (which may in turn create a MemorySegment or a scratch array of some sort), that we instead have a mutable Scorer that can accept a new target vector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]
ljak commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2438170606 Done. Thanks for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Speed up advancing within a block, take 2. [lucene]
jpountz opened a new pull request, #13958: URL: https://github.com/apache/lucene/pull/13958 PR #13692 tried to speed up advancing by using branchless binary search, but while this yielded a speedup on my machine, this yielded a slowdown on nightly benchmarks. This PR tries a different approach using vectorization. Experimentation suggests that it slows down a bit queries when advancing often goes to the very next doc ID, such as term queries and `OrHighNotXXX` tasks. But it speeds up queries that advance to the next few doc IDs, such as `AndHighHigh`. I think that this is a good trade-off since it slows down some plenty fast queries in exchange for a speedup with some more expensive queries. Here is a `luceneutil` run on `wikibigall` with `-searchConcurrency 0`: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value OrHighNotHigh 302.78 (2.4%) 283.75 (2.9%) -6.3% ( -11% - -1%) 0.000 OrHighNotMed 384.69 (3.0%) 363.33 (2.8%) -5.6% ( -10% -0%) 0.000 MedTerm 564.86 (2.2%) 537.04 (3.5%) -4.9% ( -10% -0%) 0.000 LowTerm 1014.02 (2.2%) 967.37 (3.6%) -4.6% ( -10% -1%) 0.000 OrHighNotLow 446.38 (3.4%) 427.10 (3.3%) -4.3% ( -10% -2%) 0.000 HighTerm 485.41 (1.9%) 464.49 (3.2%) -4.3% ( -9% -0%) 0.000 OrNotHighHigh 229.78 (2.4%) 221.51 (3.1%) -3.6% ( -8% -1%) 0.000 OrNotHighMed 396.63 (2.7%) 382.41 (3.1%) -3.6% ( -9% -2%) 0.000 Prefix3 145.65 (3.6%) 142.39 (3.7%) -2.2% ( -9% -5%) 0.051 IntNRQ 158.04 (4.7%) 154.77 (5.6%) -2.1% ( -11% -8%) 0.205 CountTerm 8320.96 (3.2%) 8198.56 (4.7%) -1.5% ( -9% -6%) 0.246 PKLookup 273.35 (3.6%) 269.71 (5.2%) -1.3% ( -9% -7%) 0.345 Wildcard 83.30 (3.4%) 82.28 (3.1%) -1.2% ( -7% -5%) 0.234 HighTermMonthSort 3235.98 (3.1%) 3198.04 (2.9%) -1.2% ( -6% -4%) 0.215 HighTermTitleSort 148.94 (2.5%) 148.38 (2.6%) -0.4% ( -5% -4%) 0.638 CountOrHighMed 104.51 (2.0%) 104.22 (1.7%) -0.3% ( -3% -3%) 0.640 HighTermTitleBDVSort 14.67 (5.3%) 14.64 (5.9%) -0.2% ( -10% - 11%) 0.899 AndStopWords 30.68 (3.0%) 30.66 (2.7%) -0.1% ( -5% -5%) 0.941 CountOrHighHigh 50.17 (2.0%) 50.19 (1.9%)0.0% ( -3% -3%) 0.947 OrHighRare 273.82 (4.5%) 273.96 (3.8%)0.0% ( -7% -8%) 0.971 TermDTSort 353.37 (6.4%) 354.23 (6.7%)0.2% ( -12% - 14%) 0.907 Fuzzy1 77.85 (2.6%) 78.12 (2.0%)0.3% ( -4% -4%) 0.633 Fuzzy2 73.23 (2.5%) 73.50 (1.9%)0.4% ( -3% -4%) 0.594 HighTermDayOfYearSort 836.62 (3.1%) 841.07 (4.0%)0.5% ( -6% -7%) 0.639 And2Terms2StopWords 154.49 (1.8%) 155.41 (2.1%)0.6% ( -3% -4%) 0.340 OrHighLow 771.90 (2.0%) 778.20 (2.2%)0.8% ( -3% -5%) 0.217 And3Terms 167.63 (2.3%) 169.23 (2.2%)1.0% ( -3% -5%) 0.176 OrStopWords 33.99 (4.6%) 34.39 (4.1%)1.2% ( -7% - 10%) 0.388 CountAndHighMed 148.01 (2.4%) 149.91 (1.0%)1.3% ( -2% -4%) 0.025 Or2Terms2StopWords 156.93 (2.8%) 159.21 (3.0%)1.5% ( -4% -7%) 0.117 AndHighHigh 67.06 (1.3%) 68.07 (1.6%)1.5% ( -1% -4%) 0.001 OrMany 18.67 (2.9%) 18.96 (2.9%)1.5% ( -4% -7%) 0.089 AndHighMed 185.02 (1.6%) 189.06 (1.3%)2.2% ( 0% -5%) 0.000 AndHighLow 948.34 (2.6%) 970.47 (2.6%)2.3% ( -2% -7%) 0.004 OrHighHigh 68.42 (1.4%) 70.08 (1.3%)2.4% ( 0% -5%) 0.000 Or3Terms
Re: [I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]
derreisende77 commented on issue #13959: URL: https://github.com/apache/lucene/issues/13959#issuecomment-2438658215 I made some tests with Ubuntu 24.10: JDK 23: 9.9 seconds JDK 22: 1.4 seconds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]
jpountz commented on code in PR #13950: URL: https://github.com/apache/lucene/pull/13950#discussion_r1815173658 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -136,20 +158,20 @@ public List clauses() { } /** Return the collection of queries for the given {@link Occur}. */ - Collection getClauses(Occur occur) { + public Collection getClauses(Occur occur) { return clauseSets.get(occur); } /** * Whether this query is a pure disjunction, ie. it only has SHOULD clauses and it is enough for a * single clause to match for this boolean query to match. */ - boolean isPureDisjunction() { + public boolean isPureDisjunction() { return clauses.size() == getClauses(Occur.SHOULD).size() && minimumNumberShouldMatch <= 1; } /** Whether this query is a two clause disjunction with two term query clauses. */ - boolean isTwoClausePureDisjunctionWithTerms() { + public boolean isTwoClausePureDisjunctionWithTerms() { Review Comment: I can understand why someone would want to make `getClauses` public, but I wouldn't make the two above methods public, these are just implementation details of some rewrite rules? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
jpountz commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437740226 Can you clarify which allocation is the problematic one, and where it's done on the indexing path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816669062 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class CosineSupplier extends Lucene99MemorySegmentByteVectorScorerS @Override public RandomVectorScorer scorer(int ord) { checkOrdinal(ord); + MemorySegmentAccessInput slice = input.clone(); + byte[] scratch1 = new byte[vectorByteSize]; + byte[] scratch2 = new byte[vectorByteSize]; Review Comment: We don't know during construction whether or not access to the vector data in backing segment will *always* be available. The main reason is that a vector may span across multiple memory segments. (one MSIndexInput can be made up of several memory segments) This change is not right. The scratch buffers are created per supplier, since we know with the threading model that that is safe. Creating scratch buffers per scorer will be too expensive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]
jpountz commented on PR #13864: URL: https://github.com/apache/lucene/pull/13864#issuecomment-2437765755 Sorry, I don't feel good about relying on `paddingBitsNeeded` on the read path. I suggest we close this PR, IMO the better fix would be to change the way we store terms dictionaries to rely less on `DirectMonotonicReader`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
ChrisHegarty commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437761782 > that we instead have a mutable Scorer that can accept a new target vector. Yes, that is something that I've noodled on for a while now too - a scorer that accepts two ords, and returns the score. This will safe gigabytes garbage, which can be seen in the blunder output of the nightly luceneutil runs. Tho, you do no have to do it all in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816669062 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class CosineSupplier extends Lucene99MemorySegmentByteVectorScorerS @Override public RandomVectorScorer scorer(int ord) { checkOrdinal(ord); + MemorySegmentAccessInput slice = input.clone(); + byte[] scratch1 = new byte[vectorByteSize]; + byte[] scratch2 = new byte[vectorByteSize]; Review Comment: We don't know during construction whether or not access to the vector data in backing segment will *always* be available. The main reason is that a vector may span across multiple memory segments. (one MSIndexInput can be made up of several memory segments) This change is not right. The scratch buffers were created per supplier, since we know from the threading model that that is safe. Creating scratch buffers per scorer will be too expensive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
ChrisHegarty commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1816669062 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorerSupplier.java: ## @@ -112,20 +96,20 @@ static final class CosineSupplier extends Lucene99MemorySegmentByteVectorScorerS @Override public RandomVectorScorer scorer(int ord) { checkOrdinal(ord); + MemorySegmentAccessInput slice = input.clone(); + byte[] scratch1 = new byte[vectorByteSize]; + byte[] scratch2 = new byte[vectorByteSize]; Review Comment: We don't know during construction whether or not access to the vector data in backing segment will *always* be available. The main reason is that a vector may span across multiple memory segments. (one MSIndexInput can be made up of several memory segments) This change is not right. The scratch buffers were created per supplier, since we know with the threading model that that is safe. Creating scratch buffers per scorer will be too expensive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]
original-brownbear closed pull request #13864: Make DirectMonotonicReader.Meta more compact URL: https://github.com/apache/lucene/pull/13864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]
original-brownbear commented on PR #13864: URL: https://github.com/apache/lucene/pull/13864#issuecomment-2437848161 yea that's cool sorry forgot about this one, we for starters just store the offsets in a more compact form that'll help already. I'll open a PR once I find a little time :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2437853233 Maybe we could add a `RandomVectorScorer.setTarget(int node)` method that would only be implemented by the Scorers returned from ScorerSuppliers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Remove some useless code in TopScoreDocCollector. [lucene]
jpountz merged PR #13955: URL: https://github.com/apache/lucene/pull/13955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Make some BooleanQuery methods public and a new `#add(Collection)` method for BQ builder [lucene]
jpountz merged PR #13950: URL: https://github.com/apache/lucene/pull/13950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add MIGRATE entry about the fact that readVLong() may now read negative values, and up to 10 bytes. [lucene]
jpountz merged PR #13956: URL: https://github.com/apache/lucene/pull/13956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]
jpountz commented on PR #13944: URL: https://github.com/apache/lucene/pull/13944#issuecomment-2437549776 Can you add an entry to `lucene/CHANGES.txt` under version 10.1.0? Then I'll merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensure doc order for TestCommonTermsQuery#testMinShouldMatch [lucene]
benwtrent merged PR #13953: URL: https://github.com/apache/lucene/pull/13953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] TestCommonTermsQuery.testMinShouldMatch test failure [lucene]
benwtrent closed issue #13946: TestCommonTermsQuery.testMinShouldMatch test failure URL: https://github.com/apache/lucene/issues/13946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]
derreisende77 opened a new issue, #13959: URL: https://github.com/apache/lucene/issues/13959 ### Description I am using Lucene in my app for several years happily with JDKs up to 22. My use case searches through film data and Lucene can return fairly huge result sets to my app - which as of now never was a problem. I upgraded my app to JDK 23.0.1 on my MacBook Air macOS 15.01 16GB RAM. ``` openjdk version "23.0.1" 2024-10-15 OpenJDK Runtime Environment (build 23.0.1+13) OpenJDK 64-Bit Server VM (build 23.0.1+13, mixed mode, sharing) ``` and started to notice **horrible** Lucene performance. With the following code snippet I do query my results: ```java var reader = DirectoryReader.open(list.getLuceneDirectory()); final var searcher = new IndexSearcher(reader); final var docs = searcher.search(finalQuery, list.size()); final var hit_length = docs.scoreDocs.length; var storedFields = searcher.storedFields(); // the for loop takes ages with JDK23... for (final var hit : docs.scoreDocs) { var docId = hit.doc; //storedFields.prefetch(docId); //<-- doesn't change anything var d = storedFields.document(docId, INTEREST_SET); //<-- this takes ages //filmNrSet.add(Integer.parseInt(d.get(LuceneIndexKeys.ID))); } ``` In the explored use case Lucene always returned the expected 558333 hits out of 802k documents. 99% of the app runs take **5.7 seconds** to get the result. However when I am lucky *1% of the app runs* do get the same result back in **756 milliseconds**. If Lucene is delivering fast, it will stay fast, if it is slow it will remain slow. I have no idea how this is triggered. I moved from `NRTCachingDirectory` to `MMapDirectory` but the performance remained bad. Tried some other stuff from internet - same result. I switched back to JDK 22. ``` openjdk version "22.0.2" 2024-07-16 OpenJDK Runtime Environment (build 22.0.2+11) OpenJDK 64-Bit Server VM (build 22.0.2+11, mixed mode, sharing) ``` The same source which performed just horrible with JDK23 was **consistently fast** with JDK 22: ``` Search took: 763.8 ms ``` I am using the following flags for the JVM: ``` -ea -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=compact -XX:+UseStringDeduplication -XX:MaxRAMPercentage=50.0 --enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.vector ``` I removed `--enable-native-access=ALL-UNNAMED` and `--add-modules jdk.incubator.vector` for testing purposes but the performance remained bad with JDK23. I made the same tests with JDK23 on my Windows 11 AMD Ryzen 4900H 16GB RAM laptop. There I get the results back in **13.63 seconds** with JDK 23. JDK 22 does the same consistently in **2.1 seconds**. Switching between Lucene `9.11.1` and `10.0.0` made no difference, always shitty performance with JDK 23 both on macOS and windows. Consistent performance with JDK 22. ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Ensure stability of clause order for DisjunctionMaxQuery toString [lucene]
jpountz merged PR #13944: URL: https://github.com/apache/lucene/pull/13944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1817245059 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ## @@ -146,6 +146,7 @@ public float getScoreCorrectionConstant(int targetOrd) throws IOException { } slice.seek(((long) targetOrd * byteSize) + numBytes); slice.readFloats(scoreCorrectionConstant, 0, 1); +lastOrd = targetOrd; Review Comment: Got it!. I will remove this in the next revision. I was just trying to optimize for the case when `getScoreCorrectionConstant(int targetOrd)` gets invoked with the same `targetOrd` multiple times in succession. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]
benwtrent commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2438671673 Hey @vigyasharma there is a lot of good work here. I am going to shift my focus and see about how I can help here more fully. What are the next steps? I am guessing handling all the merging from main, I can take care of that sometime next week. Just wondering where I can help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]
benwtrent commented on issue #13959: URL: https://github.com/apache/lucene/issues/13959#issuecomment-2438673337 @derreisende77 do you have profiling of the two different runs? Maybe through async-profiler? It would be interesting to see where the time is being spent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1817385236 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -84,6 +91,76 @@ public void init() { floatsA[i] = random.nextFloat(); floatsB[i] = random.nextFloat(); } +// Java 21+ specific initialization +final int runtimeVersion = Runtime.version().feature(); +if (runtimeVersion >= 21) { + // Reflection based code to eliminate the use of Preview classes in JMH benchmarks + try { +final Class vectorUtilSupportClass = VectorUtil.getVectorUtilSupportClass(); +final var className = "org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport"; +if (vectorUtilSupportClass.getName().equals(className) == false) { + nativeBytesA = null; + nativeBytesB = null; +} else { + MethodHandles.Lookup lookup = MethodHandles.lookup(); + final var MemorySegment = "java.lang.foreign.MemorySegment"; + final var methodType = + MethodType.methodType(lookup.findClass(MemorySegment), byte[].class); + MethodHandle nativeMemorySegment = + lookup.findStatic(vectorUtilSupportClass, "nativeMemorySegment", methodType); + byte[] a = new byte[size]; Review Comment: Yes this is the setup code for the benchmark. We run setup once every `iteration` for a total of `15` iterations across `3` forks (5 iterations per fork) for each `size` being tested. Each fork is preceded by 3 warm-up iterations. So before **each** iteration we generate random numbers in range [0-127] in two on-heap `byte[]`, allocate off-heap memory segments and populate them with contents from `byte[]`. These off-heap memory segments are provided to `VectorUtil.NATIVE_DOT_PRODUCT` method handle. (Code snippet below for reference) ``` @Param({"1", "128", "207", "256", "300", "512", "702", "1024"}) int size; @Setup(Level.Iteration) public void init() { ... } ``` > I wonder if we would see something different if we generated a large number of vectors and randomized which ones we compare on each run. Also would performance vary if the vectors are sequential in their buffer (ie vector 0 starts at 0, vector 1 starts at size...) I guess the question you are hinting at is how does the performance vary when the two candidate vectors are further apart in memory (L1 cache / L2 cache / L3 cache / Main-memory). Do the gains from native implementation become insignificant with increasing distance ? Its an interesting question and I propose that we add benchmark method(s) to answer them in a follow up PR. Does that sound reasonable ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Absolutely horrible Lucene performance with JDK 23 (Lucene 9.11.1 and 10.0.0) [lucene]
derreisende77 commented on issue #13959: URL: https://github.com/apache/lucene/issues/13959#issuecomment-2438907870 @benwtrent I have JProfiler but I am not really experienced in using it - or profiling at all. I made two runs on macOS and made screenshots from the hotspot page. JDK23:  JDK22:  The highlighted line in JDK22 image marks the function from where I posted the code snippet at the beginning. The `StoredFields.document` ine above corresponds to the `var d = storedFields.document(docId, INTEREST_SET);` line. What I have seen during the profile run: - `ArrayUtil.growExact` takes a lot more time on JDK23 than on JDK 22. - `UnicodeUtil.UTF16toUTF8` calls were created during the index creation phase and take almost the same time on JDK 22 and 23. HTH -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up advancing within a block, take 2. [lucene]
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438911637 And I seem to be getting a better speedup by using `trueCount()` instead of `firstTrue()`: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value CountTerm 8621.82 (5.6%) 8504.44 (4.6%) -1.4% ( -10% -9%) 0.401 AndStopWords 31.14 (1.4%) 30.83 (4.6%) -1.0% ( -6% -5%) 0.363 Prefix3 96.42 (5.7%) 95.50 (4.4%) -1.0% ( -10% -9%) 0.557 HighTermTitleBDVSort 15.80 (6.0%) 15.65 (5.0%) -0.9% ( -11% - 10%) 0.587 OrStopWords 34.67 (2.9%) 34.45 (5.7%) -0.6% ( -8% -8%) 0.657 OrNotHighMed 385.71 (4.2%) 384.12 (3.2%) -0.4% ( -7% -7%) 0.725 TermDTSort 346.51 (5.7%) 345.26 (6.2%) -0.4% ( -11% - 12%) 0.847 HighTermTitleSort 153.13 (1.7%) 152.59 (3.3%) -0.4% ( -5% -4%) 0.670 OrMany 19.06 (1.6%) 18.99 (3.2%) -0.3% ( -5% -4%) 0.671 HighTermMonthSort 3126.69 (2.9%) 3117.99 (3.7%) -0.3% ( -6% -6%) 0.791 CountOrHighHigh 50.32 (1.6%) 50.26 (2.1%) -0.1% ( -3% -3%) 0.862 CountOrHighMed 104.69 (1.7%) 104.70 (2.0%)0.0% ( -3% -3%) 0.981 PKLookup 270.86 (2.7%) 270.98 (2.7%)0.0% ( -5% -5%) 0.960 OrHighRare 281.93 (3.4%) 282.35 (4.8%)0.1% ( -7% -8%) 0.911 Wildcard 49.07 (3.7%) 49.15 (4.2%)0.2% ( -7% -8%) 0.893 Or2Terms2StopWords 160.10 (1.5%) 160.52 (3.5%)0.3% ( -4% -5%) 0.756 And2Terms2StopWords 156.75 (1.5%) 157.35 (2.8%)0.4% ( -3% -4%) 0.586 OrHighLow 855.65 (2.4%) 859.93 (2.7%)0.5% ( -4% -5%) 0.542 HighTermDayOfYearSort 800.87 (2.8%) 805.06 (2.9%)0.5% ( -5% -6%) 0.562 And3Terms 169.90 (1.5%) 170.87 (3.1%)0.6% ( -3% -5%) 0.455 Fuzzy1 77.88 (3.3%) 78.52 (2.9%)0.8% ( -5% -7%) 0.409 Fuzzy2 73.27 (3.0%) 73.93 (2.4%)0.9% ( -4% -6%) 0.295 OrNotHighLow 1099.84 (3.7%) 1114.61 (3.8%)1.3% ( -5% -9%) 0.260 Or3Terms 169.45 (1.5%) 171.80 (3.7%)1.4% ( -3% -6%) 0.118 CountAndHighMed 148.89 (2.5%) 151.58 (3.0%)1.8% ( -3% -7%) 0.040 LowTerm 1033.62 (3.6%) 1052.61 (2.8%)1.8% ( -4% -8%) 0.075 OrHighNotMed 371.62 (3.1%) 378.74 (3.5%)1.9% ( -4% -8%) 0.066 OrHighNotHigh 296.15 (3.1%) 302.30 (3.1%)2.1% ( -4% -8%) 0.036 AndHighHigh 70.55 (1.6%) 72.20 (2.4%)2.3% ( -1% -6%) 0.000 OrHighHigh 94.03 (1.6%) 96.25 (2.0%)2.4% ( -1% -6%) 0.000 OrHighNotLow 442.74 (3.0%) 454.42 (3.6%)2.6% ( -3% -9%) 0.011 OrHighMed 232.09 (2.5%) 238.43 (2.5%)2.7% ( -2% -7%) 0.001 IntNRQ 110.25 (15.4%) 113.35 (17.9%)2.8% ( -26% - 42%) 0.594 MedTerm 601.09 (3.7%) 619.19 (2.2%)3.0% ( -2% -9%) 0.002 AndHighMed 221.49 (1.9%) 228.33 (2.4%)3.1% ( -1% -7%) 0.000 HighTerm 520.52 (3.4%) 537.37 (2.6%)3.2% ( -2% -9%) 0.001 AndHighLow 1047.38 (2.8%) 1082.62 (2.7%)3.4% ( -2% -9%) 0.000 OrNotHighHigh 276.13 (3.5%) 286.23 (3.4%)3.7% ( -3% - 10%) 0.001 CountAndHighHigh 49.28 (2.3%) 54.98 (2.4%) 11.6% ( 6% - 16%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen
Re: [PR] Speed up advancing within a block, take 2. [lucene]
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438925486 you are using VectorMask, only use this where implemented in HW (AVX-512 and ARM SVE). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up advancing within a block, take 2. [lucene]
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438919587 I ran this PR on my Mac laptop (M3), where this gives a massive slowdown, I imagine because some of the vector operations I'm using are emulated. I need to find what to check against in order to avoid this like we did for vectors with `PanamaVectorConstants.HAS_FAST_INTEGER_VECTORS`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up advancing within a block, take 2. [lucene]
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438947715 For these uses of vectormask you are ok with AVX2 (so just use existing FAST_INTEGER_VECTORS check): https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L1597-L1603 So if you want to add this one without slowdowns: i would check: `FAST_INTEGER_VECTORS && amd64` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]
vsop-479 commented on PR #13915: URL: https://github.com/apache/lucene/pull/13915#issuecomment-2437267763 I will close it, since it is insignificant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. [lucene]
vsop-479 closed pull request #13915: Early reset scratchBytes in Lucene90BlockTreeTermsWriter.compileIndex. URL: https://github.com/apache/lucene/pull/13915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Remove LeafSimScorer abstraction. [lucene]
jpountz opened a new pull request, #13957: URL: https://github.com/apache/lucene/pull/13957 `LeafSimScorer` is a specialization of a `SimScorer` for a given segment. It doesn't add much value, but benchmarks suggest that it adds measurable overhead to queries sorted by score. Here is a `luceneutil` run with `-searchConcurrency 0` on `wikibigall`: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value CountAndHighMed 148.80 (3.6%) 146.79 (3.3%) -1.4% ( -8% -5%) 0.219 Prefix3 210.12 (3.4%) 208.12 (3.1%) -1.0% ( -7% -5%) 0.355 OrNotHighLow 930.49 (2.9%) 922.26 (2.8%) -0.9% ( -6% -4%) 0.326 CountOrHighMed 104.34 (1.6%) 103.50 (1.5%) -0.8% ( -3% -2%) 0.099 CountAndHighHigh 48.93 (3.6%) 48.55 (3.4%) -0.8% ( -7% -6%) 0.485 HighTermMonthSort 3011.98 (2.9%) 2989.18 (4.1%) -0.8% ( -7% -6%) 0.498 TermDTSort 342.40 (7.1%) 340.02 (6.1%) -0.7% ( -13% - 13%) 0.741 CountOrHighHigh 49.93 (1.6%) 49.76 (1.2%) -0.3% ( -3% -2%) 0.451 HighTermTitleSort 111.58 (2.3%) 111.22 (3.1%) -0.3% ( -5% -5%) 0.710 OrNotHighHigh 308.36 (3.1%) 307.70 (3.3%) -0.2% ( -6% -6%) 0.835 Fuzzy2 71.17 (1.6%) 71.07 (2.2%) -0.1% ( -3% -3%) 0.824 OrHighLow 726.98 (1.6%) 727.36 (2.5%)0.1% ( -4% -4%) 0.939 HighTermDayOfYearSort 764.56 (3.8%) 765.85 (3.4%)0.2% ( -6% -7%) 0.882 OrNotHighMed 350.64 (3.4%) 351.46 (4.3%)0.2% ( -7% -8%) 0.848 Fuzzy1 75.46 (1.9%) 75.80 (1.8%)0.5% ( -3% -4%) 0.448 IntNRQ 139.45 (13.7%) 140.08 (14.5%)0.5% ( -24% - 33%) 0.918 HighTermTitleBDVSort 15.35 (5.7%) 15.42 (5.5%)0.5% ( -10% - 12%) 0.781 PKLookup 265.51 (2.5%) 267.01 (1.6%)0.6% ( -3% -4%) 0.389 AndHighLow 989.77 (1.9%) 995.39 (2.2%)0.6% ( -3% -4%) 0.387 CountTerm 7984.92 (3.9%) 8051.09 (5.0%)0.8% ( -7% - 10%) 0.557 OrHighNotHigh 321.43 (2.7%) 324.15 (3.1%)0.8% ( -4% -6%) 0.357 OrMany 18.24 (2.4%) 18.45 (2.1%)1.1% ( -3% -5%) 0.107 Wildcard 117.97 (3.2%) 119.40 (3.2%)1.2% ( -5% -7%) 0.230 OrHighRare 269.54 (5.3%) 273.78 (6.5%)1.6% ( -9% - 14%) 0.401 OrHighMed 219.25 (2.5%) 222.89 (2.7%)1.7% ( -3% -7%) 0.044 And2Terms2StopWords 151.65 (1.8%) 154.21 (1.6%)1.7% ( -1% -5%) 0.002 Or2Terms2StopWords 153.46 (3.1%) 156.15 (2.8%)1.8% ( -4% -7%) 0.061 Or3Terms 164.81 (2.4%) 168.57 (2.9%)2.3% ( -2% -7%) 0.007 MedTerm 610.37 (3.5%) 625.30 (3.7%)2.4% ( -4% - 10%) 0.032 OrHighNotMed 417.48 (2.8%) 427.78 (3.1%)2.5% ( -3% -8%) 0.008 LowTerm 981.78 (2.8%) 1008.35 (3.8%)2.7% ( -3% -9%) 0.010 And3Terms 165.41 (1.8%) 170.05 (1.7%)2.8% ( 0% -6%) 0.000 AndStopWords 30.15 (3.0%) 31.07 (3.8%)3.0% ( -3% - 10%) 0.005 HighTerm 455.84 (3.4%) 469.91 (4.0%)3.1% ( -4% - 10%) 0.009 OrHighHigh 68.52 (1.7%) 70.69 (3.7%)3.2% ( -2% -8%) 0.000 OrHighNotLow 412.63 (2.8%) 427.86 (3.5%)3.7% ( -2% - 10%) 0.000 OrStopWords 33.50 (3.8%) 34.75 (5.1%)3.7% ( -4% - 13%) 0.009 AndHighMed 165.41 (1.9%) 171.81 (1.7%)3.9% ( 0% -7%) 0.000 AndHighHigh 72.22 (1.7%) 76.11 (1.4%)5.4% ( 2% -8%) 0.000 ```
Re: [PR] Disable exchanging minimum scores across slices for exhaustive evaluation. [lucene]
jpountz merged PR #13954: URL: https://github.com/apache/lucene/pull/13954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]
jpountz commented on PR #13951: URL: https://github.com/apache/lucene/pull/13951#issuecomment-2437616406 > I couldn't think of a clean way to integrate the two... but I'll give it some more thought For what it's worth, these classes are package-private, so we can feel free to change their API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]
HoustonPutman commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1810967900 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used to compute the equi-weight per bin. */ public static List computeDynamicNumericRanges( - long[] values, long[] weights, int len, long totalWeight, int topN) { + long[] values, long[] weights, int len, long totalValue, long totalWeight, int topN) { assert values.length == weights.length && len <= values.length && len >= 0; assert topN >= 0; List dynamicRangeResult = new ArrayList<>(); if (len == 0 || topN == 0) { return dynamicRangeResult; } -new InPlaceMergeSorter() { - @Override - protected int compare(int index1, int index2) { -int cmp = Long.compare(values[index1], values[index2]); -if (cmp == 0) { - // If the values are equal, sort based on the weights. - // Any weight order is correct as long as it's deterministic. - return Long.compare(weights[index1], weights[index2]); -} -return cmp; - } +double rangeWeightTarget = (double) totalWeight / topN; +double[] kWeights = new double[topN]; +for (int i = 0; i < topN; i++) { + kWeights[i] = (i == 0 ? 0 : kWeights[i - 1]) + rangeWeightTarget; Review Comment: Wow yeah, both are better (though I like the first). This is the beauty of PR reviews haha. When you are 500 lines into a change, who knows what dumb things you will write... ## lucene/core/src/java/org/apache/lucene/util/WeightedSelector.java: ## @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.util.Arrays; +import java.util.Comparator; +import java.util.SplittableRandom; + +/** + * Adaptive selection algorithm based on the introspective quick select algorithm. The quick select + * algorithm uses an interpolation variant of Tukey's ninther median-of-medians for pivot, and + * Bentley-McIlroy 3-way partitioning. For the introspective protection, it shuffles the sub-range + * if the max recursive depth is exceeded. + * + * This selection algorithm is fast on most data shapes, especially on nearly sorted data, or + * when k is close to the boundaries. It runs in linear time on average. + * + * @lucene.internal + */ +public abstract class WeightedSelector { + + // This selector is used repeatedly by the radix selector for sub-ranges of less than + // 100 entries. This means this selector is also optimized to be fast on small ranges. + // It uses the variant of medians-of-medians and 3-way partitioning, and finishes the + // last tiny range (3 entries or less) with a very specialized sort. + + private SplittableRandom random; + + protected abstract long getWeight(int i); + + protected abstract long getValue(int i); + + public final WeightRangeInfo[] select( Review Comment: Absolutely. Was going to go through and add docs, just wanted to make sure it was a good direction to go in first. Probably worth doing the benchmarking first 🥹 ## lucene/core/src/java/org/apache/lucene/util/WeightedSelector.java: ## @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.util.Arrays; +import
Re: [PR] Speed up advancing within a block, take 2. [lucene]
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438973598 maybe its a bug that it doesnt work on your mac either. because elsewhere they have code that looks like it is supposed to be doing this stuff: https://github.com/openjdk/jdk/blob/f1a9a8d25b2e1f9b5dbe8719abb66ec4cd9057dc/src/hotspot/cpu/aarch64/aarch64_vector_ad.m4#L3782 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
goankur commented on code in PR #13572: URL: https://github.com/apache/lucene/pull/13572#discussion_r1817415010 ## lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/VectorUtilBenchmark.java: ## @@ -84,6 +91,76 @@ public void init() { floatsA[i] = random.nextFloat(); floatsB[i] = random.nextFloat(); } +// Java 21+ specific initialization +final int runtimeVersion = Runtime.version().feature(); +if (runtimeVersion >= 21) { + // Reflection based code to eliminate the use of Preview classes in JMH benchmarks + try { +final Class vectorUtilSupportClass = VectorUtil.getVectorUtilSupportClass(); +final var className = "org.apache.lucene.internal.vectorization.PanamaVectorUtilSupport"; +if (vectorUtilSupportClass.getName().equals(className) == false) { + nativeBytesA = null; + nativeBytesB = null; +} else { + MethodHandles.Lookup lookup = MethodHandles.lookup(); + final var MemorySegment = "java.lang.foreign.MemorySegment"; + final var methodType = + MethodType.methodType(lookup.findClass(MemorySegment), byte[].class); + MethodHandle nativeMemorySegment = + lookup.findStatic(vectorUtilSupportClass, "nativeMemorySegment", methodType); + byte[] a = new byte[size]; Review Comment: Nonetheless I will simplify the setup code to make it a bit more readable in the next iteration. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Speed up advancing within a block, take 2. [lucene]
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438737799 Specializing `ImpactsDISI#nextDoc()` helped get rid of the slowdown: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndStopWords 31.34 (1.8%) 30.84 (4.0%) -1.6% ( -7% -4%) 0.105 CountTerm 8573.12 (3.8%) 8449.05 (4.7%) -1.4% ( -9% -7%) 0.284 CountOrHighMed 105.75 (2.1%) 104.50 (1.4%) -1.2% ( -4% -2%) 0.039 TermDTSort 363.06 (6.4%) 358.98 (6.6%) -1.1% ( -13% - 12%) 0.585 CountOrHighHigh 50.62 (2.4%) 50.28 (1.7%) -0.7% ( -4% -3%) 0.305 IntNRQ 453.67 (4.7%) 451.13 (4.5%) -0.6% ( -9% -9%) 0.700 OrHighRare 283.32 (3.8%) 282.52 (3.8%) -0.3% ( -7% -7%) 0.813 Fuzzy1 78.58 (2.1%) 78.42 (3.0%) -0.2% ( -5% -5%) 0.812 HighTermDayOfYearSort 850.86 (4.4%) 849.52 (3.0%) -0.2% ( -7% -7%) 0.895 HighTermTitleBDVSort 13.97 (6.3%) 13.96 (5.5%) -0.1% ( -11% - 12%) 0.974 And2Terms2StopWords 157.31 (1.3%) 157.27 (2.2%) -0.0% ( -3% -3%) 0.965 LowTerm 985.67 (3.0%) 986.01 (1.8%)0.0% ( -4% -4%) 0.964 HighTermMonthSort 3216.69 (2.2%) 3217.92 (3.9%)0.0% ( -5% -6%) 0.969 Fuzzy2 73.69 (2.0%) 73.74 (2.4%)0.1% ( -4% -4%) 0.910 AndHighHigh 65.88 (2.1%) 66.18 (2.0%)0.5% ( -3% -4%) 0.472 And3Terms 169.85 (2.0%) 170.81 (2.4%)0.6% ( -3% -5%) 0.424 OrMany 19.10 (1.7%) 19.22 (1.7%)0.6% ( -2% -4%) 0.237 Or2Terms2StopWords 160.88 (1.4%) 161.91 (2.0%)0.6% ( -2% -4%) 0.241 OrStopWords 34.90 (1.4%) 35.15 (3.9%)0.7% ( -4% -6%) 0.450 OrHighLow 799.18 (1.6%) 805.33 (1.5%)0.8% ( -2% -3%) 0.117 CountAndHighMed 149.99 (3.1%) 151.23 (1.1%)0.8% ( -3% -5%) 0.261 Wildcard 88.47 (2.7%) 89.32 (3.2%)1.0% ( -4% -7%) 0.309 PKLookup 270.87 (3.8%) 273.47 (1.7%)1.0% ( -4% -6%) 0.307 Prefix3 93.00 (8.2%) 94.14 (6.3%)1.2% ( -12% - 17%) 0.599 MedTerm 690.05 (2.6%) 701.55 (1.3%)1.7% ( -2% -5%) 0.010 OrHighNotMed 359.57 (2.7%) 366.02 (1.9%)1.8% ( -2% -6%) 0.014 Or3Terms 170.81 (1.3%) 173.98 (2.1%)1.9% ( -1% -5%) 0.001 OrHighNotLow 432.25 (3.4%) 440.76 (2.4%)2.0% ( -3% -8%) 0.035 HighTermTitleSort 159.15 (4.8%) 162.44 (2.9%)2.1% ( -5% - 10%) 0.096 AndHighMed 225.25 (2.6%) 229.93 (1.4%)2.1% ( -1% -6%) 0.002 HighTerm 455.45 (2.4%) 465.69 (2.1%)2.2% ( -2% -6%) 0.002 OrHighHigh 78.87 (1.5%) 80.64 (1.5%)2.3% ( 0% -5%) 0.000 OrHighNotHigh 218.32 (2.7%) 224.10 (2.0%)2.6% ( -2% -7%) 0.000 OrNotHighLow .11 (2.8%) 1144.28 (2.5%)3.0% ( -2% -8%) 0.000 OrHighMed 267.13 (1.8%) 275.57 (1.3%)3.2% ( 0% -6%) 0.000 OrNotHighMed 303.24 (3.0%) 313.56 (2.5%)3.4% ( -2% -9%) 0.000 OrNotHighHigh 230.18 (2.8%) 238.62 (2.2%)3.7% ( -1% -8%) 0.000 AndHighLow 866.39 (2.7%) 903.54 (2.4%)4.3% ( 0% -9%) 0.000 CountAndHighHigh 49.60 (3.1%) 53.54 (0.9%)7.9% ( 3% - 12%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m
Re: [PR] Speed up advancing within a block, take 2. [lucene]
rmuir commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2438944785 https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64_vector.ad#L280-L283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Simplify leaf slice calculation [lucene]
github-actions[bot] commented on PR #13893: URL: https://github.com/apache/lucene/pull/13893#issuecomment-2439076438 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Optimize slice calculation in IndexSearcher a little [lucene]
github-actions[bot] commented on PR #13860: URL: https://github.com/apache/lucene/pull/13860#issuecomment-2439076472 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce allocations in BKDReaderDocIDSetIterator [lucene]
github-actions[bot] commented on PR #13888: URL: https://github.com/apache/lucene/pull/13888#issuecomment-2439076449 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org