[PR] Determinize automata used by IntervalsSource.regexp [lucene]
ChrisHegarty opened a new pull request, #13718: URL: https://github.com/apache/lucene/pull/13718 This commit determinizes internal automata used in the construction of the IntervalsSource created by the `regexp` factory. relates #13715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] move Operations.sameLanguage/subsetOf to AutomatonTestUtil in test-framework [lucene]
rmuir merged PR #13708: URL: https://github.com/apache/lucene/pull/13708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]
rmuir commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2331298246 I do this on build side, rather than locally. so you might want to tweak it if you want to ignore "explicitly git-added files" which I think is our use-case. `git status` has a lot of options to do just that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]
rmuir merged PR #13707: URL: https://github.com/apache/lucene/pull/13707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Relax Operations.isTotal() to work with a deterministic automaton [lucene]
mikemccand commented on code in PR #13707: URL: https://github.com/apache/lucene/pull/13707#discussion_r1745429485 ## lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java: ## @@ -857,22 +857,38 @@ public static boolean isEmpty(Automaton a) { return true; } - /** Returns true if the given automaton accepts all strings. The automaton must be minimized. */ + /** Returns true if the given automaton accepts all strings. */ public static boolean isTotal(Automaton a) { return isTotal(a, Character.MIN_CODE_POINT, Character.MAX_CODE_POINT); } /** * Returns true if the given automaton accepts all strings for the specified min/max range of the - * alphabet. The automaton must be minimized. + * alphabet. */ public static boolean isTotal(Automaton a, int minAlphabet, int maxAlphabet) { -if (a.isAccept(0) && a.getNumTransitions(0) == 1) { - Transition t = new Transition(); - a.getTransition(0, 0, t); - return t.dest == 0 && t.min == minAlphabet && t.max == maxAlphabet; +BitSet states = getLiveStates(a); +Transition spare = new Transition(); +int seenStates = 0; +for (int state = states.nextSetBit(0); state >= 0; state = states.nextSetBit(state + 1)) { + // all reachable states must be accept states + if (a.isAccept(state) == false) return false; Review Comment: It can return a false `false`! (When the automaton is non-deterministic). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure in TestTaxonomyFacetAssociations.testFloatSumAssociation -- ULP float issue? [lucene]
stefanvodita commented on issue #13720: URL: https://github.com/apache/lucene/issues/13720#issuecomment-2331754886 It's one of those float-summation-is-not-commutative errors. First ordering: ``` 1> 0.0 + 575310.1 = 575310.1 1> 575310.1 + 701147.2 = 1276457.2 1> 1276457.2 + 555620.8 = 1832078.0 ``` Second ordering: ``` 1> 0.0 + 575310.1 = 575310.1 1> 575310.1 + 555620.8 = 1130931.0 1> 1130931.0 + 701147.2 = 1832078.2 ``` The results are compared with an epsilon of 0.2. The short-term solution can be to increase that even more. The long-term solution can be #13011. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Follow-up to GH#13702 [lucene]
gsmiller opened a new pull request, #13722: URL: https://github.com/apache/lucene/pull/13722 Ensures we retain pre-existing (but strange) inconsistency in DrillSideways#search(DrillDownQuery, Collector). This is a deprecated method so I propose we retain this inconsistency since the method will go away with 10.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add dynamic range facets [lucene]
mikemccand commented on code in PR #13689: URL: https://github.com/apache/lucene/pull/13689#discussion_r1745595485 ## lucene/demo/src/java/org/apache/lucene/demo/facet/package-info.java: ## @@ -385,6 +385,12 @@ * Sampling support is implemented in {@link * org.apache.lucene.facet.RandomSamplingFacetsCollector}. * + * Dynamic Range Facets Review Comment: Thank you for updating package javadocs! ## lucene/CHANGES.txt: ## @@ -303,6 +303,9 @@ New Features * GITHUB#13678: Add support JDK 23 to the Panama Vectorization Provider. (Chris Hegarty) +* GITHUB#13689: Dynamic range facets - create weighted ranges over numeric fields with counts per range. Review Comment: Maybe make this a bit more verbose? E.g. something like: ``` Add a new faceting feature, dynamic range facets, which automatically picks a balanced set of numeric ranges based on the distribution of values that occur across all hits. For use cases that have a highly variable numeric doc values field, such as "price" in an e-commerce application, this facet method is powerful as it allows the presented ranges to adapt depending on what hits the query actually matches. This is in contrast to existing range faceting that requires the application to provide the specific fixed ranges up front. ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -0,0 +1,276 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.facet.range; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.Callable; +import java.util.concurrent.ExecutionException; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; +import org.apache.lucene.facet.FacetsCollector; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.LongValues; +import org.apache.lucene.search.LongValuesSource; +import org.apache.lucene.util.IOUtils; +import org.apache.lucene.util.InPlaceMergeSorter; + +/** + * Methods to create dynamic ranges for numeric fields. + * + * @lucene.experimental + */ +public final class DynamicRangeUtil { + + private DynamicRangeUtil() {} + + /** + * Construct dynamic ranges using the specified weight field to generate equi-weight range for the + * specified numeric bin field + * + * @param weightFieldName Name of the specified weight field + * @param weightValueSource Value source of the weight field + * @param fieldValueSource Value source of the value field + * @param facetsCollector FacetsCollector + * @param topN Number of requested ranges + * @param exec An executor service that is used to do the computation + * @return A list of DynamicRangeInfo that contains count, relevance, min, max, and centroid for + * each range + */ + public static List computeDynamicRanges( + String weightFieldName, + LongValuesSource weightValueSource, + LongValuesSource fieldValueSource, + FacetsCollector facetsCollector, + int topN, + ExecutorService exec) + throws IOException { + +List matchingDocsList = facetsCollector.getMatchingDocs(); +int totalDoc = matchingDocsList.stream().mapToInt(matchingDoc -> matchingDoc.totalHits).sum(); +long[] values = new long[totalDoc]; +long[] weights = new long[totalDoc]; +long totalWeight = 0; +int overallLength = 0; + +List> futures = new ArrayList<>(); +List tasks = new ArrayList<>(); +for (FacetsCollector.MatchingDocs matchingDocs : matchingDocsList) { + if (matchingDocs.totalHits > 0) { +SegmentOutput segmentOutput = new SegmentOutput(matchingDocs.totalHits); + +// [1] retrieve values and associated weights concurrently +SegmentTask task = +new SegmentTask(matchingDocs, fieldValueSource, weightValueSource, segmentOutput); +tasks.add(task); +futures.add(exec.submit(task)); + } +} + +// [2] wait for all segment runs to finish +for (Future future : futures) { + try { +future.get(); + } catch (InterruptedException ie) { +throw new RuntimeExcept
Re: [I] Reproducible test failure in TestTaxonomyFacetAssociations.testFloatSumAssociation -- ULP float issue? [lucene]
mikemccand commented on issue #13720: URL: https://github.com/apache/lucene/issues/13720#issuecomment-2331878119 I wish the test assert APIs allowed us to express the allowed epsilon in ULPs (1 or 2 or so), not a fixed float. The expected/allowed absolute error varies with how large the float is. In this case 0.25 is 2 ULPs at 1832078.25 for `float32` I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Early exit from Operations#removeDeadStates when an automaton doesn't have dead states. [lucene]
jpountz merged PR #13721: URL: https://github.com/apache/lucene/pull/13721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]
jpountz commented on code in PR #13697: URL: https://github.com/apache/lucene/pull/13697#discussion_r1745619666 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -440,6 +478,101 @@ private String formatScoreExplanation(int matches, int start, int end, ScoreMode } } + private abstract static class BatchAwareLeafCollector extends FilterLeafCollector { +public BatchAwareLeafCollector(LeafCollector in) { + super(in); +} + +public void endBatch() throws IOException {} + } + + private static class BlockJoinBulkScorer extends BulkScorer { +private final BulkScorer childBulkScorer; +private final ScoreMode scoreMode; +private final BitSet parents; +private final int parentsLength; + +public BlockJoinBulkScorer(BulkScorer childBulkScorer, ScoreMode scoreMode, BitSet parents) { + this.childBulkScorer = childBulkScorer; + this.scoreMode = scoreMode; + this.parents = parents; + this.parentsLength = parents.length(); +} + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + // Subtract one because max is exclusive w.r.t. score but inclusive w.r.t prevSetBit + int lastParent = parents.prevSetBit(Math.min(parentsLength, max) - 1); Review Comment: I believe it would be legal to call BulkScorer.score with min=max=0, and this would give an out-of-bounds exception. We may need to check for the case when max == 0 explicitly like we do below for `min`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]
jpountz commented on code in PR #13697: URL: https://github.com/apache/lucene/pull/13697#discussion_r1745752639 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -440,6 +478,114 @@ private String formatScoreExplanation(int matches, int start, int end, ScoreMode } } + private abstract static class BatchAwareLeafCollector extends FilterLeafCollector { +public BatchAwareLeafCollector(LeafCollector in) { + super(in); +} + +public void endBatch() throws IOException {} + } + + private static class BlockJoinBulkScorer extends BulkScorer { +private final BulkScorer childBulkScorer; +private final ScoreMode scoreMode; +private final BitSet parents; +private final int parentsLength; + +public BlockJoinBulkScorer(BulkScorer childBulkScorer, ScoreMode scoreMode, BitSet parents) { + this.childBulkScorer = childBulkScorer; + this.scoreMode = scoreMode; + this.parents = parents; + this.parentsLength = parents.length(); +} + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + // Subtract one because max is exclusive w.r.t. score but inclusive w.r.t prevSetBit + int lastParent = parents.prevSetBit(Math.min(parentsLength, max) - 1); + int prevParent = min == 0 ? -1 : parents.prevSetBit(min - 1); + if (lastParent == prevParent) { +// No parent docs in this range. +// If we've scored the last parent in the bit set, return NO_MORE_DOCS to indicate we are +// done scoring. +return max >= parentsLength ? NO_MORE_DOCS : max; + } + + BatchAwareLeafCollector wrappedCollector = wrapCollector(collector); + childBulkScorer.score(wrappedCollector, acceptDocs, prevParent + 1, lastParent + 1); + wrappedCollector.endBatch(); + + // If we've scored the last parent in the bit set, return NO_MORE_DOCS to indicate we are done + // scoring + return lastParent + 1 >= parentsLength ? NO_MORE_DOCS : max; +} + +@Override +public long cost() { + return childBulkScorer.cost(); +} + +private BatchAwareLeafCollector wrapCollector(LeafCollector collector) { + return new BatchAwareLeafCollector(collector) { +private final Score currentParentScore = new Score(scoreMode); +private int currentParent = -1; +private Scorable scorer = null; + +@Override +public void setScorer(Scorable scorer) throws IOException { + assert scorer != null; + this.scorer = scorer; + + super.setScorer( + new Scorable() { +@Override +public float score() { + return currentParentScore.score(); +} + +@Override +public void setMinCompetitiveScore(float minScore) throws IOException { + if (scoreMode == ScoreMode.None || scoreMode == ScoreMode.Max) { +scorer.setMinCompetitiveScore(minScore); + } +} + }); Review Comment: It looks good to me this way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add Bulk Scorer For ToParentBlockJoinQuery [lucene]
Mikep86 commented on code in PR #13697: URL: https://github.com/apache/lucene/pull/13697#discussion_r1745751201 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -440,6 +478,114 @@ private String formatScoreExplanation(int matches, int start, int end, ScoreMode } } + private abstract static class BatchAwareLeafCollector extends FilterLeafCollector { +public BatchAwareLeafCollector(LeafCollector in) { + super(in); +} + +public void endBatch() throws IOException {} + } + + private static class BlockJoinBulkScorer extends BulkScorer { +private final BulkScorer childBulkScorer; +private final ScoreMode scoreMode; +private final BitSet parents; +private final int parentsLength; + +public BlockJoinBulkScorer(BulkScorer childBulkScorer, ScoreMode scoreMode, BitSet parents) { + this.childBulkScorer = childBulkScorer; + this.scoreMode = scoreMode; + this.parents = parents; + this.parentsLength = parents.length(); +} + +@Override +public int score(LeafCollector collector, Bits acceptDocs, int min, int max) +throws IOException { + // Subtract one because max is exclusive w.r.t. score but inclusive w.r.t prevSetBit + int lastParent = parents.prevSetBit(Math.min(parentsLength, max) - 1); + int prevParent = min == 0 ? -1 : parents.prevSetBit(min - 1); + if (lastParent == prevParent) { +// No parent docs in this range. +// If we've scored the last parent in the bit set, return NO_MORE_DOCS to indicate we are +// done scoring. +return max >= parentsLength ? NO_MORE_DOCS : max; + } + + BatchAwareLeafCollector wrappedCollector = wrapCollector(collector); + childBulkScorer.score(wrappedCollector, acceptDocs, prevParent + 1, lastParent + 1); + wrappedCollector.endBatch(); + + // If we've scored the last parent in the bit set, return NO_MORE_DOCS to indicate we are done + // scoring + return lastParent + 1 >= parentsLength ? NO_MORE_DOCS : max; +} + +@Override +public long cost() { + return childBulkScorer.cost(); +} + +private BatchAwareLeafCollector wrapCollector(LeafCollector collector) { + return new BatchAwareLeafCollector(collector) { +private final Score currentParentScore = new Score(scoreMode); +private int currentParent = -1; +private Scorable scorer = null; + +@Override +public void setScorer(Scorable scorer) throws IOException { + assert scorer != null; + this.scorer = scorer; + + super.setScorer( + new Scorable() { +@Override +public float score() { + return currentParentScore.score(); +} + +@Override +public void setMinCompetitiveScore(float minScore) throws IOException { + if (scoreMode == ScoreMode.None || scoreMode == ScoreMode.Max) { +scorer.setMinCompetitiveScore(minScore); + } +} + }); Review Comment: LMKWYT of this approach. I originally added the child scorer to `currentParentScore`'s constructor and created `currentParentScore` in `setScorer`, but that didn't work because `setScorer` is potentially called multiple times. I then tried creating `currentParentScore` once, on the first call to `setScorer`, and checking that the same child scorer is passed in subsequent calls in an assertion. At least in the tests, this didn't work because the child scorer is wrapped in a different `AssertingScorable` in each call. This approach seemed like the simplest that avoids the above problems. I could go back to them though if we can assume that subsequent calls to `setScorer` will pass the same child scorer without checking in an assertion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible test failure in TestTaxonomyFacetAssociations.testFloatSumAssociation -- ULP float issue? [lucene]
stefanvodita commented on issue #13720: URL: https://github.com/apache/lucene/issues/13720#issuecomment-2332088174 I like the idea of comparing based on ULP! I'll poach some code for [float comparison](https://github.com/apache/commons-numbers/blob/master/commons-numbers-core/src/main/java/org/apache/commons/numbers/core/Precision.java#L224) from Apache Commons Numbers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] New JMH benchmark method - vdot8s that implement int8 dotProduct in C… [lucene]
mikemccand commented on PR #13572: URL: https://github.com/apache/lucene/pull/13572#issuecomment-2332099352 I'm trying to understand the status of this PR... so far it's a standalone JMH benchy that shows that using [FFM](https://openjdk.org/jeps/454) to invoke our own native C implementation of `dotProduct` on ints (e.g. needed for `int7`, `int4` HNSW scalar quantization in Lucene) that's carefully optimized to use the right SIMD instructions (switching depending on the capabilities of the CPU) shows sizable speedups (up to 10X) over our current "pure java" Panama vector API implementation? This is a micro-benchmark of just `dotProduct`, so the overall gains to Lucene's HNSW indexing and searching will be less than 10X. But, `dotProduct` is [very much the hotspot of Lucene's HNSW](https://github.com/mikemccand/luceneutil/issues/256#issuecomment-215955) indexing and searching, so the end gains will likely be significant, especially on CPUs that perform poorly today via Panama (e.g. ARM). To make this actually accessible to users I think we would still need to: * Add this `nativeDotProduct` somewhere, likely `lucene/misc` module, and get gradle to compile it all * Make an alternative `NativeFlatVectorScorer` (in `misc`) that uses the `nativeDotProduct` * Make a `NativeLucene99HnswVectorsFormat` that is just like the default for `core` (`Lucene99HnswVectorsFormat` now), but it invokes `NativeFlatVectorScorer` instead. Hopefully this can be done without too much code duplication. * Show examples of how one could use Lucene's default `Codec` but swap in this `NativeLucene99HnswVectorsFormat` for KNN fields I think this all can be done after 10.0 -- there's no particular reason why we need a major release to add this? It's entirely new (optimized) added feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332114836 Hey all, I have done some benchmarking with two main goals: 1) ensure there are no regressions introduced by the proposed change 2) ensure there is some performance gain when intra-segment is activated, as basic as its support is in this initial proposed step. I ran `wikimediumall` benchmarks with the default parameters and manually added count queries to the tasks executed. The default search concurrency is automatic, meaning it will create an executor based on the number of CPUs available. The index is not force merged, there are multiple segments. The first run is main (baseline) against my current branch (my_modified_version): TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value CountTerm 4252.98 (6.3%) 3928.20 (5.2%) -7.6% ( -18% -4%) 0.000 HighIntervalsOrdered 18.28 (5.7%) 17.80 (5.4%) -2.7% ( -12% -8%) 0.129 CountOrHighMed 319.35 (5.3%) 311.55 (4.5%) -2.4% ( -11% -7%) 0.118 HighTermMonthSort 1225.80 (3.8%) 1200.00 (5.3%) -2.1% ( -10% -7%) 0.148 LowSpanNear 17.17 (5.3%) 16.89 (4.4%) -1.7% ( -10% -8%) 0.280 MedIntervalsOrdered 62.72 (4.4%) 61.69 (5.7%) -1.6% ( -11% -8%) 0.310 Fuzzy1 65.29 (6.7%) 64.36 (4.6%) -1.4% ( -11% - 10%) 0.431 MedTermDayTaxoFacets 17.35 (6.9%) 17.11 (7.0%) -1.4% ( -14% - 13%) 0.537 HighTermTitleBDVSort 19.73 (5.3%) 19.50 (4.1%) -1.2% ( -10% -8%) 0.441 AndHighLow 954.69 (4.2%) 943.83 (4.1%) -1.1% ( -9% -7%) 0.384 BrowseDayOfYearTaxoFacets3.87 (8.4%)3.83 (4.9%) -1.1% ( -13% - 13%) 0.603 BrowseDateTaxoFacets3.83 (8.8%)3.79 (5.9%) -1.1% ( -14% - 14%) 0.641 AndHighHighDayTaxoFacets 11.99 (4.9%) 11.86 (6.5%) -1.1% ( -11% - 10%) 0.556 LowPhrase 176.29 (3.3%) 174.85 (3.7%) -0.8% ( -7% -6%) 0.460 BrowseRandomLabelTaxoFacets3.20 (3.9%)3.18 (4.3%) -0.8% ( -8% -7%) 0.547 HighTermTitleSort 97.01 (5.0%) 96.30 (4.6%) -0.7% ( -9% -9%) 0.628 HighTerm 410.73 (5.6%) 407.91 (7.1%) -0.7% ( -12% - 12%) 0.736 HighPhrase 68.29 (4.3%) 67.87 (4.0%) -0.6% ( -8% -7%) 0.641 CountOrHighHigh 34.41 (25.3%) 34.22 (20.2%) -0.5% ( -36% - 60%) 0.942 OrHighNotHigh 326.55 (6.4%) 324.92 (5.6%) -0.5% ( -11% - 12%) 0.793 AndHighMed 232.64 (3.5%) 231.61 (4.8%) -0.4% ( -8% -8%) 0.739 PKLookup 163.46 (8.6%) 162.75 (6.9%) -0.4% ( -14% - 16%) 0.860 OrNotHighLow 1042.34 (3.9%) 1039.01 (4.2%) -0.3% ( -8% -8%) 0.803 HighSloppyPhrase 19.17 (4.4%) 19.12 (5.7%) -0.3% ( -9% - 10%) 0.855 BrowseMonthTaxoFacets4.08 (5.0%)4.07 (7.4%) -0.2% ( -11% - 12%) 0.908 HighSpanNear 17.99 (5.9%) 18.00 (6.0%)0.0% ( -11% - 12%) 0.984 OrHighMedDayTaxoFacets2.49 (7.3%)2.50 (6.7%)0.2% ( -12% - 15%) 0.936 OrNotHighMed 296.13 (5.0%) 297.20 (5.9%)0.4% ( -9% - 11%) 0.833 OrHighMed 350.64 (4.2%) 352.22 (4.6%)0.5% ( -8% -9%) 0.748 MedPhrase 60.88 (3.8%) 61.18 (4.4%)0.5% ( -7% -8%) 0.695 CountAndHighMed 272.12 (3.4%) 273.55 (4.8%)0.5% ( -7% -9%) 0.691 Respell 35.73 (4.9%) 35.93 (6.7%)0.6% ( -10% - 12%) 0.763 MedSloppyPhrase 19.80 (6.6%) 19.92 (7.3%)0.6% ( -12% - 15%) 0.778 LowSloppyPhrase 17.75 (4.3%) 17.86 (3.7%)0.6% ( -7% -9%) 0.622 Prefix3 1050.17 (3.9%) 1057.01 (4.8%)0.7% ( -7% -9%) 0.636 Wildcard 143.63 (4.1
[PR] Add unit-of-least-precision float comparison [lucene]
stefanvodita opened a new pull request, #13723: URL: https://github.com/apache/lucene/pull/13723 Comparing floats with a fixed epsilon doesn't really work. We add comparison based on unit-of-lest-precision (ULP) and use it to fix a failing test. Closes #13720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Dry up TestScorerPerf [lucene]
javanna merged PR #13712: URL: https://github.com/apache/lucene/pull/13712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Dimensionality reduction in Lucene [lucene]
tanyaroosta opened a new issue, #13727: URL: https://github.com/apache/lucene/issues/13727 ### Description Hi. I am opening a new issue to follow up on a [discussion in this issue](https://github.com/apache/lucene/issues/13403#issuecomment-2132043000) regarding using the segment size to decide if we should do vector quantization or not. There are two things to figure out with this, 1) if it would be helpful, 2) how to build/tune? I like to start the conversation and get some ideas and feedback from the community. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
msokolov commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332479198 Thanks for the testing, @javanna! Indeed it is clear that this change does *something* and could be useful as-is for some query loads. I'm also encouraged by Adrien's comments. Although ideally I would have preferred to see us get to a really good place, I get the PNP argument, and appreciate that we don't want to do everything all at once. Regarding the small regression due to the collector overhead, I wonder if it would be possible to create some kind of coupling between Collector and IndexSearcher such that the Collector can assert that it is to be used with an IndexSearcher that "enables cross-segment concurrency". Then it would be up to the user to select the proper Collector (or collector option) to use with the IndexSearcher they have configured. Or perhaps IndexSearcher could be a Collector factory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]
rmuir commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2332541693 @dweiss yes that's the case i hit, it is just the "switching branches from main" use-case and it seems to trip on things such as buildSrc and benchmark-jmh, i end out manually rm -rf'ing them after switching. in my uses of git-status checks, I actually run the git-status in CI after all the build logic runs, to make sure no tasks/tests/etc modify the tree. It is a different concern maybe than what we might be doing with it (I'm not sure). e.g. I'm not trying to detect "you forgot to git-add" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]
dweiss commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2332544468 I think this logic is flawed: ``` // git ignores any folders which are empty (this includes folders with recursively empty sub-folders). def untrackedNonEmptyFolders = status.untrackedFolders.findAll { path -> File location = file("${rootProject.projectDir}/${path}") boolean hasFiles = false Files.walkFileTree(location.toPath(), new SimpleFileVisitor() { @Override FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { hasFiles = true // Terminate early. return FileVisitResult.TERMINATE } }) return hasFiles } ``` we shouldn't do anything about untrackedFolders. Looking at jgit's implementation, I don't even see the reason it's there and I can't remember why I (or somebody else) added it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] simplify checkWorkingCopyClean to make backporting easier? [lucene]
dweiss commented on issue #13719: URL: https://github.com/apache/lucene/issues/13719#issuecomment-2332550998 https://github.com/apache/lucene/pull/13728 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for intra-segment search concurrency [lucene]
javanna commented on PR #13542: URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332571178 > I think we could keep it simple and provide a separate collector manager perhaps that supports intra-segment concurrency for now Having thought a little more, I am not sure this is a good idea. The first problem I have with it is naming, what would we call this other collector manager? :) We could otherwise add a constructor to the existing collector manager that takes the slices as an argument. Not very clean, but would allow the manager to go through the slices and determine what collector impl to return once `newCollector `is called later. We could make this new constructor optional, and leave the default constructor untouched for now (when created through it the manager won't support intra-segment concurrency), so we don't break existing usages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Should the static search methods in FacetsCollector take a FacetsCollector as last argument? [lucene]
gsmiller commented on issue #13725: URL: https://github.com/apache/lucene/issues/13725#issuecomment-2332818574 My vote would be to be more restrictive with these signatures and specify `FacetsCollectorManager` given that these are sugar methods meant to make it easier to do faceting while also the "main" search. Today's faceting module all works against `FacetsCollector` explicitly in the various APIs, so having a generic `Collector` or `CollectorManager` here seems confusing IMO. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]
github-actions[bot] commented on PR #13398: URL: https://github.com/apache/lucene/pull/13398#issuecomment-2332950383 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Gradle builds slow to start [lucene]
dweiss opened a new issue, #13730: URL: https://github.com/apache/lucene/issues/13730 ### Description This has been mentioned by Mike Sokolov, I think. Gradle builds have become slw to start as we upgraded from version to version. Interestingly, I've come across this hint: https://docs.gradle.org/current/userguide/sharing_build_logic_between_subprojects.html#sec:convention_plugins_vs_cross_configuration which is the exact opposite to what we do. I do have some background in aspect-oriented programming so, to me, this kind of cross-configuration and separation of concerns is a way to clean up the build configuration and separate different parts of it. Well, gradle folks clearly think otherwise. I have not debugged this intensively but when you run even the smallest task with the ```-debug``` option, you'll see a lot of this: ``` 2024-09-06T08:31:11.679+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 11: acquired lock on worker lease 2024-09-06T08:31:11.679+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 11: released lock on worker lease 2024-09-06T08:31:11.679+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 10: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 10: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 9: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 9: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 8: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 8: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 2: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 2: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 3: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 3: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 5: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 5: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 4: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 4: released lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 6: acquired lock on worker lease 2024-09-06T08:31:11.680+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 6: released lock on worker lease 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 7: acquired lock on worker lease 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 7: released lock on worker lease 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationRunner] Build operation 'Cross-configure project :lucene:analysis:opennlp' started 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 6: acquired lock on worker lease 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 6: released lock on worker lease 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationRunner] Completing Build operation 'Cross-configure project :lucene:analysis:opennlp' 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Execution worker Thread 4: acquired lock on worker lease 2024-09-06T08:31:11.681+0200 [DEBUG] [org.gradle.internal.resources.AbstractTrackedReso