[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803395995 ## File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java ## @@ -147,6 +165,11 @@ NeighborQueue searchLevel( continue; } +numVisited++; +if (numVisited > visitedLimit) { + throw new CollectionTerminatedException(); Review comment: This may be an abuse of `CollectionTerminatedException`. Another idea would be to try to pass back the information that the search was terminated early in `TopDocs.TotalHits` (but this also doesn't seem ideal). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
Feng Guo created LUCENE-10417: - Summary: IntNRQ task performance decreased in nightly benchmark Key: LUCENE-10417 URL: https://issues.apache.org/jira/browse/LUCENE-10417 Project: Lucene - Core Issue Type: Bug Components: core/codecs Reporter: Feng Guo Probably related to LUCENE-LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10417: -- Description: Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html Probably related to LUCENE-LUCENE-10315, I'll dig. was:Probably related to LUCENE-LUCENE-10315, I'll dig. > IntNRQ task performance decreased in nightly benchmark > -- > > Key: LUCENE-10417 > URL: https://issues.apache.org/jira/browse/LUCENE-10417 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html > Probably related to LUCENE-LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo updated LUCENE-10417: -- Description: Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html Probably related to LUCENE-10315, I'll dig. was: Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html Probably related to LUCENE-LUCENE-10315, I'll dig. > IntNRQ task performance decreased in nightly benchmark > -- > > Key: LUCENE-10417 > URL: https://issues.apache.org/jira/browse/LUCENE-10417 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Reporter: Feng Guo >Priority: Major > > Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html > Probably related to LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10417) IntNRQ task performance decreased in nightly benchmark
[ https://issues.apache.org/jira/browse/LUCENE-10417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Guo reassigned LUCENE-10417: - Assignee: Feng Guo > IntNRQ task performance decreased in nightly benchmark > -- > > Key: LUCENE-10417 > URL: https://issues.apache.org/jira/browse/LUCENE-10417 > Project: Lucene - Core > Issue Type: Bug > Components: core/codecs >Reporter: Feng Guo >Assignee: Feng Guo >Priority: Major > > Link: https://home.apache.org/~mikemccand/lucenebench/2022.02.07.18.02.48.html > Probably related to LUCENE-10315, I'll dig. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #671: Add custom composite action to set up CI environments
mocobeta commented on a change in pull request #671: URL: https://github.com/apache/lucene/pull/671#discussion_r803409108 ## File path: .github/workflows/gradle-precommit.yml ## @@ -26,12 +26,9 @@ jobs: steps: - uses: actions/checkout@v2 -- name: Set up JDK - uses: actions/setup-java@v2 +- uses: ./.github/actions/setup-action with: -distribution: 'adopt-hotspot' java-version: ${{ matrix.java }} Review comment: I don't think the strategy matrix can be shared across workflows. If the target Java versions have to be hard-coded in workflow files anyway, the shared action in this PR wouldn't be much help for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10418) Improve Query rewriting for non-scoring clauses
Adrien Grand created LUCENE-10418: - Summary: Improve Query rewriting for non-scoring clauses Key: LUCENE-10418 URL: https://issues.apache.org/jira/browse/LUCENE-10418 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand Query rewriting is occasionally important for performance, e.g. it may allow using an optimized bulk scorer instead of the default bulk scorer like in the example from LUCENE-10412. One case when we could simplify queries is in the non-scoring case. All layers of query wrappers that only affect scoring like BoostQuery and ConstantScore query can be removed, which might help identify new opportunities for rewriting. For instance, we have several rewrite rules that optimize for MatchAllDocsQuery and would fail to recognize it if it is behind a ConstantScoreQuery or a BoostQuery. Boolean queries can also simplify themselves in the non-scoring case, by changing MUST clauses to FILTER clauses, or removing fully optional SHOULD clauses. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10418) Improve Query rewriting for non-scoring clauses
[ https://issues.apache.org/jira/browse/LUCENE-10418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490053#comment-17490053 ] Adrien Grand commented on LUCENE-10418: --- I initially thought of adding a `boolean needsScores` parameter to {{Query#rewrite}} to address this case, but non-scoring optimizations are mostly applicable to ConstantScoreQuery, BoostQuery and BooleanQuery so I gave a try at an approach that only specializes rewriting for these queries while keeping the existing API. > Improve Query rewriting for non-scoring clauses > --- > > Key: LUCENE-10418 > URL: https://issues.apache.org/jira/browse/LUCENE-10418 > Project: Lucene - Core > Issue Type: Task >Reporter: Adrien Grand >Priority: Minor > > Query rewriting is occasionally important for performance, e.g. it may allow > using an optimized bulk scorer instead of the default bulk scorer like in the > example from LUCENE-10412. > One case when we could simplify queries is in the non-scoring case. All > layers of query wrappers that only affect scoring like BoostQuery and > ConstantScore query can be removed, which might help identify new > opportunities for rewriting. For instance, we have several rewrite rules that > optimize for MatchAllDocsQuery and would fail to recognize it if it is behind > a ConstantScoreQuery or a BoostQuery. Boolean queries can also simplify > themselves in the non-scoring case, by changing MUST clauses to FILTER > clauses, or removing fully optional SHOULD clauses. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #668: LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser
romseygeek commented on a change in pull request #668: URL: https://github.com/apache/lucene/pull/668#discussion_r803476165 ## File path: lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/nodes/intervalfn/FuzzyTerm.java ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queryparser.flexible.standard.nodes.intervalfn; + +import java.util.Locale; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.index.Term; +import org.apache.lucene.queries.intervals.Intervals; +import org.apache.lucene.queries.intervals.IntervalsSource; +import org.apache.lucene.search.FuzzyQuery; + +/** + * An interval function equivalent to {@link FuzzyQuery}. A fuzzy term expands to a disjunction of + * intervals of terms that are within the specified {@code maxEdits} from the provided term. A limit + * of {@code maxExpansions} prevents the internal implementation from blowing up on too many + * potential candidate terms. + */ +public class FuzzyTerm extends IntervalFunction { + private final String term; + private final int maxEdits; + private final Integer maxExpansions; + + public FuzzyTerm(String term, Integer maxEdits, Integer maxExpansions) { +this.term = term; +this.maxEdits = maxEdits == null ? FuzzyQuery.defaultMaxEdits : maxEdits; +this.maxExpansions = maxExpansions == null ? Intervals.DEFAULT_MAX_EXPANSIONS : maxExpansions; + } + + @Override + public IntervalsSource toIntervalSource(String field, Analyzer analyzer) { +var fuzzyQuery = new FuzzyQuery(new Term(field, term), maxEdits); Review comment: A static method on FuzzyQuery fits with what we have elsewhere for PrefixQuery and WildcardQuery, let's do that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
gautamworah96 commented on a change in pull request #658: URL: https://github.com/apache/lucene/pull/658#discussion_r803485066 ## File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java ## @@ -369,6 +369,52 @@ private void intersect(IntersectVisitor visitor, PointTree pointTree) throws IOE } } + /** + * Finds the number of points matching the provided range conditions. Using this method is faster + * than calling {@link #intersect(IntersectVisitor)} to get the count of intersecting points. This + * method does not enforce live documents, therefore it should only be used when there are no + * deleted documents. + */ + public final long countPoints(IntersectVisitor visitor) throws IOException { +final PointTree pointTree = getPointTree(); +long countPoints = countPoints(visitor, pointTree); +assert pointTree.moveToParent() +== false; // just checking to make sure we ended the tree search at the root node +return countPoints; + } + + private long countPoints(IntersectVisitor visitor, PointTree pointTree) throws IOException { +Relation r = visitor.compare(pointTree.getMinPackedValue(), pointTree.getMaxPackedValue()); +switch (r) { + case CELL_OUTSIDE_QUERY: +// This cell is fully outside the query shape: return 0 as the count of its nodes +return 0; + case CELL_INSIDE_QUERY: +// This cell is fully inside the query shape: return the size of the entire node as the +// count +return pointTree.size(); + case CELL_CROSSES_QUERY: +/* +The cell crosses the shape boundary, or the cell fully contains the query, so we fall +through and do full counting. +*/ +if (pointTree.moveToChild()) { + int cellCount = 0; + do { +cellCount += countPoints(visitor, pointTree); + } while (pointTree.moveToSibling()); + pointTree.moveToParent(); + return cellCount; +} else { + // we have reached a leaf node here. + pointTree.visitDocValues(visitor); + return 0; // the visitor has safely recorded the number of leaf nodes that matched +} + default: +throw new IllegalArgumentException("Unreachable code"); +} + } + Review comment: Got it. Makes sense. This implementation is only dealing with query specific loopholes. `PointValues` has nothing to do with these query level optimizations. Fixed in the next commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
gautamworah96 commented on a change in pull request #658: URL: https://github.com/apache/lucene/pull/658#discussion_r803485794 ## File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java ## @@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws IOException { return scorerSupplier.get(Long.MAX_VALUE); } + @Override + public int count(LeafReaderContext context) throws IOException { +LeafReader reader = context.reader(); + +PointValues values = reader.getPointValues(field); +if (checkValidPointValues(values) == false) { + return 0; +} + +if (reader.hasDeletions() == false +&& numDims == 1 +&& values.getDocCount() == values.size()) { + // if all documents have at-most one point + final int[] intersectingLeafNodeCount = {0}; + // create a custom IntersectVisitor that records the number of leafNodes that matched + final IntersectVisitor visitor = + new IntersectVisitor() { +@Override +public void visit(int docID) { + intersectingLeafNodeCount[0]++; Review comment: Done. Thanks for the idea @iverase. Looks much cleaner now (+ removes the inconsistency of adding the leaf node count separately). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on a change in pull request #668: LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser
dweiss commented on a change in pull request #668: URL: https://github.com/apache/lucene/pull/668#discussion_r803527778 ## File path: lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/nodes/intervalfn/FuzzyTerm.java ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queryparser.flexible.standard.nodes.intervalfn; + +import java.util.Locale; +import org.apache.lucene.analysis.Analyzer; +import org.apache.lucene.index.Term; +import org.apache.lucene.queries.intervals.Intervals; +import org.apache.lucene.queries.intervals.IntervalsSource; +import org.apache.lucene.search.FuzzyQuery; + +/** + * An interval function equivalent to {@link FuzzyQuery}. A fuzzy term expands to a disjunction of + * intervals of terms that are within the specified {@code maxEdits} from the provided term. A limit + * of {@code maxExpansions} prevents the internal implementation from blowing up on too many + * potential candidate terms. + */ +public class FuzzyTerm extends IntervalFunction { + private final String term; + private final int maxEdits; + private final Integer maxExpansions; + + public FuzzyTerm(String term, Integer maxEdits, Integer maxExpansions) { +this.term = term; +this.maxEdits = maxEdits == null ? FuzzyQuery.defaultMaxEdits : maxEdits; +this.maxExpansions = maxExpansions == null ? Intervals.DEFAULT_MAX_EXPANSIONS : maxExpansions; + } + + @Override + public IntervalsSource toIntervalSource(String field, Analyzer analyzer) { +var fuzzyQuery = new FuzzyQuery(new Term(field, term), maxEdits); Review comment: I have already - see commit 86c9756 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
iverase commented on a change in pull request #658: URL: https://github.com/apache/lucene/pull/658#discussion_r803536317 ## File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java ## @@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException { return scorerSupplier.get(Long.MAX_VALUE); } + @Override + public int count(LeafReaderContext context) throws IOException { +LeafReader reader = context.reader(); + +PointValues values = reader.getPointValues(field); +if (checkValidPointValues(values) == false) { + return 0; +} + +if (reader.hasDeletions() == false +&& numDims == 1 +&& values.getDocCount() == values.size()) { + // if all documents have at-most one point + return (int) pointCount(values.getPointTree(), this::relate, this::matches); +} +return super.count(context); + } + + /** + * Finds the number of points matching the provided range conditions. Using this method is + * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of + * intersecting points. This method does not enforce live documents, therefore it should only + * be used when there are no deleted documents. + * + * @param pointTree start node of the count operation + * @param nodeComparator comparator to be used for checking whether the internal node is + * inside the range + * @param leafComparator comparator to be used for checking whether the leaf node is inside + * the range + * @return count of points that match the range + */ + private long pointCount( + PointValues.PointTree pointTree, + BiFunction nodeComparator, + Predicate leafComparator) + throws IOException { +final int[] matchingLeafNodeCount = {0}; +// create a custom IntersectVisitor that records the number of leafNodes that matched +final IntersectVisitor visitor = +new IntersectVisitor() { + @Override + public void visit(int docID) { +// this branch should be unreachable +throw new UnsupportedOperationException( +"This IntersectVisitor does not perform any actions on a " ++ "docID=" ++ docID ++ " node being visited"); + } + + @Override + public void visit(int docID, byte[] packedValue) { +if (leafComparator.test(packedValue)) { + matchingLeafNodeCount[0]++; +} + } + + @Override + public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { +return nodeComparator.apply(minPackedValue, maxPackedValue); + } +}; +Relation r = Review comment: I ythink we should move the recursive part into its own method and reuse the IntersectVisitor? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
iverase commented on a change in pull request #658: URL: https://github.com/apache/lucene/pull/658#discussion_r803536317 ## File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java ## @@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException { return scorerSupplier.get(Long.MAX_VALUE); } + @Override + public int count(LeafReaderContext context) throws IOException { +LeafReader reader = context.reader(); + +PointValues values = reader.getPointValues(field); +if (checkValidPointValues(values) == false) { + return 0; +} + +if (reader.hasDeletions() == false +&& numDims == 1 +&& values.getDocCount() == values.size()) { + // if all documents have at-most one point + return (int) pointCount(values.getPointTree(), this::relate, this::matches); +} +return super.count(context); + } + + /** + * Finds the number of points matching the provided range conditions. Using this method is + * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of + * intersecting points. This method does not enforce live documents, therefore it should only + * be used when there are no deleted documents. + * + * @param pointTree start node of the count operation + * @param nodeComparator comparator to be used for checking whether the internal node is + * inside the range + * @param leafComparator comparator to be used for checking whether the leaf node is inside + * the range + * @return count of points that match the range + */ + private long pointCount( + PointValues.PointTree pointTree, + BiFunction nodeComparator, + Predicate leafComparator) + throws IOException { +final int[] matchingLeafNodeCount = {0}; +// create a custom IntersectVisitor that records the number of leafNodes that matched +final IntersectVisitor visitor = +new IntersectVisitor() { + @Override + public void visit(int docID) { +// this branch should be unreachable +throw new UnsupportedOperationException( +"This IntersectVisitor does not perform any actions on a " ++ "docID=" ++ docID ++ " node being visited"); + } + + @Override + public void visit(int docID, byte[] packedValue) { +if (leafComparator.test(packedValue)) { + matchingLeafNodeCount[0]++; +} + } + + @Override + public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { +return nodeComparator.apply(minPackedValue, maxPackedValue); + } +}; +Relation r = Review comment: I think we should move the recursive part into its own method and reuse the IntersectVisitor? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-10419: Assignee: Dawid Weiss > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
Dawid Weiss created LUCENE-10419: Summary: Identify occasional validateSourcePatterns error on CI servers Key: LUCENE-10419 URL: https://issues.apache.org/jira/browse/LUCENE-10419 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss {code} What went wrong: Execution failed for task ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 {code} This annoys me. It's a message from stringbuilder.substring somewhere - let's get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490121#comment-17490121 ] ASF subversion and git services commented on LUCENE-10419: -- Commit 1f1da12c89baea3db689135cf4325d231c7025f3 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1f1da12 ] LUCENE-10419: add debugging code. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490124#comment-17490124 ] ASF subversion and git services commented on LUCENE-10419: -- Commit 9289b94329adcf712c72bb2cbe056c161b7d7188 in lucene's branch refs/heads/branch_9x from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9289b94 ] LUCENE-10419: add debugging code. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss merged pull request #668: LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser
dweiss merged pull request #668: URL: https://github.com/apache/lucene/pull/668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10414) Add fn:fuzzyTerm interval function to flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-10414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490129#comment-17490129 ] ASF subversion and git services commented on LUCENE-10414: -- Commit f6cebac3337926ca871b922241976a4ba4799c70 in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f6cebac ] LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (#668) > Add fn:fuzzyTerm interval function to flexible query parser > --- > > Key: LUCENE-10414 > URL: https://issues.apache.org/jira/browse/LUCENE-10414 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Time Spent: 1h 10m > Remaining Estimate: 0h > > Searching for "fuzzy" terms within interval expressions is currently > impossible. The Intervals class does expose the necessary low-level machinery > to make it happen though. > > PR: [https://github.com/apache/lucene/pull/668/files] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10414) Add fn:fuzzyTerm interval function to flexible query parser
[ https://issues.apache.org/jira/browse/LUCENE-10414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490130#comment-17490130 ] ASF subversion and git services commented on LUCENE-10414: -- Commit 9a293da5967ff272529a532106e64baecf28f24c in lucene's branch refs/heads/branch_9x from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9a293da ] LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (#668) > Add fn:fuzzyTerm interval function to flexible query parser > --- > > Key: LUCENE-10414 > URL: https://issues.apache.org/jira/browse/LUCENE-10414 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > Time Spent: 1h 10m > Remaining Estimate: 0h > > Searching for "fuzzy" terms within interval expressions is currently > impossible. The Intervals class does expose the necessary low-level machinery > to make it happen though. > > PR: [https://github.com/apache/lucene/pull/668/files] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490161#comment-17490161 ] Praveen Nishchal commented on LUCENE-8739: -- Hi Adrien, Thank you for your feedback! I am a little unclear as to why we should wait for Panama to have a new JNI-based codec? That codec will not be part of the Lucene core, but as mentioned it will be an unofficial codec included under Lucene/codecs? Given the tremendous performance benefits shouldn’t the customers (users) be allowed to use JNI in their deployments if they chose to? > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10420) Move functional interfaces in IOUtils to top-level interfaces
Tomoko Uchida created LUCENE-10420: -- Summary: Move functional interfaces in IOUtils to top-level interfaces Key: LUCENE-10420 URL: https://issues.apache.org/jira/browse/LUCENE-10420 Project: Lucene - Core Issue Type: Improvement Reporter: Tomoko Uchida Suggested at https://github.com/apache/lucene/pull/643#discussion_r802285404. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490174#comment-17490174 ] Adrien Grand commented on LUCENE-8739: -- My opinion is that there are interesting benefits, but they are not worth the cost of adding an extra dependency on the library that provides the JNI bindings. Sure it performs better on retrieval than BEST_COMPRESSION, but if retrieval is what a user cares most about then BEST_SPEED is an even better option. > ZSTD Compressor support in Lucene > - > > Key: LUCENE-8739 > URL: https://issues.apache.org/jira/browse/LUCENE-8739 > Project: Lucene - Core > Issue Type: New Feature > Components: core/codecs >Reporter: Sean Torres >Priority: Minor > Labels: features > Attachments: image-2022-01-11-02-18-11-402.png, > image-2022-01-11-02-18-57-752.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > ZStandard has a great speed and compression ratio tradeoff. > ZStandard is open source compression from Facebook. > More about ZSTD > [https://github.com/facebook/zstd] > [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #643: LUCENE-10400: revise binary dictionaries' constructor in kuromoji
mocobeta commented on a change in pull request #643: URL: https://github.com/apache/lucene/pull/643#discussion_r803637475 ## File path: lucene/core/src/java/org/apache/lucene/util/IOUtils.java ## @@ -526,4 +526,17 @@ public static void fsync(Path fileToSync, boolean isDir) throws IOException { public interface IOFunction { R apply(T t) throws IOException; } + + /** + * A resource supplier function that may throw an IOException. + * + * Note that this would open a resource such as a File. Consumers should make sure to close the + * resource (e.g., use try-with-resources) + * + * @see java.util.function.Supplier + */ + @FunctionalInterface Review comment: Hi, could anybody review #673? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490193#comment-17490193 ] Dawid Weiss commented on LUCENE-10419: -- Captured this: {code:java} > Task :lucene:analysis:icu:validateSourcePatterns FAILED java.lang.StringIndexOutOfBoundsException: start 1, end 854, length 0 at java.base/java.lang.AbstractStringBuilder.checkRangeSIOOBE(AbstractStringBuilder.java:1810) at java.base/java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:1070) at java.base/java.lang.StringBuilder.substring(StringBuilder.java:87) at java.base/java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:1022) at java.base/java.lang.StringBuilder.substring(StringBuilder.java:87) at org.apache.rat.analysis.license.FullTextMatchingLicense.match(FullTextMatchingLicense.java:100) at org.apache.rat.analysis.util.HeaderMatcherMultiplexer.match(HeaderMatcherMultiplexer.java:40) at org.apache.rat.analysis.IHeaderMatcher$match$0.call(Unknown Source) at ValidateSourcePatternsTask$_check_closure2$_closure6.doCall(/home/jenkins/workspace/Lucene-main-Linux/gradle/validation/validate-source-patterns.gradle:177) at jdk.internal.reflect.GeneratedMethodAccessor698.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.BooleanReturningMethodInvoker.invoke(BooleanReturningMethodInvoker.java:49) at org.codehaus.groovy.runtime.callsite.BooleanClosureWrapper.call(BooleanClosureWrapper.java:52) at org.codehaus.groovy.runtime.DefaultGroovyMethods.any(DefaultGroovyMethods.java:2642) at org.codehaus.groovy.runtime.DefaultGroovyMethods.any(DefaultGroovyMethods.java:2674) at org.codehaus.groovy.runtime.dgm$13.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:247) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139) at ValidateSourcePatternsTask$_check_closure2.doCall(/home/jenkins/workspace/Lucene-main-Linux/gradle/validation/validate-source-patterns.gradle:177) at jdk.internal.reflect.GeneratedMethodAccessor699.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148) at ValidateSourcePatternsTask$_check_closure3.doCall(/home/jenkins/workspace/Lucene-main-Linux/gradle/validation/validate-source-patterns.gradle:186) at jdk.internal.reflect.GeneratedMethodAccessor702.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:38) at ValidateSourcePatternsTask$_check_closure5.doCall(/home/jenkins/workspace/Lucene-main-Linux/gradle/validation/validate-source-patterns.gradle:244) at jdk.internal.reflect.GeneratedMethodAccessor700.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethod
[GitHub] [lucene] msokolov commented on a change in pull request #673: LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces
msokolov commented on a change in pull request #673: URL: https://github.com/apache/lucene/pull/673#discussion_r803661345 ## File path: lucene/core/src/java/org/apache/lucene/util/IOUtils.java ## @@ -521,22 +523,11 @@ public static void fsync(Path fileToSync, boolean isDir) throws IOException { * A Function that may throw an IOException * * @see java.util.function.Function + * @deprecated was replaced by {@link org.apache.lucene.util.IOFunction}. */ @FunctionalInterface + @Deprecated(forRemoval = true, since = "9.1") public interface IOFunction { R apply(T t) throws IOException; } - - /** - * A resource supplier function that may throw an IOException. - * - * Note that this would open a resource such as a File. Consumers should make sure to close the - * resource (e.g., use try-with-resources) - * - * @see java.util.function.Supplier - */ - @FunctionalInterface - public interface IOSupplier { Review comment: just curious; why are we able to remove this one, while the others are merely deprecated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10177) Rename VectorValues#dimension to VectorValues#getNumDimensions?
[ https://issues.apache.org/jira/browse/LUCENE-10177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490196#comment-17490196 ] Michael Sokolov commented on LUCENE-10177: -- Heh, I prefer {{dimension()}} and would probably do the rename in the other direction, but I won't block this > Rename VectorValues#dimension to VectorValues#getNumDimensions? > --- > > Key: LUCENE-10177 > URL: https://issues.apache.org/jira/browse/LUCENE-10177 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Major > > This would make it consistent with PointValues#getNumDimensions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490212#comment-17490212 ] Uwe Schindler commented on LUCENE-10419: Hi, Wouldn't it a good idea to pass --stacktrace by default on Jenkins jobs? I can change this. This would have made the debugging code obsolete. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490214#comment-17490214 ] Uwe Schindler commented on LUCENE-10419: Looks like a bug in Rat. Maybe it found an empty file? > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490218#comment-17490218 ] Uwe Schindler commented on LUCENE-10419: We should log file path in the catch block. Maybe it tried some binary ICU file or as said before an empty one. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a change in pull request #630: LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order
mikemccand commented on a change in pull request #630: URL: https://github.com/apache/lucene/pull/630#discussion_r803674222 ## File path: lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java ## @@ -84,6 +99,28 @@ public void execute() throws Exception { } executor.shutdown(); } +List ordered = new ArrayList<>(); +try (IndexReader reader = DirectoryReader.open(output)) { + for (DocumentSelector ds : documentSelectors) { +boolean found = false; +for (LeafReaderContext context : reader.leaves()) { + SegmentReader sr = (SegmentReader) context.reader(); + if (ds.getFilteredLiveDocs(sr).nextSetBit(0) != DocIdSetIterator.NO_MORE_DOCS) { +if (found) { + throw new IllegalStateException( + "A document selector can't match more than 1 rearranged segments"); Review comment: Hmm maybe include some details in the exception message about which doc(s) in which segment(s) were duplicated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
msokolov commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803668310 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -76,17 +81,23 @@ public KnnVectorQuery(String field, float[] target, int k, Query filter) { @Override public Query rewrite(IndexReader reader) throws IOException { -BitSet[] bitSets = null; +TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()]; +BitSetCollector filterCollector = null; if (filter != null) { + filterCollector = new BitSetCollector(reader.leaves().size()); IndexSearcher indexSearcher = new IndexSearcher(reader); - bitSets = new BitSet[reader.leaves().size()]; - indexSearcher.search(filter, new BitSetCollector(bitSets)); + indexSearcher.search(filter, filterCollector); Review comment: for another day, but I am realizing that we have no opportunity to make use of per-segment concurrency here, as we ordinarily do in `IndexSearcher.search()`. To do so, we'd need to consider some API change though. Perhaps instead of using `rewrite` for this, we could make use of `Query`'s two-phase iteration mode of operation. Just a thought for later - I'll go open an issue elsewhere. ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { +// We stopped the kNN search because it visited too many nodes, so fall back to exact search +return exactSearch(ctx, target, k, filterIterator); + } } + } -TopDocs results = ctx.reader().searchNearestVectors(field, target, kPerLeaf, bitsFilter); -if (results == null) { + private TopDocs exactSearch( + LeafReaderContext context, float[] target, int k, DocIdSetIterator acceptIterator) + throws IOException { +FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field); +if (fi == null || fi.getVectorDimension() == 0) { + // The field does not exist or does not index vectors return NO_RESULTS; } -if (ctx.docBase > 0) { - for (ScoreDoc scoreDoc : results.scoreDocs) { -scoreDoc.doc += ctx.docBase; - } + +VectorSimilarityFunction similarityFunction = fi.getVectorSimilarityFunction(); +VectorValues vectorValues = context.reader().getVectorValues(field); + +HitQueue queue = new HitQueue(k, false); Review comment: Did you consider using the pre-populated version? We might be creating and discarding a lot of `ScoreDoc`s here. ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.g
[GitHub] [lucene] mocobeta commented on a change in pull request #673: LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces
mocobeta commented on a change in pull request #673: URL: https://github.com/apache/lucene/pull/673#discussion_r803700144 ## File path: lucene/core/src/java/org/apache/lucene/util/IOUtils.java ## @@ -521,22 +523,11 @@ public static void fsync(Path fileToSync, boolean isDir) throws IOException { * A Function that may throw an IOException * * @see java.util.function.Function + * @deprecated was replaced by {@link org.apache.lucene.util.IOFunction}. */ @FunctionalInterface + @Deprecated(forRemoval = true, since = "9.1") public interface IOFunction { R apply(T t) throws IOException; } - - /** - * A resource supplier function that may throw an IOException. - * - * Note that this would open a resource such as a File. Consumers should make sure to close the - * resource (e.g., use try-with-resources) - * - * @see java.util.function.Supplier - */ - @FunctionalInterface - public interface IOSupplier { Review comment: This was added in #643 by me and is still not shipped to the public (I will remove this also from the 9x branch.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490270#comment-17490270 ] Dawid Weiss commented on LUCENE-10419: -- I do log the path - see the bottom of that quote. I don't have the time to look into this now - will do it later. Indeed looks like a bug in rat somewhere. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #673: LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces
mocobeta commented on pull request #673: URL: https://github.com/apache/lucene/pull/673#issuecomment-1034956590 Thanks @msokolov for taking a look. I will keep this open for a day or two, then merge it if there is no disapproval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on a change in pull request #672: LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case.
msokolov commented on a change in pull request #672: URL: https://github.com/apache/lucene/pull/672#discussion_r803701669 ## File path: lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java ## @@ -191,51 +191,55 @@ boolean isPureDisjunction() { return clauses.iterator(); } - private BooleanQuery rewriteNoScoring() { -boolean keepShould = + BooleanQuery rewriteNoScoring() { +boolean actuallyRewritten = false; +BooleanQuery.Builder newQuery = +new BooleanQuery.Builder().setMinimumNumberShouldMatch(getMinimumNumberShouldMatch()); + +final boolean keepShould = getMinimumNumberShouldMatch() > 0 || (clauseSets.get(Occur.MUST).size() + clauseSets.get(Occur.FILTER).size() == 0); -if (clauseSets.get(Occur.MUST).size() == 0 && keepShould) { - return this; -} -BooleanQuery.Builder newQuery = new BooleanQuery.Builder(); - -newQuery.setMinimumNumberShouldMatch(getMinimumNumberShouldMatch()); for (BooleanClause clause : clauses) { - switch (clause.getOccur()) { -case MUST: - { -newQuery.add(clause.getQuery(), Occur.FILTER); -break; - } -case SHOULD: - { -if (keepShould) { - newQuery.add(clause); -} -break; - } -case FILTER: -case MUST_NOT: -default: - { -newQuery.add(clause); - } + Query query = clause.getQuery(); + Query rewritten = ConstantScoreQuery.rewriteNoScoring(query); + BooleanClause.Occur occur = clause.getOccur(); + if (occur == Occur.SHOULD && keepShould == false) { +// ignore clause +actuallyRewritten = true; + } else if (occur == Occur.MUST) { +// replace MUST clauses with FILTER clauses +newQuery.add(rewritten, Occur.FILTER); +actuallyRewritten = true; + } else if (query != rewritten) { +newQuery.add(rewritten, occur); +actuallyRewritten = true; + } else { +newQuery.add(clause); } } +if (actuallyRewritten == false) { + return this; +} + return newQuery.build(); } @Override public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { -BooleanQuery query = this; if (scoreMode.needsScores() == false) { - query = rewriteNoScoring(); + Query rewritten = rewriteNoScoring(); + if (this != rewritten) { +// Pass it back to IndexSearcher#rewrite, which might find new opportunities for rewriting Review comment: this goes beyond the non-scoring case right? In theory it could result in additional rewrites for scoring queries as well? ## File path: lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java ## @@ -63,6 +65,22 @@ public Query rewrite(IndexReader reader) throws IOException { return super.rewrite(reader); } + /** + * Perform some simplifications that are only legal when a query is not expected to produce + * scores. + */ + static Query rewriteNoScoring(Query query) { Review comment: It might be nice to enable other queries to also be aware of the scoring/nonscoring mode? I think we have other queries that can have child queries like `DisjunctionMaxQuery` maybe positional queries? I mean this is already a step forward - progress! Just wondering if there are other opportunities -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a change in pull request #672: LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case.
rmuir commented on a change in pull request #672: URL: https://github.com/apache/lucene/pull/672#discussion_r803745240 ## File path: lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java ## @@ -63,6 +65,22 @@ public Query rewrite(IndexReader reader) throws IOException { return super.rewrite(reader); } + /** + * Perform some simplifications that are only legal when a query is not expected to produce + * scores. + */ + static Query rewriteNoScoring(Query query) { Review comment: I question how much complexity we should add to optimize really degenerate inputs such as `ConstantScoreQuery(DisjunctionMaxQuery())`. I also think it might be better to put such logic here, not in e.g. booleanquery. The optimization is specific to CSQ, no? For example for your DisjunctionMaxQuery case: ```java } else if (query instanceof DisjunctionMaxQuery) { // since we don't care about scoring, turn it into a simple booleanquery... does this even make it faster? var builder = new BooleanQuery.Builder(); for (Query subQuery : (DisjunctionMaxQuery)query) { builder.add(subQuery, Occur.SHOULD); } return builder.build(); } ``` It might also make the logic easier to follow for the BooleanQuery case too, especially the recursive piece. I personally think it is a lot better than adding `if needsScores == false` conditional logic everywhere to that already hairy code. If you move it to CSQ, then there's no conditional anymore, and it just seems like a better home. In general, its messy either way because I think we make it messy. I hate that we have `Query.rewrite` but here now we have it happening in `Query.createWeight` too. It is also unclear to me if this optimization happens for all the correct places, where scores are not needed. This doesn't necessarily mean we need to add more abstractions or API complexity to make it work cleanly. For example in `IndexSearcher.count`, when it has to fall back to `search()` to do the counting, it doesn't need scores. it can wrap the query in a ConstantScoreQuery to get the optimizations. Probably BooleanQuery could do the same with its`FILTER` clauses? It is just one potential option, to really make this "non-scoring rewrite" case easier to optimize everywhere: wrap it in a ConstantScoreQuery and you get all the optimizations. There are probably other alternatives we can consider too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta opened a new pull request #674: trivial updates on github actions
mocobeta opened a new pull request #674: URL: https://github.com/apache/lucene/pull/674 - upgrade actions/setup-java to v2 in the hunspell regression workflow (aligned with the main workflow) - migrate the distribution to 'temurin' ([supported distributions](https://github.com/actions/setup-java#supported-version-syntax)) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #674: trivial updates on github actions
mocobeta commented on pull request #674: URL: https://github.com/apache/lucene/pull/674#issuecomment-1035042297 Do you have a minute to take a look? @dweiss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #674: trivial updates on github actions
dweiss commented on pull request #674: URL: https://github.com/apache/lucene/pull/674#issuecomment-1035046672 It'd be interesting to randomize those distributions using a custom action, perhaps? I have no experience here whatsoever, but I bet it's possible... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803804328 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { Review comment: I agree, also it is an expensive operation to throw an Exception in comparison with a just returning a value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803804328 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { Review comment: I agree and also prefer not to throw an Exception if possible; it is an expensive operation to throw an Exception in comparison with just returning a value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] javanna opened a new pull request #675: LUCENE-10385: Avoid SimpleText codec in TestIndexSortSortedNumericDocValuesRangeQuery
javanna opened a new pull request #675: URL: https://github.com/apache/lucene/pull/675 The recently introduced testCount (added with LUCENE-10385) verifies that the Weight#count optimization kicks in. When SimpleText codec is used, `DocValues#unwrapSingleton` returns null which disables the optimization and makes the test fail. Relates to #635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #675: LUCENE-10385: Avoid SimpleText codec in TestIndexSortSortedNumericDocValuesRangeQuery
jpountz merged pull request #675: URL: https://github.com/apache/lucene/pull/675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #674: trivial updates on github actions
mocobeta commented on pull request #674: URL: https://github.com/apache/lucene/pull/674#issuecomment-1035107842 Thank you, I will merge this soon. > It'd be interesting to randomize those distributions using a custom action, perhaps? I have no experience here whatsoever, but I bet it's possible... I have never tried writing such a complex action, it could be implemented by some bash script. ? (I am not sure the Actions' sandbox gives users what level of flexibility.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta merged pull request #674: trivial updates on github actions
mocobeta merged pull request #674: URL: https://github.com/apache/lucene/pull/674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #674: trivial updates on github actions
dweiss commented on pull request #674: URL: https://github.com/apache/lucene/pull/674#issuecomment-1035112715 I think these actions are javascript, basically. I've never written one myself, so can't help. Don't worry about it, it was just a wild idea - we have lots of randomization elsewhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #671: Add custom composite action to set up CI environments
mocobeta commented on pull request #671: URL: https://github.com/apache/lucene/pull/671#issuecomment-1035126874 I'm closing this. I think we'd need more complex or fully scratched custom actions not to duplicate the JDK set-up across workflows. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta closed pull request #671: Add custom composite action to set up CI environments
mocobeta closed pull request #671: URL: https://github.com/apache/lucene/pull/671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490365#comment-17490365 ] Dawid Weiss commented on LUCENE-10419: -- I did take a look at rat's source code. It looks like a concurrency bug somewhere with the stringbuilder containing junk. I can't reproduce the same error locally though, no matter what. Very strange. I upgraded rat on main to 0.13; can't see how it's going to help but who better than nothing. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490372#comment-17490372 ] ASF subversion and git services commented on LUCENE-10419: -- Commit 21c5b42063e7a82339136f4da1041d1d7d3d3c1f in lucene's branch refs/heads/main from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=21c5b42 ] LUCENE-10419: upgrade rat to 0.13. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #672: LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case.
jpountz commented on a change in pull request #672: URL: https://github.com/apache/lucene/pull/672#discussion_r803883301 ## File path: lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java ## @@ -63,6 +65,22 @@ public Query rewrite(IndexReader reader) throws IOException { return super.rewrite(reader); } + /** + * Perform some simplifications that are only legal when a query is not expected to produce + * scores. + */ + static Query rewriteNoScoring(Query query) { Review comment: This was my reasoning too, `DisjunctionMaxQuery` suggests scoring matters, so it didn't look worth optimizing for. I tried to improve the PR a bit with your ideas @rmuir. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #672: LUCENE-10418: Optimize `Query#rewrite` in the non-scoring case.
jpountz commented on a change in pull request #672: URL: https://github.com/apache/lucene/pull/672#discussion_r803887266 ## File path: lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java ## @@ -114,7 +124,19 @@ public long cost() { @Override public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException { -final Weight innerWeight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1f); +final ScoreMode innerScoreMode; +switch (scoreMode) { + case TOP_SCORES: +innerScoreMode = ScoreMode.COMPLETE_NO_SCORES; +break; + case TOP_DOCS_WITH_SCORES: +innerScoreMode = ScoreMode.TOP_DOCS; +break; + default: +innerScoreMode = scoreMode; +break; +} Review comment: I had to add this because the additional wrapping in IndexSearcher made a couple test fail because ConstantScoreQuery was not propagating the score mode correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803937272 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { Review comment: I can add a comment explaining how I'm using the `BitSetIterator` here to capture both the bitset and the (exact) cardinality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803950304 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { +// We stopped the kNN search because it visited too many nodes, so fall back to exact search +return exactSearch(ctx, target, k, filterIterator); + } } + } -TopDocs results = ctx.reader().searchNearestVectors(field, target, kPerLeaf, bitsFilter); -if (results == null) { + private TopDocs exactSearch( + LeafReaderContext context, float[] target, int k, DocIdSetIterator acceptIterator) + throws IOException { +FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field); +if (fi == null || fi.getVectorDimension() == 0) { + // The field does not exist or does not index vectors return NO_RESULTS; } -if (ctx.docBase > 0) { - for (ScoreDoc scoreDoc : results.scoreDocs) { -scoreDoc.doc += ctx.docBase; - } + +VectorSimilarityFunction similarityFunction = fi.getVectorSimilarityFunction(); +VectorValues vectorValues = context.reader().getVectorValues(field); + +HitQueue queue = new HitQueue(k, false); Review comment: Oh this is good to know about, I'll try to switch over. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r803965732 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { Review comment: I agree, it's nice to avoid using exceptions for normal control flow. I'm not too concerned from a performance perspective though, exceptions aren't thrown in a "hot loop" and I didn't see a perf hit in testing. If we go the route of using `TopDocs`, I'd prefer to avoid 'null' since that's a bit overloaded (indicates the field is missing or does not have vectors). Brainstorming ideas: * Just return `EMPTY_TOPDOCS`. * Still return best score docs and the visited count. But use `EQUAL_TO` for `TotalHits.Relation` if the search completed normally, otherwise use `GREATER_THAN_OR_EQUAL_TO`. * Use a special subtype of `TopDocs` instead, which has an explicit "complete" flag? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
jtibshirani commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r804009517 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -70,18 +118,104 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf) throws IOException { -Bits liveDocs = ctx.reader().getLiveDocs(); -TopDocs results = ctx.reader().searchNearestVectors(field, target, kPerLeaf, liveDocs); -if (results == null) { + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) + throws IOException { + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { +// We stopped the kNN search because it visited too many nodes, so fall back to exact search +return exactSearch(ctx, target, k, filterIterator); + } +} + } + + private TopDocs exactSearch( + LeafReaderContext context, float[] target, int k, DocIdSetIterator acceptIterator) + throws IOException { +FieldInfo fi = context.reader().getFieldInfos().fieldInfo(field); +if (fi == null || fi.getVectorDimension() == 0) { + // The field does not exist or does not index vectors return NO_RESULTS; } -if (ctx.docBase > 0) { - for (ScoreDoc scoreDoc : results.scoreDocs) { -scoreDoc.doc += ctx.docBase; + +VectorSimilarityFunction similarityFunction = fi.getVectorSimilarityFunction(); +VectorValues vectorValues = context.reader().getVectorValues(field); + +HitQueue queue = new HitQueue(k, false); +DocIdSetIterator iterator = +ConjunctionUtils.intersectIterators(List.of(acceptIterator, vectorValues)); Review comment: I just noticed: maybe we should move this intersection earlier to when we execute the filter into a bitset. The way we do it now, our assessment of the filter selectivity is inaccurate when docs are missing vectors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery
mayya-sharipova commented on a change in pull request #656: URL: https://github.com/apache/lucene/pull/656#discussion_r804028845 ## File path: lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java ## @@ -96,43 +107,98 @@ public Query rewrite(IndexReader reader) throws IOException { return createRewrittenQuery(reader, topK); } - private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, Bits bitsFilter) + private TopDocs searchLeaf(LeafReaderContext ctx, int kPerLeaf, BitSetCollector filterCollector) throws IOException { -// If the filter is non-null, then it already handles live docs -if (bitsFilter == null) { - bitsFilter = ctx.reader().getLiveDocs(); + +if (filterCollector == null) { + Bits acceptDocs = ctx.reader().getLiveDocs(); + return ctx.reader() + .searchNearestVectors(field, target, kPerLeaf, acceptDocs, Integer.MAX_VALUE); +} else { + BitSetIterator filterIterator = filterCollector.getIterator(ctx.ord); + if (filterIterator == null || filterIterator.cost() == 0) { +return NO_RESULTS; + } + + if (filterIterator.cost() <= k) { +// If there <= k possible matches, short-circuit and perform exact search, since HNSW must +// always visit at least k documents +return exactSearch(ctx, target, k, filterIterator); + } + + try { +// The filter iterator already incorporates live docs +Bits acceptDocs = filterIterator.getBitSet(); +int visitedLimit = (int) filterIterator.cost(); +return ctx.reader().searchNearestVectors(field, target, kPerLeaf, acceptDocs, visitedLimit); + } catch ( + @SuppressWarnings("unused") + CollectionTerminatedException e) { Review comment: I liked very much of "a special subtype of TopDocs instead, which has an explicit "complete" flag" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gautamworah96 commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
gautamworah96 commented on a change in pull request #658: URL: https://github.com/apache/lucene/pull/658#discussion_r804116906 ## File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java ## @@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException { return scorerSupplier.get(Long.MAX_VALUE); } + @Override + public int count(LeafReaderContext context) throws IOException { +LeafReader reader = context.reader(); + +PointValues values = reader.getPointValues(field); +if (checkValidPointValues(values) == false) { + return 0; +} + +if (reader.hasDeletions() == false +&& numDims == 1 +&& values.getDocCount() == values.size()) { + // if all documents have at-most one point + return (int) pointCount(values.getPointTree(), this::relate, this::matches); +} +return super.count(context); + } + + /** + * Finds the number of points matching the provided range conditions. Using this method is + * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of + * intersecting points. This method does not enforce live documents, therefore it should only + * be used when there are no deleted documents. + * + * @param pointTree start node of the count operation + * @param nodeComparator comparator to be used for checking whether the internal node is + * inside the range + * @param leafComparator comparator to be used for checking whether the leaf node is inside + * the range + * @return count of points that match the range + */ + private long pointCount( + PointValues.PointTree pointTree, + BiFunction nodeComparator, + Predicate leafComparator) + throws IOException { +final int[] matchingLeafNodeCount = {0}; +// create a custom IntersectVisitor that records the number of leafNodes that matched +final IntersectVisitor visitor = +new IntersectVisitor() { + @Override + public void visit(int docID) { +// this branch should be unreachable +throw new UnsupportedOperationException( +"This IntersectVisitor does not perform any actions on a " ++ "docID=" ++ docID ++ " node being visited"); + } + + @Override + public void visit(int docID, byte[] packedValue) { +if (leafComparator.test(packedValue)) { + matchingLeafNodeCount[0]++; +} + } + + @Override + public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { +return nodeComparator.apply(minPackedValue, maxPackedValue); + } +}; +Relation r = Review comment: I've implemented a method signature that I thought would be simpler to understand. It restricts all increment/counting operations to the `matchingNodeCount` array. The second `pointCount` function just returns `void`. IMO, The other slightly complex approach to do this resulted in a method signature like ``` private long pointCount( IntersectVisitor visitor, PointValues.PointTree pointTree, BiFunction nodeComparator, Predicate leafComparator, int[] matchingLeafNodeCount) ``` A [branch](https://github.com/gautamworah96/lucene/commit/fe937df49def4dc3cd512fef6c7d39ef53023fb1) that implements this method signature and adds matchingLeafNodeCount[0] to the final count. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on a change in pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
jtibshirani commented on a change in pull request #649: URL: https://github.com/apache/lucene/pull/649#discussion_r804189512 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java ## @@ -138,9 +140,20 @@ public void writeField(FieldInfo fieldInfo, KnnVectorsReader knnVectorsReader) long vectorIndexOffset = vectorIndex.getFilePointer(); // build the graph using the temporary vector data + int count = docsWithField.cardinality(); + int[] docIds = null; + if (count < maxDoc) { Review comment: Although it was a bit fragile, I preferred the previous approach of passing `null` with a clear comment. Now it seems like we're doing (potentially significant?) extra work that will not be used. ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsWriter.java ## @@ -206,14 +214,19 @@ private void writeMeta( meta.writeVLong(vectorIndexOffset); meta.writeVLong(vectorIndexLength); meta.writeInt(field.getVectorDimension()); -meta.writeInt(docIds.length); -for (int docId : docIds) { - // TODO: delta-encode, or write as bitset - meta.writeVInt(docId); + +// write docIDs +meta.writeInt(count); +if (docIds == null) { + meta.writeShort((short) -1); // dense marker, each document has a vector value Review comment: Any reason not to use `writeByte` here? ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java ## @@ -372,7 +393,9 @@ int size() { implements RandomAccessVectorValues, RandomAccessVectorValuesProducer { final int dimension; +final int size; Review comment: Small comment, maybe we can make all of these variables (including the new ones) private. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors
jtibshirani commented on pull request #649: URL: https://github.com/apache/lucene/pull/649#issuecomment-1035648183 Additional motivation for this PR: it could help with performance of exact search (in https://github.com/apache/lucene/pull/656). When all docs have vectors, we can avoid a binary search in `VectorValues#advance`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10176) Remove VectorValues#size()
[ https://issues.apache.org/jira/browse/LUCENE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490670#comment-17490670 ] spike liu commented on LUCENE-10176: I would like to work on this. > Remove VectorValues#size() > -- > > Key: LUCENE-10176 > URL: https://issues.apache.org/jira/browse/LUCENE-10176 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Major > > This method doesn't seem to be used anywhere except by > SimpleTextKnnVectorsReader#search, which uses it in an incorrect way by using > it as the total number of hits matching a nearest-neighbor search (it is > incorrect because this number might be higher than the number of vectors > having a value because of deletes). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] spike-liu opened a new pull request #676: Lucene-10176: Remove VectorValues#size()
spike-liu opened a new pull request #676: URL: https://github.com/apache/lucene/pull/676 https://issues.apache.org/jira/browse/LUCENE-10176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10176) Remove VectorValues#size()
[ https://issues.apache.org/jira/browse/LUCENE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490671#comment-17490671 ] spike liu commented on LUCENE-10176: https://github.com/apache/lucene/pull/676 > Remove VectorValues#size() > -- > > Key: LUCENE-10176 > URL: https://issues.apache.org/jira/browse/LUCENE-10176 > Project: Lucene - Core > Issue Type: Bug >Reporter: Adrien Grand >Priority: Major > > This method doesn't seem to be used anywhere except by > SimpleTextKnnVectorsReader#search, which uses it in an incorrect way by using > it as the total number of hits matching a nearest-neighbor search (it is > incorrect because this number might be higher than the number of vectors > having a value because of deletes). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #674: trivial updates on github actions
mocobeta commented on pull request #674: URL: https://github.com/apache/lucene/pull/674#issuecomment-1035913042 I am not sure if randomizing the distribution per test run is possible without forking the setup-java action, but I think a matrix test may be easy (if it makes sense to run workflows for multiple distributions on every PR.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10419) Identify occasional validateSourcePatterns error on CI servers
[ https://issues.apache.org/jira/browse/LUCENE-10419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490698#comment-17490698 ] Dawid Weiss commented on LUCENE-10419: -- {code:java} Unhandled exception while validating patterns on file: /home/jenkins/workspace/Lucene-9.x-Linux/lucene/test-framework/src/java/org/apache/lucene/tests/analysis/standard/WordBreakTestUnicode_12_1_0.java{code} Different file. This has to be a race condition or a JVM bug somewhere on your machine, Uwe. This doesn't happen anywhere else as far as I remember - only on policeman jenkins. Very strange. > Identify occasional validateSourcePatterns error on CI servers > -- > > Key: LUCENE-10419 > URL: https://issues.apache.org/jira/browse/LUCENE-10419 > Project: Lucene - Core > Issue Type: Bug >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Trivial > > {code} > What went wrong: Execution failed for task > ':lucene:analysis:icu:validateSourcePatterns'. > start 1, end 0, length 0 > {code} > > This annoys me. It's a message from stringbuilder.substring somewhere - let's > get the stack of that first and see where the bug is. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery
iverase commented on a change in pull request #658: URL: https://github.com/apache/lucene/pull/658#discussion_r804415672 ## File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java ## @@ -369,6 +378,100 @@ public Scorer scorer(LeafReaderContext context) throws IOException { return scorerSupplier.get(Long.MAX_VALUE); } + @Override + public int count(LeafReaderContext context) throws IOException { +LeafReader reader = context.reader(); + +PointValues values = reader.getPointValues(field); +if (checkValidPointValues(values) == false) { + return 0; +} + +if (reader.hasDeletions() == false +&& numDims == 1 +&& values.getDocCount() == values.size()) { + // if all documents have at-most one point + return (int) pointCount(values.getPointTree(), this::relate, this::matches); +} +return super.count(context); + } + + /** + * Finds the number of points matching the provided range conditions. Using this method is + * faster than calling {@link PointValues#intersect(IntersectVisitor)} to get the count of + * intersecting points. This method does not enforce live documents, therefore it should only + * be used when there are no deleted documents. + * + * @param pointTree start node of the count operation + * @param nodeComparator comparator to be used for checking whether the internal node is + * inside the range + * @param leafComparator comparator to be used for checking whether the leaf node is inside + * the range + * @return count of points that match the range + */ + private long pointCount( + PointValues.PointTree pointTree, + BiFunction nodeComparator, + Predicate leafComparator) + throws IOException { +final int[] matchingLeafNodeCount = {0}; +// create a custom IntersectVisitor that records the number of leafNodes that matched +final IntersectVisitor visitor = +new IntersectVisitor() { + @Override + public void visit(int docID) { +// this branch should be unreachable +throw new UnsupportedOperationException( +"This IntersectVisitor does not perform any actions on a " ++ "docID=" ++ docID ++ " node being visited"); + } + + @Override + public void visit(int docID, byte[] packedValue) { +if (leafComparator.test(packedValue)) { + matchingLeafNodeCount[0]++; +} + } + + @Override + public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { +return nodeComparator.apply(minPackedValue, maxPackedValue); + } +}; +Relation r = Review comment: That is correct but why are you passing the `nodeComparator` and the `leafComparator` here? there not needed anymore as they are part of the IntersectVisitor, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org