[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()
[ https://issues.apache.org/jira/browse/LUCENE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498799#comment-17498799 ] Alan Woodward commented on LUCENE-10431: I think the issue is that BooleanQuery is expecting all its subqueries to be immutable, but MultiTermQuery isn't - you can set the rewrite method, which changes the hash. I think ideally we'd make MTQ properly immutable and have the rewrite method as part of the constructor, especially as there are already cases like FuzzyQuery that have specific rewrite methods that shouldn't be externally settable, but that is a pretty big change. A more immediate fix would be to remove the rewrite method from MTQ's hash calculation. > AssertionError in BooleanQuery.hashCode() > - > > Key: LUCENE-10431 > URL: https://issues.apache.org/jira/browse/LUCENE-10431 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.11.1 >Reporter: Michael Bien >Priority: Major > > Hello devs, > the constructor of BooleanQuery can under some circumstances trigger a hash > code computation before "clauseSets" is fully filled. Since BooleanClause is > using its query field for the hash code too, it can happen that the "wrong" > hash code is stored, since adding the clause to the set triggers its > hashCode(). > If assertions are enabled the check in BooleanQuery, which recomputes the > hash code, will notice it and throw an error. > exception: > {code:java} > java.lang.AssertionError > at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614) > at java.base/java.util.Objects.hashCode(Objects.java:103) > at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298) > at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527) > at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119) > at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717) > at java.base/java.util.EnumMap.hashCode(EnumMap.java:709) > at java.base/java.util.Arrays.hashCode(Arrays.java:4498) > at java.base/java.util.Objects.hash(Objects.java:133) > at > org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597) > at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611) > at java.base/java.util.HashMap.hash(HashMap.java:340) > at java.base/java.util.HashMap.put(HashMap.java:612) > at org.apache.lucene.search.Multiset.add(Multiset.java:82) > at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154) > at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42) > at > org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133) > {code} > I noticed this while trying to upgrade the NetBeans maven indexer modules > from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422
romseygeek commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r815721410 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SearcherManager; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex extends QueryIndex { + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.queries = new HashMap<>(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); +this.populateQueryCache(serializer, decomposer); + } + + @Override + public void commit(List updates) throws IOException { +throw new IllegalStateException("Monitor is readOnly cannot commit"); + } + + @Override + long search(final Query query, QueryCollector matcher) throws IOException { +QueryBuilder builder = termFilter -> query; +return search(builder, matcher); + } + + @Override + public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws IOException { +IndexSearcher searcher = null; +try { + searcher = manager.acquire(); + return searchInMemory(queryBuilder, matcher, searcher, this.queries); +} finally { + if (searcher != null) { +manager.release(searcher); + } +} + } + + @Override + public void purgeCache() throws IOException { +this.populateQueryCache(serializer, decomposer); +lastPurged = System.nanoTime(); + } + + @Override + void purgeCache(CachePopulator populator) throws IOException { +manager.maybeRefresh(); Review comment: I think actually the best solution is to remove the query cache entirely for this impl, which is where you started out - sorry for all the back and forth here. We can have a background thread that calls maybeRefresh() on the manager to keep up with updates, but all the queries will be read directly from the searcher and parsed as they are executed. The in-memory cache works when the Monitor in question is handling updates as well, but trying to do that when you have no idea what the changes are between IndexReaders is going to be nasty. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422
mogui commented on a change in pull request #679: URL: https://github.com/apache/lucene/pull/679#discussion_r815732492 ## File path: lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.monitor; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.SearcherManager; +import org.apache.lucene.store.Directory; +import org.apache.lucene.util.IOUtils; + +class ReadonlyQueryIndex extends QueryIndex { + + public ReadonlyQueryIndex(MonitorConfiguration configuration) throws IOException { +if (configuration.getDirectoryProvider() == null) { + throw new IllegalStateException( + "You must specify a Directory when configuring a Monitor as read-only."); +} +Directory directory = configuration.getDirectoryProvider().get(); +this.queries = new HashMap<>(); +this.manager = new SearcherManager(directory, new TermsHashBuilder(termFilters)); +this.decomposer = configuration.getQueryDecomposer(); +this.serializer = configuration.getQuerySerializer(); +this.populateQueryCache(serializer, decomposer); + } + + @Override + public void commit(List updates) throws IOException { +throw new IllegalStateException("Monitor is readOnly cannot commit"); + } + + @Override + long search(final Query query, QueryCollector matcher) throws IOException { +QueryBuilder builder = termFilter -> query; +return search(builder, matcher); + } + + @Override + public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws IOException { +IndexSearcher searcher = null; +try { + searcher = manager.acquire(); + return searchInMemory(queryBuilder, matcher, searcher, this.queries); +} finally { + if (searcher != null) { +manager.release(searcher); + } +} + } + + @Override + public void purgeCache() throws IOException { +this.populateQueryCache(serializer, decomposer); +lastPurged = System.nanoTime(); + } + + @Override + void purgeCache(CachePopulator populator) throws IOException { +manager.maybeRefresh(); Review comment: Ok, I think it was the best solution too, I'll work getting back to that solution. Don't worry, all the back and forth got me to understand everything better! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498818#comment-17498818 ] Lu Xugang commented on LUCENE-10442: Further more: can we leverage Weigh#count() to get a ConstantScoreScorer while count == reader.maxDoc() in implementation of Weight#scorerSupplier(LeafReaderContext context)? If indexWeight.count(LeafReaderContext) or dvWeight.count(LeafReaderContext) equals reader.maxDoc() means match everything in this segment? > When indexQuery or/and dvQuery be a MatchAllDocsQuery then > IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery > - > > Key: LUCENE-10442 > URL: https://issues.apache.org/jira/browse/LUCENE-10442 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > Fix For: 9.1 > > Time Spent: 10m > Remaining Estimate: 0h > > IndexOrDocValuesQuery is typically useful for range queries, When indexQuery > was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead > iterator , it most likely that dvQuery will supply the Scorer not indexQuery. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] codaitya commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
codaitya commented on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-1054179398 Sorry for the delay in getting back to this, got busy with work and also needed time to study more details on how Lucene does segment merges. > Why do we need to exclude small segments from regular merges? The idea was that since writer threads can flush on their own, the new segments are eligible for regular merge. Regular merges can pick up these small segments, spend lot of time on these merges, and they might become unavailable for fullFlush merges. But I think this step (writer threads flush on their own) should normally only kick in once the RAMBuffer fills up, which would mean that the resulting segment isn’t that small. So I agree small segments need not be excluded from regular merges. I will update the PR to not override the `findMerges` function and just do the computation of small segment merge in `findFullFlushMerges` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search
wjp719 commented on pull request #687: URL: https://github.com/apache/lucene/pull/687#issuecomment-1054244294 @iverase , Hi I move the bkd binary to the IndexSortSortedNumericDocValuesRangeQuery as you suggested, please help to review it, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 edited a comment on pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search
wjp719 edited a comment on pull request #687: URL: https://github.com/apache/lucene/pull/687#issuecomment-1054244294 @iverase , Hi I move the bkd binary search to the IndexSortSortedNumericDocValuesRangeQuery as you suggested, please help to review it, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
jpountz commented on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-1054266698 > I also noticed that in IndexWriter where we call findFullFlushMerges, we only do so for merge triggers GET_READER and COMMIT, but not for trigger FULL_FLUSH, which seems quite confusing. I wonder if we could find a better name for findFullFlushMerges. I agree the naming makes it a bit confusing. One name that came to mind was `findPointInTimeMerges` since these two merge triggers map to merges that must run before creating a new point-in-time view of the index, while FULL_FLUSH runs after the new point-in-time view has been created. Clarifying ordering in the `MergeTrigger` javadocs would probably help too. > given that both findMerges and findFullFlushMerges are both called from the same switch statement, and for different triggers, and the trigger is passed in as an argument -- we could get rid of findFullFlushMerges, always call findMerges, and let the merge policy decide what to do based on the value of trigger. @s1monw WDTY? FWIW I don't dislike the current approach, as I would expect merge policies to generally ignore the `mergeTrigger` parameter as it makes sense to always make the same decisions for the triggers that are covered by `findFullFlushMerges` on the one hand, and by `findMerges` on the other hand, but it would be wrong to make the same decisions in `findFullFlushMerges` and `findMerges` as it would force reopens to wait for `maxFullFlushMergeWaitMillis` millis every time a non-trivial merge is computed? > So I agree small segments need not be excluded from regular merges. +1 to not exclude small segments from regular merges -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz edited a comment on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
jpountz edited a comment on pull request #446: URL: https://github.com/apache/lucene/pull/446#issuecomment-1054266698 > I also noticed that in IndexWriter where we call findFullFlushMerges, we only do so for merge triggers GET_READER and COMMIT, but not for trigger FULL_FLUSH, which seems quite confusing. I wonder if we could find a better name for findFullFlushMerges. I agree the naming makes it a bit confusing. One name that came to mind was `findPointInTimeMerges` since these two merge triggers map to merges that must run before creating a new point-in-time view of the index, while FULL_FLUSH runs after the new point-in-time view has been created. Clarifying ordering in the `MergeTrigger` javadocs would probably help too. > given that both findMerges and findFullFlushMerges are both called from the same switch statement, and for different triggers, and the trigger is passed in as an argument -- we could get rid of findFullFlushMerges, always call findMerges, and let the merge policy decide what to do based on the value of trigger. @s1monw WDTY? FWIW I don't dislike the current approach, as I would expect merge policies to generally ignore the `mergeTrigger` parameter as it makes sense to always make the same decisions for the triggers that are covered by `findFullFlushMerges` on the one hand, and by `findMerges` on the other hand, but it would be wrong to make the same decisions in `findFullFlushMerges` and `findMerges` as it would force reopens to wait for `maxFullFlushMergeWaitMillis` millis every time a non-trivial merge is computed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a change in pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox
jpountz commented on a change in pull request #446: URL: https://github.com/apache/lucene/pull/446#discussion_r815897598 ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/index/MergeOnFlushMergePolicy.java ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.sandbox.index; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import org.apache.lucene.index.*; + +/** + * A simple extension to wrap {@link MergePolicy} to merge all tiny segments (or at least segments + * smaller than specified in setSmallSegmentThresholdMB) into one segment on commit. Review comment: nit: put a link on `setSmallSegmentThresholdMB` ```suggestion * smaller than specified in {@link MergeOnFlushMergePolicy#setSmallSegmentThresholdMB}) into one segment on commit. ``` ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/index/package-info.java ## @@ -0,0 +1,19 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** Experimental classes for merge policy */ Review comment: Let's not make this merge-related since we could add non merge-related classes in this package in the future? ```suggestion /** Experimental index-related classes */ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
jpountz commented on pull request #715: URL: https://github.com/apache/lucene/pull/715#issuecomment-1054280947 +1 can you add a CHANGES entry? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] iverase commented on a change in pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search
iverase commented on a change in pull request #687: URL: https://github.com/apache/lucene/pull/687#discussion_r815921747 ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java ## @@ -181,12 +189,143 @@ public int count(LeafReaderContext context) throws IOException { }; } + /** + * Returns the first document whose packed value is greater than or equal (if allowEqual is true) to the provided packed value + * or -1 if all packed values are smaller than the provided one, + */ + public final int nextDoc(PointValues values, byte[] packedValue, boolean allowEqual) throws IOException { + final int numIndexDimensions = values.getNumIndexDimensions(); + final int bytesPerDim = values.getBytesPerDimension(); + final ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim); + final Predicate biggerThan = testPackedValue -> { + for (int dim = 0; dim < numIndexDimensions; dim++) { + final int offset = dim * bytesPerDim; + if (allowEqual) { + if (comparator.compare(testPackedValue, offset, packedValue, offset) < 0) { + return false; + } + } else { + if (comparator.compare(testPackedValue, offset, packedValue, offset) <= 0) { + return false; + } + } + } + return true; + }; + return nextDoc(values.getPointTree(), biggerThan); + } + + private int nextDoc(PointValues.PointTree pointTree, Predicate biggerThan) throws IOException { + if (biggerThan.test(pointTree.getMaxPackedValue()) == false) { + // doc is before us + return -1; + } else if (pointTree.moveToChild()) { + // navigate down + do { + final int doc = nextDoc(pointTree, biggerThan); + if (doc != -1) { + return doc; + } + } while (pointTree.moveToSibling()); + pointTree.moveToParent(); + return -1; + } else { + // doc is in this leaf + final int[] doc = { -1 }; + pointTree.visitDocValues(new IntersectVisitor() { + @Override + public void visit(int docID) { + throw new AssertionError("Invalid call to visit(docID)"); + } + + @Override + public void visit(int docID, byte[] packedValue) { + if (doc[0] == -1 && biggerThan.test(packedValue)) { + doc[0] = docID; + } + } + + @Override + public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { + return Relation.CELL_CROSSES_QUERY; + } + }); + return doc[0]; + } + } + + private boolean matchAll(PointValues points, byte[] queryLowerPoint, byte[] queryUpperPoint) throws IOException { + final ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(points.getBytesPerDimension()); + for (int dim = 0; dim < points.getNumDimensions(); dim++) { + int offset = dim * points.getBytesPerDimension(); + if (comparator.compare(points.getMinPackedValue(), offset, queryUpperPoint, offset) > 0) { + return false; + } + if (comparator.compare(points.getMaxPackedValue(), offset, queryLowerPoint, offset) < 0) { + return false; + } + if (comparator.compare(points.getMinPackedValue(), offset, queryLowerPoint, offset) < 0 + || comparator.compare(points.getMaxPackedValue(), offset, queryUpperPoint, offset) > 0) { + return false; + } + } + return true; + } + + private BoundedDocSetIdIterator getDocIdSetIteratorOrNullFromBkd(LeafReaderContext context, DocIdSetIterator delegate) + throws IOException { + Sort indexSort = context.reader().getMetaData().getSort(); + if (indexSort != null + && indexSort.getSort().length > 0 + && indexSort.getSort()[0].getField().equals(field) + && !indexSort.getSort()[0].getReverse()) { Review comment: We prefer to explicitly equal to false than to use the `!` operator for readability. ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java ## @@ -308,8 +449,10 @@ public int advance(int target) throws IOException { if (target < firstDoc) { target = firstDoc; } - - int result = delegate.advance(target); + int result = target; + if(!allDocExist) { Review comment: We prefer to explicitly equal to false than to use the `!` operator for readability. ## File path: lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericD
[GitHub] [lucene] LuXugang commented on pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
LuXugang commented on pull request #715: URL: https://github.com/apache/lucene/pull/715#issuecomment-1054355756 > +1 can you add a CHANGES entry? OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] thelabdude commented on a change in pull request #2644: SOLR-16009 Add custom udfs for filtering inside multi-valued fields
thelabdude commented on a change in pull request #2644: URL: https://github.com/apache/lucene-solr/pull/2644#discussion_r816037662 ## File path: solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java ## @@ -2388,6 +2388,7 @@ public void testMultiValuedFieldHandling() throws Exception { update.add("id", String.valueOf(maxDocs)); // all multi-valued fields are null update.commit(cluster.getSolrClient(), COLLECTIONORALIAS); +expectResults("SELECT stringxmv, stringsx, booleans FROM $ALIAS WHERE stringxmv IN ('a') AND stringxmv IN ('b')", 10); Review comment: how is this working? is calcite just matching all rows here? I thought the bug here was that calcite was erasing the two IN's and then matching none ;-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()
[ https://issues.apache.org/jira/browse/LUCENE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499074#comment-17499074 ] Adrien Grand commented on LUCENE-10431: --- I wonder if the most immediate fix could consist of setting a flag when hashCode() or equals() is called the first time and rejecting any calls to setRewriteMethod after that, in order to better point users to where the problem in their code is. > AssertionError in BooleanQuery.hashCode() > - > > Key: LUCENE-10431 > URL: https://issues.apache.org/jira/browse/LUCENE-10431 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.11.1 >Reporter: Michael Bien >Priority: Major > > Hello devs, > the constructor of BooleanQuery can under some circumstances trigger a hash > code computation before "clauseSets" is fully filled. Since BooleanClause is > using its query field for the hash code too, it can happen that the "wrong" > hash code is stored, since adding the clause to the set triggers its > hashCode(). > If assertions are enabled the check in BooleanQuery, which recomputes the > hash code, will notice it and throw an error. > exception: > {code:java} > java.lang.AssertionError > at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614) > at java.base/java.util.Objects.hashCode(Objects.java:103) > at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298) > at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527) > at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119) > at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717) > at java.base/java.util.EnumMap.hashCode(EnumMap.java:709) > at java.base/java.util.Arrays.hashCode(Arrays.java:4498) > at java.base/java.util.Objects.hash(Objects.java:133) > at > org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597) > at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611) > at java.base/java.util.HashMap.hash(HashMap.java:340) > at java.base/java.util.HashMap.put(HashMap.java:612) > at org.apache.lucene.search.Multiset.add(Multiset.java:82) > at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154) > at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42) > at > org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133) > {code} > I noticed this while trying to upgrade the NetBeans maven indexer modules > from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
jpountz merged pull request #715: URL: https://github.com/apache/lucene/pull/715 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499077#comment-17499077 ] ASF subversion and git services commented on LUCENE-10442: -- Commit 6224d0b157f9339f9048f33bd65436b2ebf5d9b8 in lucene's branch refs/heads/main from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6224d0b ] LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery (#715) > When indexQuery or/and dvQuery be a MatchAllDocsQuery then > IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery > - > > Key: LUCENE-10442 > URL: https://issues.apache.org/jira/browse/LUCENE-10442 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > Fix For: 9.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > IndexOrDocValuesQuery is typically useful for range queries, When indexQuery > was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead > iterator , it most likely that dvQuery will supply the Scorer not indexQuery. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499087#comment-17499087 ] ASF subversion and git services commented on LUCENE-10442: -- Commit 9497524cc2d1eea24c5dd3da10e46eda991a7df7 in lucene's branch refs/heads/branch_9x from Lu Xugang [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9497524 ] LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery (#715) > When indexQuery or/and dvQuery be a MatchAllDocsQuery then > IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery > - > > Key: LUCENE-10442 > URL: https://issues.apache.org/jira/browse/LUCENE-10442 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > IndexOrDocValuesQuery is typically useful for range queries, When indexQuery > was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead > iterator , it most likely that dvQuery will supply the Scorer not indexQuery. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-10442. --- Resolution: Fixed > When indexQuery or/and dvQuery be a MatchAllDocsQuery then > IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery > - > > Key: LUCENE-10442 > URL: https://issues.apache.org/jira/browse/LUCENE-10442 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > IndexOrDocValuesQuery is typically useful for range queries, When indexQuery > was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead > iterator , it most likely that dvQuery will supply the Scorer not indexQuery. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery
[ https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499099#comment-17499099 ] Adrien Grand commented on LUCENE-10442: --- Thanks [~ChrisLu]! > When indexQuery or/and dvQuery be a MatchAllDocsQuery then > IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery > - > > Key: LUCENE-10442 > URL: https://issues.apache.org/jira/browse/LUCENE-10442 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Lu Xugang >Priority: Trivial > Fix For: 9.1 > > Time Spent: 40m > Remaining Estimate: 0h > > IndexOrDocValuesQuery is typically useful for range queries, When indexQuery > was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead > iterator , it most likely that dvQuery will supply the Scorer not indexQuery. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-10428) getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop
[ https://issues.apache.org/jira/browse/LUCENE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497658#comment-17497658 ] Ankit Jain edited comment on LUCENE-10428 at 2/28/22, 6:14 PM: --- {quote}I opened a pull request that doesn't fix the bug but at least makes it an error instead of an infinite loop. {quote} [~jpountz] - Can you share link to this PR? Also, we should capture all the debug information as part of that error to understand this further. was (Author: akjain): {quote}I opened a pull request that doesn't fix the bug but at least makes it an error instead of an infinite loop. {quote} Can you share link to this PR? Also, we should capture all the debug information as part of that error to understand this further. > getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge > leading to busy threads in infinite loop > - > > Key: LUCENE-10428 > URL: https://issues.apache.org/jira/browse/LUCENE-10428 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring, core/search >Reporter: Ankit Jain >Priority: Major > Attachments: Flame_graph.png > > > Customers complained about high CPU for Elasticsearch cluster in production. > We noticed that few search requests were stuck for long time > {code:java} > % curl -s localhost:9200/_cat/tasks?v > indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205 > AmMLzDQ4RrOJievRDeGFZw:569204 direct1645195007282 14:36:47 6.2h > indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075 > emjWc5bUTG6lgnCGLulq-Q:502074 direct1645195037259 14:37:17 6.2h > indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270 > emjWc5bUTG6lgnCGLulq-Q:583269 direct1645201316981 16:21:56 4.5h > {code} > Flame graphs indicated that CPU time is mostly going into > *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some > live JVM debugging found that > org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had > around 4 million invocations every second > Figured out the values of some parameters from live debugging: > {code:java} > minScoreSum = 3.5541441 > minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = > 3.554144322872162 > returnObj scoreSumUpperBound = 3.5541444 > Math.ulp(minScoreSum) = 2.3841858E-7 > {code} > Example code snippet: > {code:java} > double sumOfOtherMaxScores = 3.554144322872162; > double minScoreSum = 3.5541441; > float minScore = (float) (minScoreSum - sumOfOtherMaxScores); > while (scoreSumUpperBound(minScore + sumOfOtherMaxScores) > minScoreSum) { > minScore -= Math.ulp(minScoreSum); > System.out.printf("%.20f, %.100f\n", minScore, Math.ulp(minScoreSum)); > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] kiranchitturi commented on a change in pull request #2644: SOLR-16009 Add custom udfs for filtering inside multi-valued fields
kiranchitturi commented on a change in pull request #2644: URL: https://github.com/apache/lucene-solr/pull/2644#discussion_r816219452 ## File path: solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java ## @@ -2388,6 +2388,7 @@ public void testMultiValuedFieldHandling() throws Exception { update.add("id", String.valueOf(maxDocs)); // all multi-valued fields are null update.commit(cluster.getSolrClient(), COLLECTIONORALIAS); +expectResults("SELECT stringxmv, stringsx, booleans FROM $ALIAS WHERE stringxmv IN ('a') AND stringxmv IN ('b')", 10); Review comment: that was a temporary change that got pushed accidentally. the assert actually fails. I have removed it in the next commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] wjp719 commented on a change in pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search
wjp719 commented on a change in pull request #687: URL: https://github.com/apache/lucene/pull/687#discussion_r816404708 ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java ## @@ -308,8 +449,10 @@ public int advance(int target) throws IOException { if (target < firstDoc) { target = firstDoc; } - - int result = delegate.advance(target); + int result = target; + if(!allDocExist) { Review comment: done ## File path: lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java ## @@ -181,12 +189,143 @@ public int count(LeafReaderContext context) throws IOException { }; } + /** + * Returns the first document whose packed value is greater than or equal (if allowEqual is true) to the provided packed value + * or -1 if all packed values are smaller than the provided one, + */ + public final int nextDoc(PointValues values, byte[] packedValue, boolean allowEqual) throws IOException { + final int numIndexDimensions = values.getNumIndexDimensions(); + final int bytesPerDim = values.getBytesPerDimension(); + final ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(bytesPerDim); + final Predicate biggerThan = testPackedValue -> { + for (int dim = 0; dim < numIndexDimensions; dim++) { + final int offset = dim * bytesPerDim; + if (allowEqual) { + if (comparator.compare(testPackedValue, offset, packedValue, offset) < 0) { + return false; + } + } else { + if (comparator.compare(testPackedValue, offset, packedValue, offset) <= 0) { + return false; + } + } + } + return true; + }; + return nextDoc(values.getPointTree(), biggerThan); + } + + private int nextDoc(PointValues.PointTree pointTree, Predicate biggerThan) throws IOException { + if (biggerThan.test(pointTree.getMaxPackedValue()) == false) { + // doc is before us + return -1; + } else if (pointTree.moveToChild()) { + // navigate down + do { + final int doc = nextDoc(pointTree, biggerThan); + if (doc != -1) { + return doc; + } + } while (pointTree.moveToSibling()); + pointTree.moveToParent(); + return -1; + } else { + // doc is in this leaf + final int[] doc = { -1 }; + pointTree.visitDocValues(new IntersectVisitor() { + @Override + public void visit(int docID) { + throw new AssertionError("Invalid call to visit(docID)"); + } + + @Override + public void visit(int docID, byte[] packedValue) { + if (doc[0] == -1 && biggerThan.test(packedValue)) { + doc[0] = docID; + } + } + + @Override + public Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { + return Relation.CELL_CROSSES_QUERY; + } + }); + return doc[0]; + } + } + + private boolean matchAll(PointValues points, byte[] queryLowerPoint, byte[] queryUpperPoint) throws IOException { + final ByteArrayComparator comparator = ArrayUtil.getUnsignedComparator(points.getBytesPerDimension()); + for (int dim = 0; dim < points.getNumDimensions(); dim++) { + int offset = dim * points.getBytesPerDimension(); + if (comparator.compare(points.getMinPackedValue(), offset, queryUpperPoint, offset) > 0) { + return false; + } + if (comparator.compare(points.getMaxPackedValue(), offset, queryLowerPoint, offset) < 0) { + return false; + } + if (comparator.compare(points.getMinPackedValue(), offset, queryLowerPoint, offset) < 0 + || comparator.compare(points.getMaxPackedValue(), offset, queryUpperPoint, offset) > 0) { + return false; + } + } + return true; + } + + private BoundedDocSetIdIterator getDocIdSetIteratorOrNullFromBkd(LeafReaderContext context, DocIdSetIterator delegate) + throws IOException { + Sort indexSort = context.reader().getMetaData().getSort(); + if (indexSort != null + && indexSort.getSort().length > 0 + && indexSort.getSort()[0].getField().equals(field) + && !indexSort.getSort()[0].getReverse()) { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please con
[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search
wjp719 commented on pull request #687: URL: https://github.com/apache/lucene/pull/687#issuecomment-1054926857 @iverase I add a random test, please review it again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-10446) Add a precise cost of score in ScorerSupplier
Lu Xugang created LUCENE-10446: -- Summary: Add a precise cost of score in ScorerSupplier Key: LUCENE-10446 URL: https://issues.apache.org/jira/browse/LUCENE-10446 Project: Lucene - Core Issue Type: Improvement Reporter: Lu Xugang Some queries could sometime actually provide a precise cost of Score like RangeFieldQuery, PointRangeQuery, SpatialQuery. maybe we could do some optimization by using this precise cost. Like in IndexOrDocValuesQuery, when indexScorerSupplier or/and dvScorerSupplier's precise cost is reader.maxDoc, we will supply the right Scorer instead of according to the condition of threshold <= leadCost which sometime supply a inappropriate Score when IndexOrDocValuesQuery not a lead iterator. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()
[ https://issues.apache.org/jira/browse/LUCENE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499321#comment-17499321 ] Michael Bien commented on LUCENE-10431: --- IMO: If you don't want client code to use setters: deprecate them. Setters should either work or they shouldn't, it shouldn't depend implementation details like eager hashcode initialization and fail due to a certain query type in the tree. I would also investigate the following: does the lazy hashcode logic make sense in context of the constructor essentially initializing it eagerly anyway? The problem for the NetBeans module I am attempting to migrate is though: Some of the queries are not created by netbeans, as you can see in this code (https://github.com/apache/netbeans/blob/04fa8fba812566a211462fc3eef73597fbf3a975/java/maven.indexer/src/org/netbeans/modules/maven/indexer/NexusRepositoryIndexerImpl.java#L1389-L1457 ), they are created by maven-indexer, a third party dependency. So you could remove the setters but this would slow the lucene 5->8 migration down (for this particular part of NB at least, lucene is used in several places), since someone would have to try to update the API it in maven-indexer first, which would have to happen after its fixed in lucene. NB would be last in the chain. > AssertionError in BooleanQuery.hashCode() > - > > Key: LUCENE-10431 > URL: https://issues.apache.org/jira/browse/LUCENE-10431 > Project: Lucene - Core > Issue Type: Bug >Affects Versions: 8.11.1 >Reporter: Michael Bien >Priority: Major > > Hello devs, > the constructor of BooleanQuery can under some circumstances trigger a hash > code computation before "clauseSets" is fully filled. Since BooleanClause is > using its query field for the hash code too, it can happen that the "wrong" > hash code is stored, since adding the clause to the set triggers its > hashCode(). > If assertions are enabled the check in BooleanQuery, which recomputes the > hash code, will notice it and throw an error. > exception: > {code:java} > java.lang.AssertionError > at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614) > at java.base/java.util.Objects.hashCode(Objects.java:103) > at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298) > at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527) > at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119) > at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717) > at java.base/java.util.EnumMap.hashCode(EnumMap.java:709) > at java.base/java.util.Arrays.hashCode(Arrays.java:4498) > at java.base/java.util.Objects.hash(Objects.java:133) > at > org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597) > at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611) > at java.base/java.util.HashMap.hash(HashMap.java:340) > at java.base/java.util.HashMap.put(HashMap.java:612) > at org.apache.lucene.search.Multiset.add(Multiset.java:82) > at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154) > at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42) > at > org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133) > {code} > I noticed this while trying to upgrade the NetBeans maven indexer modules > from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org