date:20220228

[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

2022-02-28 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498799#comment-17498799
 ] 

Alan Woodward commented on LUCENE-10431:


I think the issue is that BooleanQuery is expecting all its subqueries to be 
immutable, but MultiTermQuery isn't - you can set the rewrite method, which 
changes the hash.  I think ideally we'd make MTQ properly immutable and have 
the rewrite method as part of the constructor, especially as there are already 
cases like FuzzyQuery that have specific rewrite methods that shouldn't be 
externally settable, but that is a pretty big change.  A more immediate fix 
would be to remove the rewrite method from MTQ's hash calculation.

> AssertionError in BooleanQuery.hashCode()
> -
>
> Key: LUCENE-10431
> URL: https://issues.apache.org/jira/browse/LUCENE-10431
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Michael Bien
>Priority: Major
>
> Hello devs,
> the constructor of BooleanQuery can under some circumstances trigger a hash 
> code computation before "clauseSets" is fully filled. Since BooleanClause is 
> using its query field for the hash code too, it can happen that the "wrong" 
> hash code is stored, since adding the clause to the set triggers its 
> hashCode().
> If assertions are enabled the check in BooleanQuery, which recomputes the 
> hash code, will notice it and throw an error.
> exception:
> {code:java}
> java.lang.AssertionError
>     at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614)
>     at java.base/java.util.Objects.hashCode(Objects.java:103)
>     at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298)
>     at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527)
>     at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119)
>     at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717)
>     at java.base/java.util.EnumMap.hashCode(EnumMap.java:709)
>     at java.base/java.util.Arrays.hashCode(Arrays.java:4498)
>     at java.base/java.util.Objects.hash(Objects.java:133)
>     at 
> org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597)
>     at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611)
>     at java.base/java.util.HashMap.hash(HashMap.java:340)
>     at java.base/java.util.HashMap.put(HashMap.java:612)
>     at org.apache.lucene.search.Multiset.add(Multiset.java:82)
>     at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154)
>     at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42)
>     at 
> org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133)
> {code}
> I noticed this while trying to upgrade the NetBeans maven indexer modules 
> from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-28 Thread GitBox



romseygeek commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r815721410



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.SearcherManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex extends QueryIndex {
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.queries = new HashMap<>();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+this.populateQueryCache(serializer, decomposer);
+  }
+
+  @Override
+  public void commit(List updates) throws IOException {
+throw new IllegalStateException("Monitor is readOnly cannot commit");
+  }
+
+  @Override
+  long search(final Query query, QueryCollector matcher) throws IOException {
+QueryBuilder builder = termFilter -> query;
+return search(builder, matcher);
+  }
+
+  @Override
+  public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws 
IOException {
+IndexSearcher searcher = null;
+try {
+  searcher = manager.acquire();
+  return searchInMemory(queryBuilder, matcher, searcher, this.queries);
+} finally {
+  if (searcher != null) {
+manager.release(searcher);
+  }
+}
+  }
+
+  @Override
+  public void purgeCache() throws IOException {
+this.populateQueryCache(serializer, decomposer);
+lastPurged = System.nanoTime();
+  }
+
+  @Override
+  void purgeCache(CachePopulator populator) throws IOException {
+manager.maybeRefresh();

Review comment:
   I think actually the best solution is to remove the query cache entirely 
for this impl, which is where you started out - sorry for all the back and 
forth here.  We can have a background thread that calls maybeRefresh() on the 
manager to keep up with updates, but all the queries will be read directly from 
the searcher and parsed as they are executed.  The in-memory cache works when 
the Monitor in question is handling updates as well, but trying to do that when 
you have no idea what the changes are between IndexReaders is going to be nasty.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-28 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r815732492



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.SearcherManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex extends QueryIndex {
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.queries = new HashMap<>();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+this.populateQueryCache(serializer, decomposer);
+  }
+
+  @Override
+  public void commit(List updates) throws IOException {
+throw new IllegalStateException("Monitor is readOnly cannot commit");
+  }
+
+  @Override
+  long search(final Query query, QueryCollector matcher) throws IOException {
+QueryBuilder builder = termFilter -> query;
+return search(builder, matcher);
+  }
+
+  @Override
+  public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws 
IOException {
+IndexSearcher searcher = null;
+try {
+  searcher = manager.acquire();
+  return searchInMemory(queryBuilder, matcher, searcher, this.queries);
+} finally {
+  if (searcher != null) {
+manager.release(searcher);
+  }
+}
+  }
+
+  @Override
+  public void purgeCache() throws IOException {
+this.populateQueryCache(serializer, decomposer);
+lastPurged = System.nanoTime();
+  }
+
+  @Override
+  void purgeCache(CachePopulator populator) throws IOException {
+manager.maybeRefresh();

Review comment:
   Ok, I think it was the best solution too, I'll work getting back to that 
solution.
   Don't worry, all the back and forth got me to understand everything better!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread Lu Xugang (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498818#comment-17498818
 ] 

Lu Xugang commented on LUCENE-10442:


Further more: can we  leverage Weigh#count()  to get a ConstantScoreScorer 
while count == reader.maxDoc() in implementation of 
Weight#scorerSupplier(LeafReaderContext context)?

If indexWeight.count(LeafReaderContext) or dvWeight.count(LeafReaderContext) 
equals reader.maxDoc() means match everything in this segment? 

> When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
> IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery 
> -
>
> Key: LUCENE-10442
> URL: https://issues.apache.org/jira/browse/LUCENE-10442
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery is typically useful for range queries, When indexQuery 
> was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead 
> iterator , it most likely that dvQuery will supply the Scorer not indexQuery.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] codaitya commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

2022-02-28 Thread GitBox



codaitya commented on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-1054179398


   Sorry for the delay in getting back to this, got busy with work and  also 
needed time to study more details on how Lucene does segment  merges.
   
   > Why do we need to exclude small segments from regular merges?
   
   The idea was that since writer threads can  flush on their own, the new 
segments are eligible for regular merge. Regular merges can pick up these small 
segments, spend lot of time on these merges, and they might become unavailable 
for fullFlush merges. But I think this  step (writer threads flush on their 
own) should normally only kick in once the RAMBuffer fills up, which would mean 
that the resulting segment isn’t that small. So I agree small segments need not 
be excluded from regular merges. 
   I will update the PR to not override the `findMerges` function and just do 
the computation of small segment merge in `findFullFlushMerges`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-02-28 Thread GitBox



wjp719 commented on pull request #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1054244294


   @iverase , Hi I move the bkd binary to the 
IndexSortSortedNumericDocValuesRangeQuery as you suggested, please help to 
review it, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 edited a comment on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-02-28 Thread GitBox



wjp719 edited a comment on pull request #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1054244294


   @iverase , Hi I move the bkd binary search to the 
IndexSortSortedNumericDocValuesRangeQuery as you suggested, please help to 
review it, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

2022-02-28 Thread GitBox



jpountz commented on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-1054266698


   > I also noticed that in IndexWriter where we call findFullFlushMerges, we 
only do so for merge triggers GET_READER and COMMIT, but not for trigger 
FULL_FLUSH, which seems quite confusing. I wonder if we could find a better 
name for findFullFlushMerges.
   
   I agree the naming makes it a bit confusing. One name that came to mind was 
`findPointInTimeMerges` since these two merge triggers map to merges that must 
run before creating a new point-in-time view of the index, while FULL_FLUSH 
runs after the new point-in-time view has been created. Clarifying ordering in 
the `MergeTrigger` javadocs would probably help too.
   
   > given that both findMerges and findFullFlushMerges are both called from 
the same switch statement, and for different triggers, and the trigger is 
passed in as an argument -- we could get rid of findFullFlushMerges, always 
call findMerges, and let the merge policy decide what to do based on the value 
of trigger. @s1monw WDTY?
   
   FWIW I don't dislike the current approach, as I would expect merge policies 
to generally ignore the `mergeTrigger` parameter as it makes sense to always 
make the same decisions for the triggers that are covered by 
`findFullFlushMerges` on the one hand, and by `findMerges` on the other hand, 
but it would be wrong to make the same decisions in `findFullFlushMerges` and 
`findMerges` as it would force reopens to wait for 
`maxFullFlushMergeWaitMillis` millis every time a non-trivial merge is computed?
   
   > So I agree small segments need not be excluded from regular merges.
   
   +1 to not exclude small segments from regular merges


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz edited a comment on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

2022-02-28 Thread GitBox



jpountz edited a comment on pull request #446:
URL: https://github.com/apache/lucene/pull/446#issuecomment-1054266698


   > I also noticed that in IndexWriter where we call findFullFlushMerges, we 
only do so for merge triggers GET_READER and COMMIT, but not for trigger 
FULL_FLUSH, which seems quite confusing. I wonder if we could find a better 
name for findFullFlushMerges.
   
   I agree the naming makes it a bit confusing. One name that came to mind was 
`findPointInTimeMerges` since these two merge triggers map to merges that must 
run before creating a new point-in-time view of the index, while FULL_FLUSH 
runs after the new point-in-time view has been created. Clarifying ordering in 
the `MergeTrigger` javadocs would probably help too.
   
   > given that both findMerges and findFullFlushMerges are both called from 
the same switch statement, and for different triggers, and the trigger is 
passed in as an argument -- we could get rid of findFullFlushMerges, always 
call findMerges, and let the merge policy decide what to do based on the value 
of trigger. @s1monw WDTY?
   
   FWIW I don't dislike the current approach, as I would expect merge policies 
to generally ignore the `mergeTrigger` parameter as it makes sense to always 
make the same decisions for the triggers that are covered by 
`findFullFlushMerges` on the one hand, and by `findMerges` on the other hand, 
but it would be wrong to make the same decisions in `findFullFlushMerges` and 
`findMerges` as it would force reopens to wait for 
`maxFullFlushMergeWaitMillis` millis every time a non-trivial merge is computed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

2022-02-28 Thread GitBox



jpountz commented on a change in pull request #446:
URL: https://github.com/apache/lucene/pull/446#discussion_r815897598



##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/index/MergeOnFlushMergePolicy.java
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.index;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.lucene.index.*;
+
+/**
+ * A simple extension to wrap {@link MergePolicy} to merge all tiny segments 
(or at least segments
+ * smaller than specified in setSmallSegmentThresholdMB) into one segment on 
commit.

Review comment:
   nit: put a link on `setSmallSegmentThresholdMB`
   ```suggestion
* smaller than specified in {@link 
MergeOnFlushMergePolicy#setSmallSegmentThresholdMB}) into one segment on commit.
   ```

##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/index/package-info.java
##
@@ -0,0 +1,19 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Experimental classes for merge policy */

Review comment:
   Let's not make this merge-related since we could add non merge-related 
classes in this package in the future?
   
   ```suggestion
   /** Experimental index-related classes */
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread GitBox



jpountz commented on pull request #715:
URL: https://github.com/apache/lucene/pull/715#issuecomment-1054280947


   +1 can you add a CHANGES entry?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase commented on a change in pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-02-28 Thread GitBox



iverase commented on a change in pull request #687:
URL: https://github.com/apache/lucene/pull/687#discussion_r815921747



##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -181,12 +189,143 @@ public int count(LeafReaderContext context) throws 
IOException {
 };
   }
 
+  /**
+   * Returns the first document whose packed value is greater than or equal 
(if allowEqual is true) to the provided packed value
+   * or -1 if all packed values are smaller than the provided one,
+   */
+  public final int nextDoc(PointValues values, byte[] packedValue, boolean 
allowEqual) throws IOException {
+  final int numIndexDimensions = values.getNumIndexDimensions();
+  final int bytesPerDim = values.getBytesPerDimension();
+  final ByteArrayComparator comparator = 
ArrayUtil.getUnsignedComparator(bytesPerDim);
+  final Predicate biggerThan = testPackedValue -> {
+  for (int dim = 0; dim < numIndexDimensions; dim++) {
+  final int offset = dim * bytesPerDim;
+  if (allowEqual) {
+  if (comparator.compare(testPackedValue, offset, packedValue, 
offset) < 0) {
+  return false;
+  }
+  } else {
+  if (comparator.compare(testPackedValue, offset, packedValue, 
offset) <= 0) {
+  return false;
+  }
+  }
+  }
+  return true;
+  };
+  return nextDoc(values.getPointTree(), biggerThan);
+  }
+
+  private int nextDoc(PointValues.PointTree pointTree, Predicate 
biggerThan) throws IOException {
+  if (biggerThan.test(pointTree.getMaxPackedValue()) == false) {
+  // doc is before us
+  return -1;
+  } else if (pointTree.moveToChild()) {
+  // navigate down
+  do {
+  final int doc = nextDoc(pointTree, biggerThan);
+  if (doc != -1) {
+  return doc;
+  }
+  } while (pointTree.moveToSibling());
+  pointTree.moveToParent();
+  return -1;
+  } else {
+  // doc is in this leaf
+  final int[] doc = { -1 };
+  pointTree.visitDocValues(new IntersectVisitor() {
+  @Override
+  public void visit(int docID) {
+  throw new AssertionError("Invalid call to visit(docID)");
+  }
+
+  @Override
+  public void visit(int docID, byte[] packedValue) {
+  if (doc[0] == -1 && biggerThan.test(packedValue)) {
+  doc[0] = docID;
+  }
+  }
+
+  @Override
+  public Relation compare(byte[] minPackedValue, byte[] 
maxPackedValue) {
+  return Relation.CELL_CROSSES_QUERY;
+  }
+  });
+  return doc[0];
+  }
+  }
+
+  private boolean matchAll(PointValues points, byte[] queryLowerPoint, byte[] 
queryUpperPoint) throws IOException {
+  final ByteArrayComparator comparator = 
ArrayUtil.getUnsignedComparator(points.getBytesPerDimension());
+  for (int dim = 0; dim < points.getNumDimensions(); dim++) {
+  int offset = dim * points.getBytesPerDimension();
+  if (comparator.compare(points.getMinPackedValue(), offset, 
queryUpperPoint, offset) > 0) {
+  return false;
+  }
+  if (comparator.compare(points.getMaxPackedValue(), offset, 
queryLowerPoint, offset) < 0) {
+  return false;
+  }
+  if (comparator.compare(points.getMinPackedValue(), offset, 
queryLowerPoint, offset) < 0
+  || comparator.compare(points.getMaxPackedValue(), offset, 
queryUpperPoint, offset) > 0) {
+  return false;
+  }
+  }
+  return true;
+  }
+
+  private BoundedDocSetIdIterator 
getDocIdSetIteratorOrNullFromBkd(LeafReaderContext context, DocIdSetIterator 
delegate)
+  throws IOException {
+  Sort indexSort = context.reader().getMetaData().getSort();
+  if (indexSort != null
+  && indexSort.getSort().length > 0
+  && indexSort.getSort()[0].getField().equals(field)
+  && !indexSort.getSort()[0].getReverse()) {

Review comment:
   We prefer to explicitly equal to false than to use the `!` operator for 
readability.
   

##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -308,8 +449,10 @@ public int advance(int target) throws IOException {
   if (target < firstDoc) {
 target = firstDoc;
   }
-
-  int result = delegate.advance(target);
+  int result = target;
+  if(!allDocExist) {

Review comment:
   We prefer to explicitly equal to false than to use the `!` operator for 
readability.

##
File path: 
lucene/sandbox/src/test/org/apache/lucene/sandbox/search/TestIndexSortSortedNumericD

[GitHub] [lucene] LuXugang commented on pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread GitBox



LuXugang commented on pull request #715:
URL: https://github.com/apache/lucene/pull/715#issuecomment-1054355756


   > +1 can you add a CHANGES entry?
   
   OK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude commented on a change in pull request #2644: SOLR-16009 Add custom udfs for filtering inside multi-valued fields

2022-02-28 Thread GitBox



thelabdude commented on a change in pull request #2644:
URL: https://github.com/apache/lucene-solr/pull/2644#discussion_r816037662



##
File path: solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java
##
@@ -2388,6 +2388,7 @@ public void testMultiValuedFieldHandling() throws 
Exception {
 update.add("id", String.valueOf(maxDocs)); // all multi-valued fields are 
null
 update.commit(cluster.getSolrClient(), COLLECTIONORALIAS);
 
+expectResults("SELECT stringxmv, stringsx, booleans FROM $ALIAS WHERE 
stringxmv IN ('a') AND stringxmv IN ('b')", 10);

Review comment:
   how is this working? is calcite just matching all rows here? I thought 
the bug here was that calcite was erasing the two IN's and then matching none 
;-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

2022-02-28 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499074#comment-17499074
 ] 

Adrien Grand commented on LUCENE-10431:
---

I wonder if the most immediate fix could consist of setting a flag when 
hashCode() or equals() is called the first time and rejecting any calls to 
setRewriteMethod after that, in order to better point users to where the 
problem in their code is.

> AssertionError in BooleanQuery.hashCode()
> -
>
> Key: LUCENE-10431
> URL: https://issues.apache.org/jira/browse/LUCENE-10431
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Michael Bien
>Priority: Major
>
> Hello devs,
> the constructor of BooleanQuery can under some circumstances trigger a hash 
> code computation before "clauseSets" is fully filled. Since BooleanClause is 
> using its query field for the hash code too, it can happen that the "wrong" 
> hash code is stored, since adding the clause to the set triggers its 
> hashCode().
> If assertions are enabled the check in BooleanQuery, which recomputes the 
> hash code, will notice it and throw an error.
> exception:
> {code:java}
> java.lang.AssertionError
>     at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614)
>     at java.base/java.util.Objects.hashCode(Objects.java:103)
>     at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298)
>     at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527)
>     at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119)
>     at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717)
>     at java.base/java.util.EnumMap.hashCode(EnumMap.java:709)
>     at java.base/java.util.Arrays.hashCode(Arrays.java:4498)
>     at java.base/java.util.Objects.hash(Objects.java:133)
>     at 
> org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597)
>     at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611)
>     at java.base/java.util.HashMap.hash(HashMap.java:340)
>     at java.base/java.util.HashMap.put(HashMap.java:612)
>     at org.apache.lucene.search.Multiset.add(Multiset.java:82)
>     at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154)
>     at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42)
>     at 
> org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133)
> {code}
> I noticed this while trying to upgrade the NetBeans maven indexer modules 
> from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread GitBox



jpountz merged pull request #715:
URL: https://github.com/apache/lucene/pull/715


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499077#comment-17499077
 ] 

ASF subversion and git services commented on LUCENE-10442:
--

Commit 6224d0b157f9339f9048f33bd65436b2ebf5d9b8 in lucene's branch 
refs/heads/main from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=6224d0b ]

LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery (#715)



> When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
> IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery 
> -
>
> Key: LUCENE-10442
> URL: https://issues.apache.org/jira/browse/LUCENE-10442
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery is typically useful for range queries, When indexQuery 
> was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead 
> iterator , it most likely that dvQuery will supply the Scorer not indexQuery.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499087#comment-17499087
 ] 

ASF subversion and git services commented on LUCENE-10442:
--

Commit 9497524cc2d1eea24c5dd3da10e46eda991a7df7 in lucene's branch 
refs/heads/branch_9x from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=9497524 ]

LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery (#715)



> When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
> IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery 
> -
>
> Key: LUCENE-10442
> URL: https://issues.apache.org/jira/browse/LUCENE-10442
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery is typically useful for range queries, When indexQuery 
> was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead 
> iterator , it most likely that dvQuery will supply the Scorer not indexQuery.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10442.
---
Resolution: Fixed

> When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
> IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery 
> -
>
> Key: LUCENE-10442
> URL: https://issues.apache.org/jira/browse/LUCENE-10442
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery is typically useful for range queries, When indexQuery 
> was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead 
> iterator , it most likely that dvQuery will supply the Scorer not indexQuery.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

2022-02-28 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499099#comment-17499099
 ] 

Adrien Grand commented on LUCENE-10442:
---

Thanks [~ChrisLu]!

> When indexQuery or/and dvQuery be a MatchAllDocsQuery  then 
> IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery 
> -
>
> Key: LUCENE-10442
> URL: https://issues.apache.org/jira/browse/LUCENE-10442
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> IndexOrDocValuesQuery is typically useful for range queries, When indexQuery 
> was rewrite to MatchAllDocsQuery and if IndexOrDocValuesQuery not be a lead 
> iterator , it most likely that dvQuery will supply the Scorer not indexQuery.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10428) getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop

2022-02-28 Thread Ankit Jain (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497658#comment-17497658
 ] 

Ankit Jain edited comment on LUCENE-10428 at 2/28/22, 6:14 PM:
---

{quote}I opened a pull request that doesn't fix the bug but at least makes it 
an error instead of an infinite loop.
{quote}

[~jpountz] - Can you share link to this PR? Also, we should capture all the 
debug information as part of that error to understand this further.


was (Author: akjain):
{quote}I opened a pull request that doesn't fix the bug but at least makes it 
an error instead of an infinite loop.
{quote}

Can you share link to this PR? Also, we should capture all the debug 
information as part of that error to understand this further.

> getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge 
> leading to busy threads in infinite loop
> -
>
> Key: LUCENE-10428
> URL: https://issues.apache.org/jira/browse/LUCENE-10428
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring, core/search
>Reporter: Ankit Jain
>Priority: Major
> Attachments: Flame_graph.png
>
>
> Customers complained about high CPU for Elasticsearch cluster in production. 
> We noticed that few search requests were stuck for long time
> {code:java}
> % curl -s localhost:9200/_cat/tasks?v   
> indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205  
> AmMLzDQ4RrOJievRDeGFZw:569204  direct1645195007282 14:36:47  6.2h
> indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075  
> emjWc5bUTG6lgnCGLulq-Q:502074  direct1645195037259 14:37:17  6.2h
> indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270  
> emjWc5bUTG6lgnCGLulq-Q:583269  direct1645201316981 16:21:56  4.5h
> {code}
> Flame graphs indicated that CPU time is mostly going into 
> *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some 
> live JVM debugging found that 
> org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had 
> around 4 million invocations every second
> Figured out the values of some parameters from live debugging:
> {code:java}
> minScoreSum = 3.5541441
> minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = 
> 3.554144322872162
> returnObj scoreSumUpperBound = 3.5541444
> Math.ulp(minScoreSum) = 2.3841858E-7
> {code}
> Example code snippet:
> {code:java}
> double sumOfOtherMaxScores = 3.554144322872162;
> double minScoreSum = 3.5541441;
> float minScore = (float) (minScoreSum - sumOfOtherMaxScores);
> while (scoreSumUpperBound(minScore + sumOfOtherMaxScores) > minScoreSum) {
> minScore -= Math.ulp(minScoreSum);
> System.out.printf("%.20f, %.100f\n", minScore, Math.ulp(minScoreSum));
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] kiranchitturi commented on a change in pull request #2644: SOLR-16009 Add custom udfs for filtering inside multi-valued fields

2022-02-28 Thread GitBox



kiranchitturi commented on a change in pull request #2644:
URL: https://github.com/apache/lucene-solr/pull/2644#discussion_r816219452



##
File path: solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java
##
@@ -2388,6 +2388,7 @@ public void testMultiValuedFieldHandling() throws 
Exception {
 update.add("id", String.valueOf(maxDocs)); // all multi-valued fields are 
null
 update.commit(cluster.getSolrClient(), COLLECTIONORALIAS);
 
+expectResults("SELECT stringxmv, stringsx, booleans FROM $ALIAS WHERE 
stringxmv IN ('a') AND stringxmv IN ('b')", 10);

Review comment:
   that was a temporary change that got pushed accidentally. the assert 
actually fails. I have removed it in the next commit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 commented on a change in pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-02-28 Thread GitBox



wjp719 commented on a change in pull request #687:
URL: https://github.com/apache/lucene/pull/687#discussion_r816404708



##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -308,8 +449,10 @@ public int advance(int target) throws IOException {
   if (target < firstDoc) {
 target = firstDoc;
   }
-
-  int result = delegate.advance(target);
+  int result = target;
+  if(!allDocExist) {

Review comment:
   done

##
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/IndexSortSortedNumericDocValuesRangeQuery.java
##
@@ -181,12 +189,143 @@ public int count(LeafReaderContext context) throws 
IOException {
 };
   }
 
+  /**
+   * Returns the first document whose packed value is greater than or equal 
(if allowEqual is true) to the provided packed value
+   * or -1 if all packed values are smaller than the provided one,
+   */
+  public final int nextDoc(PointValues values, byte[] packedValue, boolean 
allowEqual) throws IOException {
+  final int numIndexDimensions = values.getNumIndexDimensions();
+  final int bytesPerDim = values.getBytesPerDimension();
+  final ByteArrayComparator comparator = 
ArrayUtil.getUnsignedComparator(bytesPerDim);
+  final Predicate biggerThan = testPackedValue -> {
+  for (int dim = 0; dim < numIndexDimensions; dim++) {
+  final int offset = dim * bytesPerDim;
+  if (allowEqual) {
+  if (comparator.compare(testPackedValue, offset, packedValue, 
offset) < 0) {
+  return false;
+  }
+  } else {
+  if (comparator.compare(testPackedValue, offset, packedValue, 
offset) <= 0) {
+  return false;
+  }
+  }
+  }
+  return true;
+  };
+  return nextDoc(values.getPointTree(), biggerThan);
+  }
+
+  private int nextDoc(PointValues.PointTree pointTree, Predicate 
biggerThan) throws IOException {
+  if (biggerThan.test(pointTree.getMaxPackedValue()) == false) {
+  // doc is before us
+  return -1;
+  } else if (pointTree.moveToChild()) {
+  // navigate down
+  do {
+  final int doc = nextDoc(pointTree, biggerThan);
+  if (doc != -1) {
+  return doc;
+  }
+  } while (pointTree.moveToSibling());
+  pointTree.moveToParent();
+  return -1;
+  } else {
+  // doc is in this leaf
+  final int[] doc = { -1 };
+  pointTree.visitDocValues(new IntersectVisitor() {
+  @Override
+  public void visit(int docID) {
+  throw new AssertionError("Invalid call to visit(docID)");
+  }
+
+  @Override
+  public void visit(int docID, byte[] packedValue) {
+  if (doc[0] == -1 && biggerThan.test(packedValue)) {
+  doc[0] = docID;
+  }
+  }
+
+  @Override
+  public Relation compare(byte[] minPackedValue, byte[] 
maxPackedValue) {
+  return Relation.CELL_CROSSES_QUERY;
+  }
+  });
+  return doc[0];
+  }
+  }
+
+  private boolean matchAll(PointValues points, byte[] queryLowerPoint, byte[] 
queryUpperPoint) throws IOException {
+  final ByteArrayComparator comparator = 
ArrayUtil.getUnsignedComparator(points.getBytesPerDimension());
+  for (int dim = 0; dim < points.getNumDimensions(); dim++) {
+  int offset = dim * points.getBytesPerDimension();
+  if (comparator.compare(points.getMinPackedValue(), offset, 
queryUpperPoint, offset) > 0) {
+  return false;
+  }
+  if (comparator.compare(points.getMaxPackedValue(), offset, 
queryLowerPoint, offset) < 0) {
+  return false;
+  }
+  if (comparator.compare(points.getMinPackedValue(), offset, 
queryLowerPoint, offset) < 0
+  || comparator.compare(points.getMaxPackedValue(), offset, 
queryUpperPoint, offset) > 0) {
+  return false;
+  }
+  }
+  return true;
+  }
+
+  private BoundedDocSetIdIterator 
getDocIdSetIteratorOrNullFromBkd(LeafReaderContext context, DocIdSetIterator 
delegate)
+  throws IOException {
+  Sort indexSort = context.reader().getMetaData().getSort();
+  if (indexSort != null
+  && indexSort.getSort().length > 0
+  && indexSort.getSort()[0].getField().equals(field)
+  && !indexSort.getSort()[0].getReverse()) {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please con

[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

2022-02-28 Thread GitBox



wjp719 commented on pull request #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1054926857


   @iverase I add a random test, please review it again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10446) Add a precise cost of score in ScorerSupplier

2022-02-28 Thread Lu Xugang (Jira)

Lu Xugang created LUCENE-10446:
--

 Summary: Add a precise cost of score in ScorerSupplier
 Key: LUCENE-10446
 URL: https://issues.apache.org/jira/browse/LUCENE-10446
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Lu Xugang


Some queries could sometime actually provide a precise cost of Score like 
RangeFieldQuery, PointRangeQuery, SpatialQuery. maybe we could do some 
optimization by using this precise cost.

Like in IndexOrDocValuesQuery, when indexScorerSupplier or/and 
dvScorerSupplier's precise cost is reader.maxDoc, we will supply the right 
Scorer instead of according to the condition of threshold <= leadCost which 
sometime supply a  inappropriate Score when IndexOrDocValuesQuery not a lead 
iterator.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

2022-02-28 Thread Michael Bien (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499321#comment-17499321
 ] 

Michael Bien commented on LUCENE-10431:
---

IMO: If you don't want client code to use setters: deprecate them. Setters 
should either work or they shouldn't, it shouldn't depend implementation 
details like eager hashcode initialization and fail due to a certain query type 
in the tree.

I would also investigate the following: does the lazy hashcode logic make sense 
in context of the constructor essentially initializing it eagerly anyway?


The problem for the NetBeans module I am attempting to migrate is though: Some 
of the queries are not created by netbeans, as you can see in this code 
(https://github.com/apache/netbeans/blob/04fa8fba812566a211462fc3eef73597fbf3a975/java/maven.indexer/src/org/netbeans/modules/maven/indexer/NexusRepositoryIndexerImpl.java#L1389-L1457
 ), they are created by maven-indexer, a third party dependency.

So you could remove the setters but this would slow the lucene 5->8 migration 
down (for this particular part of NB at least, lucene is used in several 
places), since someone would have to try to update the API it in maven-indexer 
first, which would have to happen after its fixed in lucene. NB would be last 
in the chain.

> AssertionError in BooleanQuery.hashCode()
> -
>
> Key: LUCENE-10431
> URL: https://issues.apache.org/jira/browse/LUCENE-10431
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 8.11.1
>Reporter: Michael Bien
>Priority: Major
>
> Hello devs,
> the constructor of BooleanQuery can under some circumstances trigger a hash 
> code computation before "clauseSets" is fully filled. Since BooleanClause is 
> using its query field for the hash code too, it can happen that the "wrong" 
> hash code is stored, since adding the clause to the set triggers its 
> hashCode().
> If assertions are enabled the check in BooleanQuery, which recomputes the 
> hash code, will notice it and throw an error.
> exception:
> {code:java}
> java.lang.AssertionError
>     at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614)
>     at java.base/java.util.Objects.hashCode(Objects.java:103)
>     at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298)
>     at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527)
>     at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119)
>     at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717)
>     at java.base/java.util.EnumMap.hashCode(EnumMap.java:709)
>     at java.base/java.util.Arrays.hashCode(Arrays.java:4498)
>     at java.base/java.util.Objects.hash(Objects.java:133)
>     at 
> org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597)
>     at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611)
>     at java.base/java.util.HashMap.hash(HashMap.java:340)
>     at java.base/java.util.HashMap.put(HashMap.java:612)
>     at org.apache.lucene.search.Multiset.add(Multiset.java:82)
>     at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154)
>     at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42)
>     at 
> org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133)
> {code}
> I noticed this while trying to upgrade the NetBeans maven indexer modules 
> from lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[GitHub] [lucene] codaitya commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

[GitHub] [lucene] wjp719 edited a comment on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

[GitHub] [lucene] jpountz commented on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

[GitHub] [lucene] jpountz edited a comment on pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

[GitHub] [lucene] jpountz commented on a change in pull request #446: LUCENE-10237 : Add MergeOnCommitTieredMergePolicy to sandbox

[GitHub] [lucene] jpountz commented on pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[GitHub] [lucene] iverase commented on a change in pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

[GitHub] [lucene] LuXugang commented on pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[GitHub] [lucene-solr] thelabdude commented on a change in pull request #2644: SOLR-16009 Add custom udfs for filtering inside multi-valued fields

[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

[GitHub] [lucene] jpountz merged pull request #715: LUCENE-10442: When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[jira] [Resolved] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[jira] [Commented] (LUCENE-10442) When indexQuery or/and dvQuery be a MatchAllDocsQuery then IndexOrDocValuesQuery should be rewrite to MatchAllDocsQuery

[jira] [Comment Edited] (LUCENE-10428) getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge leading to busy threads in infinite loop

[GitHub] [lucene-solr] kiranchitturi commented on a change in pull request #2644: SOLR-16009 Add custom udfs for filtering inside multi-valued fields

[GitHub] [lucene] wjp719 commented on a change in pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

[GitHub] [lucene] wjp719 commented on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search

[jira] [Created] (LUCENE-10446) Add a precise cost of score in ScorerSupplier

[jira] [Commented] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

26 matches

Site Navigation

Mail list logo

Footer information