[GitHub] [lucene] jpountz commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


jpountz commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047526553


   In my opinion the API as it is today isn't bad. The only thing we might want 
to change is to make `DocIdSetBuilder#grow` take a long instead of an int.
   
   Maybe it's a javadocs issue because `DocIdSetBuilder#grow` says that it 
returns "a `BulkAdder` object that can be used to add up to `numDocs` 
documents", which might suggest that `numDocs` is the number of unique 
documents contributed, when in fact this number is simply an upper bound of the 
number of times that you may call `BulkAdder#add` on the returned `BulkAdder` 
object.
   
   > I'm still a bit confused about why we need to grow(long) on a bitset that 
can only hold Integer.MAX_VALUE elements.
   
   This doesn't have anything to do with the `long counter` that you looked at.
   
   The point of `BulkAdder#add` is to call it every time we find a matching 
(docID, value) pair, and the number of matching pairs may be larger than 
`Integer#MAX_VALUE` (e.g. a range over a multi-valued field that matches all 
docs but one), hence the long. This is the same reason why e.g. 
`SortedSetDocValues#nextOrd` returns a long.
   
   > in the sparse/buffer case, wouldn't a much simpler estimation simply be 
the length of int array?
   
   This is already the case today, see the `else` block in 
`DocIdSetBuilder#build`. The cost estimation logic only happens in the dense 
case when a `FixedBitSet` is used to hold the set of matching docs.
   
   FWIW we could change the estimation logic to perform a popCount over a 
subset of the `FixedBitSet` and scale it according to the size of the bitset or 
something along these lines, if we think that it would be better than tracking 
this counter and dividing it by the number of values per doc.
   
   > I'm also confused why we have this sorted array buffer case instead of 
using SparseFixedBitSet
   
   `SparseFixedBitSet` is the right choice for the sparse case when you need 
something that implements the `BitSet` API. Here we only need to produce a 
`DocIdSet` and buffering doc IDs into an array and sorting them using radix 
sort proved to be faster than accumulating doc IDs into a `SparseFixedBitSet`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #698: LUCENE-10429: Change how DocIdSetBuilder compute the cost of the dense iterator

2022-02-22 Thread GitBox


jpountz commented on pull request #698:
URL: https://github.com/apache/lucene/pull/698#issuecomment-1047529128


   > This is inconsistent with the #grow method where the counter is increased 
as it expects grow to be called for documents and no values.
   
   Actually my expectation is that `grow()` is called with a number of values, 
not unique documents. Javadocs say "documents" today, which might be a source 
of confusion, but it is really an upper bound of the number of times 
`BulkAdder#add` may be called, ie. an upper bound of the number of matching 
*values*?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery

2022-02-22 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17495945#comment-17495945
 ] 

Adrien Grand commented on LUCENE-10424:
---

With the linked pull request, we limit this new case to single-valued 1D 
fields, but it actually works with fields that have multiple dimensions and/or 
that are multi-valued?

> Optimize the "everything matches" case for count query in PointRangeQuery
> -
>
> Key: LUCENE-10424
> URL: https://issues.apache.org/jira/browse/LUCENE-10424
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In Implement of Weight#count in PointRangeQuery, Whether additional 
> consideration is needed that when PointValues#getDocCount() == 
> IndexReader#maxDoc() and the range's lower bound is less that the field's min 
> value and the range's upper bound is greater than the field's max value, then 
> return reader.maxDoc() directly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #698: LUCENE-10429: Change how DocIdSetBuilder compute the cost of the dense iterator

2022-02-22 Thread GitBox


iverase commented on pull request #698:
URL: https://github.com/apache/lucene/pull/698#issuecomment-1047575915


   > Actually my expectation is that grow() is called with a number of values, 
not unique documents.
   
   Then it is wrong that accepts an int and should accept a long? which is what 
Robert complains about 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10431) AssertionError in BooleanQuery.hashCode()

2022-02-22 Thread Michael Bien (Jira)
Michael Bien created LUCENE-10431:
-

 Summary: AssertionError in BooleanQuery.hashCode()
 Key: LUCENE-10431
 URL: https://issues.apache.org/jira/browse/LUCENE-10431
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 8.11.1
Reporter: Michael Bien


Hello devs,

the constructor of BooleanQuery can under some circumstances trigger a hash 
code computation before "clauseSets" is fully filled. Since BooleanClause is 
using its query field for the hash code too, it can happen that the "wrong" 
hash code is stored, since adding the clause to the set triggers its hashCode().

If assertions are enabled the check in BooleanQuery, which recomputes the hash 
code, will notice it and throw an error.

exception:
{code:java}
java.lang.AssertionError
    at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:614)
    at java.base/java.util.Objects.hashCode(Objects.java:103)
    at java.base/java.util.HashMap$Node.hashCode(HashMap.java:298)
    at java.base/java.util.AbstractMap.hashCode(AbstractMap.java:527)
    at org.apache.lucene.search.Multiset.hashCode(Multiset.java:119)
    at java.base/java.util.EnumMap.entryHashCode(EnumMap.java:717)
    at java.base/java.util.EnumMap.hashCode(EnumMap.java:709)
    at java.base/java.util.Arrays.hashCode(Arrays.java:4498)
    at java.base/java.util.Objects.hash(Objects.java:133)
    at 
org.apache.lucene.search.BooleanQuery.computeHashCode(BooleanQuery.java:597)
    at org.apache.lucene.search.BooleanQuery.hashCode(BooleanQuery.java:611)
    at java.base/java.util.HashMap.hash(HashMap.java:340)
    at java.base/java.util.HashMap.put(HashMap.java:612)
    at org.apache.lucene.search.Multiset.add(Multiset.java:82)
    at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:154)
    at org.apache.lucene.search.BooleanQuery.(BooleanQuery.java:42)
    at 
org.apache.lucene.search.BooleanQuery$Builder.build(BooleanQuery.java:133)

{code}

I noticed this while trying to upgrade the NetBeans maven indexer modules from 
lucene 5.x to 8.x https://github.com/apache/netbeans/pull/3558



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-22 Thread GitBox


mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r811815617



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.SearcherManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex extends QueryIndex {
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+this.populateQueryCache(serializer, decomposer);

Review comment:
   Yes.
   @romseygeek Do you think it could make sense using the purge executor here 
too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-22 Thread GitBox


mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r811817151



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.SearcherManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex extends QueryIndex {
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+this.populateQueryCache(serializer, decomposer);
+  }
+
+  @Override
+  public void commit(List updates) throws IOException {
+throw new IllegalStateException("Monitor is readOnly cannot commit");
+  }
+
+  @Override
+  long search(final Query query, QueryCollector matcher) throws IOException {
+QueryBuilder builder = termFilter -> query;
+return search(builder, matcher);
+  }
+
+  @Override
+  public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws 
IOException {
+IndexSearcher searcher = null;
+try {
+  searcher = manager.acquire();
+  return searchInMemory(queryBuilder, matcher, searcher, this.queries);
+} finally {
+  if (searcher != null) {
+manager.release(searcher);
+  }
+}
+  }
+
+  @Override
+  void purgeCache(CachePopulator populator) throws IOException {
+final ConcurrentMap newCache = new 
ConcurrentHashMap<>();

Review comment:
   True, but then we have to assign iot to `queries` that it is in the 
abstract class and it is concurrent




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10416) Update Korean Dictionary for Nori

2022-02-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496032#comment-17496032
 ] 

ASF subversion and git services commented on LUCENE-10416:
--

Commit c22d6d09d9b9b9d44fd88e886ed3105c5a927a63 in lucene's branch 
refs/heads/branch_9x from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c22d6d0 ]

Revert "LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 
for Nori"

This reverts commit b2b35964663bfbf2063884d7dcda6818d5b590e1.


> Update Korean Dictionary for Nori
> -
>
> Key: LUCENE-10416
> URL: https://issues.apache.org/jira/browse/LUCENE-10416
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uihyun Kim
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10416.patch
>
>
> For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, 
> which is available under an Apache license here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic]
>  
> The dictionary hasn't been updated in Nori although it has some updates to 
> provide better analysis results. Downloading is available here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads]
>  * Currently used in Nori: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  * Latest: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  
> There are changes between the currently used version and the latest release 
> version(change log: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md])
>  * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태
>  * Fix: correct unexpectedly huge cost on NNG/장소
>  * New words
>  
> There's no issue with testing :lucene:analysis:nori:test and building a new 
> binary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10416) Update Korean Dictionary for Nori

2022-02-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496034#comment-17496034
 ] 

ASF subversion and git services commented on LUCENE-10416:
--

Commit f8040d565fc25c6b7388d9300c2cc890315bc9cd in lucene's branch 
refs/heads/main from Tomoko Uchida
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f8040d5 ]

LUCENE-10416: move changes entry to v10.0.0


> Update Korean Dictionary for Nori
> -
>
> Key: LUCENE-10416
> URL: https://issues.apache.org/jira/browse/LUCENE-10416
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uihyun Kim
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
> Attachments: LUCENE-10416.patch
>
>
> For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, 
> which is available under an Apache license here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic]
>  
> The dictionary hasn't been updated in Nori although it has some updates to 
> provide better analysis results. Downloading is available here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads]
>  * Currently used in Nori: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  * Latest: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  
> There are changes between the currently used version and the latest release 
> version(change log: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md])
>  * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태
>  * Fix: correct unexpectedly huge cost on NNG/장소
>  * New words
>  
> There's no issue with testing :lucene:analysis:nori:test and building a new 
> binary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10416) Update Korean Dictionary for Nori

2022-02-22 Thread Tomoko Uchida (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-10416:
---
Fix Version/s: (was: 9.1)

> Update Korean Dictionary for Nori
> -
>
> Key: LUCENE-10416
> URL: https://issues.apache.org/jira/browse/LUCENE-10416
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uihyun Kim
>Priority: Minor
> Fix For: 10.0 (main)
>
> Attachments: LUCENE-10416.patch
>
>
> For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, 
> which is available under an Apache license here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic]
>  
> The dictionary hasn't been updated in Nori although it has some updates to 
> provide better analysis results. Downloading is available here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads]
>  * Currently used in Nori: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  * Latest: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  
> There are changes between the currently used version and the latest release 
> version(change log: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md])
>  * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태
>  * Fix: correct unexpectedly huge cost on NNG/장소
>  * New words
>  
> There's no issue with testing :lucene:analysis:nori:test and building a new 
> binary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10416) Update Korean Dictionary for Nori

2022-02-22 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496038#comment-17496038
 ] 

Tomoko Uchida commented on LUCENE-10416:


I'd revert it from the 9x branch since I can't estimate the impact. It'd be 
easy to backport this again to 9x. Let me know if you'd like to have this in 
9.1.

> Update Korean Dictionary for Nori
> -
>
> Key: LUCENE-10416
> URL: https://issues.apache.org/jira/browse/LUCENE-10416
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Uihyun Kim
>Priority: Minor
> Fix For: 10.0 (main)
>
> Attachments: LUCENE-10416.patch
>
>
> For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, 
> which is available under an Apache license here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic]
>  
> The dictionary hasn't been updated in Nori although it has some updates to 
> provide better analysis results. Downloading is available here: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads]
>  * Currently used in Nori: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  * Latest: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.0.3-20170922.tar.gz]
>  
> There are changes between the currently used version and the latest release 
> version(change log: 
> [https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/CHANGES.md])
>  * New feature: added semantic class for NNG - 장소, 행위, 상태변화, 정적상태
>  * Fix: correct unexpectedly huge cost on NNG/장소
>  * New words
>  
> There's no issue with testing :lucene:analysis:nori:test and building a new 
> binary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-22 Thread GitBox


mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r811871746



##
File path: 
lucene/monitor/src/test/org/apache/lucene/monitor/TestMonitorReadonly.java
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import java.util.Collections;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.index.IndexNotFoundException;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.store.FSDirectory;
+import org.junit.Test;
+
+public class TestMonitorReadonly extends MonitorTestBase {
+  private static final Analyzer ANALYZER = new WhitespaceAnalyzer();
+
+  @Test
+  public void testReadonlyMonitorThrowsOnInexistentIndex() {
+Path indexDirectory = createTempDir();
+MonitorConfiguration config =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse),
+true);
+assertThrows(
+IndexNotFoundException.class,
+() -> {
+  new Monitor(ANALYZER, config);
+});
+  }
+
+  @Test
+  public void testReadonlyMonitorThrowsWhenCallingWriteRequests() throws 
IOException {
+Path indexDirectory = createTempDir();
+MonitorConfiguration writeConfig =
+new MonitorConfiguration()
+.setIndexPath(
+indexDirectory, 
MonitorQuerySerializer.fromParser(MonitorTestBase::parse));
+
+// this will create the index
+Monitor writeMonitor = new Monitor(ANALYZER, writeConfig);
+writeMonitor.close();
+
+MonitorConfiguration config =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse),
+true);
+try (Monitor monitor = new Monitor(ANALYZER, config)) {
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+TermQuery query = new TermQuery(new Term(FIELD, "test"));
+monitor.register(
+new MonitorQuery("query1", query, query.toString(), 
Collections.emptyMap()));
+  });
+
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+monitor.deleteById("query1");
+  });
+
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+monitor.clear();
+  });
+}
+  }
+
+  @Test
+  public void testSettingCustomDirectory() throws IOException {
+Path indexDirectory = createTempDir();
+Document doc = new Document();
+doc.add(newTextField(FIELD, "This is a Foobar test document", 
Field.Store.NO));
+
+MonitorConfiguration writeConfig =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse));
+
+try (Monitor writeMonitor = new Monitor(ANALYZER, writeConfig)) {
+  TermQuery query = new TermQuery(new Term(FIELD, "test"));
+  writeMonitor.register(
+  new MonitorQuery("query1", query, query.toString(), 
Collections.emptyMap()));
+  TermQuery query2 = new TermQuery(new Term(FIELD, "Foobar"));
+  writeMonitor.register(
+  new MonitorQuery("query2", query2, query.toString(), 
Collections.emptyMap()));
+  MatchingQueries matches = writeMonitor.match(doc, 
QueryMatch.SIMPLE_MATCHER);
+  assertNotNull(matches.getMatches());
+  assertEquals(2, matches.getMatchCount());
+  assertNotNull(matches.matches("query2"));
+}
+  }
+
+  public void testMonitorReadOnlyCouldReadOnTheSameIndex() throws IOException {
+Path indexDirectory = createTempDir();
+Document doc = new Document();
+doc.add(newTextField(FIELD, "This is a te

[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


rmuir commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047729980


   > In my opinion the API as it is today isn't bad. The only thing we might 
want to change is to make `DocIdSetBuilder#grow` take a long instead of an int.
   
   I've really tried, I think I have to just give up. Having a `grow(long)` on 
something with `DocIdSet` in its name is beyond bad, it is terrible.
   
   Please, please, please don't make this change to take a long.
   
   > > I'm still a bit confused about why we need to grow(long) on a bitset 
that can only hold Integer.MAX_VALUE elements.
   > 
   > This doesn't have anything to do with the `long counter` that you looked 
at.
   > 
   > The point of `BulkAdder#add` is to call it every time we find a matching 
(docID, value) pair, and the number of matching pairs may be larger than 
`Integer#MAX_VALUE` (e.g. a range over a multi-valued field that matches all 
docs but one), hence the long. This is the same reason why e.g. 
`SortedSetDocValues#nextOrd` returns a long.
   
   Sure it does. I'm looking at the only code using the 64-bit value, and 
that's the `counter`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


jpountz commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047748712


   > Having a grow(long) on something with DocIdSet in its name is beyond bad, 
it is terrible.
   
   Would it look better if we gave it a different name that doesn't suggest 
that it relates to the number of docs in the set, e.g. `prepareAdd` or 
something along these lines?
   
   > Please, please, please don't make this change to take a long.
   
   I have a preference for making it a long but I'm ok with keeping it an 
integer. The downside is that it pushes the problem to callers, which need to 
make sure that they never add more than `Integer.MAX_VALUE` documents with the 
same `BulkAdder`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


rmuir commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047750276


   https://user-images.githubusercontent.com/504194/155133007-71ec1d81-a2bd-485d-b7e6-17a10cd78fdf.png";>
   
   I've uploaded a screenshot here of how the only thing using 64-bits is this 
stupid `counter`. Guys, we really have to agree on this simple fact to proceed. 
It is a fact!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


rmuir commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047752932


   Yeah, there seems to be some disagreement about what the code is actually 
doing. Probably because it is too confusing. Recommend (as i did before) to 
temporarily remove `counter` and cost estimation from here. Then you will see 
that 64 bits is not needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


iverase commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047752116


   I don't understand all this discussion. Looking at the cost of a 
DocIdSetIterator:
   
   ```
 /**
  * Returns the estimated cost of this {@link DocIdSetIterator}.
  *
  * This is generally an upper bound of the number of documents this 
iterator might match, but
  * may be a rough heuristic, hardcoded value, or otherwise completely 
inaccurate.
  */
 public abstract long cost();
   ```
   
   Why it is ok a long here?  I think the dance we are doing on the BKD reader 
when wee are visiting more that Integer.MAX_VALUE documents is wrong and should 
be fixed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase edited a comment on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


iverase edited a comment on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047752116


   I don't understand all this discussion. Looking at the cost of a 
DocIdSetIterator:
   
   ```
 /**
  * Returns the estimated cost of this {@link DocIdSetIterator}.
  *
  * This is generally an upper bound of the number of documents this 
iterator might match, but
  * may be a rough heuristic, hardcoded value, or otherwise completely 
inaccurate.
  */
 public abstract long cost();
   ```
   
   Why it is ok a long here?  I think the dance we are doing on the BKD reader 
when wee are visiting more that Integer.MAX_VALUE ~documents~ points is wrong 
and should be fixed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] iverase commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


iverase commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047786510


   If you go a bit higher top in that class:
   
   https://user-images.githubusercontent.com/29038686/155139791-fb87fedb-22a0-44a7-86a6-60b6af84f177.png";>
   
   We are throwing 32 bits there now? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


rmuir commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047788256


   it's fine to do that since only 32 bits are needed.
   
   nothing uses 64-bits here, hence changing the api signature to a `long` is 
wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on pull request #692: LUCENE-10311: Different implementations of DocIdSetBuilder for points and terms

2022-02-22 Thread GitBox


rmuir commented on pull request #692:
URL: https://github.com/apache/lucene/pull/692#issuecomment-1047789904


   Seriously, let's remove this `counter` and cost estimation. @jpountz tells 
me I am wrong, but you can plainly see from the code, this issue is all about 
that. Everything else is only using 32 bits.
   
   If we remove the silly `counter` and bad cost estimator, it will be clear 
that adding a `long` to this API is not needed: nothing needs the extra 32 
bits, nothing uses the extra 32 bits!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery

2022-02-22 Thread Lu Xugang (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496168#comment-17496168
 ] 

Lu Xugang commented on LUCENE-10424:


??but it actually works with fields that have multiple dimensions and/or that 
are multi-valued??
Yes, but I am not sure why in the implementation of Weight#count , only 1D 
fields case was considered,  it seems count query can work on multi dimensions, 
please tell me if I missed something.

??we limit this new case to single-valued 1D fields??
If so, maybe we should support multi dimensions in Weight#count?


> Optimize the "everything matches" case for count query in PointRangeQuery
> -
>
> Key: LUCENE-10424
> URL: https://issues.apache.org/jira/browse/LUCENE-10424
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In Implement of Weight#count in PointRangeQuery, Whether additional 
> consideration is needed that when PointValues#getDocCount() == 
> IndexReader#maxDoc() and the range's lower bound is less that the field's min 
> value and the range's upper bound is greater than the field's max value, then 
> return reader.maxDoc() directly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery

2022-02-22 Thread Lu Xugang (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496168#comment-17496168
 ] 

Lu Xugang edited comment on LUCENE-10424 at 2/22/22, 3:22 PM:
--

{quote}but it actually works with fields that have multiple dimensions and/or 
that are multi-valued{quote}
Yes, but I am not sure why in the implementation of Weight#count , only 1D 
fields case was considered,  it seems count query can work on multi dimensions, 
please tell me if I missed something.

{quote}we limit this new case to single-valued 1D fields{quote}
If so, maybe we should support multi dimensions in Weight#count?



was (Author: chrislu):
??but it actually works with fields that have multiple dimensions and/or that 
are multi-valued??
Yes, but I am not sure why in the implementation of Weight#count , only 1D 
fields case was considered,  it seems count query can work on multi dimensions, 
please tell me if I missed something.

??we limit this new case to single-valued 1D fields??
If so, maybe we should support multi dimensions in Weight#count?


> Optimize the "everything matches" case for count query in PointRangeQuery
> -
>
> Key: LUCENE-10424
> URL: https://issues.apache.org/jira/browse/LUCENE-10424
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In Implement of Weight#count in PointRangeQuery, Whether additional 
> consideration is needed that when PointValues#getDocCount() == 
> IndexReader#maxDoc() and the range's lower bound is less that the field's min 
> value and the range's upper bound is greater than the field's max value, then 
> return reader.maxDoc() directly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-22 Thread GitBox


mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r811871746



##
File path: 
lucene/monitor/src/test/org/apache/lucene/monitor/TestMonitorReadonly.java
##
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import java.util.Collections;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.Field;
+import org.apache.lucene.index.IndexNotFoundException;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.store.FSDirectory;
+import org.junit.Test;
+
+public class TestMonitorReadonly extends MonitorTestBase {
+  private static final Analyzer ANALYZER = new WhitespaceAnalyzer();
+
+  @Test
+  public void testReadonlyMonitorThrowsOnInexistentIndex() {
+Path indexDirectory = createTempDir();
+MonitorConfiguration config =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse),
+true);
+assertThrows(
+IndexNotFoundException.class,
+() -> {
+  new Monitor(ANALYZER, config);
+});
+  }
+
+  @Test
+  public void testReadonlyMonitorThrowsWhenCallingWriteRequests() throws 
IOException {
+Path indexDirectory = createTempDir();
+MonitorConfiguration writeConfig =
+new MonitorConfiguration()
+.setIndexPath(
+indexDirectory, 
MonitorQuerySerializer.fromParser(MonitorTestBase::parse));
+
+// this will create the index
+Monitor writeMonitor = new Monitor(ANALYZER, writeConfig);
+writeMonitor.close();
+
+MonitorConfiguration config =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse),
+true);
+try (Monitor monitor = new Monitor(ANALYZER, config)) {
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+TermQuery query = new TermQuery(new Term(FIELD, "test"));
+monitor.register(
+new MonitorQuery("query1", query, query.toString(), 
Collections.emptyMap()));
+  });
+
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+monitor.deleteById("query1");
+  });
+
+  assertThrows(
+  IllegalStateException.class,
+  () -> {
+monitor.clear();
+  });
+}
+  }
+
+  @Test
+  public void testSettingCustomDirectory() throws IOException {
+Path indexDirectory = createTempDir();
+Document doc = new Document();
+doc.add(newTextField(FIELD, "This is a Foobar test document", 
Field.Store.NO));
+
+MonitorConfiguration writeConfig =
+new MonitorConfiguration()
+.setDirectoryProvider(
+() -> FSDirectory.open(indexDirectory),
+MonitorQuerySerializer.fromParser(MonitorTestBase::parse));
+
+try (Monitor writeMonitor = new Monitor(ANALYZER, writeConfig)) {
+  TermQuery query = new TermQuery(new Term(FIELD, "test"));
+  writeMonitor.register(
+  new MonitorQuery("query1", query, query.toString(), 
Collections.emptyMap()));
+  TermQuery query2 = new TermQuery(new Term(FIELD, "Foobar"));
+  writeMonitor.register(
+  new MonitorQuery("query2", query2, query.toString(), 
Collections.emptyMap()));
+  MatchingQueries matches = writeMonitor.match(doc, 
QueryMatch.SIMPLE_MATCHER);
+  assertNotNull(matches.getMatches());
+  assertEquals(2, matches.getMatchCount());
+  assertNotNull(matches.matches("query2"));
+}
+  }
+
+  public void testMonitorReadOnlyCouldReadOnTheSameIndex() throws IOException {
+Path indexDirectory = createTempDir();
+Document doc = new Document();
+doc.add(newTextField(FIELD, "This is a te

[jira] [Resolved] (LUCENE-10412) Improve handling of MatchNoDocsQuery in rewrite rules

2022-02-22 Thread Adrien Grand (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-10412.
---
Fix Version/s: 9.1
   Resolution: Fixed

> Improve handling of MatchNoDocsQuery in rewrite rules
> -
>
> Key: LUCENE-10412
> URL: https://issues.apache.org/jira/browse/LUCENE-10412
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Having MatchNoDocsQuery in your query tree usually doesn't make the query 
> slower, but by recognizing it in rewrite rules, we could perform rewrites 
> which would then sometimes unlock other rewrite rules.
> For instance if you have a boolean query with 2 should clauses where one is a 
> MatchAllDocsQuery and the other one is a MatchNoDocsQuery, we would naively 
> run it as a disjunction today, while we could rewrite it to a 
> MatchAllDocsQuery and leverage its specialized bulk scorer.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mogui commented on pull request #679: Monitor Improvements LUCENE-10422

2022-02-22 Thread GitBox


mogui commented on pull request #679:
URL: https://github.com/apache/lucene/pull/679#issuecomment-1047984458


   @romseygeek I should have fixed everything, also added few lines of docs to 
explain read-only behaviour.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10432) Add optional 'name' property to org.apache.lucene.search.Explanation

2022-02-22 Thread Andriy Redko (Jira)
Andriy Redko created LUCENE-10432:
-

 Summary: Add optional 'name' property to 
org.apache.lucene.search.Explanation 
 Key: LUCENE-10432
 URL: https://issues.apache.org/jira/browse/LUCENE-10432
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Andriy Redko


Right now, the `Explanation` class has the `description` property which is used 
pretty much as placeholder for free-style, human readable summary of what is 
happening. This is totally fine but it would be great to have a bit more formal 
way to link the explanation with corresponding function / query / filter if 
supported by the underlying engine.

Example: Opensearch / Elasticseach has the concept of named queries / filters 
[1]. This is not supported by Apache Lucene at the moment but it would be 
helpful to propagate this information back as part of Explanation tree, for 
example by introducing  optional 'name' property:

 
{noformat}
{
"value": 0.0,
"description": "script score function, computed with script: ...", 
"name": "script1",
"details": [
 {
 "value": 1.0,
 "description": "_score: ",
 "details": [
  {
  "value": 1.0,
  "description": "*:*",
  "details": []
   }
  ]
  }
]
}{noformat}
 

>From the other side, the `name` property may look like not belonging here, the 
>alternative suggestion would be to add support of `properties` /  `parameters` 
>/ `tags` key/value bag, for example:

 
{noformat}
{
"value": 0.0,
"description": "script score function, computed with script: ...", 
"tags": [
   {  "name": "script1" }
],
"details": [
 {
 "value": 1.0,
 "description": "_score: ",
 "details": [
  {
  "value": 1.0,
  "description": "*:*",
  "details": []
   }
  ]
  }
]
}{noformat}
The change should be non-breaking but quite useful for engines to enrich the 
`Explanation` with additional context.

[1] 
https://www.elastic.co/guide/en/elasticsearch/reference/7.16/query-dsl-bool-query.html#named-queries

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10432) Add optional 'name' property to org.apache.lucene.search.Explanation

2022-02-22 Thread Andriy Redko (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496292#comment-17496292
 ] 

Andriy Redko commented on LUCENE-10432:
---

[~jpountz] my apologies for pinging you directly, curious if this small 
improvement makes sense or not really, before doing any work on pull request, 
thank you!

> Add optional 'name' property to org.apache.lucene.search.Explanation 
> -
>
> Key: LUCENE-10432
> URL: https://issues.apache.org/jira/browse/LUCENE-10432
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Andriy Redko
>Priority: Minor
>
> Right now, the `Explanation` class has the `description` property which is 
> used pretty much as placeholder for free-style, human readable summary of 
> what is happening. This is totally fine but it would be great to have a bit 
> more formal way to link the explanation with corresponding function / query / 
> filter if supported by the underlying engine.
> Example: Opensearch / Elasticseach has the concept of named queries / filters 
> [1]. This is not supported by Apache Lucene at the moment but it would be 
> helpful to propagate this information back as part of Explanation tree, for 
> example by introducing  optional 'name' property:
>  
> {noformat}
> {
> "value": 0.0,
> "description": "script score function, computed with script: ...",
>  
> "name": "script1",
> "details": [
>  {
>  "value": 1.0,
>  "description": "_score: ",
>  "details": [
>   {
>   "value": 1.0,
>   "description": "*:*",
>   "details": []
>}
>   ]
>   }
> ]
> }{noformat}
>  
> From the other side, the `name` property may look like not belonging here, 
> the alternative suggestion would be to add support of `properties` /  
> `parameters` / `tags` key/value bag, for example:
>  
> {noformat}
> {
> "value": 0.0,
> "description": "script score function, computed with script: ...",
>  
> "tags": [
>{  "name": "script1" }
> ],
> "details": [
>  {
>  "value": 1.0,
>  "description": "_score: ",
>  "details": [
>   {
>   "value": 1.0,
>   "description": "*:*",
>   "details": []
>}
>   ]
>   }
> ]
> }{noformat}
> The change should be non-breaking but quite useful for engines to enrich the 
> `Explanation` with additional context.
> [1] 
> https://www.elastic.co/guide/en/elasticsearch/reference/7.16/query-dsl-bool-query.html#named-queries
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] andywebb1975 opened a new pull request #2643: Make Config API work for warming queries

2022-02-22 Thread GitBox


andywebb1975 opened a new pull request #2643:
URL: https://github.com/apache/lucene-solr/pull/2643


   This is my attempt at resolving 
https://issues.apache.org/jira/browse/SOLR-9359 - it's still very 
work-in-progress, hence all the debug output etc, but if anyone has thoughts on 
it please let me know.
   
   I don't know if there's a better way to do this without all the 
`getClass()`/`instanceof` checking?
   
   With this patch in place it becomes possible to send `add/update-listener` 
commands to the Config API like this, and they take effect as expected rather 
than throwing a `ClassCastException`:
   
   ```
   {
 "update-listener": {
   "name": "warming-queries",
   "event": "newSearcher",
   "class": "solr.QuerySenderListener",
   "queries": [
 [
   {
 "q": "foo"
   },
   {
 "q": "bar"
   }
 ]
   ]
 }
   }
   ```
   
   Note the nested array: without that, only the first query in the list is 
picked up - the rest don't appear in the `getArgs().get("queries")` response at 
all. I don't know if that's fixable but I suspect it'd require more widespread 
changes so I've steered clear of that thus far.
   
   (Also, this class is virtually the same in the new Solr repo - I'd raise a 
PR for that too.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10433) we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select()

2022-02-22 Thread kkewwei (Jira)
kkewwei created LUCENE-10433:


 Summary: we should pass l instead of d to 
getFallbackSelector(d).select in RadixSelector.select()
 Key: LUCENE-10433
 URL: https://issues.apache.org/jira/browse/LUCENE-10433
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 8.6.2
Reporter: kkewwei


In the `RadixSelector.select`
{code:java}
  private void select(int from, int to, int k, int d, int l) {
if (to - from <= LENGTH_THRESHOLD || d >= LEVEL_THRESHOLD) { 
  getFallbackSelector(d).select(from, to, k); 
} else {
  radixSelect(from, to, k, d, l); 
}
  }
{code}
we know that `l` represent the levels of recursion, not the `d`, but when we 
check the levels of recursion, we use `d >= LEVEL_THRESHOLD`, it's not right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-10433) we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select()

2022-02-22 Thread kkewwei (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kkewwei updated LUCENE-10433:
-
Component/s: core/other

> we should pass l instead of d to getFallbackSelector(d).select in 
> RadixSelector.select()
> 
>
> Key: LUCENE-10433
> URL: https://issues.apache.org/jira/browse/LUCENE-10433
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.6.2
>Reporter: kkewwei
>Priority: Major
>
> In the `RadixSelector.select`
> {code:java}
>   private void select(int from, int to, int k, int d, int l) {
> if (to - from <= LENGTH_THRESHOLD || d >= LEVEL_THRESHOLD) { 
>   getFallbackSelector(d).select(from, to, k); 
> } else {
>   radixSelect(from, to, k, d, l); 
> }
>   }
> {code}
> we know that `l` represent the levels of recursion, not the `d`, but when we 
> check the levels of recursion, we use `d >= LEVEL_THRESHOLD`, it's not right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10433) we should pass l instead of d to getFallbackSelector(d).select in RadixSelector.select()

2022-02-22 Thread kkewwei (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kkewwei resolved LUCENE-10433.
--
Resolution: Resolved

> we should pass l instead of d to getFallbackSelector(d).select in 
> RadixSelector.select()
> 
>
> Key: LUCENE-10433
> URL: https://issues.apache.org/jira/browse/LUCENE-10433
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 8.6.2
>Reporter: kkewwei
>Priority: Major
>
> In the `RadixSelector.select`
> {code:java}
>   private void select(int from, int to, int k, int d, int l) {
> if (to - from <= LENGTH_THRESHOLD || d >= LEVEL_THRESHOLD) { 
>   getFallbackSelector(d).select(from, to, k); 
> } else {
>   radixSelect(from, to, k, d, l); 
> }
>   }
> {code}
> we know that `l` represent the levels of recursion, not the `d`, but when we 
> check the levels of recursion, we use `d >= LEVEL_THRESHOLD`, it's not right.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani opened a new pull request #699: LUCENE-10054: Make sure to use Lucene90 codec in unit tests

2022-02-22 Thread GitBox


jtibshirani opened a new pull request #699:
URL: https://github.com/apache/lucene/pull/699


   Before we were using the default Lucene91 codec, so we weren't exercising the
   old format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani opened a new pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-22 Thread GitBox


jtibshirani opened a new pull request #700:
URL: https://github.com/apache/lucene/pull/700


   The original PR that added kNN filtering support overlooked non-default 
codecs.
   This follow-up ensures that other codecs work with the new filtering logic:
   * Make sure to check the visited nodes limit in `SimpleTextKnnVectorsReader`
   and `Lucene90HnswVectorsReader`
   * Add a test `BaseKnnVectorsFormatTestCase` to cover this case
   * Fix failures in `TestKnnVectorQuery#testRandomWithFilter`, whose 
assumptions
   don't hold when SimpleText is used
   
   This PR also clarifies the limit checking logic for
   `Lucene91HnswVectorsReader`. Now we always check the limit before visiting a
   new node, whereas before we only checked it in an outer loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-22 Thread GitBox


jtibshirani commented on pull request #700:
URL: https://github.com/apache/lucene/pull/700#issuecomment-1048394775


   This will fix the nightly test failures. Example repro:
   
   ```
   ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
-Dtests.seed=C4BEEB7EDCFB4E6C -Dtests.slow=true
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10434) Improve handling of DocValuesRangeQuery in rewrite rules

2022-02-22 Thread Lu Xugang (Jira)
Lu Xugang created LUCENE-10434:
--

 Summary: Improve handling of DocValuesRangeQuery in rewrite rules
 Key: LUCENE-10434
 URL: https://issues.apache.org/jira/browse/LUCENE-10434
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Lu Xugang


Since DocValuesFieldExistsQuery's rewrite rule has been implemented in  
[LUCENE-10084|https://issues.apache.org/jira/browse/LUCENE-10084], maybe those 
Queries who rewrite to the DocValuesFieldExistsQuery should be rewrite further?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-10434) Improve handling of DocValuesRangeQuery in rewrite rules

2022-02-22 Thread Lu Xugang (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang resolved LUCENE-10434.

Resolution: Not A Problem

> Improve handling of DocValuesRangeQuery in rewrite rules
> 
>
> Key: LUCENE-10434
> URL: https://issues.apache.org/jira/browse/LUCENE-10434
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Minor
>
> Since DocValuesFieldExistsQuery's rewrite rule has been implemented in  
> [LUCENE-10084|https://issues.apache.org/jira/browse/LUCENE-10084], maybe 
> those Queries who rewrite to the DocValuesFieldExistsQuery should be rewrite 
> further?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10434) Improve handling of DocValuesRangeQuery in rewrite rules

2022-02-22 Thread Lu Xugang (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496460#comment-17496460
 ] 

Lu Xugang commented on LUCENE-10434:


Oh, it seems IndexSearch#rewrite will handle this

> Improve handling of DocValuesRangeQuery in rewrite rules
> 
>
> Key: LUCENE-10434
> URL: https://issues.apache.org/jira/browse/LUCENE-10434
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Minor
>
> Since DocValuesFieldExistsQuery's rewrite rule has been implemented in  
> [LUCENE-10084|https://issues.apache.org/jira/browse/LUCENE-10084], maybe 
> those Queries who rewrite to the DocValuesFieldExistsQuery should be rewrite 
> further?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang commented on a change in pull request #677: LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field

2022-02-22 Thread GitBox


LuXugang commented on a change in pull request #677:
URL: https://github.com/apache/lucene/pull/677#discussion_r812597747



##
File path: 
lucene/core/src/java/org/apache/lucene/search/DocValuesFieldExistsQuery.java
##
@@ -64,6 +67,24 @@ public void visit(QueryVisitor visitor) {
 }
   }
 
+  @Override
+  public Query rewrite(IndexReader reader) throws IOException {
+int rewritableReaders = 0;
+for (LeafReaderContext context : reader.leaves()) {
+  LeafReader leaf = context.reader();
+  Terms terms = leaf.terms(field);
+  PointValues pointValues = leaf.getPointValues(field);
+  if ((terms != null && terms.getDocCount() == leaf.maxDoc())

Review comment:
   If condition false, maybe we should break for loop early?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-10435) Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-22 Thread Lu Xugang (Jira)
Lu Xugang created LUCENE-10435:
--

 Summary: Break loop early while checking whether 
DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery
 Key: LUCENE-10435
 URL: https://issues.apache.org/jira/browse/LUCENE-10435
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Lu Xugang


In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one 
Segment can't match the condition occurs, maybe we should break loop directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] LuXugang opened a new pull request #701: LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-22 Thread GitBox


LuXugang opened a new pull request #701:
URL: https://github.com/apache/lucene/pull/701


   In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when 
one Segment can't match the condition occurs, maybe we should break loop 
directly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org