date:20220217

[jira] [Created] (LUCENE-10424) Optimize the "everything matches" case for count queries in PointRangeQuery

2022-02-17 Thread Lu Xugang (Jira)

Lu Xugang created LUCENE-10424:
--

 Summary: Optimize the "everything matches" case for count queries 
in PointRangeQuery
 Key: LUCENE-10424
 URL: https://issues.apache.org/jira/browse/LUCENE-10424
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 9.1
Reporter: Lu Xugang


In Implement of Weight#count in PointRangeQuery, Whether additional 
consideration is needed that when PointValues#getDocCount() == 
IndexReader#maxDoc() and the range's lower bound is less that the field's min 
value and the range's upper bound is greater than the field's max value, then 
return reader.maxDoc() directly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

2022-02-17 Thread GitBox



mayya-sharipova commented on a change in pull request #656:
URL: https://github.com/apache/lucene/pull/656#discussion_r808862885



##
File path: lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java
##
@@ -455,6 +484,61 @@ public void testRandom() throws IOException {
 }
   }
 
+  /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */
+  public void testRandomWithFilter() throws IOException {
+int numDocs = 200;
+int dimension = atLeast(5);
+int numIters = atLeast(10);
+try (Directory d = newDirectory()) {
+  RandomIndexWriter w = new RandomIndexWriter(random(), d);
+  for (int i = 0; i < numDocs; i++) {
+Document doc = new Document();
+doc.add(new KnnVectorField("field", randomVector(dimension)));
+doc.add(new NumericDocValuesField("tag", i));
+doc.add(new IntPoint("tag", i));
+w.addDocument(doc);
+  }
+  w.close();
+
+  try (IndexReader reader = DirectoryReader.open(d)) {
+IndexSearcher searcher = newSearcher(reader);
+for (int i = 0; i < numIters; i++) {
+  int lower = random().nextInt(50);
+
+  // Check that when filter is restrictive, we use exact search
+  Query filter = IntPoint.newRangeQuery("tag", lower, lower + 6);
+  KnnVectorQuery query = new KnnVectorQuery("field", 
randomVector(dimension), 5, filter);
+  TopDocs results = searcher.search(query, numDocs);
+  assertEquals(TotalHits.Relation.EQUAL_TO, 
results.totalHits.relation);
+  assertEquals(results.totalHits.value, 5);

Review comment:
   Thanks for the explanation, I missed a part about rewriting to  
`DocAndScoreQuery`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mayya-sharipova merged pull request #649: LUCENE-10408 Better encoding of doc Ids in vectors

2022-02-17 Thread GitBox



mayya-sharipova merged pull request #649:
URL: https://github.com/apache/lucene/pull/649


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



romseygeek commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r808905427



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   AIUI, this implementation doesn't have an in-memory query cache, and 
re-parses the queries every time we do a match.  I think having a lazy parser 
is definitely a valid use-case but I think we should decouple it from the 
notion of a read-only monitor.

##
File path: lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java
##
@@ -125,14 +108,21 @@ public Monitor(Analyzer analyzer, Presearcher 
presearcher, MonitorConfiguration
* Monitor's queryindex
*
* @param listener listener to register
+   * @throws IllegalStateException when Monitor is readonly
*/
   public void addQueryIndexUpdateListener(MonitorUpdateListener listener) {
-listeners.add(listener);

Review comment:
   I think we can just make `addListener()` a method on `QueryIndex` and 
delegate there? And then we don't need the `readOnly` member variable on 
`Monitor`

##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new Ill

[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread Mayya Sharipova (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493858#comment-17493858
 ] 

Mayya Sharipova commented on LUCENE-10408:
--

This issue is concerned with dense case where all documents have vectors. We 
will explore a sparse case in follow-up.

> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread Mayya Sharipova (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova updated LUCENE-10408:
-
Summary: Better dense encoding of doc Ids in Lucene91HnswVectorsFormat  
(was: Better encoding of doc Ids in Lucene91HnswVectorsFormat)

> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread Mayya Sharipova (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova resolved LUCENE-10408.
--
Fix Version/s: 9.1
   Resolution: Fixed

> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Closed] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread Mayya Sharipova (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayya Sharipova closed LUCENE-10408.


> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r808924539



##
File path: lucene/monitor/src/java/org/apache/lucene/monitor/Monitor.java
##
@@ -125,14 +108,21 @@ public Monitor(Analyzer analyzer, Presearcher 
presearcher, MonitorConfiguration
* Monitor's queryindex
*
* @param listener listener to register
+   * @throws IllegalStateException when Monitor is readonly
*/
   public void addQueryIndexUpdateListener(MonitorUpdateListener listener) {
-listeners.add(listener);

Review comment:
   yes better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r808924635



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/MonitorConfiguration.java
##
@@ -47,16 +49,39 @@ private static IndexWriterConfig defaultIndexWriterConfig() 
{
 return iwc;
   }
 
+  public Boolean isReadOnly() {

Review comment:
   sure
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493867#comment-17493867
 ] 

ASF subversion and git services commented on LUCENE-10408:
--

Commit 3355273630b8396cfd51a770caf6213a9e2fba3f in lucene's branch 
refs/heads/branch_9x from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=3355273 ]

LUCENE-10408 Better encoding of doc Ids in vectors (#649)

Better encoding of doc Ids in Lucene91HnswVectorsFormat
for a dense case where all docs have vectors.

Currently we write doc Ids of all documents that have vectors
not very efficiently.
This improve their encoding by for a case when all documents
have vectors, we don't write document IDs, but just write a
single short value – a dense marker.

> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493866#comment-17493866
 ] 

ASF subversion and git services commented on LUCENE-10408:
--

Commit f8c5408be78fe98e1e8ed61ce999d6fb1f643eb2 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=f8c5408 ]

LUCENE-10408 Better encoding of doc Ids in vectors (#649)

Better encoding of doc Ids in Lucene91HnswVectorsFormat
for a dense case where all docs have vectors.

Currently we write doc Ids of all documents that have vectors
not very efficiently.
This improve their encoding by for a case when all documents
have vectors, we don't write document IDs, but just write a
single short value – a dense marker.

> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r808925141



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+  }
+
+  @Override
+  public void commit(List updates) throws IOException {
+throw new IllegalStateException("Monitor is readOnly cannot commit");
+  }
+
+  @Override
+  public MonitorQuery getQuery(String queryId) throws IOException {
+if (serializer == null) {
+  throw new IllegalStateException(
+  "Cannot get queries from an index with no MonitorQuerySerializer");
+}
+BytesRef[] bytesHolder = new BytesRef[1];
+search(
+new TermQuery(new Term(WritableQueryIndex.FIELDS.query_id, queryId)),

Review comment:
   yes it' something that I thought about, I'll drop the interface for an  
abstract class




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r808930507



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   Yes, it all goes to the `MonitorQueryCollector` that relies on the 
in-memory query  cache, and it is an internal class of WritableQueryIndex.
   
   Are you suggesting to decouple `ReadonlyMonitorQueryCollector` as a lazy 
query parser,  outside Readonly Monitor ?
   Am I getting it right? @romseygeek 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r808938064



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   Or maybe restore alle the query cache logic in the abstract class and 
selectively choose if use it or not for both implementations?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10420) Move functional interfaces in IOUtils to top-level interfaces

2022-02-17 Thread Tomoko Uchida (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida resolved LUCENE-10420.

Fix Version/s: 9.1
   10.0 (main)
   Resolution: Fixed

> Move functional interfaces in IOUtils to top-level interfaces
> -
>
> Key: LUCENE-10420
> URL: https://issues.apache.org/jira/browse/LUCENE-10420
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Tomoko Uchida
>Priority: Minor
> Fix For: 9.1, 10.0 (main)
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Suggested at https://github.com/apache/lucene/pull/643#discussion_r802285404.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list

2022-02-17 Thread jianping weng (Jira)

jianping weng created LUCENE-10425:
--

 Summary: Lucene supports bkd binary search and return current 
index of posting list
 Key: LUCENE-10425
 URL: https://issues.apache.org/jira/browse/LUCENE-10425
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: jianping weng






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 opened a new pull request #687: support binary search in PointValues

2022-02-17 Thread GitBox



wjp719 opened a new pull request #687:
URL: https://github.com/apache/lucene/pull/687


add common function for caller to binary search in bkdPointTree.
   
   One possible use case is:  for log data, when indexd sort in ascend order by 
@timestamp field, when we want to run count aggregation query to find the count 
of document in many time interval, we can use the binary search to find out the 
min/max docId in on  time interval, and the doc count=max docId- min docId +1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 opened a new pull request #688: PostingsEnum supports to return current index of postings

2022-02-17 Thread GitBox



wjp719 opened a new pull request #688:
URL: https://github.com/apache/lucene/pull/688


   PostingsEnum supports to return current index of postings
   
   As we known, the docId list in .doc is like an array of integer, every 
element of this array is a docId, when we call nextDoc or advance method, it 
will move to another element of this array. this pr support to return current 
index of the element in the posting list. We can use method to find the diff 
value when we call advance, the diff value means we skip how many doc in this 
docId list.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list

2022-02-17 Thread jianping weng (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10425:
---
Component/s: core/search
Description: 
In log scenario, we usually want to know the doc count of documents between 
every time intervals. One possible optimized method is to sort the docuemt in 
ascend order according to @timestamp field in one segment. then we can use 
pr[[https://github.com/apache/lucene/pull/687]] to find out the min/max docId 
in on time interval.

If there is no other filter query, the doc count of one time interval is (max 
docId- min docId +1)

if there is only one another term filter query, we can use 
pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, 
when we call advance(minId) and advance(maxId), the diff value is also the doc 
count of one time interval

 

> Lucene supports bkd binary search and return current index of posting list
> --
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use 
> pr[[https://github.com/apache/lucene/pull/687]] to find out the min/max docId 
> in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use 
> pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of 
> index, when we call advance(minId) and advance(maxId), the diff value is also 
> the doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list

2022-02-17 Thread jianping weng (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10425:
---
Description: 
In log scenario, we usually want to know the doc count of documents between 
every time intervals. One possible optimized method is to sort the docuemt in 
ascend order according to @timestamp field in one segment. then we can use   
[pr|[https://github.com/apache/lucene/pull/687]] to find out the min/max docId 
in on time interval.

If there is no other filter query, the doc count of one time interval is (max 
docId- min docId +1)

if there is only one another term filter query, we can use 
pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, 
when we call advance(minId) and advance(maxId), the diff value is also the doc 
count of one time interval

 

  was:
In log scenario, we usually want to know the doc count of documents between 
every time intervals. One possible optimized method is to sort the docuemt in 
ascend order according to @timestamp field in one segment. then we can use 
pr[[https://github.com/apache/lucene/pull/687]] to find out the min/max docId 
in on time interval.

If there is no other filter query, the doc count of one time interval is (max 
docId- min docId +1)

if there is only one another term filter query, we can use 
pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, 
when we call advance(minId) and advance(maxId), the diff value is also the doc 
count of one time interval

 


> Lucene supports bkd binary search and return current index of posting list
> --
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use   
> [pr|[https://github.com/apache/lucene/pull/687]] to find out the min/max 
> docId in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use 
> pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of 
> index, when we call advance(minId) and advance(maxId), the diff value is also 
> the doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10425) Lucene supports bkd binary search and return current index of posting list

2022-02-17 Thread jianping weng (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10425:
---
Description: 
In log scenario, we usually want to know the doc count of documents between 
every time intervals. One possible optimized method is to sort the docuemt in 
ascend order according to @timestamp field in one segment. then we can use    
this pr [https://github.com/apache/lucene/pull/687] to find out the min/max 
docId in on time interval.

If there is no other filter query, the doc count of one time interval is (max 
docId- min docId +1)

if there is only one another term filter query, we can use this pr 
[https://github.com/apache/lucene/pull/688 
|https://github.com/apache/lucene/pull/688]to get the diff value of index, when 
we call advance(minId) and advance(maxId), the diff value is also the doc count 
of one time interval

 

  was:
In log scenario, we usually want to know the doc count of documents between 
every time intervals. One possible optimized method is to sort the docuemt in 
ascend order according to @timestamp field in one segment. then we can use   
[pr|[https://github.com/apache/lucene/pull/687]] to find out the min/max docId 
in on time interval.

If there is no other filter query, the doc count of one time interval is (max 
docId- min docId +1)

if there is only one another term filter query, we can use 
pr[[https://github.com/apache/lucene/pull/688]] to get the diff value of index, 
when we call advance(minId) and advance(maxId), the diff value is also the doc 
count of one time interval

 


> Lucene supports bkd binary search and return current index of posting list
> --
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use    
> this pr [https://github.com/apache/lucene/pull/687] to find out the min/max 
> docId in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use this pr 
> [https://github.com/apache/lucene/pull/688 
> |https://github.com/apache/lucene/pull/688]to get the diff value of index, 
> when we call advance(minId) and advance(maxId), the diff value is also the 
> doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



romseygeek commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r809038865



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   So the way the Monitor works at the moment is that it parses all the 
serialized queries in its QueryIndex on startup, and stores them in an 
in-memory cache.  This means that when we run a document through the Monitor, 
once it has identified which candidate queries to run against it we don't need 
to re-parse them, they are already instantiated in RAM.  The alternative, which 
I think is what you've implemented here, is to re-parse the query every time we 
need to run it.  This is perfectly reasonable (in fact it's what 
elasticsearch's percolator does) but it is a significant change in behaviour so 
I don't think we should fold it in as part of this ticket.
   
   The Writeable query index needs to have all the tricksy behaviour around 
re-populating the cache, as we need to remove deleted entries or replace 
updated entries when queries are added or deleted (cache invalidation is hard, 
apparently!); the read-only index can just hold everything in a single Map that 
is populated on startup and never changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r809070886



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   Ok it's more clear now.
   I can do it that way, but The readonly monitor would need a way to 
repopulate the cache too, assuming there are other writer that insert and 
delete on the same index, the readonly would never get the delta untile it gets 
re-instantiated to populate its Map. 
   This way would not be so useful.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r809076507



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   What do you think if I keep all the in-memory cache things along with 
the purgeExecutor in the abstract QueryIndex class and let the readonly monitor 
use it too, closing this round of improvements.
   Then open another changeset to implement the lazy parsing.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



romseygeek commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r809078793



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   At the moment you wouldn't pick up any changes anyway, because you're 
not calling `maybeRefresh()` on the SearcherManager so you would always get the 
same view of the index.  If you want a dynamic view then you'll need  a 
background refresh thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r809081950



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   Ah, good to know! what do you think it's best to do? Make sense what I 
have proposed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-17 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r809081950



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex implements QueryIndex {
+  private final SearcherManager manager;
+  private final QueryDecomposer decomposer;
+  private final MonitorQuerySerializer serializer;
+
+  final Map 
termFilters = new HashMap<>();
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();

Review comment:
   Ah, good to know! What do you think it's best to do? Makes sense what I 
have proposed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on pull request #688: LUCENE-10425：PostingsEnum supports to return current index of postings

2022-02-17 Thread GitBox



msokolov commented on pull request #688:
URL: https://github.com/apache/lucene/pull/688#issuecomment-1043123035


   I'm concerned exposing this could limit future implementations by requiring 
them to support it. Do you have a proposed use case that would justify needing 
to add this? Maybe post to java-u...@lucene.apache.org explaining the use case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



gsmiller commented on pull request #678:
URL: https://github.com/apache/lucene/pull/678#issuecomment-1043211273


   Going ahead and merging this now since there's been no opposition (of course 
it could always be backed out later if someone did take strong issue with 
this). Thanks again @spike-liu !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494092#comment-17494092
 ] 

ASF subversion and git services commented on LUCENE-10398:
--

Commit fc3c790ab421122e7aa2f20453cb468def712123 in lucene's branch 
refs/heads/main from spike.liu
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fc3c790 ]

LUCENE-10398: Add static method for getting Terms from LeafReader (#678)

Co-authored-by: cheng...@ctrip.com 

> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller merged pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



gsmiller merged pull request #678:
URL: https://github.com/apache/lucene/pull/678


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494102#comment-17494102
 ] 

ASF subversion and git services commented on LUCENE-10398:
--

Commit 00029f1ec4a952b4345d966c00dc5abe7b9b8af1 in lucene's branch 
refs/heads/main from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=00029f1 ]

Add CHANGES entry for LUCENE-10398


> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller opened a new pull request #689: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



gsmiller opened a new pull request #689:
URL: https://github.com/apache/lucene/pull/689


   Backporting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller merged pull request #689: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



gsmiller merged pull request #689:
URL: https://github.com/apache/lucene/pull/689


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494136#comment-17494136
 ] 

ASF subversion and git services commented on LUCENE-10398:
--

Commit db2cd347a7aa543d123d157a9ca8e2c63844a82d in lucene's branch 
refs/heads/branch_9x from Greg Miller
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=db2cd34 ]

LUCENE-10398: Add static method for getting Terms from LeafReader (#689)

Co-authored-by: spike.liu 

> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread Greg Miller (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller resolved LUCENE-10398.
--
Fix Version/s: 9.1
   Resolution: Fixed

> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494137#comment-17494137
 ] 

Greg Miller commented on LUCENE-10398:
--

Merged onto {{main}} and {{{}branch_9x{}}}. Resolving. Thanks again 
[~spike.liu] !

> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10426) Should we create a static factory method for loading VectorValues?

2022-02-17 Thread Greg Miller (Jira)

Greg Miller created LUCENE-10426:


 Summary: Should we create a static factory method for loading 
VectorValues?
 Key: LUCENE-10426
 URL: https://issues.apache.org/jira/browse/LUCENE-10426
 Project: Lucene - Core
  Issue Type: Wish
  Components: core/index
Reporter: Greg Miller


Similar to the recent work in LUCENE-10398, it might be useful to add a static 
factory method for loading {{VectorValues}} that returns an "empty" 
{{VectorValues}} instance if the field doesn't exist in a segment (and also 
throws if the field is not configured as a vector field). This follows the same 
pattern of the static factory methods found in {{{}DocValues{}}}. 

I'm less convinced this is useful to add right now since I don't really see any 
existing usages of {{{}LeafReader#getVectorValues{}}}, so maybe the value isn't 
there right now? 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #677: LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field

2022-02-17 Thread GitBox



jtibshirani merged pull request #677:
URL: https://github.com/apache/lucene/pull/677


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494169#comment-17494169
 ] 

ASF subversion and git services commented on LUCENE-10084:
--

Commit c132bbf677b5eb4d3ff0acf838b4d8f2c4e0327e in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c132bbf ]

LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all 
docs have the field (#677)

Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.

> Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == 
> maxDoc
> 
>
> Key: LUCENE-10084
> URL: https://issues.apache.org/jira/browse/LUCENE-10084
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms 
> or points have a docCount that is equal to maxDoc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494170#comment-17494170
 ] 

ASF subversion and git services commented on LUCENE-9334:
-

Commit c132bbf677b5eb4d3ff0acf838b4d8f2c4e0327e in lucene's branch 
refs/heads/main from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=c132bbf ]

LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all 
docs have the field (#677)

Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.

> Require consistency between data-structures on a per-field basis
> 
>
> Key: LUCENE-9334
> URL: https://issues.apache.org/jira/browse/LUCENE-9334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Follow-up of 
> https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E.
> We would like to start requiring consitency across data-structures on a 
> per-field basis in order to make it easier to do the right thing by default: 
> range queries can run faster if doc values are enabled, sorted queries can 
> run faster if points by indexed, etc.
> This would be a big change, so it should be rolled out in a major.
> Strict validation is tricky to implement, but we should still implement 
> best-effort validation:
>  - Documents all use the same data-structures, e.g. it is illegal for a 
> document to only enable points and another document to only enable doc values,
>  - When possible, check whether values are consistent too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9334) Require consistency between data-structures on a per-field basis

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494176#comment-17494176
 ] 

ASF subversion and git services commented on LUCENE-9334:
-

Commit a9532f32866cad79c65fb4b0f220140b69757c42 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a9532f3 ]

LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all 
docs have the field (#677)

Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.

> Require consistency between data-structures on a per-field basis
> 
>
> Key: LUCENE-9334
> URL: https://issues.apache.org/jira/browse/LUCENE-9334
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Follow-up of 
> https://lists.apache.org/thread.html/r747de568afd7502008c45783b74cc3aeb31dab8aa60fcafaf65d5431%40%3Cdev.lucene.apache.org%3E.
> We would like to start requiring consitency across data-structures on a 
> per-field basis in order to make it easier to do the right thing by default: 
> range queries can run faster if doc values are enabled, sorted queries can 
> run faster if points by indexed, etc.
> This would be a big change, so it should be rolled out in a major.
> Strict validation is tricky to implement, but we should still implement 
> best-effort validation:
>  - Documents all use the same data-structures, e.g. it is illegal for a 
> document to only enable points and another document to only enable doc values,
>  - When possible, check whether values are consistent too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494175#comment-17494175
 ] 

ASF subversion and git services commented on LUCENE-10084:
--

Commit a9532f32866cad79c65fb4b0f220140b69757c42 in lucene's branch 
refs/heads/branch_9x from Vigya Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a9532f3 ]

LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all 
docs have the field (#677)

Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.

> Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == 
> maxDoc
> 
>
> Key: LUCENE-10084
> URL: https://issues.apache.org/jira/browse/LUCENE-10084
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms 
> or points have a docCount that is equal to maxDoc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10084) Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == maxDoc

2022-02-17 Thread Julie Tibshirani (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julie Tibshirani resolved LUCENE-10084.
---
Fix Version/s: 9.1
   Resolution: Fixed

> Rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery when docCount == 
> maxDoc
> 
>
> Key: LUCENE-10084
> URL: https://issues.apache.org/jira/browse/LUCENE-10084
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now that we require all documents to use the same features (LUCENE-9334) we 
> could rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms 
> or points have a docCount that is equal to maxDoc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

2022-02-17 Thread GitBox



jtibshirani merged pull request #656:
URL: https://github.com/apache/lucene/pull/656


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494181#comment-17494181
 ] 

ASF subversion and git services commented on LUCENE-10382:
--

Commit 8ca372573dba0f4755b982b0c36a2b87aaf4705b in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8ca3725 ]

LUCENE-10382: Support filtering in KnnVectorQuery (#656)

This PR adds support for a query filter in KnnVectorQuery. First, we gather the
query results for each leaf as a bit set. Then the HNSW search skips over the
non-matching documents (using the same approach as for live docs). To prevent
HNSW search from visiting too many documents when the filter is very selective,
we short-circuit if HNSW has already visited more than the number of documents
that match the filter, and execute an exact search instead. This bounds the
number of visited documents at roughly 2x the cost of just running the exact
filter, while in most cases HNSW completes successfully and does a lot better.

Co-authored-by: Joel Bernstein 

> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani opened a new pull request #690: LUCENE-10408: Fix vector valuese iteration bug

2022-02-17 Thread GitBox



jtibshirani opened a new pull request #690:
URL: https://github.com/apache/lucene/pull/690


   Now that there is special logic to handle the dense case, we need to adjust 
some
   assertions in VectorValues#advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on pull request #690: LUCENE-10408: Fix vector values iteration bug

2022-02-17 Thread GitBox



jtibshirani commented on pull request #690:
URL: https://github.com/apache/lucene/pull/690#issuecomment-1043407860


   This fixes recent test failures in `TestKnnVectorQuery`. One example:
   
   ```
   ./gradlew test --tests TestKnnVectorQuery.testRandomWithFilter 
-Dtests.seed=1748D2B9616D6A6B
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10408) Better dense encoding of doc Ids in Lucene91HnswVectorsFormat

2022-02-17 Thread Julie Tibshirani (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494235#comment-17494235
 ] 

Julie Tibshirani commented on LUCENE-10408:
---

A test we recently added caught a small issue with the iteration logic. I 
opened https://github.com/apache/lucene/pull/690 to address it.

> Better dense encoding of doc Ids in Lucene91HnswVectorsFormat
> -
>
> Key: LUCENE-10408
> URL: https://issues.apache.org/jira/browse/LUCENE-10408
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mayya Sharipova
>Assignee: Mayya Sharipova
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently we write doc Ids of all documents that have vectors as is.  We 
> should improve their encoding either using delta encoding or bitset.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-02-17 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494283#comment-17494283
 ] 

ASF subversion and git services commented on LUCENE-10382:
--

Commit af40b448227e07e93d12c62f9dcf083b92f6eb51 in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=af40b44 ]

LUCENE-10382: Support filtering in KnnVectorQuery (#656)

This PR adds support for a query filter in KnnVectorQuery. First, we gather the
query results for each leaf as a bit set. Then the HNSW search skips over the
non-matching documents (using the same approach as for live docs). To prevent
HNSW search from visiting too many documents when the filter is very selective,
we short-circuit if HNSW has already visited more than the number of documents
that match the filter, and execute an exact search instead. This bounds the
number of visited documents at roughly 2x the cost of just running the exact
filter, while in most cases HNSW completes successfully and does a lot better.

Co-authored-by: Joel Bernstein 


> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] spike-liu commented on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



spike-liu commented on pull request #678:
URL: https://github.com/apache/lucene/pull/678#issuecomment-1043674843


   > Going ahead and merging this now since there's been no opposition (of 
course it could always be backed out later if someone did take strong issue 
with this). Thanks again @spike-liu !
   
   It is my pleasure, Greg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] spike-liu edited a comment on pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



spike-liu edited a comment on pull request #678:
URL: https://github.com/apache/lucene/pull/678#issuecomment-1043674843


   > Going ahead and merging this now since there's been no opposition (of 
course it could always be backed out later if someone did take strong issue 
with this). Thanks again @spike-liu !
   
   It is my pleasure, Greg.
   
   Really appreciate your note of force pushing changes, which would save me a 
lot of time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread spike liu (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494306#comment-17494306
 ] 

spike liu commented on LUCENE-10398:


It is my pleasure, Greg.

> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-10398) Add static method for getting Terms from LeafReader

2022-02-17 Thread spike liu (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494306#comment-17494306
 ] 

spike liu edited comment on LUCENE-10398 at 2/18/22, 12:41 AM:
---

It is my pleasure, [~gsmiller] 


was (Author: spike.liu):
It is my pleasure, Greg.

> Add static method for getting Terms from LeafReader
> ---
>
> Key: LUCENE-10398
> URL: https://issues.apache.org/jira/browse/LUCENE-10398
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Marc D'Mello
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Hi all, {{LeafReader}} has methods like {{getBinaryDocValues(String field)}} 
> that return {{null}} values if the field is not indexed. These methods also 
> have equivalent {{DocValues}} static methods, such as 
> {{DocValues.getBinary()}}, which return an {{emptyBinary()}} rather than a 
> {{null}} if there is no field. I noticed that {{Terms}} does not have an 
> equivalent static method for {{LeafReader.terms()}} like {{Terms.terms()}} or 
> something similar. I was wondering if there was a reason for this, or if a 
> method like this could be useful. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] spike-liu commented on a change in pull request #678: LUCENE-10398: Add static method for getting Terms from LeafReader

2022-02-17 Thread GitBox



spike-liu commented on a change in pull request #678:
URL: https://github.com/apache/lucene/pull/678#discussion_r809637709



##
File path: lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java
##
@@ -111,12 +111,9 @@ public Explanation explain(LeafReaderContext context, int 
doc) throws IOExceptio
 
   @Override
   public Scorer scorer(LeafReaderContext context) throws IOException {
-Terms terms = context.reader().terms(fieldName);
-if (terms == null) {
-  return null;
-}
+Terms terms = Terms.terms(context.reader(), fieldName);
 TermsEnum termsEnum = terms.iterator();
-if (termsEnum.seekExact(new BytesRef(featureName)) == false) {
+if (!termsEnum.seekExact(new BytesRef(featureName))) {

Review comment:
   Gotcha.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10424) Optimize the "everything matches" case for count query in PointRangeQuery

2022-02-17 Thread Lu Xugang (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang updated LUCENE-10424:
---
Summary: Optimize the "everything matches" case for count query in 
PointRangeQuery  (was: Optimize the "everything matches" case for count queries 
in PointRangeQuery)

> Optimize the "everything matches" case for count query in PointRangeQuery
> -
>
> Key: LUCENE-10424
> URL: https://issues.apache.org/jira/browse/LUCENE-10424
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.1
>Reporter: Lu Xugang
>Priority: Minor
>
> In Implement of Weight#count in PointRangeQuery, Whether additional 
> consideration is needed that when PointValues#getDocCount() == 
> IndexReader#maxDoc() and the range's lower bound is less that the field's min 
> value and the range's upper bound is greater than the field's max value, then 
> return reader.maxDoc() directly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang opened a new pull request #691: LUCENE-10424: Optimize the "everything matches" case for count query in PointRangeQuery

2022-02-17 Thread GitBox



LuXugang opened a new pull request #691:
URL: https://github.com/apache/lucene/pull/691


   In Implement of Weight#count in PointRangeQuery, Whether additional 
consideration is needed that when PointValues#getDocCount() == 
IndexReader#maxDoc() and the range's lower bound is less that the field's min 
value and the range's upper bound is greater than the field's max value, then 
return reader.maxDoc() directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] wjp719 commented on pull request #688: LUCENE-10425：PostingsEnum supports to return current index of postings

2022-02-17 Thread GitBox



wjp719 commented on pull request #688:
URL: https://github.com/apache/lucene/pull/688#issuecomment-1043888635


   @msokolov Hi, I describe a use case in this issue 
https://issues.apache.org/jira/browse/LUCENE-10425 , thanks a lot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10425) log count aggregation optimization inside one segment

2022-02-17 Thread jianping weng (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10425:
---
Summary: log count aggregation optimization inside one segment  (was: 
Lucene supports bkd binary search and return current index of posting list)

> log count aggregation optimization inside one segment
> -
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use    
> this pr [https://github.com/apache/lucene/pull/687] to find out the min/max 
> docId in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use this pr 
> [https://github.com/apache/lucene/pull/688 
> |https://github.com/apache/lucene/pull/688]to get the diff value of index, 
> when we call advance(minId) and advance(maxId), the diff value is also the 
> doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10425) count aggregation optimization inside one segment in log scenario

2022-02-17 Thread jianping weng (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10425:
---
Summary: count aggregation optimization inside one segment in log scenario  
(was: count aggregation optimization inside one segment in log sc)

> count aggregation optimization inside one segment in log scenario
> -
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use    
> this pr [https://github.com/apache/lucene/pull/687] to find out the min/max 
> docId in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use this pr 
> [https://github.com/apache/lucene/pull/688 
> |https://github.com/apache/lucene/pull/688]to get the diff value of index, 
> when we call advance(minId) and advance(maxId), the diff value is also the 
> doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10425) count aggregation optimization inside one segment in log sc

2022-02-17 Thread jianping weng (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jianping weng updated LUCENE-10425:
---
Summary: count aggregation optimization inside one segment in log sc  (was: 
log count aggregation optimization inside one segment)

> count aggregation optimization inside one segment in log sc
> ---
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use    
> this pr [https://github.com/apache/lucene/pull/687] to find out the min/max 
> docId in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use this pr 
> [https://github.com/apache/lucene/pull/688 
> |https://github.com/apache/lucene/pull/688]to get the diff value of index, 
> when we call advance(minId) and advance(maxId), the diff value is also the 
> doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

62 matches

Mail list logo