date:20220223

[GitHub] [lucene] dweiss commented on pull request #696: Custom GitHub action to generate build matrix

2022-02-23 Thread GitBox



dweiss commented on pull request #696:
URL: https://github.com/apache/lucene/pull/696#issuecomment-1048561226


   Hi @mocobeta . This looks impressive that it can be done! I kind of like the 
verbose version because then you can tell what's going on, whereas the custom 
action is much more difficult to understand at first glance. I'm fine with 
committing this in, just observing the fact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10436) Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery into a single FieldExistsQuery?

2022-02-23 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10436:
-

 Summary: Combine DocValuesFieldExistsQuery, NormsFieldExistsQuery 
and KnnVectorFieldExistsQuery into a single FieldExistsQuery?
 Key: LUCENE-10436
 URL: https://issues.apache.org/jira/browse/LUCENE-10436
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand


Now that we require consistency across data structures, we could merge 
DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery 
together into a FieldExistsQuery that would require that the field indexes 
either norms, doc values or vectors?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on pull request #701: LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread GitBox



LuXugang commented on pull request #701:
URL: https://github.com/apache/lucene/pull/701#issuecomment-1048582255


   > 
   
   much more thoughtful！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang removed a comment on pull request #701: LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread GitBox



LuXugang removed a comment on pull request #701:
URL: https://github.com/apache/lucene/pull/701#issuecomment-1048582255


   > 
   
   much more thoughtful！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on pull request #701: LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread GitBox



LuXugang commented on pull request #701:
URL: https://github.com/apache/lucene/pull/701#issuecomment-1048582999


   > then we don't need to maintain the `rewritableReaders` counter?
   
   Much more thoughtful！


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request #702: LUCENE-10382: Use `IndexReaderContext#id` to check reader identity.

2022-02-23 Thread GitBox



jpountz opened a new pull request #702:
URL: https://github.com/apache/lucene/pull/702


   `KnnVectorQuery` currently uses the index reader's hashcode to make sure that
   the query it builds runs on the right reader. We had added
   `IndexContextReader#id` a while back for a similar purpose with `TermStates`,
   let's reuse it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jpountz commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r812722194



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java
##
@@ -821,6 +822,64 @@ public void testRandom() throws Exception {
 }
   }
 
+  public void testSearchWithVisitedLimit() throws Exception {
+IndexWriterConfig iwc = newIndexWriterConfig();
+String fieldName = "field";
+try (Directory dir = newDirectory();
+IndexWriter iw = new IndexWriter(dir, iwc)) {
+  int numDoc = atLeast(300);
+  int dimension = atLeast(10);
+  for (int i = 0; i < numDoc; i++) {
+int id = random().nextInt(numDoc);

Review comment:
   maybe just use `i` as the `id`, it makes things a bit confusing to me 
that some docs might have duplicate ids, and that some of the delete-by-term 
calls below might not delete any document?

##
File path: 
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java
##
@@ -821,6 +822,64 @@ public void testRandom() throws Exception {
 }
   }
 
+  public void testSearchWithVisitedLimit() throws Exception {
+IndexWriterConfig iwc = newIndexWriterConfig();
+String fieldName = "field";
+try (Directory dir = newDirectory();
+IndexWriter iw = new IndexWriter(dir, iwc)) {
+  int numDoc = atLeast(300);
+  int dimension = atLeast(10);
+  for (int i = 0; i < numDoc; i++) {
+int id = random().nextInt(numDoc);
+float[] value;
+if (random().nextInt(7) != 3) {
+  // usually index a vector value for a doc
+  value = randomVector(dimension);
+} else {
+  value = null;
+}
+add(iw, fieldName, id, value, VectorSimilarityFunction.EUCLIDEAN);
+  }
+  iw.forceMerge(1);
+
+  // randomly delete some documents
+  for (int i = 0; i < 30; i++) {
+int idToDelete = random().nextInt(numDoc);
+iw.deleteDocuments(new Term("id", Integer.toString(idToDelete)));
+  }
+
+  try (IndexReader reader = DirectoryReader.open(iw)) {
+for (LeafReaderContext ctx : reader.leaves()) {
+  Bits liveDocs = ctx.reader().getLiveDocs();
+  VectorValues vectorValues = ctx.reader().getVectorValues(fieldName);
+  if (vectorValues == null) {
+continue;
+  }
+
+  // check the limit is hit when it's very small
+  int k = 5 + random().nextInt(45);
+  int visitedLimit = k + random().nextInt(5);
+  TopDocs results =
+  ctx.reader()
+  .searchNearestVectors(
+  fieldName, randomVector(dimension), k, liveDocs, 
visitedLimit);
+  assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, 
results.totalHits.relation);

Review comment:
   Would this be actually guaranteed for every KNN vectors format? It feels 
hard to identify nearest neighbors while visitedLimit is so close to k, but 
maybe some edge cases like completely disconnected graphs could trigger rare 
test failures?

##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
##
@@ -186,6 +188,7 @@ private NeighborQueue searchLevel(
 }
   }
 }
+numVisited++;

Review comment:
   nit: I liked having `numVisited` next to `similarityFunction.compare` 
better since my mental model is that the point of `numVisited` is to compute 
the number of similarity computations that are performed to find the top-k 
nearest neighbors to the query vector.

##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
##
@@ -155,18 +159,13 @@ private NeighborQueue searchLevel(
 if (results.size() >= topK) {
   bound.set(results.topScore());
 }
-while (candidates.size() > 0) {
+while (candidates.size() > 0 && results.incomplete() == false) {

Review comment:
   Maybe extract this to an outer `if` statement so that we only check it 
before entering the while loop. Once we are in the while loop, 
`results.incomplete()` is guaranteed to be `false` since we always break the 
loop after calling `markIncomplete()`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] epugh commented on pull request #2643: SOLR-9359 Make Config API work for warming queries

2022-02-23 Thread GitBox



epugh commented on pull request #2643:
URL: https://github.com/apache/lucene-solr/pull/2643#issuecomment-1048668234


   I like what this does, but I totally agree about all the 
`getClass()`/`instanceof` stuff!   I wonder if burying all that logic into a 
method with a name like `normalizeArgsToQueries()` or maybe just 
`bewareHereBeDragons()` ;-) might help someone understand that this logic is 
odd 
   
   Is there a way to reduce the complexity of `NamedLists` and `ArrayLists` and 
the variation in the JSON structure   
   
   Do we have a test that at least demonstrates the fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10437) Improve error message in the Tessellator for polygon with all points collinear

2022-02-23 Thread Ignacio Vera (Jira)

Ignacio Vera created LUCENE-10437:
-

 Summary: Improve error message in the Tessellator for polygon with 
all points collinear
 Key: LUCENE-10437
 URL: https://issues.apache.org/jira/browse/LUCENE-10437
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Ignacio Vera


Currently the error that is throws only says that it is not possible to 
tessellate but this check is trivial and we can give better information to the 
user.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase opened a new pull request #703: LUCENE-10437: Improve error message in the Tessellator for polygon with all points collinear

2022-02-23 Thread GitBox



iverase opened a new pull request #703:
URL: https://github.com/apache/lucene/pull/703


Polygon tessellator throws a more informative error message when the 
provided polygon does not contain enough no-collinear points.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] romseygeek commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-23 Thread GitBox



romseygeek commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r812845233



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.SearcherManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex extends QueryIndex {
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.queries = new HashMap<>();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+this.populateQueryCache(serializer, decomposer);
+  }
+
+  @Override
+  public void commit(List updates) throws IOException {
+throw new IllegalStateException("Monitor is readOnly cannot commit");
+  }
+
+  @Override
+  long search(final Query query, QueryCollector matcher) throws IOException {
+QueryBuilder builder = termFilter -> query;
+return search(builder, matcher);
+  }
+
+  @Override
+  public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws 
IOException {
+IndexSearcher searcher = null;
+try {
+  searcher = manager.acquire();
+  return searchInMemory(queryBuilder, matcher, searcher, this.queries);
+} finally {
+  if (searcher != null) {
+manager.release(searcher);
+  }
+}
+  }
+
+  @Override
+  public void purgeCache() throws IOException {
+this.populateQueryCache(serializer, decomposer);
+lastPurged = System.nanoTime();
+  }
+
+  @Override
+  void purgeCache(CachePopulator populator) throws IOException {
+manager.maybeRefresh();

Review comment:
   There's a race here, I think.  Once you've called 
`manager.maybeRefresh()` then a subsequent call to `search` will return the new 
searcher, but you may not have updated `this.queries` yet.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] iverase merged pull request #703: LUCENE-10437: Improve error message in the Tessellator for polygon with all points collinear

2022-02-23 Thread GitBox



iverase merged pull request #703:
URL: https://github.com/apache/lucene/pull/703


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10437) Improve error message in the Tessellator for polygon with all points collinear

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496734#comment-17496734
 ] 

ASF subversion and git services commented on LUCENE-10437:
--

Commit ab47db4feef29f7f5739a21988c40a83755359f5 in lucene's branch 
refs/heads/main from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ab47db4 ]

LUCENE-10437:  Improve error message in the Tessellator for polygon with all 
points collinear (#703)

Polygon tessellator throws a more informative error message when the provided 
polygon does not contain enough no-collinear points.

> Improve error message in the Tessellator for polygon with all points collinear
> --
>
> Key: LUCENE-10437
> URL: https://issues.apache.org/jira/browse/LUCENE-10437
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the error that is throws only says that it is not possible to 
> tessellate but this check is trivial and we can give better information to 
> the user.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz merged pull request #701: LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread GitBox



jpountz merged pull request #701:
URL: https://github.com/apache/lucene/pull/701


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10437) Improve error message in the Tessellator for polygon with all points collinear

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496740#comment-17496740
 ] 

ASF subversion and git services commented on LUCENE-10437:
--

Commit fb8d79d96aee1a1e1646e6d23d8997c48b75ff19 in lucene's branch 
refs/heads/branch_9x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=fb8d79d ]

LUCENE-10437:  Improve error message in the Tessellator for polygon with all 
points collinear (#703)

Polygon tessellator throws a more informative error message when the provided 
polygon does not contain enough no-collinear points.

> Improve error message in the Tessellator for polygon with all points collinear
> --
>
> Key: LUCENE-10437
> URL: https://issues.apache.org/jira/browse/LUCENE-10437
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the error that is throws only says that it is not possible to 
> tessellate but this check is trivial and we can give better information to 
> the user.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10435) Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496739#comment-17496739
 ] 

ASF subversion and git services commented on LUCENE-10435:
--

Commit 43e89d6a2920c4b7d0a999133062f0183b0b324d in lucene's branch 
refs/heads/main from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=43e89d6 ]

LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery 
can be rewrite to MatchAllDocsQuery (#701)



> Break loop early while checking whether DocValuesFieldExistsQuery can be 
> rewrite to MatchAllDocsQuery
> -
>
> Key: LUCENE-10435
> URL: https://issues.apache.org/jira/browse/LUCENE-10435
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one 
> Segment can't match the condition occurs, maybe we should break loop directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-10437) Improve error message in the Tessellator for polygon with all points collinear

2022-02-23 Thread Ignacio Vera (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-10437.
---
Fix Version/s: 9.1
 Assignee: Ignacio Vera
   Resolution: Fixed

> Improve error message in the Tessellator for polygon with all points collinear
> --
>
> Key: LUCENE-10437
> URL: https://issues.apache.org/jira/browse/LUCENE-10437
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Assignee: Ignacio Vera
>Priority: Minor
> Fix For: 9.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the error that is throws only says that it is not possible to 
> tessellate but this check is trivial and we can give better information to 
> the user.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10432) Add optional 'name' property to org.apache.lucene.search.Explanation

2022-02-23 Thread Andriy Redko (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy Redko updated LUCENE-10432:
--
Affects Version/s: 8.10.1
   9.0

> Add optional 'name' property to org.apache.lucene.search.Explanation 
> -
>
> Key: LUCENE-10432
> URL: https://issues.apache.org/jira/browse/LUCENE-10432
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0, 8.10.1
>Reporter: Andriy Redko
>Priority: Minor
>
> Right now, the `Explanation` class has the `description` property which is 
> used pretty much as placeholder for free-style, human readable summary of 
> what is happening. This is totally fine but it would be great to have a bit 
> more formal way to link the explanation with corresponding function / query / 
> filter if supported by the underlying engine.
> Example: Opensearch / Elasticseach has the concept of named queries / filters 
> [1]. This is not supported by Apache Lucene at the moment but it would be 
> helpful to propagate this information back as part of Explanation tree, for 
> example by introducing  optional 'name' property:
>  
> {noformat}
> {
> "value": 0.0,
> "description": "script score function, computed with script: ...",
>  
> "name": "script1",
> "details": [
>  {
>  "value": 1.0,
>  "description": "_score: ",
>  "details": [
>   {
>   "value": 1.0,
>   "description": "*:*",
>   "details": []
>}
>   ]
>   }
> ]
> }{noformat}
>  
> From the other side, the `name` property may look like not belonging here, 
> the alternative suggestion would be to add support of `properties` /  
> `parameters` / `tags` key/value bag, for example:
>  
> {noformat}
> {
> "value": 0.0,
> "description": "script score function, computed with script: ...",
>  
> "tags": [
>{  "name": "script1" }
> ],
> "details": [
>  {
>  "value": 1.0,
>  "description": "_score: ",
>  "details": [
>   {
>   "value": 1.0,
>   "description": "*:*",
>   "details": []
>}
>   ]
>   }
> ]
> }{noformat}
> The change should be non-breaking but quite useful for engines to enrich the 
> `Explanation` with additional context.
> [1] 
> https://www.elastic.co/guide/en/elasticsearch/reference/7.16/query-dsl-bool-query.html#named-queries
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #687: LUCENE-10425：speed up IndexSortSortedNumericDocValuesRangeQuery#count using bkd binary search

2022-02-23 Thread GitBox



jpountz commented on pull request #687:
URL: https://github.com/apache/lucene/pull/687#issuecomment-1048791215


   This looks very similar to the implementation of `Weight#count` on 
`PointRangeQuery` and should only perform marginally faster?
   It's uncreal to me whether this PR buys us much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10425) count aggregation optimization inside one segment in log scenario

2022-02-23 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496766#comment-17496766
 ] 

Adrien Grand commented on LUCENE-10425:
---

I'm not sure #687 actually helps compared to what we are already doing.

I like the idea of being able to count the number of hits of conjunctions 
efficiently thanks to index sorting but I think we'll need to expose this using 
a better API.

> count aggregation optimization inside one segment in log scenario
> -
>
> Key: LUCENE-10425
> URL: https://issues.apache.org/jira/browse/LUCENE-10425
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: jianping weng
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In log scenario, we usually want to know the doc count of documents between 
> every time intervals. One possible optimized method is to sort the docuemt in 
> ascend order according to @timestamp field in one segment. then we can use    
> this pr [https://github.com/apache/lucene/pull/687] to find out the min/max 
> docId in on time interval.
> If there is no other filter query, the doc count of one time interval is (max 
> docId- min docId +1)
> if there is only one another term filter query, we can use this pr 
> [https://github.com/apache/lucene/pull/688 
> |https://github.com/apache/lucene/pull/688]to get the diff value of index, 
> when we call advance(minId) and advance(maxId), the diff value is also the 
> doc count of one time interval
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10438) Leverage Weight#count in lucene/facets

2022-02-23 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-10438:
-

 Summary: Leverage Weight#count in lucene/facets
 Key: LUCENE-10438
 URL: https://issues.apache.org/jira/browse/LUCENE-10438
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand


The facet module could leverage Weight#count in order to give fast counts for 
the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mogui commented on a change in pull request #679: Monitor Improvements LUCENE-10422

2022-02-23 Thread GitBox



mogui commented on a change in pull request #679:
URL: https://github.com/apache/lucene/pull/679#discussion_r812915950



##
File path: 
lucene/monitor/src/java/org/apache/lucene/monitor/ReadonlyQueryIndex.java
##
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.monitor;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.lucene.search.Query;
+import org.apache.lucene.search.SearcherManager;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.IOUtils;
+
+class ReadonlyQueryIndex extends QueryIndex {
+
+  public ReadonlyQueryIndex(MonitorConfiguration configuration) throws 
IOException {
+if (configuration.getDirectoryProvider() == null) {
+  throw new IllegalStateException(
+  "You must specify a Directory when configuring a Monitor as 
read-only.");
+}
+Directory directory = configuration.getDirectoryProvider().get();
+this.queries = new HashMap<>();
+this.manager = new SearcherManager(directory, new 
TermsHashBuilder(termFilters));
+this.decomposer = configuration.getQueryDecomposer();
+this.serializer = configuration.getQuerySerializer();
+this.populateQueryCache(serializer, decomposer);
+  }
+
+  @Override
+  public void commit(List updates) throws IOException {
+throw new IllegalStateException("Monitor is readOnly cannot commit");
+  }
+
+  @Override
+  long search(final Query query, QueryCollector matcher) throws IOException {
+QueryBuilder builder = termFilter -> query;
+return search(builder, matcher);
+  }
+
+  @Override
+  public long search(QueryBuilder queryBuilder, QueryCollector matcher) throws 
IOException {
+IndexSearcher searcher = null;
+try {
+  searcher = manager.acquire();
+  return searchInMemory(queryBuilder, matcher, searcher, this.queries);
+} finally {
+  if (searcher != null) {
+manager.release(searcher);
+  }
+}
+  }
+
+  @Override
+  public void purgeCache() throws IOException {
+this.populateQueryCache(serializer, decomposer);
+lastPurged = System.nanoTime();
+  }
+
+  @Override
+  void purgeCache(CachePopulator populator) throws IOException {
+manager.maybeRefresh();

Review comment:
   should I call `maybeRefreshBlocking` instead?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #686: LUCENE-10421: use Constant instead of relying upon timestamp

2022-02-23 Thread GitBox



jpountz commented on pull request #686:
URL: https://github.com/apache/lucene/pull/686#issuecomment-1048810781


   Hard not to think of [this XKCD](https://xkcd.com/221/). :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang opened a new pull request #704: LUCENE-10435: add CHANGES.txt entry

2022-02-23 Thread GitBox



LuXugang opened a new pull request #704:
URL: https://github.com/apache/lucene/pull/704


   Add entry for [#701](https://github.com/apache/lucene/pull/701)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #696: Custom GitHub action to generate build matrix

2022-02-23 Thread GitBox



mocobeta commented on pull request #696:
URL: https://github.com/apache/lucene/pull/696#issuecomment-104455


   I didn't explain the implementation so it might be hard to figure out what's 
happening here at the first glance; the matrix value is propagated from the 
custom action to each job (somewhat like Unix pipe) by the combination of 
several built-in functions. I'd just like to leave the references to them.
   - [set-output shell 
command](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-output-parameter)
   - [composite action 
outputs](https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#outputs-for-composite-actions)
   - [job 
outputs](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idoutputs)
   - [needs 
context](https://docs.github.com/en/actions/learn-github-actions/contexts#needs-context)
   - [fromJson 
expression](https://docs.github.com/en/actions/learn-github-actions/expressions#fromjson)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mocobeta commented on pull request #696: Custom GitHub action to generate build matrix

2022-02-23 Thread GitBox



mocobeta commented on pull request #696:
URL: https://github.com/apache/lucene/pull/696#issuecomment-1048944204


   This is a kind of exercise to see how the Actions runner works to me, but we 
wouldn't need it unless we have a lot more jobs to run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #699: LUCENE-10054: Make sure to use Lucene90 codec in unit tests

2022-02-23 Thread GitBox



jtibshirani merged pull request #699:
URL: https://github.com/apache/lucene/pull/699


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496888#comment-17496888
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit 4364bdd63ef58b2094dc252018d3c027302af4f4 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=4364bdd ]

LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699)

Before we were using the default Lucene91 codec, so we weren't exercising the
old format.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #704: LUCENE-10435: add CHANGES.txt entry

2022-02-23 Thread GitBox



jtibshirani commented on a change in pull request #704:
URL: https://github.com/apache/lucene/pull/704#discussion_r813077353



##
File path: lucene/CHANGES.txt
##
@@ -191,6 +191,9 @@ Improvements
 * LUCENE-10371: Make IndexRearranger able to arrange segment in a determined 
order.
   (Patrick Zhai)
 
+* LUCENE-10435: Break loop early while checking whether 
DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery.

Review comment:
   Maybe we could combine this with the original changes entry (also in 
9.1) to keep it simple. The original entry would become:
   
   ```
   * LUCENE-10084, LUCENE-10435: Rewrite DocValuesFieldExistsQuery to 
MatchAllDocsQuery whenever
 terms or points have a docCount that is equal to maxDoc. (Vigya Sharma, Lu 
Xugang)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jtibshirani commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813083832



##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
##
@@ -186,6 +188,7 @@ private NeighborQueue searchLevel(
 }
   }
 }
+numVisited++;

Review comment:
   I started to think of visiting as both performing a similarity 
calculation and potentially updating the queue. I'm happy to move it back there 
though, your perspective makes sense too. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jtibshirani commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813084279



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java
##
@@ -821,6 +822,64 @@ public void testRandom() throws Exception {
 }
   }
 
+  public void testSearchWithVisitedLimit() throws Exception {
+IndexWriterConfig iwc = newIndexWriterConfig();
+String fieldName = "field";
+try (Directory dir = newDirectory();
+IndexWriter iw = new IndexWriter(dir, iwc)) {
+  int numDoc = atLeast(300);
+  int dimension = atLeast(10);
+  for (int i = 0; i < numDoc; i++) {
+int id = random().nextInt(numDoc);

Review comment:
   Oops, I did not mean to do this (copied this from another test).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang commented on a change in pull request #704: LUCENE-10435: add CHANGES.txt entry

2022-02-23 Thread GitBox



LuXugang commented on a change in pull request #704:
URL: https://github.com/apache/lucene/pull/704#discussion_r813093098



##
File path: lucene/CHANGES.txt
##
@@ -191,6 +191,9 @@ Improvements
 * LUCENE-10371: Make IndexRearranger able to arrange segment in a determined 
order.
   (Patrick Zhai)
 
+* LUCENE-10435: Break loop early while checking whether 
DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery.

Review comment:
   > Maybe we could combine this with the original changes entry (also in 
9.1) to keep it simple. The original entry would become:
   > 
   > ```
   > * LUCENE-10084, LUCENE-10435: Rewrite DocValuesFieldExistsQuery to 
MatchAllDocsQuery whenever
   >   terms or points have a docCount that is equal to maxDoc. (Vigya Sharma, 
Lu Xugang)
   > ```
   
   OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jtibshirani commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813116025



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java
##
@@ -821,6 +822,64 @@ public void testRandom() throws Exception {
 }
   }
 
+  public void testSearchWithVisitedLimit() throws Exception {
+IndexWriterConfig iwc = newIndexWriterConfig();
+String fieldName = "field";
+try (Directory dir = newDirectory();
+IndexWriter iw = new IndexWriter(dir, iwc)) {
+  int numDoc = atLeast(300);
+  int dimension = atLeast(10);
+  for (int i = 0; i < numDoc; i++) {
+int id = random().nextInt(numDoc);
+float[] value;
+if (random().nextInt(7) != 3) {
+  // usually index a vector value for a doc
+  value = randomVector(dimension);
+} else {
+  value = null;
+}
+add(iw, fieldName, id, value, VectorSimilarityFunction.EUCLIDEAN);
+  }
+  iw.forceMerge(1);
+
+  // randomly delete some documents
+  for (int i = 0; i < 30; i++) {
+int idToDelete = random().nextInt(numDoc);
+iw.deleteDocuments(new Term("id", Integer.toString(idToDelete)));
+  }
+
+  try (IndexReader reader = DirectoryReader.open(iw)) {
+for (LeafReaderContext ctx : reader.leaves()) {
+  Bits liveDocs = ctx.reader().getLiveDocs();
+  VectorValues vectorValues = ctx.reader().getVectorValues(fieldName);
+  if (vectorValues == null) {
+continue;
+  }
+
+  // check the limit is hit when it's very small
+  int k = 5 + random().nextInt(45);
+  int visitedLimit = k + random().nextInt(5);
+  TopDocs results =
+  ctx.reader()
+  .searchNearestVectors(
+  fieldName, randomVector(dimension), k, liveDocs, 
visitedLimit);
+  assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, 
results.totalHits.relation);

Review comment:
   Right, this isn't a behavioral guarantee. I was seeing this test just as 
providing important signal (catching if you forgot to implement it, making sure 
it works in non-edge case scenarios).
   
   I ran it a ton of times locally and didn't see failures. I'll keep an eye on 
builds -- if we encounter a failure because of the random data generation, an 
option is to switch to using a fixed set of vectors.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10054) Handle hierarchy in HNSW graph

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496910#comment-17496910
 ] 

ASF subversion and git services commented on LUCENE-10054:
--

Commit 458fb1abed45e2b7605b3e89d20ec0709ca755fd in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=458fb1a ]

LUCENE-10054: Make sure to use Lucene90 codec in unit tests (#699)

Before we were using the default Lucene91 codec, so we weren't exercising the
old format.

> Handle hierarchy in HNSW graph
> --
>
> Key: LUCENE-10054
> URL: https://issues.apache.org/jira/browse/LUCENE-10054
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mayya Sharipova
>Priority: Major
> Fix For: 9.1
>
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> Currently HNSW graph is represented as a single layer graph. 
>  We would like to extend it to handle hierarchy as per 
> [discussion|https://issues.apache.org/jira/browse/LUCENE-9004?focusedCommentId=17393216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17393216].
>  
>  
> TODO tasks:
> - add multiple layers in the HnswGraph class
>  - modify the format in  Lucene90HnswVectorsWriter and 
> Lucene90HnswVectorsReader to handle multiple layers
> - modify graph construction and search algorithm to handle hierarchy
>  - run benchmarks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jpountz commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813130011



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java
##
@@ -821,6 +822,64 @@ public void testRandom() throws Exception {
 }
   }
 
+  public void testSearchWithVisitedLimit() throws Exception {
+IndexWriterConfig iwc = newIndexWriterConfig();
+String fieldName = "field";
+try (Directory dir = newDirectory();
+IndexWriter iw = new IndexWriter(dir, iwc)) {
+  int numDoc = atLeast(300);
+  int dimension = atLeast(10);
+  for (int i = 0; i < numDoc; i++) {
+int id = random().nextInt(numDoc);
+float[] value;
+if (random().nextInt(7) != 3) {
+  // usually index a vector value for a doc
+  value = randomVector(dimension);
+} else {
+  value = null;
+}
+add(iw, fieldName, id, value, VectorSimilarityFunction.EUCLIDEAN);
+  }
+  iw.forceMerge(1);
+
+  // randomly delete some documents
+  for (int i = 0; i < 30; i++) {
+int idToDelete = random().nextInt(numDoc);
+iw.deleteDocuments(new Term("id", Integer.toString(idToDelete)));
+  }
+
+  try (IndexReader reader = DirectoryReader.open(iw)) {
+for (LeafReaderContext ctx : reader.leaves()) {
+  Bits liveDocs = ctx.reader().getLiveDocs();
+  VectorValues vectorValues = ctx.reader().getVectorValues(fieldName);
+  if (vectorValues == null) {
+continue;
+  }
+
+  // check the limit is hit when it's very small
+  int k = 5 + random().nextInt(45);
+  int visitedLimit = k + random().nextInt(5);
+  TopDocs results =
+  ctx.reader()
+  .searchNearestVectors(
+  fieldName, randomVector(dimension), k, liveDocs, 
visitedLimit);
+  assertEquals(TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO, 
results.totalHits.relation);

Review comment:
   This sounds ok to me, I was mostly curious to better understand how you 
were thinking of it. We can revisit when/if we see failures with a new codec or 
other changes. It's not the only place where we are doing things like that. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10435) Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496920#comment-17496920
 ] 

ASF subversion and git services commented on LUCENE-10435:
--

Commit 701e40132be16e58d1e15bc33835926d59d21faa in lucene's branch 
refs/heads/branch_9x from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=701e401 ]

LUCENE-10435: Break loop early while checking whether DocValuesFieldExistsQuery 
can be rewrite to MatchAllDocsQuery (#701)



> Break loop early while checking whether DocValuesFieldExistsQuery can be 
> rewrite to MatchAllDocsQuery
> -
>
> Key: LUCENE-10435
> URL: https://issues.apache.org/jira/browse/LUCENE-10435
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one 
> Segment can't match the condition occurs, maybe we should break loop directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jtibshirani commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813130908



##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
##
@@ -155,18 +159,13 @@ private NeighborQueue searchLevel(
 if (results.size() >= topK) {
   bound.set(results.topScore());
 }
-while (candidates.size() > 0) {
+while (candidates.size() > 0 && results.incomplete() == false) {

Review comment:
   The 'break' only breaks out of the inner while loop, which checks the 
neighbors list. This check is required so that when that break happens, we exit 
the outer loop too. I preferred to do this rather than using a labelled while 
statement, which can catch people by surprise. Did I understand your comment 
correctly?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jpountz commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813134067



##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
##
@@ -155,18 +159,13 @@ private NeighborQueue searchLevel(
 if (results.size() >= topK) {
   bound.set(results.topScore());
 }
-while (candidates.size() > 0) {
+while (candidates.size() > 0 && results.incomplete() == false) {

Review comment:
   Ah, thanks for clarifying, I had missed that the break was now under the 
inner while loop. Let's keep things as they are in your PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jpountz commented on a change in pull request #700:
URL: https://github.com/apache/lucene/pull/700#discussion_r813135325



##
File path: 
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java
##
@@ -186,6 +188,7 @@ private NeighborQueue searchLevel(
 }
   }
 }
+numVisited++;

Review comment:
   I'm happy either way, this makes me wonder if renaming this variable 
could help, e.g. `numCandidates` or `numSimilarityComparisons`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10438) Leverage Weight#count in lucene/facets

2022-02-23 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497020#comment-17497020
 ] 

Greg Miller commented on LUCENE-10438:
--

Interesting thought. So off the top of my head, it seems like this could be 
useful when the user wants counts for specific values in the "browse" case? In 
that situation, {{Weight#count}} for the requested value could be used. This 
would require deferring the aggregation computation (i.e., we'd have to stop 
eagerly computing aggregations during initialization). Is that sort of what 
you're thinking [~jpountz] , or is there more to this that I'm not considering?

> Leverage Weight#count in lucene/facets
> --
>
> Key: LUCENE-10438
> URL: https://issues.apache.org/jira/browse/LUCENE-10438
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for 
> the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10438) Leverage Weight#count in lucene/facets

2022-02-23 Thread Greg Miller (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-10438:
-
Component/s: modules/facet

> Leverage Weight#count in lucene/facets
> --
>
> Key: LUCENE-10438
> URL: https://issues.apache.org/jira/browse/LUCENE-10438
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/facet
>Reporter: Adrien Grand
>Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for 
> the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10438) Leverage Weight#count in lucene/facets

2022-02-23 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497023#comment-17497023
 ] 

Adrien Grand commented on LUCENE-10438:
---

I believe that it is mostly applicable to term and range faceting. Each bucket 
can be expressed as a filter (TermQuery for terms facets and PointRangeQuery 
for range facets) and the faceting framework would detect when the query 
matches all documents (so that the filters that describe the buckets do not 
need to be intersected with anything) in order to compute counts for the 
buckets by calling Weight#count for each of these filters. For terms it would 
just read the document frequency from the inverted index and for ranges it 
would use the new logic we added that just counts the number of matches on the 
two leaves that intersect the query and count how many documents belong to 
leaves that are fully contained by the query.

I'm not familiar enough with the faceting framework to comment on how best to 
fold this into the existing logic, but this would need to run before matches 
are loaded into a bitset indeed.

> Leverage Weight#count in lucene/facets
> --
>
> Key: LUCENE-10438
> URL: https://issues.apache.org/jira/browse/LUCENE-10438
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/facet
>Reporter: Adrien Grand
>Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for 
> the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #700: LUCENE-10382: Ensure kNN filtering works with other codecs

2022-02-23 Thread GitBox



jtibshirani merged pull request #700:
URL: https://github.com/apache/lucene/pull/700


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497061#comment-17497061
 ] 

ASF subversion and git services commented on LUCENE-10382:
--

Commit b40a750aa8c0cc05291d8d8673d9d068d078d2de in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=b40a750 ]

LUCENE-10382: Ensure kNN filtering works with other codecs (#700)

The original PR that added kNN filtering support overlooked non-default codecs.
This follow-up ensures that other codecs work with the new filtering logic:
* Make sure to check the visited nodes limit in `SimpleTextKnnVectorsReader`
and `Lucene90HnswVectorsReader`
* Add a test `BaseKnnVectorsFormatTestCase` to cover this case
* Fix failures in `TestKnnVectorQuery#testRandomWithFilter`, whose assumptions
don't hold when SimpleText is used

This PR also clarifies the limit checking logic for
`Lucene91HnswVectorsReader`. Now we always check the limit before visiting a
new node, whereas before we only checked it in an outer loop.

> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497066#comment-17497066
 ] 

ASF subversion and git services commented on LUCENE-10382:
--

Commit 29d4adfe60368c0159cd0accd53efba77ca11771 in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=29d4adf ]

LUCENE-10382: Ensure kNN filtering works with other codecs (#700)

The original PR that added kNN filtering support overlooked non-default codecs.
This follow-up ensures that other codecs work with the new filtering logic:
* Make sure to check the visited nodes limit in `SimpleTextKnnVectorsReader`
and `Lucene90HnswVectorsReader`
* Add a test `BaseKnnVectorsFormatTestCase` to cover this case
* Fix failures in `TestKnnVectorQuery#testRandomWithFilter`, whose assumptions
don't hold when SimpleText is used

This PR also clarifies the limit checking logic for
`Lucene91HnswVectorsReader`. Now we always check the limit before visiting a
new node, whereas before we only checked it in an outer loop.

> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10438) Leverage Weight#count in lucene/facets

2022-02-23 Thread Greg Miller (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497068#comment-17497068
 ] 

Greg Miller commented on LUCENE-10438:
--

Thanks [~jpountz]. Yep, that's exactly was I was thinking (but didn't describe 
as well). I like the idea of experimenting with this. I also think this idea 
could be applicable in some "non-browse" cases as well where the user knows the 
values they want counted. There could be some cases where it's more efficient 
to actually intersect those queries with the match set than to accumulate 
counts for all values. I think Solr might have a version of faceting that does 
this? Basically cases where the user knows the values they want counts for 
ahead of time (as opposed to a "top-n" type request), and the number of 
distinct values they want counted is much smaller than the overall cardinality 
of the faceting field.

> Leverage Weight#count in lucene/facets
> --
>
> Key: LUCENE-10438
> URL: https://issues.apache.org/jira/browse/LUCENE-10438
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/facet
>Reporter: Adrien Grand
>Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for 
> the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jtibshirani merged pull request #704: LUCENE-10435: add CHANGES.txt entry

2022-02-23 Thread GitBox



jtibshirani merged pull request #704:
URL: https://github.com/apache/lucene/pull/704


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-10438) Leverage Weight#count in lucene/facets

2022-02-23 Thread Greg Miller (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-10438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller reassigned LUCENE-10438:


Assignee: Greg Miller

> Leverage Weight#count in lucene/facets
> --
>
> Key: LUCENE-10438
> URL: https://issues.apache.org/jira/browse/LUCENE-10438
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/facet
>Reporter: Adrien Grand
>Assignee: Greg Miller
>Priority: Minor
>
> The facet module could leverage Weight#count in order to give fast counts for 
> the browsing use-case?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10435) Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497069#comment-17497069
 ] 

ASF subversion and git services commented on LUCENE-10435:
--

Commit 7ec89603e388cbb01db7a44f2694d61cfacbe6d6 in lucene's branch 
refs/heads/main from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=7ec8960 ]

LUCENE-10435: add CHANGES.txt entry (#704)



> Break loop early while checking whether DocValuesFieldExistsQuery can be 
> rewrite to MatchAllDocsQuery
> -
>
> Key: LUCENE-10435
> URL: https://issues.apache.org/jira/browse/LUCENE-10435
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one 
> Segment can't match the condition occurs, maybe we should break loop directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10435) Break loop early while checking whether DocValuesFieldExistsQuery can be rewrite to MatchAllDocsQuery

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497071#comment-17497071
 ] 

ASF subversion and git services commented on LUCENE-10435:
--

Commit 5aab8a8e40f3d68c82270b1195c3394d9dde87f0 in lucene's branch 
refs/heads/branch_9x from Lu Xugang
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=5aab8a8 ]

LUCENE-10435: add CHANGES.txt entry (#704)



> Break loop early while checking whether DocValuesFieldExistsQuery can be 
> rewrite to MatchAllDocsQuery
> -
>
> Key: LUCENE-10435
> URL: https://issues.apache.org/jira/browse/LUCENE-10435
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Lu Xugang
>Priority: Trivial
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In the implementation of Query#rewrite in DocValuesFieldExistsQuery, when one 
> Segment can't match the condition occurs, maybe we should break loop directly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497159#comment-17497159
 ] 

ASF subversion and git services commented on LUCENE-10382:
--

Commit d9c2e46824c8b5be8f471da6ce291e908cc58955 in lucene's branch 
refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=d9c2e46 ]

LUCENE-10382: Fix testSearchWithVisitedLimit failures


> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10382) Allow KnnVectorQuery to operate over a subset of liveDocs

2022-02-23 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497161#comment-17497161
 ] 

ASF subversion and git services commented on LUCENE-10382:
--

Commit a3b136573fcb2a1e61dd70519708a5ef36d20eb8 in lucene's branch 
refs/heads/branch_9x from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a3b1365 ]

LUCENE-10382: Fix testSearchWithVisitedLimit failures


> Allow KnnVectorQuery to operate over a subset of liveDocs
> -
>
> Key: LUCENE-10382
> URL: https://issues.apache.org/jira/browse/LUCENE-10382
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: 9.0
>Reporter: Joel Bernstein
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently the KnnVectorQuery selects the top K vectors from all live docs.  
> This ticket will change the interface to make it possible for the top K 
> vectors to be selected from a subset of the live docs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10439) Support multi-valued and multiple dimensions for count query in PointRangeQuery

2022-02-23 Thread Lu Xugang (Jira)

Lu Xugang created LUCENE-10439:
--

 Summary: Support multi-valued and multiple dimensions for count 
query in PointRangeQuery
 Key: LUCENE-10439
 URL: https://issues.apache.org/jira/browse/LUCENE-10439
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Lu Xugang


Follow-up of LUCENE-10424, it also works with fields that have multiple 
dimensions and/or that are multi-valued.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang opened a new pull request #705: LUCENE-10439: Support multi-valued and multiple dimensions for count query in PointRangeQuery

2022-02-23 Thread GitBox



LuXugang opened a new pull request #705:
URL: https://github.com/apache/lucene/pull/705


   Follow-up of 
[LUCENE-10424](https://issues.apache.org/jira/browse/LUCENE-10424), it also 
works with fields that have multiple dimensions and/or that are multi-valued.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

54 matches

Mail list logo