[GitHub] [lucene] mocobeta commented on a change in pull request #207: LUCENE-9855: Rename nn search vector format

2021-07-10 Thread GitBox


mocobeta commented on a change in pull request #207:
URL: https://github.com/apache/lucene/pull/207#discussion_r667338802



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -2284,28 +2284,28 @@ static void checkImpacts(Impacts impacts, int 
lastTarget) {
*
* @lucene.experimental
*/
-  public static Status.VectorValuesStatus testVectors(
+  public static Status.NnVectorsStatus testVectors(
   CodecReader reader, PrintStream infoStream, boolean failFast) throws 
IOException {
 if (infoStream != null) {
   infoStream.print("test: vectors..");

Review comment:
   Fixed in 
https://github.com/apache/lucene/pull/207/commits/4d301c580470a27993393ec50fef258040cfbe6c




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #207: LUCENE-9855: Rename nn search vector format

2021-07-10 Thread GitBox


mocobeta commented on a change in pull request #207:
URL: https://github.com/apache/lucene/pull/207#discussion_r667338807



##
File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
##
@@ -2284,28 +2284,28 @@ static void checkImpacts(Impacts impacts, int 
lastTarget) {
*
* @lucene.experimental
*/
-  public static Status.VectorValuesStatus testVectors(
+  public static Status.NnVectorsStatus testVectors(
   CodecReader reader, PrintStream infoStream, boolean failFast) throws 
IOException {
 if (infoStream != null) {
   infoStream.print("test: vectors..");
 }
 long startNS = System.nanoTime();
 FieldInfos fieldInfos = reader.getFieldInfos();
-Status.VectorValuesStatus status = new Status.VectorValuesStatus();
+Status.NnVectorsStatus status = new Status.NnVectorsStatus();
 try {
 
-  if (fieldInfos.hasVectorValues()) {
+  if (fieldInfos.hasNnVectors()) {
 for (FieldInfo fieldInfo : fieldInfos) {
-  if (fieldInfo.hasVectorValues()) {
-int dimension = fieldInfo.getVectorDimension();
+  if (fieldInfo.hasNnVectors()) {
+int dimension = fieldInfo.getNnVectorDimension();
 if (dimension <= 0) {
   throw new RuntimeException(
   "Field \""
   + fieldInfo.name
   + "\" has vector values but dimension is "

Review comment:
   Fixed in 
https://github.com/apache/lucene/pull/207/commits/4d301c580470a27993393ec50fef258040cfbe6c




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on a change in pull request #207: LUCENE-9855: Rename nn search vector format

2021-07-10 Thread GitBox


mocobeta commented on a change in pull request #207:
URL: https://github.com/apache/lucene/pull/207#discussion_r667339213



##
File path: lucene/CHANGES.txt
##
@@ -7,7 +7,7 @@ http://s.apache.org/luceneversions
 
 New Features
 
-* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie 
Tibshirani, Tomoko Uchida)
+* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, 
Julie Tibshirani, Tomoko Uchida)

Review comment:
   I think this can be grouped with LUCENE-9322?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mocobeta commented on pull request #207: LUCENE-9855: Rename nn search vector format

2021-07-10 Thread GitBox


mocobeta commented on pull request #207:
URL: https://github.com/apache/lucene/pull/207#issuecomment-877639170


   Thanks @msokolov for reviewing. I'll wait for a few more days to let others 
give comments on this then merge it to the upstream if there is no another 
feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov opened a new pull request #210: LUCENE-10016: remove fanout parameter from nearest neighbor vector search

2021-07-10 Thread GitBox


msokolov opened a new pull request #210:
URL: https://github.com/apache/lucene/pull/210


   I think we can remove this parameter to simplify the nn search api and make 
it more generic. For the HNSW algorithm, searching for top K=M and fanout=N is 
basically the same as searching with top K=M+N and then keeping the top M, so 
callers can easily implement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?

2021-07-10 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378489#comment-17378489
 ] 

Michael Sokolov commented on LUCENE-10016:
--

I posted a PR removing fanout. For the question of how to integrate with 
"regular" search, handle deletions, etc, let's track over in LUCENE-9614

> VectorReader.search needs rethought, o.a.l.search integration?
> --
>
> Key: LUCENE-10016
> URL: https://issues.apache.org/jira/browse/LUCENE-10016
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 9.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's no search integration (e.g. queries) for the current vector values, 
> no documentation/examples that I can find.
> Instead the codec has this method:
> {code}
> TopDocs search(String field, float[] target, int k, int fanout)
> {code}
> First, the "fanout" parameter needs to go, this is specific to HNSW impl, get 
> it out of here.
> Second, How am I supposed to skip over deleted documents? How can I use 
> filters? How should i search across multiple segments?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9614) Implement KNN Query

2021-07-10 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378497#comment-17378497
 ] 

Michael Sokolov commented on LUCENE-9614:
-

Doing nn vector search during rewrite has one significant drawback, which is 
that rewrite() cannot make use of IndexSearcher's executor to perform 
concurrent searches across segments (or slices), whereas an implementation that 
does the search in createWeight will naturally get executed concurrently when 
IndexSearcher is configured for that.

To fix that would require some substantial change to pass an executor to 
Query.rewrite, which seems kind of overkill at this point. Instead, perhaps we 
can implement the `createWeight` version that supports concurrency and define 
`equals` and `hashCode` to use object identity in order to prevent spurious 
caching.

> Implement KNN Query
> ---
>
> Key: LUCENE-9614
> URL: https://issues.apache.org/jira/browse/LUCENE-9614
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>
> Now we have a vector index format, and one vector indexing/KNN search 
> implementation, but the interface is low-level: you can search across a 
> single segment only. We would like to expose a Query implementation. 
> Initially, we want to support a usage where the KnnVectorQuery selects the 
> k-nearest neighbors without regard to any other constraints, and these can 
> then be filtered as part of an enclosing Boolean or other query.
> Later we will want to explore some kind of filtering *while* performing 
> vector search, or a re-entrant search process that can yield further results. 
> Because of the nature of knn search (all documents having any vector value 
> match), it is more like a ranking than a filtering operation, and it doesn't 
> really make sense to provide an iterator interface that can be merged in the 
> usual way, in docid order, skipping ahead. It's not yet clear how to satisfy 
> a query that is "k nearest neighbors satsifying some arbitrary Query", at 
> least not without realizing a complete bitset for the Query. But this is for 
> a later issue; *this* issue is just about performing the knn search in 
> isolation, computing a set of (some given) K nearest neighbors, and providing 
> an iterator over those.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9614) Implement KNN Query

2021-07-10 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378497#comment-17378497
 ] 

Michael Sokolov edited comment on LUCENE-9614 at 7/10/21, 5:03 PM:
---

Doing nn vector search during rewrite has one significant drawback, which is 
that {{rewrite()}} cannot make use of IndexSearcher's executor to perform 
concurrent searches across segments (or slices), whereas an implementation that 
does the search in createWeight will naturally get executed concurrently when 
IndexSearcher is configured for that.

To fix that would require some substantial change to pass an executor to 
{{Query.rewrite}}, which seems kind of overkill at this point. Instead, perhaps 
we can implement the {{createWeight}} version that supports concurrency and 
define {{equals(Object)}} and {{hashCode()}} to use object identity in order to 
prevent spurious caching.


was (Author: sokolov):
Doing nn vector search during rewrite has one significant drawback, which is 
that rewrite() cannot make use of IndexSearcher's executor to perform 
concurrent searches across segments (or slices), whereas an implementation that 
does the search in createWeight will naturally get executed concurrently when 
IndexSearcher is configured for that.

To fix that would require some substantial change to pass an executor to 
Query.rewrite, which seems kind of overkill at this point. Instead, perhaps we 
can implement the `createWeight` version that supports concurrency and define 
`equals` and `hashCode` to use object identity in order to prevent spurious 
caching.

> Implement KNN Query
> ---
>
> Key: LUCENE-9614
> URL: https://issues.apache.org/jira/browse/LUCENE-9614
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael Sokolov
>Priority: Major
>
> Now we have a vector index format, and one vector indexing/KNN search 
> implementation, but the interface is low-level: you can search across a 
> single segment only. We would like to expose a Query implementation. 
> Initially, we want to support a usage where the KnnVectorQuery selects the 
> k-nearest neighbors without regard to any other constraints, and these can 
> then be filtered as part of an enclosing Boolean or other query.
> Later we will want to explore some kind of filtering *while* performing 
> vector search, or a re-entrant search process that can yield further results. 
> Because of the nature of knn search (all documents having any vector value 
> match), it is more like a ranking than a filtering operation, and it doesn't 
> really make sense to provide an iterator interface that can be merged in the 
> usual way, in docid order, skipping ahead. It's not yet clear how to satisfy 
> a query that is "k nearest neighbors satsifying some arbitrary Query", at 
> least not without realizing a complete bitset for the Query. But this is for 
> a later issue; *this* issue is just about performing the knn search in 
> isolation, computing a set of (some given) K nearest neighbors, and providing 
> an iterator over those.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zacharymorn commented on pull request #180: LUCENE-9959: Add non thread local based API for term vector reader usage

2021-07-10 Thread GitBox


zacharymorn commented on pull request #180:
URL: https://github.com/apache/lucene/pull/180#issuecomment-877676142


   > Yes; javadocs will need to warn people. This is _already_ a trap, not a 
new one. The method `org.apache.lucene.index.IndexReader#getTermVector` is 
tempting but can be bad for performance unless you only ever need one field's 
TV Terms.
   
   Makes sense. Thanks for the approval! I'll wait for a few more days before 
merging, in case other folks may have further feedback on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke commented on pull request #201: LUCENE-8638: remove deprecated FixBrokenOffsetsFilter[Factory] classes

2021-07-10 Thread GitBox


cpoerschke commented on pull request #201:
URL: https://github.com/apache/lucene/pull/201#issuecomment-877708925


   Thanks @msokolov for copying across the comments from the JIRA! I'll convert 
the pull request to "draft" to indicate the status of the deprecation process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] cpoerschke commented on pull request #202: LUCENE-8682: remove deprecated WordDelimiterFilter[Factory] classes

2021-07-10 Thread GitBox


cpoerschke commented on pull request #202:
URL: https://github.com/apache/lucene/pull/202#issuecomment-877709067


   Thanks @msokolov for copying across the comments from the JIRA! I'll convert 
the pull request to "draft" to indicate the status of the deprecation process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #2531: SOLR-15526 Use new cluster for each LeaderTragicEvent test

2021-07-10 Thread GitBox


madrob merged pull request #2531:
URL: https://github.com/apache/lucene-solr/pull/2531


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org