[GitHub] [lucene] mocobeta commented on a change in pull request #207: LUCENE-9855: Rename nn search vector format
mocobeta commented on a change in pull request #207: URL: https://github.com/apache/lucene/pull/207#discussion_r667338802 ## File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java ## @@ -2284,28 +2284,28 @@ static void checkImpacts(Impacts impacts, int lastTarget) { * * @lucene.experimental */ - public static Status.VectorValuesStatus testVectors( + public static Status.NnVectorsStatus testVectors( CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException { if (infoStream != null) { infoStream.print("test: vectors.."); Review comment: Fixed in https://github.com/apache/lucene/pull/207/commits/4d301c580470a27993393ec50fef258040cfbe6c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #207: LUCENE-9855: Rename nn search vector format
mocobeta commented on a change in pull request #207: URL: https://github.com/apache/lucene/pull/207#discussion_r667338807 ## File path: lucene/core/src/java/org/apache/lucene/index/CheckIndex.java ## @@ -2284,28 +2284,28 @@ static void checkImpacts(Impacts impacts, int lastTarget) { * * @lucene.experimental */ - public static Status.VectorValuesStatus testVectors( + public static Status.NnVectorsStatus testVectors( CodecReader reader, PrintStream infoStream, boolean failFast) throws IOException { if (infoStream != null) { infoStream.print("test: vectors.."); } long startNS = System.nanoTime(); FieldInfos fieldInfos = reader.getFieldInfos(); -Status.VectorValuesStatus status = new Status.VectorValuesStatus(); +Status.NnVectorsStatus status = new Status.NnVectorsStatus(); try { - if (fieldInfos.hasVectorValues()) { + if (fieldInfos.hasNnVectors()) { for (FieldInfo fieldInfo : fieldInfos) { - if (fieldInfo.hasVectorValues()) { -int dimension = fieldInfo.getVectorDimension(); + if (fieldInfo.hasNnVectors()) { +int dimension = fieldInfo.getNnVectorDimension(); if (dimension <= 0) { throw new RuntimeException( "Field \"" + fieldInfo.name + "\" has vector values but dimension is " Review comment: Fixed in https://github.com/apache/lucene/pull/207/commits/4d301c580470a27993393ec50fef258040cfbe6c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on a change in pull request #207: LUCENE-9855: Rename nn search vector format
mocobeta commented on a change in pull request #207: URL: https://github.com/apache/lucene/pull/207#discussion_r667339213 ## File path: lucene/CHANGES.txt ## @@ -7,7 +7,7 @@ http://s.apache.org/luceneversions New Features -* LUCENE-9322: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie Tibshirani, Tomoko Uchida) +* LUCENE-9322 LUCENE-9855: Vector-valued fields, Lucene90 Codec (Mike Sokolov, Julie Tibshirani, Tomoko Uchida) Review comment: I think this can be grouped with LUCENE-9322? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mocobeta commented on pull request #207: LUCENE-9855: Rename nn search vector format
mocobeta commented on pull request #207: URL: https://github.com/apache/lucene/pull/207#issuecomment-877639170 Thanks @msokolov for reviewing. I'll wait for a few more days to let others give comments on this then merge it to the upstream if there is no another feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov opened a new pull request #210: LUCENE-10016: remove fanout parameter from nearest neighbor vector search
msokolov opened a new pull request #210: URL: https://github.com/apache/lucene/pull/210 I think we can remove this parameter to simplify the nn search api and make it more generic. For the HNSW algorithm, searching for top K=M and fanout=N is basically the same as searching with top K=M+N and then keeping the top M, so callers can easily implement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-10016) VectorReader.search needs rethought, o.a.l.search integration?
[ https://issues.apache.org/jira/browse/LUCENE-10016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378489#comment-17378489 ] Michael Sokolov commented on LUCENE-10016: -- I posted a PR removing fanout. For the question of how to integrate with "regular" search, handle deletions, etc, let's track over in LUCENE-9614 > VectorReader.search needs rethought, o.a.l.search integration? > -- > > Key: LUCENE-10016 > URL: https://issues.apache.org/jira/browse/LUCENE-10016 > Project: Lucene - Core > Issue Type: Task >Reporter: Robert Muir >Priority: Blocker > Fix For: 9.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There's no search integration (e.g. queries) for the current vector values, > no documentation/examples that I can find. > Instead the codec has this method: > {code} > TopDocs search(String field, float[] target, int k, int fanout) > {code} > First, the "fanout" parameter needs to go, this is specific to HNSW impl, get > it out of here. > Second, How am I supposed to skip over deleted documents? How can I use > filters? How should i search across multiple segments? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9614) Implement KNN Query
[ https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378497#comment-17378497 ] Michael Sokolov commented on LUCENE-9614: - Doing nn vector search during rewrite has one significant drawback, which is that rewrite() cannot make use of IndexSearcher's executor to perform concurrent searches across segments (or slices), whereas an implementation that does the search in createWeight will naturally get executed concurrently when IndexSearcher is configured for that. To fix that would require some substantial change to pass an executor to Query.rewrite, which seems kind of overkill at this point. Instead, perhaps we can implement the `createWeight` version that supports concurrency and define `equals` and `hashCode` to use object identity in order to prevent spurious caching. > Implement KNN Query > --- > > Key: LUCENE-9614 > URL: https://issues.apache.org/jira/browse/LUCENE-9614 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > > Now we have a vector index format, and one vector indexing/KNN search > implementation, but the interface is low-level: you can search across a > single segment only. We would like to expose a Query implementation. > Initially, we want to support a usage where the KnnVectorQuery selects the > k-nearest neighbors without regard to any other constraints, and these can > then be filtered as part of an enclosing Boolean or other query. > Later we will want to explore some kind of filtering *while* performing > vector search, or a re-entrant search process that can yield further results. > Because of the nature of knn search (all documents having any vector value > match), it is more like a ranking than a filtering operation, and it doesn't > really make sense to provide an iterator interface that can be merged in the > usual way, in docid order, skipping ahead. It's not yet clear how to satisfy > a query that is "k nearest neighbors satsifying some arbitrary Query", at > least not without realizing a complete bitset for the Query. But this is for > a later issue; *this* issue is just about performing the knn search in > isolation, computing a set of (some given) K nearest neighbors, and providing > an iterator over those. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9614) Implement KNN Query
[ https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378497#comment-17378497 ] Michael Sokolov edited comment on LUCENE-9614 at 7/10/21, 5:03 PM: --- Doing nn vector search during rewrite has one significant drawback, which is that {{rewrite()}} cannot make use of IndexSearcher's executor to perform concurrent searches across segments (or slices), whereas an implementation that does the search in createWeight will naturally get executed concurrently when IndexSearcher is configured for that. To fix that would require some substantial change to pass an executor to {{Query.rewrite}}, which seems kind of overkill at this point. Instead, perhaps we can implement the {{createWeight}} version that supports concurrency and define {{equals(Object)}} and {{hashCode()}} to use object identity in order to prevent spurious caching. was (Author: sokolov): Doing nn vector search during rewrite has one significant drawback, which is that rewrite() cannot make use of IndexSearcher's executor to perform concurrent searches across segments (or slices), whereas an implementation that does the search in createWeight will naturally get executed concurrently when IndexSearcher is configured for that. To fix that would require some substantial change to pass an executor to Query.rewrite, which seems kind of overkill at this point. Instead, perhaps we can implement the `createWeight` version that supports concurrency and define `equals` and `hashCode` to use object identity in order to prevent spurious caching. > Implement KNN Query > --- > > Key: LUCENE-9614 > URL: https://issues.apache.org/jira/browse/LUCENE-9614 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > > Now we have a vector index format, and one vector indexing/KNN search > implementation, but the interface is low-level: you can search across a > single segment only. We would like to expose a Query implementation. > Initially, we want to support a usage where the KnnVectorQuery selects the > k-nearest neighbors without regard to any other constraints, and these can > then be filtered as part of an enclosing Boolean or other query. > Later we will want to explore some kind of filtering *while* performing > vector search, or a re-entrant search process that can yield further results. > Because of the nature of knn search (all documents having any vector value > match), it is more like a ranking than a filtering operation, and it doesn't > really make sense to provide an iterator interface that can be merged in the > usual way, in docid order, skipping ahead. It's not yet clear how to satisfy > a query that is "k nearest neighbors satsifying some arbitrary Query", at > least not without realizing a complete bitset for the Query. But this is for > a later issue; *this* issue is just about performing the knn search in > isolation, computing a set of (some given) K nearest neighbors, and providing > an iterator over those. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zacharymorn commented on pull request #180: LUCENE-9959: Add non thread local based API for term vector reader usage
zacharymorn commented on pull request #180: URL: https://github.com/apache/lucene/pull/180#issuecomment-877676142 > Yes; javadocs will need to warn people. This is _already_ a trap, not a new one. The method `org.apache.lucene.index.IndexReader#getTermVector` is tempting but can be bad for performance unless you only ever need one field's TV Terms. Makes sense. Thanks for the approval! I'll wait for a few more days before merging, in case other folks may have further feedback on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cpoerschke commented on pull request #201: LUCENE-8638: remove deprecated FixBrokenOffsetsFilter[Factory] classes
cpoerschke commented on pull request #201: URL: https://github.com/apache/lucene/pull/201#issuecomment-877708925 Thanks @msokolov for copying across the comments from the JIRA! I'll convert the pull request to "draft" to indicate the status of the deprecation process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cpoerschke commented on pull request #202: LUCENE-8682: remove deprecated WordDelimiterFilter[Factory] classes
cpoerschke commented on pull request #202: URL: https://github.com/apache/lucene/pull/202#issuecomment-877709067 Thanks @msokolov for copying across the comments from the JIRA! I'll convert the pull request to "draft" to indicate the status of the deprecation process. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #2531: SOLR-15526 Use new cluster for each LeaderTragicEvent test
madrob merged pull request #2531: URL: https://github.com/apache/lucene-solr/pull/2531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org