[GitHub] [lucene] jpountz commented on a diff in pull request #12111: SimpleText codec to support writing byte vectors

2023-01-25 Thread via GitHub


jpountz commented on code in PR #12111:
URL: https://github.com/apache/lucene/pull/12111#discussion_r1086310610


##
lucene/core/src/java/org/apache/lucene/codecs/BufferingKnnVectorsWriter.java:
##
@@ -39,79 +37,81 @@
  * @lucene.experimental
  */
 public abstract class BufferingKnnVectorsWriter extends KnnVectorsWriter {
-  private final List fields = new ArrayList<>();
+  private final List> fields = new ArrayList<>();
 
   /** Sole constructor */
   protected BufferingKnnVectorsWriter() {}
 
   @Override
-  public KnnFieldVectorsWriter addField(FieldInfo fieldInfo) throws 
IOException {
-FieldWriter newField = new FieldWriter(fieldInfo);
+  public KnnFieldVectorsWriter addField(FieldInfo fieldInfo) throws 
IOException {

Review Comment:
   Unrelated: your change reminds me that I'd rather like to split this method 
in two: `KnnFieldVectorsWriter addFloatField` and 
`KnnFieldVectorsWriter addByteField`. Otherwise we can never take 
advantage of the type safety of generics? This is a codec internal rather than 
something user-facing so we can look into it after 9.5.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna commented on a diff in pull request #12111: SimpleText codec to support writing byte vectors

2023-01-25 Thread via GitHub


javanna commented on code in PR #12111:
URL: https://github.com/apache/lucene/pull/12111#discussion_r1086409971


##
lucene/core/src/java/org/apache/lucene/codecs/BufferingKnnVectorsWriter.java:
##
@@ -39,79 +37,81 @@
  * @lucene.experimental
  */
 public abstract class BufferingKnnVectorsWriter extends KnnVectorsWriter {
-  private final List fields = new ArrayList<>();
+  private final List> fields = new ArrayList<>();
 
   /** Sole constructor */
   protected BufferingKnnVectorsWriter() {}
 
   @Override
-  public KnnFieldVectorsWriter addField(FieldInfo fieldInfo) throws 
IOException {
-FieldWriter newField = new FieldWriter(fieldInfo);
+  public KnnFieldVectorsWriter addField(FieldInfo fieldInfo) throws 
IOException {

Review Comment:
   yes that's a good point and we should probably look at #11758 as a whole and 
see what we can improve further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12111: SimpleText codec to support writing byte vectors

2023-01-25 Thread via GitHub


javanna merged PR #12111:
URL: https://github.com/apache/lucene/pull/12111


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12110: Fix flaky TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults test

2023-01-25 Thread via GitHub


javanna merged PR #12110:
URL: https://github.com/apache/lucene/pull/12110


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna opened a new pull request, #12112: SimpleText codec to support writing byte vectors (#12111)

2023-01-25 Thread via GitHub


javanna opened a new pull request, #12112:
URL: https://github.com/apache/lucene/pull/12112

   A recent test failure signaled that when the simple text codec was randomly 
selected, byte vectors could not be written. This commit addressed that by 
adding support for writing byte vectors to SimpleTextKnnVectorsWriter.
   
   Note that while support is added to the BufferingKnnVectorsWriter base 
class, 90, 91 and 92 writers don't need to support byte vectors and will throw 
unsupported operation exception when attempting to do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] javanna merged pull request #12112: SimpleText codec to support writing byte vectors (#12111)

2023-01-25 Thread via GitHub


javanna merged PR #12112:
URL: https://github.com/apache/lucene/pull/12112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

2023-01-25 Thread via GitHub


alessandrobenedetti commented on PR #12048:
URL: https://github.com/apache/lucene/pull/12048#issuecomment-1403387891

   @msokolov  do you see any reason why we shouldn't do it? because I reviewed 
the pull request Daniele is going to publish (Word2Vec for synonyms generation: 
https://www.youtube.com/watch?v=rKYJQhZxQFQ&t=469s) and having those 
constants in the HNSW Graph builder facilitates the things.
   Furthermore, those constants in terms of responsibility affect how the graph 
is built(more than the codec), so in terms of cohesion of the class, it seems 
reasonable to me for them to be in the HNSW Graph builder.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] msokolov commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

2023-01-25 Thread via GitHub


msokolov commented on PR #12048:
URL: https://github.com/apache/lucene/pull/12048#issuecomment-1403475767

   I would say let's make the changes when we can see the purpose of doing it?
   So, if Daniele is planning to make use of this in an upcoming PR, let's
   just incorporate the change into that?
   
   On Wed, Jan 25, 2023 at 10:22 AM Alessandro Benedetti <
   ***@***.***> wrote:
   
   > @msokolov  do you see any reason why we
   > shouldn't do it? because I reviewed the pull request Daniele is going to
   > publish (Word2Vec for synonyms generation:
   > https://www.youtube.com/watch?v=rKYJQhZxQFQ&t=469s) and having those
   > constants in the HNSW Graph builder facilitates the things.
   > Furthermore, those constants in terms of responsibility affect how the
   > graph is built(more than the codec), so in terms of cohesion of the class,
   > it seems reasonable to me for them to be in the HNSW Graph builder.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > , or
   > unsubscribe
   > 

   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent commented on a diff in pull request #12111: SimpleText codec to support writing byte vectors

2023-01-25 Thread via GitHub


benwtrent commented on code in PR #12111:
URL: https://github.com/apache/lucene/pull/12111#discussion_r1086603790


##
lucene/core/src/java/org/apache/lucene/codecs/BufferingKnnVectorsWriter.java:
##
@@ -161,77 +161,113 @@ public int advance(int target) throws IOException {
 }
   }
 
+  /** Sorting FloatVectorValues that iterate over documents in the order of 
the provided sortMap */

Review Comment:
   ```
 /** Sorting SortingByteVectorValues that iterate over documents in the 
order of the provided sortMap */
   ```
   
   Comment is incorrect. But not a huge deal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] alessandrobenedetti commented on pull request #12048: Move HNSW parameters to the HnswGraphBuilder class

2023-01-25 Thread via GitHub


alessandrobenedetti commented on PR #12048:
URL: https://github.com/apache/lucene/pull/12048#issuecomment-1403736061

   Ok, that's fine! @dantuzi you can cancel this PR and just include the change 
in the final PR.
   Thanks @msokolov for the suggestion!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dnhatn commented on issue #11428: Handle soft deletes via LiveDocsFormat [LUCENE-10392]

2023-01-25 Thread via GitHub


dnhatn commented on issue #11428:
URL: https://github.com/apache/lucene/issues/11428#issuecomment-1404033931

   @zacharymorn Thank you for looking into the issue.
   
   The first proposals of the API: 
https://issues.apache.org/jira/browse/LUCENE-8198 and 
https://issues.apache.org/jira/browse/LUCENE-8200.
   
   > Is that something we could deprecate directly?
   
   Good point, but I don't think we should deprecate it until we figure out how 
to index already-soft-deleted documents.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org