Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-09 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1804243772 Sorry for the late reply. > Since this is a larger API discussion, do we think we can move forward with the way it is now (quantization for HNSW and other vector indices) and itera

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-06 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1795961973 @jimczi thinking about it more, it seems to be a Flat* index format for vector search will require a different API. Right now, kNN search assumes the user provides pre-filters. However

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792854167 > I think we should expose the flat formats in the codec. But the required new functions for indexing the vectors seem to justify a new abstraction. Can we add the abstraction as an

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792686970 @jimczi the HNSWWriter and Readers need the passed flat vector readers and writers to provide specific functions. Like the mergeOneField that returns closeable scorers. I am not

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792590687 > The flat format is an implementation detail. Folks using the quantized hnsw do not have to supply a flat format. We can register the flat format for direct usage (outside of HNSW)

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792546852 @jimczi what do you mean "existing format as implementation detail"? The flat format is an implementation detail. Folks using the quantized hnsw do not have to supply a flat form

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-03 Thread via GitHub
jimczi commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792391067 I agree with Adrien that hardcoded formats with a clear strategy are better. We want to avoid exposing a knn format that takes another abstract format. That would be cryptic and diffic

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-02 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1380023598 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-02 Thread via GitHub
benwtrent commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379919203 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throw

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-11-01 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1379282330 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1786059581 @jpountz updated. Flat is no longer pluggable, two HNSW formats are exposed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
jpountz commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785712360 Thanks, splitting the way you describe would make me happy. I had not understood that the flat codec was a goal. Now that I think more about it, I wonder if we should better separa

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785563758 OK, @jpountz thinking about it more. To do what you are suggesting, I think the following would work: - Force Lucene99HnswVectorsReader & Lucene99HnswVectorsWriter to take a `F

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
benwtrent commented on PR #12729: URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785376897 @jpountz the goal of this change is not just making code reusable. But: - Allowing folks who don't want HNSW to take advantage of the per-segment quantization and logic. Paging

Re: [PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-30 Thread via GitHub
jpountz commented on code in PR #12729: URL: https://github.com/apache/lucene/pull/12729#discussion_r1376303898 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws

[PR] Adding new flat vector format and refactoring HNSW [lucene]

2023-10-27 Thread via GitHub
benwtrent opened a new pull request, #12729: URL: https://github.com/apache/lucene/pull/12729 Currently the HNSW codec does too many things, it not only indexes vectors, but stores them and determines how to store them given the vector type. This PR extracts out the vector storage int