jimczi commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1804243772
Sorry for the late reply.
> Since this is a larger API discussion, do we think we can move forward
with the way it is now (quantization for HNSW and other vector indices) and
itera
benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1795961973
@jimczi thinking about it more, it seems to be a Flat* index format for
vector search will require a different API. Right now, kNN search assumes the
user provides pre-filters. However
jimczi commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792854167
> I think we should expose the flat formats in the codec. But the required
new functions for indexing the vectors seem to justify a new abstraction.
Can we add the abstraction as an
benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792686970
@jimczi the HNSWWriter and Readers need the passed flat vector readers and
writers to provide specific functions. Like the mergeOneField that returns
closeable scorers.
I am not
jimczi commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792590687
> The flat format is an implementation detail. Folks using the quantized
hnsw do not have to supply a flat format.
We can register the flat format for direct usage (outside of HNSW)
benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792546852
@jimczi what do you mean "existing format as implementation detail"?
The flat format is an implementation detail. Folks using the quantized hnsw
do not have to supply a flat form
jimczi commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1792391067
I agree with Adrien that hardcoded formats with a clear strategy are better.
We want to avoid exposing a knn format that takes another abstract format.
That would be cryptic and diffic
jpountz commented on code in PR #12729:
URL: https://github.com/apache/lucene/pull/12729#discussion_r1380023598
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws
benwtrent commented on code in PR #12729:
URL: https://github.com/apache/lucene/pull/12729#discussion_r1379919203
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throw
jpountz commented on code in PR #12729:
URL: https://github.com/apache/lucene/pull/12729#discussion_r1379282330
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws
benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1786059581
@jpountz updated. Flat is no longer pluggable, two HNSW formats are exposed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
jpountz commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785712360
Thanks, splitting the way you describe would make me happy.
I had not understood that the flat codec was a goal. Now that I think more
about it, I wonder if we should better separa
benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785563758
OK, @jpountz thinking about it more. To do what you are suggesting, I think
the following would work:
- Force Lucene99HnswVectorsReader & Lucene99HnswVectorsWriter to take a
`F
benwtrent commented on PR #12729:
URL: https://github.com/apache/lucene/pull/12729#issuecomment-1785376897
@jpountz the goal of this change is not just making code reusable. But:
- Allowing folks who don't want HNSW to take advantage of the per-segment
quantization and logic. Paging
jpountz commented on code in PR #12729:
URL: https://github.com/apache/lucene/pull/12729#discussion_r1376303898
##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -399,41 +281,30 @@ private HnswGraph getGraph(FieldEntry entry) throws
benwtrent opened a new pull request, #12729:
URL: https://github.com/apache/lucene/pull/12729
Currently the HNSW codec does too many things, it not only indexes vectors,
but stores them and determines how to store them given the vector type.
This PR extracts out the vector storage int
16 matches
Mail list logo