LantaoJin opened a new pull request, #16214:
URL: https://github.com/apache/lucene/pull/16214

   ### Description
   
   Adds two `IndexWriter` APIs to update a document's KNN vector **in place**, 
without reindexing the
   rest of the document:
   
   ```java
   public long updateFloatVectorValue(Term term, String field, float[] value)
   public long updateByteVectorValue(Term term, String field, byte[] value)
   ```
   
   Today the only way to change a stored embedding (`KnnFloatVectorField` / 
`KnnByteVectorField`) is
   `updateDocument` -- delete-by-term + re-add the **whole** document, which 
re-analyzes/re-posts/
   re-stores every field.
   For workloads that periodically re-embed (e.g. bumping the embedding model
   version) that is wasteful: only the vector changed. These APIs mirror 
`updateDocValues` -- they
   rewrite just the affected field at a new per-segment generation and leave 
everything else untouched.
   
   
   ### Benchmark summary
   
   `dim=768, otherFields=8`, ms per commit (lower is better):
   
   | numDocs | batchSize | updateVectorValue | updateDocument |
   |--:|--:|--:|--:|
   | 50000 | 1 | 238 | 9 |
   | 50000 | 1000 | 251 | 542 |
   | 50000 | 10000 | 384 | 8090 |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to