benwtrent commented on code in PR #15271:
URL: https://github.com/apache/lucene/pull/15271#discussion_r2395367100
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene104/Lucene104ScalarQuantizedVectorsFormat.java:
##########
@@ -186,16 +203,35 @@ public Lucene104ScalarQuantizedVectorsFormat() {
this(ScalarEncoding.UNSIGNED_BYTE);
}
- /** Creates a new instance with the chosen encoding. */
+ /** Creates a new instance with the chosen symmetric quantization encoding.
*/
public Lucene104ScalarQuantizedVectorsFormat(ScalarEncoding encoding) {
+ this(encoding, encoding);
+ }
+
+ /** Creates a new instance with the chosen asymmetric quantization encoding.
*/
+ public Lucene104ScalarQuantizedVectorsFormat(
+ ScalarEncoding encoding, ScalarEncoding queryEncoding) {
super(NAME);
this.encoding = encoding;
+ this.queryEncoding = queryEncoding;
+ // until we have optimized scorers for various other asymmetric encodings,
maybe we only allow 1
Review Comment:
> I know for 4:1 we write two vector sets during index building to improve
scores/graph quality and I'm not sure if that's necessary in other cases but
I'm also not sure how much it would hurt.
The reason for writing out both, is that during graph building, we don't
want to re-quantize query vectors on every calculation for diversity
calculation. Possibly, we don't need it for diversity calculation? But we would
need to test this to ensure graph quality doesn't drop significantly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]