mikemccand commented on PR #14078:
URL: https://github.com/apache/lucene/pull/14078#issuecomment-2569246977

   +1 for proper attribution.
   
   We should give credit where credit is due.  The evolution of this PR clearly 
began with the RaBitQ paper, as seen in the [opening comment on the original 
PR](https://github.com/apache/lucene/pull/13651#issue-2464020838) as well as 
[the original 
issue](https://github.com/apache/lucene/issues/13650#issue-2463854436).
   
   Specifically for the open source changes proposed here (this pull request 
suggesting changes to Lucene's ASL2 licensed source code):
   
     * The CHANGES.txt entry should link to both RaBitQ papers?
   
     * The javadoc for the new `Lucene102BinaryQuantizedVectorsFormat` should 
also link to both papers, and describe the provenance (e.g. the algorithm 
described by these papers) along with how this implementation differs from the 
original papers?  We try to do this when a paper inspires changes in Lucene, 
e.g. [the algorithm for efficiently building our 
FSTs](https://github.com/apache/lucene/blob/204c39f8eb7fb5fd26a3b9ff41ef7d18fae1c844/lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java#L49-L50),
 [the paper that inspired our block-tree terms 
dictionary](https://github.com/apache/lucene/blob/204c39f8eb7fb5fd26a3b9ff41ef7d18fae1c844/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java#L55-L57),
 the [HNSW approximate KNN search 
algorithm](https://github.com/apache/lucene/blob/204c39f8eb7fb5fd26a3b9ff41ef7d18fae1c844/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L32-L35).
   
   Linking to the papers that inspired important changes in Lucene is not only 
for proper attribution but also so users have a deep resource they can fall 
back on to understand the algorithm, understand how tunable parameters are 
expected to behave, etc.   It's an important part of the documentation too!  
Also, future developers can re-read the paper and study Lucene's implementation 
and maybe find bugs / improvement ideas. 
   
   For the Elastic specific artifacts (blog posts, press releases, tweets, 
etc.): I would agree that Elastic should also attribute properly, probably with 
an edit/update/sorry-about-the-oversight sort of addition?  But I do not (no 
longer) work at Elastic, so this is merely my (external) opinion!  Perhaps a 
future blog post, either Elastic or someone else, could correct the mistake 
(missed attribution).
   
   Finally, thank you to @gaoj0017 and team for creating RaBitQ and publishing 
these papers -- this is an impactful vector quantization algorithm that can 
help the many Lucene/OpenSearch/Solr/Elasticsearch users building semantic / 
LLM engines these days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to