mikemccand commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2569246977
+1 for proper attribution. We should give credit where credit is due. The evolution of this PR clearly began with the RaBitQ paper, as seen in the [opening comment on the original PR](https://github.com/apache/lucene/pull/13651#issue-2464020838) as well as [the original issue](https://github.com/apache/lucene/issues/13650#issue-2463854436). Specifically for the open source changes proposed here (this pull request suggesting changes to Lucene's ASL2 licensed source code): * The CHANGES.txt entry should link to both RaBitQ papers? * The javadoc for the new `Lucene102BinaryQuantizedVectorsFormat` should also link to both papers, and describe the provenance (e.g. the algorithm described by these papers) along with how this implementation differs from the original papers? We try to do this when a paper inspires changes in Lucene, e.g. [the algorithm for efficiently building our FSTs](https://github.com/apache/lucene/blob/204c39f8eb7fb5fd26a3b9ff41ef7d18fae1c844/lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java#L49-L50), [the paper that inspired our block-tree terms dictionary](https://github.com/apache/lucene/blob/204c39f8eb7fb5fd26a3b9ff41ef7d18fae1c844/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsReader.java#L55-L57), the [HNSW approximate KNN search algorithm](https://github.com/apache/lucene/blob/204c39f8eb7fb5fd26a3b9ff41ef7d18fae1c844/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java#L32-L35). Linking to the papers that inspired important changes in Lucene is not only for proper attribution but also so users have a deep resource they can fall back on to understand the algorithm, understand how tunable parameters are expected to behave, etc. It's an important part of the documentation too! Also, future developers can re-read the paper and study Lucene's implementation and maybe find bugs / improvement ideas. For the Elastic specific artifacts (blog posts, press releases, tweets, etc.): I would agree that Elastic should also attribute properly, probably with an edit/update/sorry-about-the-oversight sort of addition? But I do not (no longer) work at Elastic, so this is merely my (external) opinion! Perhaps a future blog post, either Elastic or someone else, could correct the mistake (missed attribution). Finally, thank you to @gaoj0017 and team for creating RaBitQ and publishing these papers -- this is an impactful vector quantization algorithm that can help the many Lucene/OpenSearch/Solr/Elasticsearch users building semantic / LLM engines these days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org