benwtrent opened a new pull request, #12434: URL: https://github.com/apache/lucene/pull/12434
A `join` within Lucene is built by adding child-docs and parent-docs in order. Since our vector field already supports sparse indexing, it should be able to support parent join indexing. However, when searching for the closest `k`, it is still the k nearest children vectors with no way to join back to the parent. This commit adds this ability through some significant changes: - New leaf reader function that allows a collector for knn results - The knn results can then utilize bit-sets to join back to the parent id This change is fairly large and there are some dragons I am not sure about. So, opening as a draft for deeper discussion. FYI, I did some testing: - Indexing time is pretty much unaffected when doing sparse indexing - Search time with the parent join adds about 20% additional overhead as the change currently is. I suspect it has to do with resolving ordinals and deduplicating/ordering the hash map -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org