[GitHub] [lucene] benwtrent opened a new pull request, #12434: Add ParentJoin KNN support

via GitHub Tue, 11 Jul 2023 14:28:08 -0700


benwtrent opened a new pull request, #12434:
URL: https://github.com/apache/lucene/pull/12434


   A `join` within Lucene is built by adding child-docs and parent-docs in 
order. Since our vector field already supports sparse indexing, it should be 
able to support parent join indexing. 
   
   However, when searching for the closest `k`, it is still the k nearest 
children vectors with no way to join back to the parent.
   
   This commit adds this ability through some significant changes:
    - New leaf reader function that allows a collector for knn results
    - The knn results can then utilize bit-sets to join back to the parent id
    
   This change is fairly large and there are some dragons I am not sure about. 
So, opening as a draft for deeper discussion.
   
   FYI, I did some testing:
    - Indexing time is pretty much unaffected when doing sparse indexing
    - Search time with the parent join adds about 20% additional overhead as 
the change currently is. I suspect it has to do with resolving ordinals and 
deduplicating/ordering the hash map


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent opened a new pull request, #12434: Add ParentJoin KNN support

Reply via email to