mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1919701631
I've re-ran the sets o with latest changes on this PR (candidate) and main branch (baseline): I have also done experiments using Cohere dataset, as as seen below: - for 10M docs dataset, the speedups with the proposed approach are 1.7-2.5x times. - for 10M docs dataset, where k+fanout = 1000, QPS is close to the QPS of a single segment, while recall is better. ## Cohere/wikipedia-22-12-en-embeddings - [Cohere/wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings) dataset - 768 dims - interval to synchronize with global queue: 255 visited docs ### 1M vectors k=10, fanout=90 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | | | 0.880| | Baseline 8 segments concurrent | 13927| 815| 0.974| | Candidate2_with_queue | 12670| 859| 0.964| k=100, fanout=900 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | | | 0.964| | Baseline 8 segments concurrent | 81824| 126| 0.997| | Candidate2_with_queue | 62085| 165| 0.995| ### 10M vectors k=10, fanout=90 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | | | 0.929| | Baseline 19 segments concurrent | 37656| 271| 0.951| | Candidate2_with_queue | 21921| 443| 0.927| k=100, fanout=900 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | | | 0.950 | | Baseline 19 segments concurrent | 229945| 44| 0.990| | Candidate2_with_queue | 101970| 91| 0.984| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org