mayya-sharipova commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-1889934399
I have also done experiments using Cohere dataset, as as seen below for 10M docs dataset, the speedups with the proposed approach are 1.7-2.5x times. ## Cohere/wikipedia-22-12-en-embeddings - [Cohere/wikipedia-22-12-en-embeddings](https://huggingface.co/datasets/Cohere/wikipedia-22-12-en-embeddings) dataset - 768 dims ### 1M vectors k=10, fanout=90 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | 804| 3225| 0.454| | Baseline 8 segments concurrent | 1807| 1831| 0.887| | Candidate2_with_queue | 1807| 1872| 0.887| k=100, fanout=900 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | 4555| 527| 0.477| | Baseline 8 segments concurrent | 9119| 261| 0.923| | Candidate2_with_queue | 9119| 265| 0.923| ### 10M vectors k=10, fanout=90 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | | | | | Baseline 19 segments concurrent | 37726| 293| 0.971| | Candidate2_with_queue | 20199| 501| 0.960| k=100, fanout=900 | |Avg visited nodes | QPS | Recall| | :--- | ---: | ---: | ---: | | Baseline Single segment | | | | | Baseline 19 segments concurrent | 234047| 47| 0.992| | Candidate2_with_queue | 74995| 118| 0.979| -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org