msokolov commented on PR #12311: URL: https://github.com/apache/lucene/pull/12311#issuecomment-1561247988
I ran luceneutil with GloVe 300-dim floating point (fp32) vectors over 1M wikipedia documents: ``` TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value PKLookup 196.01 (3.8%) 192.14 (3.8%) -2.0% ( -9% - 5%) 0.099 LowTermVector 213.57 (7.2%) 252.31 (3.6%) 18.1% ( 6% - 31%) 0.000 AndHighLowVector 185.28 (6.8%) 221.08 (3.5%) 19.3% ( 8% - 31%) 0.000 AndHighMedVector 125.91 (5.7%) 152.52 (2.5%) 21.1% ( 12% - 31%) 0.000 HighTermVector 171.95 (7.3%) 208.94 (3.3%) 21.5% ( 10% - 34%) 0.000 AndHighHighVector 123.87 (5.0%) 151.81 (2.9%) 22.6% ( 14% - 32%) 0.000 MedTermVector 119.07 (7.5%) 148.07 (2.8%) 24.4% ( 13% - 37%) 0.000 ``` and with GloVe 100-dim 8-bit vectors ``` TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value PKLookup 190.59 (7.4%) 193.25 (5.1%) 1.4% ( -10% - 14%) 0.486 LowTermVector 291.71 (24.0%) 341.91 (14.3%) 17.2% ( -17% - 73%) 0.006 AndHighMedVector 230.40 (22.6%) 274.26 (13.0%) 19.0% ( -13% - 70%) 0.001 MedTermVector 245.36 (22.7%) 292.35 (11.9%) 19.2% ( -12% - 69%) 0.001 HighTermVector 296.45 (25.6%) 357.02 (9.8%) 20.4% ( -11% - 75%) 0.001 AndHighLowVector 252.70 (23.2%) 308.05 (13.7%) 21.9% ( -12% - 76%) 0.000 AndHighHighVector 150.54 (21.0%) 185.45 (13.4%) 23.2% ( -9% - 72%) 0.00 ``` I also tried getting some vectors using a different model that produces 384-dim fp32 vectors (`all-MiniLM-L6-v2` from https://www.sbert.net/docs/pretrained_models.html). The methodology here is a bit sus because we compute embedding vectors per-word and then sum them over larger docs, whereas these models are really designed to be computed on larger passages so they can make use of word context. Still I think the performance measurements will be valid. ``` TaskQPS baseline StdDevQPS candidate StdDev Pct diff p-value PKLookup 173.59 (8.5%) 176.41 (5.7%) 1.6% ( -11% - 17%) 0.477 AndHighHighVector 309.15 (26.1%) 346.54 (18.1%) 12.1% ( -25% - 76%) 0.089 LowTermVector 305.52 (26.4%) 343.83 (15.9%) 12.5% ( -23% - 74%) 0.069 MedTermVector 312.58 (26.6%) 352.51 (18.5%) 12.8% ( -25% - 78%) 0.078 HighTermVector 300.84 (30.4%) 345.35 (18.8%) 14.8% ( -26% - 92%) 0.064 AndHighMedVector 303.15 (27.8%) 349.09 (18.2%) 15.2% ( -24% - 84%) 0.041 AndHighLowVector 233.11 (21.9%) 285.00 (12.5%) 22.3% ( -9% - 72%) 0.000 ``` I was surprised this showed less improvement than the smaller vectors but there is a lot of noise in these benchmarks. I see the results vary quite a bit from run to run (even averaging over 20 JVMs). I'm currently training up some 768-dim vectors using `all-mpnet-base-v` and I'll see if I can get measurements from KnnGraphTester that should be more focused. These tests were run with 609fc9b63f61954a7408faa1669e807a6bbf1da9 so maybe a few commits back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org