benwtrent opened a new pull request, #13090: URL: https://github.com/apache/lucene/pull/13090
The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole. This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there. Why does this work? - Quantiles based on confidence intervals are (generally) unbiased and doing an average gives statistically good results - The selector algorithm scales linearly, so the cost is just about the same - We need to do more than `1` vector at a time to prevent extreme confidence intervals interacting strangely with edge cases I benchmarked this over 500k vectors. candidate ``` Force merge done in: 691533 ms 0.817 0.04 500000 0 16 250 2343 596410 1.00 post-filter ``` baseline ``` Force merge done in: 685618 ms 0.818 0.04 500000 0 16 250 2346 582242 1.00 post-filter ``` 100k vectors candidate ``` 0.855 0.03 100000 0 16 250 2207 144173 1.00 post-filter ``` baseline ``` 0.858 0.03 100000 0 16 250 2205 141578 1.00 post-filter ``` There does seem to be a slight increase in merge time (these are single threaded numbers) and a slight change in recall. But to me, these seem acceptable given we are no longer allocating a ginormous array. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org