Jackie-Jiang commented on PR #11557: URL: https://github.com/apache/pinot/pull/11557#issuecomment-1718853082
Added a benchmark and here are the results: Cardinality | Percentage | DivideBS | PlainBS | Scan | Divide vs Plain | Divide vs Scan -- | -- | -- | -- | -- | -- | -- 100 | 1 | 7659947.834 | 7193881.873 | 2590606.363 | 1.064786435 | 2.956816575 100 | 2 | 4527399.93 | 3702032.137 | 2297514.754 | 1.222949927 | 1.970564029 100 | 4 | 2298448.048 | 2219638.52 | 884871.898 | 1.035505569 | 2.59749242 100 | 8 | 1624385.705 | 1188408.158 | 730888.462 | 1.366858427 | 2.22248098 100 | 16 | 789605.698 | 561158.422 | 466697.802 | 1.407099434 | 1.691899329 100 | 32 | 451038.526 | 270074.742 | 414345.056 | 1.670050752 | 1.08855776 100 | 64 | 249953.379 | 145491.186 | 306353.403 | 1.717996711 | 0.8158988167 100 | 100 | 193388.569 | 81558.625 | 252231.273 | 2.371160242 | 0.766711307 1000 | 1 | 673035.271 | 604210.035 | 53069.877 | 1.113909455 | 12.68205824 1000 | 2 | 331272.455 | 296964.057 | 56142.286 | 1.115530473 | 5.900587215 1000 | 4 | 192433.934 | 154232.986 | 53796.693 | 1.247683385 | 3.577058798 1000 | 8 | 120568.939 | 67105.009 | 46336.664 | 1.796720406 | 2.602020271 1000 | 16 | 71785.317 | 33548.834 | 41131.107 | 2.139726138 | 1.745280452 1000 | 32 | 42476.434 | 13445.678 | 33515.651 | 3.159114327 | 1.26736115 1000 | 64 | 22895.909 | 7281.486 | 26235.134 | 3.144400607 | 0.8727193465 1000 | 100 | 16424.202 | 4369.1 | 25628.041 | 3.759172827 | 0.6408684144 10000 | 1 | 56327.402 | 36695.237 | 5402.282 | 1.53500581 | 10.42659417 10000 | 2 | 34808.144 | 19821.462 | 5119.112 | 1.756083583 | 6.799644938 10000 | 4 | 17176.683 | 8525.903 | 4878.576 | 2.014646777 | 3.520839483 10000 | 8 | 7889.65 | 4484.064 | 4283.167 | 1.759486484 | 1.842013165 10000 | 16 | 4291.993 | 2175.905 | 3454.031 | 1.97250937 | 1.242604076 10000 | 32 | 2414.918 | 1111.743 | 2897.232 | 2.172190875 | 0.833525931 10000 | 64 | 1514.006 | 569.737 | 1955.941 | 2.657377 | 0.7740550456 10000 | 100 | 1319.788 | 380.992 | 2126.729 | 3.464083235 | 0.6205717795 100000 | 1 | 3995.728 | 2898.882 | 500.751 | 1.378368626 | 7.979470835 100000 | 2 | 2205.853 | 1299.111 | 496.293 | 1.697971151 | 4.4446587 100000 | 4 | 1309.562 | 752.165 | 453.279 | 1.741056816 | 2.889085971 100000 | 8 | 759.235 | 355.94 | 416.158 | 2.133042086 | 1.824391217 100000 | 16 | 413.461 | 168.189 | 338.497 | 2.45831178 | 1.221461342 100000 | 32 | 210.711 | 74.033 | 234.106 | 2.846176705 | 0.9000666365 100000 | 64 | 103.367 | 38.865 | 134.722 | 2.659642352 | 0.7672614718 100000 | 100 | 70.296 | 24.18 | 96.128 | 2.90719603 | 0.7312749667 1000000 | 1 | 330.142 | 184.781 | 47.947 | 1.786666378 | 6.88556114 1000000 | 2 | 147.552 | 89.482 | 48.158 | 1.648957332 | 3.063914614 1000000 | 4 | 75.214 | 48.161 | 38.61 | 1.561720064 | 1.948044548 1000000 | 8 | 39.693 | 23.755 | 30.018 | 1.670932435 | 1.322306616 1000000 | 16 | 23.65 | 12.317 | 23.324 | 1.920110416 | 1.013977019 1000000 | 32 | 12.367 | 5.821 | 13.327 | 2.124549047 | 0.9279657837 1000000 | 64 | 6.577 | 2.959 | 7.576 | 2.222710375 | 0.8681362196 1000000 | 100 | 4.734 | 1.932 | 5.345 | 2.450310559 | 0.8856875585 As shown above, divide binary search always outperforms plain binary search. When the lookup value percentage is very high (over half of the dictionary), scan start outperforming divide binary search, but not by much. Based on this, I've decided to keep all 3 algorithms and use divide binary search as the default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org