xiangfu0 commented on PR #10254: URL: https://github.com/apache/pinot/pull/10254#issuecomment-1426454024
> > Put some preliminary results here: > > Dictionary Size = 40243 Matched Ids = 92 > > Note that the ratio of hit is only about 0.23%. > > IN clause size Binary search time(ms) w/ BloomFilter time(ms) Sort In Clause time(ms) Merge Sort time(ms) Merge Sort Optimized time(ms) > > 1000 0.702665593 0.477092053 1.261689 0.456446174 0.464422597 > > 2000 0.976426609 0.469555765 1.874155 2.35581563 1.03641343 > > 3000 1.395156561 0.54921677 2.016092 2.303482286 1.050509881 > > 4000 1.880918723 0.750761251 2.0648 2.362960455 1.015883262 > > 5000 2.761192677 1.066657953 2.763625 2.96542384 1.091565855 > > 6000 2.8582977489999997 1.1276038309999998 2.564282 2.592623574 1.1344038459999999 > > 7000 3.742143986 1.413617315 2.552672 2.599197506 1.161713075 > > 8000 4.295347995 1.532301406 2.577749 2.6646445659999998 1.262171061 > > 9000 4.453696857 1.839579689 2.453438 2.9550764199999997 1.254242863 > > 10000 5.223760384 2.141856571 2.821829 3.604565735 1.525939769 > > 11000 5.859459020999999 2.753079806 3.826208 3.017545197 1.475243883 > > 12000 5.9234032800000005 2.296612749 3.378312 3.146833064 1.623747765 > > 13000 6.706917585 2.536116155 3.630198 3.1185865600000002 1.7264183020000001 > > 14000 8.850008513 3.5461908650000002 4.859677 3.913546161 1.67060298 > > 15000 8.187981332 3.143937341 5.251953 3.225024074 1.582934511 > > 16000 8.206689118 3.1599710709999997 4.73979 3.47502625 1.666822424 > > 17000 8.672926212 3.4895446430000003 5.149827 3.453563212 1.886811357 > > 18000 9.326708343 3.653541142 5.452587 3.8414371270000003 1.7912497809999999 > > 19000 9.526910202 3.919721084 6.442396 3.824844376 1.922495875 > > 20000 10.616695003999999 4.158644734 6.126283 3.8659291280000003 1.9555868 > > 21000 10.534084935000001 4.347735899 6.333432 3.969913246 2.068954043 > > 22000 11.031193614 4.452593734 6.968554 3.994151205 2.090246166 > > 23000 11.293810134 4.814520738000001 7.332608 4.147165715 2.133099357 > > 24000 11.919929299 4.9986157229999995 7.446203 4.218936294 2.147012917 > > 25000 12.658507581 5.000232526 7.558624 4.9323873890000005 2.297837296 > > 26000 13.544178467 5.49112032 8.372786 4.894930091 2.5079797800000003 > > 27000 13.328060074000001 5.939238627 8.529412 4.883065706 2.467563049 > > 28000 14.697835813 5.68747577 9.463493 5.320589308 2.632497346 > > The bloom filter approach doesn't appear to dominate much of the parameter space, so is it worth the complexity? Agreed, so I guess just mergeSort is good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org