Tony-X commented on issue #12358: URL: https://github.com/apache/lucene/issues/12358#issuecomment-1585758064
Just caught up on this thread -- the design tenet of the current benchmark game is to measure time taken to do the same work in contention-free environment. As of now I'm still trying to build trust of the benchmarks so thank you for your evaluation and feedbacks @uschindler ! So far I believe there are doing the "same" work as I have chased down a few tokenization issues. Right now the indexes on both side have -- * almost "same" tokenization -- split by whitespaces and remove tokens with length >=256 * same index sort * same set of deleted docs (2% in total) * single segment Regarding the JVM here is what we do now * warm up the JVM with 6.1k query for each `COUNT` and `TOP_10_COUNT`. We could increase the warmup iterations easily [here](https://github.com/Tony-X/search-benchmark-game/blob/4402d42c906830e85d8d79a30ae776f204ade770/Makefile#L18). As I was typing, I already changed warmup iter to 3 and kicked off a run. Admittedly we haven't looked into playing with different JVM arguments. @mikemccand thanks for creating https://github.com/Tony-X/search-benchmark-game/issues/37 to explore the heap sizes :) IMO, GC here is less of an issue since we measure the best latency (min) across 10 runs for each query (a slight favor for JVM). The probability that every 10 of 10 run of the same query hit an GC is very tiny. It would be great to share your insights about an optimal JVM setting for this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org