mikemccand commented on PR #13244: URL: https://github.com/apache/lucene/pull/13244#issuecomment-2030128137
Unfortunately, benchmarking the cold index case correctly is not so easy ... I would not trust luceneutil to give accurate results (its queries are synthetically generated). We would rather need a real-world large index (or use `ramhog` to cut back on free OS RAM), and, importantly, real-world and matching query traffic that shows the typical/realistc [Zipfian distribution](https://en.wikipedia.org/wiki/Zipf%27s_law) on search terms. Not only realistic queries, but they should be delivered to Lucene accurately by time (i.e. at the actual arrival times that the queries came to the search engine), asynchronously ("open loop") to avoid the [coordinated omission bug](https://www.scylladb.com/2021/04/22/on-coordinated-omission/). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org