mikemccand commented on PR #13244:
URL: https://github.com/apache/lucene/pull/13244#issuecomment-2030128137

   Unfortunately, benchmarking the cold index case correctly is not so easy ... 
I would not trust luceneutil to give accurate results (its queries are 
synthetically generated).
   
   We would rather need a real-world large index (or use `ramhog` to cut back 
on free OS RAM), and, importantly, real-world and matching query traffic that 
shows the typical/realistc [Zipfian 
distribution](https://en.wikipedia.org/wiki/Zipf%27s_law) on search terms.
   
   Not only realistic queries, but they should be delivered to Lucene 
accurately by time (i.e. at the actual arrival times that the queries came to 
the search engine), asynchronously ("open loop") to avoid the [coordinated 
omission bug](https://www.scylladb.com/2021/04/22/on-coordinated-omission/).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to