iprithv commented on PR #16069: URL: https://github.com/apache/lucene/pull/16069#issuecomment-4493275247
> Thanks @iprithv, getting closer :) > > Can you run the luceneutil wikibigall benchmarks (from https://github.com/mikemccand/luceneutil) and post the results here? That should give us an idea of the real-world impact of these changes. I couldn’t find the wikibigall dataset. the file <code inline="">enwiki-20120502-lines-with-random-label.txt</code> doesn’t seem to be available anymore. looks like the old URL in <code inline="">constants.py</code> is gone, and <code inline="">initial_setup.py</code> only downloads wikimedium now (maybe related to https://github.com/apache/lucene/issues/13647). I also checked for mirrors but didn’t find anything.</p><p>so instead, I ran the wikimediumall benchmark (33M docs, same task file, 5 JVM iterations).</p><p>results show no real regressions in disjunction queries:</p> task | baseline qps | candidate qps | diff | p-value -- | -- | -- | -- | -- OrHighHigh | 87.83 | 86.23 | -1.8% | 0.808 OrHighMed | 244.14 | 238.30 | -2.4% | 0.556 OrHighLow | 925.61 | 908.15 | -1.9% | 0.571 OrNotHighHigh | 505.40 | 507.37 | +0.4% | 0.822 OrNotHighMed | 363.62 | 366.20 | +0.7% | 0.710 OrNotHighLow | 704.81 | 716.40 | +1.6% | 0.337 if there’s another place to get the wikibigall dataset, let me know and I can rerun with that. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
