zacharymorn commented on pull request #101: URL: https://github.com/apache/lucene/pull/101#issuecomment-835707134
I cherry-picked your commit and pushed to this branch / PR to further explore the changes and their effect, hope that's ok. > (2) feels natural to avoid doing useless score computations, though it might only work well when score upper bounds are very close to the actual scores. Maybe we should test on wikibig instead of wikimedium to get better confidence that this change makes things better. > Regarding (3) does it actually push more scorers into nonEssentialScorers? I thought I just reorganized the existing logic a bit. If it pushes more scorers into nonEssentialScorers it's probably a bug. :) I did some explorations to see how these two affect QPS by removing them here https://github.com/apache/lucene/pull/101/commits/2835055979fa6a739972d2abf30888e855b7683c, and restore the optimizations in later commits (the luceneutil results from OrMedMedMedMedMed for these changes are added into git commit messages). From a few runs it seems that these two accounted for about 15% improvement (from -7% to +7%), although for (3) after some more thought I think the different implementations should actually be the same, since the scorers were already sorted by maxScore before they were partitioned. The changes to iterator to avoid unneeded score computation should probably account for the rest of the improvement. I also tried to run `wikibigall` as well, which seems to require `enwiki-20100302-pages-articles-lines.txt` but it's not downloaded by the util. It appears the archive should be coming from http://home.apache.org/~mikemccand/enwiki-20100302-pages-articles-lines.txt.bz2, but it's giving 404 now. Is there a new place for this archive to live now that I can download from? > Maybe we can also try to port similar changes to the bulk scorer to see if it yields even greater benefits? Yes I would love to do that! Will work on that next and try out more different queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org