zacharymorn commented on pull request #101:
URL: https://github.com/apache/lucene/pull/101#issuecomment-835707134


   I cherry-picked your commit and pushed to this branch / PR to further 
explore the changes and their effect, hope that's ok.
   
   > (2) feels natural to avoid doing useless score computations, though it 
might only work well when score upper bounds are very close to the actual 
scores. Maybe we should test on wikibig instead of wikimedium to get better 
confidence that this change makes things better.
   
   > Regarding (3) does it actually push more scorers into nonEssentialScorers? 
I thought I just reorganized the existing logic a bit. If it pushes more 
scorers into nonEssentialScorers it's probably a bug. :)
   
   I did some explorations to see how these two affect QPS by removing them 
here 
https://github.com/apache/lucene/pull/101/commits/2835055979fa6a739972d2abf30888e855b7683c,
 and restore the optimizations in later commits (the luceneutil results from 
OrMedMedMedMedMed for these changes are added into git commit messages). From a 
few runs it seems that these two accounted for about 15% improvement (from -7% 
to +7%), although for (3)  after some more thought I think the different 
implementations should actually be the same, since the scorers were already 
sorted by maxScore before they were partitioned. The changes to iterator to 
avoid unneeded score computation should probably account for the rest of the 
improvement.
   
   I also tried to run `wikibigall` as well, which seems to require 
`enwiki-20100302-pages-articles-lines.txt` but it's not downloaded by the util. 
It appears the archive should be coming from 
http://home.apache.org/~mikemccand/enwiki-20100302-pages-articles-lines.txt.bz2,
 but it's giving 404 now. Is there a new place for this archive to live now 
that I can download from?
   
   > Maybe we can also try to port similar changes to the bulk scorer to see if 
it yields even greater benefits?
   
   Yes I would love to do that! Will work on that next and try out more 
different queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to