benwtrent commented on PR #14085: URL: https://github.com/apache/lucene/pull/14085#issuecomment-2599236766
I think I have addressed the bug in my implementation. I simplified it greatly and it more resembles your original change, though with some constants being changed. The recall curve & runtime curve look way better now for most filtered results. I will post graphs soon as the raw data is an eye sore. Here is my data: https://docs.google.com/spreadsheets/d/1GqD7Jw42IIqimr2nB78fzEfOohrcBlJzOlpt0NuUVDQ/edit?usp=sharing It looks like at 0.5 and above the improvements start tapering off, but it never gets significantly worse :/ Here is a graph of some of the more restrictive filters. 10% filtered allowed does much much better with this new algorithm.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org