[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

via GitHub Thu, 20 Jul 2023 12:13:56 -0700


benwtrent commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1644461632


   @jmazanec15 I followed your steps with the same data (forcemerging as well)
   
   Instead of using `dot_product` as it is, I instead focused on the 
non-negative case (which is what it would be we supported this). So I used your 
piecewise transformation (negatives are between 0-1 and positives are unscaled 
scores of 1+). 
   
   This is what I got:
   ```
   recall       latency nDoc    fanout  maxConn beamWidth       visited   index 
ms
   0.989         2.74   400000  200     32      200             210       
683712        1.00    post-filter
   ```
   
   So, 0.989 recall at 2.7ms per query taking `683712ms` to build the index. 
Not too shabby. Its interesting how the scaling slightly changes the recall 
number.
   
   We should verify this is ok by feed the docs in a random order. We might be 
getting lucky in the graph building.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

Reply via email to