msokolov commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-1815705050

   yes, this is a promising avenue to explore! One note of caution: we should 
avoid drawing strong inferences from a single dataset. I'm especially wary of 
GloVe because I've noticed it seems to have poor numerical properties. We 
especially should not be testing with random vectors. Ideally we would try 
several datasets, but if I had to pick one I'd recommend the minilm (384-dim) 
vectors we computed from wikipedia, or some internal Amazon dataset, or I know 
Elastic folks have been testing with a Cohere dataset? You can download the 
minilm data from sftp <username>@home.apache.org; cd /home/sokolov/public_html 
if you have an apache login. You can also regenerate using infer_vectors.py in 
luceneutil, but it takes a little while


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to