msokolov edited a comment on pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#issuecomment-703872279


   > Thank you for ... the tests catching mis-use where user tries to change 
dimension or scoring function in an existing field.
   
   Thanks to @mocobeta for those; I was able to carry that forward from her 
earlier patch
   
   > I see you implemented the two score functions, but are they ever exercised 
in tests
   
   True - this was extracted from a bigger change including usage of those 
methods as part of KNN search, but they deserve their own unit tests - I'll add.
   
   > I would love to see a "Vector Overview" javadoc somewhere ...
   
   Yes - I'll add to the VectorValues/VectorField class javadocs I think that's 
the most natural/visible place.
   
   > I am curious how the basic vector usage performs -- just indexing one 
vector field, and retrieving it at search time. We can (separately) enable 
luceneutil to support testing vectors, somehow. But I wonder where we'll get 
semi-realistic vectors derived from Wikipedia content 
   
   Agreed that benchmarking is needed. I think we can use 
http://ann-benchmarks.com/ as a guide for some standardized test vectors. They 
won't be related to wikipedia? If we get to wanting that, we could also make 
use of something like https://fasttext.cc/docs/en/pretrained-vectors.html that 
is trained on ngrams taken from Wikipedia (for many languages)? I don't know 
how suited it is, just found in a google search. For that, we'd have to compute 
document/query vectors based on an ngram-vector dictionary. I think a simple 
thing is to sum all the ngram-vectors for all the ngrams in a document / query


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to