jtibshirani commented on pull request #1930:
URL: https://github.com/apache/lucene-solr/pull/1930#issuecomment-711759394


   @msokolov I did a belated review and had some high-level thoughts. Hopefully 
post-commit reviews are still helpful in giving ideas for follow-ups.
   
   > It seems messy to have the ScoringFunction implementation in the main 
VectorValues interface API file. 
   
   It seems like we’re using `ScoreFunction` for two separate use cases:
   1. Configuring the ANN indexing + search strategy (to be introduced later). 
For example, HNSW needs to know up-front what similarity measure to use when 
building the graph.
   2. Providing convenience methods for users that want to incorporate vectors 
into scoring, outside of ANN.
   
   We could think about separating out these cases. Maybe `ScoreFunction` could 
stick closely to case 1 and avoid exposing score implementations. This fits 
well with its role of specifying the ANN index strategy, which might encompass 
other ANN approaches in the future like `KMEANS_EUCLIDEAN`. And for use case 2, 
we could introduce static utility methods like `dotProduct(float[] a, float[] 
b)` ? The ANN implementations would be free to make use of these methods if 
they prove helpful. If we go with this design, perhaps `ScoreFunction` would be 
renamed to something like `SearchStrategy`.
   
   Separately, I wonder if we need to expose a random access interface on the 
public `VectorValues`. When using vectors for scoring, iterator access seems 
sufficient. I’m having trouble thinking of a use for random access outside of 
ANN, and some ANN implementations don’t even need it -- for example most 
k-means and LSH approaches only need forward iteration. Would it make sense to 
keep the `VectorValues` surface area smaller and only provide iterator access? 
Random access could still be supported at a lower level, in a place that's 
available to HNSW. I’m not sure exactly how this would look, perhaps it'd be 
clearer once we see how HNSW will be integrated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to