jtibshirani removed a comment on pull request #1930: URL: https://github.com/apache/lucene-solr/pull/1930#issuecomment-711759394
@msokolov I did a belated review and had some high-level thoughts. Hopefully post-commit reviews are still helpful in giving ideas for follow-ups. > It seems messy to have the ScoringFunction implementation in the main VectorValues interface API file. It seems like we’re using `ScoreFunction` for two separate use cases: 1. Configuring the ANN indexing + search strategy (to be introduced later). For example, HNSW needs to know up-front what similarity measure to use when building the graph. 2. Providing convenience methods for users that want to incorporate vectors into scoring, outside of ANN. We could think about separating out these cases. Maybe `ScoreFunction` could stick closely to case 1 and avoid exposing score implementations. This fits well with its role of specifying the ANN index strategy, which might encompass other ANN approaches in the future like `KMEANS_EUCLIDEAN`. And for use case 2, we could introduce static utility methods like `dotProduct(float[] a, float[] b)` ? The ANN implementations would be free to make use of these methods if they prove helpful. If we go with this design, perhaps `ScoreFunction` would be renamed to something like `SearchStrategy`. Separately, I wonder if we need to expose a random access interface on the public `VectorValues`. When using vectors for scoring, iterator access seems sufficient. I’m having trouble thinking of a use for random access outside of ANN, and some ANN implementations don’t even need it -- for example most k-means and LSH approaches only need forward iteration. Would it make sense to keep the `VectorValues` surface area smaller and only provide iterator access? Random access could still be supported at a lower level, in a place that's available to HNSW. I’m not sure exactly how this would look, perhaps it'd be clearer once we see how HNSW will be integrated. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org