Thanks Nicholas, there is a sense in which Solr isn't the right tool. However, we already have lots of business rules encapsulated into filter queries, and already have content ingestion pipelines for our content in place.
TF-IDF similarity is pluggable (even just by sorting on function queries), so am looking for an alternative way to encapsulate the scoring algorithm. Upayavira On Wed, Nov 26, 2014, at 10:14 PM, Nicholas Ding wrote: > I'm not sure if Solr is the right tool to do this task. You probably need > a > machine learning library like Mahout or Weka. > > PS: Lucene doesn't really use Cosine Similarity, it's using a practical > TF-IDF Similarity. > > Nicholas Ding > > On Wed, Nov 26, 2014 at 3:05 PM, Upayavira <u...@odoko.co.uk> wrote: > > > Hi, > > > > I've been asked how to use Solr as a component in a machine learning > > system, doing document comparison based upon feature vectors. > > > > If I have two vectors, one in the index (in some form) and one in the > > query (in some form), how can I do, for example, a vector multiplication > > of the two vectors in order to calculate a score? > > > > The feature space I am being given has 100 features, with numerical > > scores for each feature. In this case, it is not sparse - most features > > will have a value. > > > > I have ideas, but it seems they get me some of the way, but not all. > > > > Has anyone worked with Solr in this way? > > > > Thanks, > > > > Upayavira > >