Upayavira, on the lucene list, two tools are sometimes talked about which might be doing some of what you are searching: - semanticvectors (https://code.google.com/p/semanticvectors) - word2vec https://github.com/kojisekig/word2vec-lucene/i Maybe it helps? I'm under the impression that you are rather looking for the lucene performance instead of these tools which I see as rather explicit samples for the interest of using vectors for word engineering.
Paul On 27 nov. 2014, at 07:57, Upayavira <u...@odoko.co.uk> wrote: > Thanks Nicholas, there is a sense in which Solr isn't the right tool. > However, we already have lots of business rules encapsulated into filter > queries, and already have content ingestion pipelines for our content in > place. > > TF-IDF similarity is pluggable (even just by sorting on function > queries), so am looking for an alternative way to encapsulate the > scoring algorithm. > > Upayavira > > On Wed, Nov 26, 2014, at 10:14 PM, Nicholas Ding wrote: >> I'm not sure if Solr is the right tool to do this task. You probably need >> a >> machine learning library like Mahout or Weka. >> >> PS: Lucene doesn't really use Cosine Similarity, it's using a practical >> TF-IDF Similarity. >> >> Nicholas Ding >> >> On Wed, Nov 26, 2014 at 3:05 PM, Upayavira <u...@odoko.co.uk> wrote: >> >>> Hi, >>> >>> I've been asked how to use Solr as a component in a machine learning >>> system, doing document comparison based upon feature vectors. >>> >>> If I have two vectors, one in the index (in some form) and one in the >>> query (in some form), how can I do, for example, a vector multiplication >>> of the two vectors in order to calculate a score? >>> >>> The feature space I am being given has 100 features, with numerical >>> scores for each feature. In this case, it is not sparse - most features >>> will have a value. >>> >>> I have ideas, but it seems they get me some of the way, but not all. >>> >>> Has anyone worked with Solr in this way? >>> >>> Thanks, >>> >>> Upayavira >>>