Re: comparing feature vectors using Solr/Lucene

Paul Libbrecht Wed, 26 Nov 2014 23:10:53 -0800

Upayavira,

on the lucene list, two tools are sometimes talked about which might be doing 
some of what you are searching:
- semanticvectors (https://code.google.com/p/semanticvectors)
- word2vec https://github.com/kojisekig/word2vec-lucene/i
Maybe it helps?
I'm under the impression that you are rather looking for the lucene performance 
instead of these tools which I see as rather explicit samples for the interest 
of using vectors for word engineering.


Paul


On 27 nov. 2014, at 07:57, Upayavira <u...@odoko.co.uk> wrote:

> Thanks Nicholas, there is a sense in which Solr isn't the right tool.
> However, we already have lots of business rules encapsulated into filter
> queries, and already have content ingestion pipelines for our content in
> place.
> 
> TF-IDF similarity is pluggable (even just by sorting on function
> queries), so am looking for an alternative way to encapsulate the
> scoring algorithm.
> 
> Upayavira
> 
> On Wed, Nov 26, 2014, at 10:14 PM, Nicholas Ding wrote:
>> I'm not sure if Solr is the right tool to do this task. You probably need
>> a
>> machine learning library like Mahout or Weka.
>> 
>> PS: Lucene doesn't really use Cosine Similarity, it's using a practical
>> TF-IDF Similarity.
>> 
>> Nicholas Ding
>> 
>> On Wed, Nov 26, 2014 at 3:05 PM, Upayavira <u...@odoko.co.uk> wrote:
>> 
>>> Hi,
>>> 
>>> I've been asked how to use Solr as a component in a machine learning
>>> system, doing document comparison based upon feature vectors.
>>> 
>>> If I have two vectors, one in the index (in some form) and one in the
>>> query (in some form), how can I do, for example, a vector multiplication
>>> of the two vectors in order to calculate a score?
>>> 
>>> The feature space I am being given has 100 features, with numerical
>>> scores for each feature. In this case, it is not sparse - most features
>>> will have a value.
>>> 
>>> I have ideas, but it seems they get me some of the way, but not all.
>>> 
>>> Has anyone worked with Solr in this way?
>>> 
>>> Thanks,
>>> 
>>> Upayavira
>>>

Re: comparing feature vectors using Solr/Lucene

Reply via email to