Hi Peyman,

I never saw this mentioned on Lucene/Solr MLs, so if anyone has done any work 
on this, I don't think it was shared.

Otis 
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



>________________________________
> From: Peyman Faratin <pey...@robustlinks.com>
>To: solr-user@lucene.apache.org 
>Sent: Monday, April 23, 2012 12:29 PM
>Subject: Kernel methods in SOLR
> 
>Hi
>
>Has there been any work that tries to integrate Kernel methods [1] with SOLR? 
>I am interested in using kernel methods to solve synonym, hyponym and 
>polysemous (disambiguation) problems which SOLR's Vector space model ("bag of 
>words") does not capture. 
>
>For example, imagine we have only 3 words in our corpus, "puma", "cougar" and 
>"feline". The 3 words have obviously interdependencies (puma disambiguates to 
>cougar, cougar and puma are instances of felines - hyponyms). Now, imagine 2 
>docs, d1 and d2, that have the following TF-IDF vectors. 
>
>                 puma, cougar, feline
>d1       =   [  2,        0,         0]
>d2       =   [  0,        1,         0]
>
>i.e. d1 has no mention of term cougar or feline and conversely, d2 has no 
>mention of terms puma or feline. Hence under the vector approach d1 and d2 are 
>not related at all (and each interpretation of the terms have a unique 
>vector). Which is not what we want to conclude. 
>
>What I need is to include a kernel matrix (as data) such as the following that 
>captures these relationships:
>
>                       puma, cougar, feline
>puma    =   [  1,        1,         0.4]
>cougar    =   [  1,        1,         0.4]
>feline    =   [  0.4,     0.4,         1]
>
>then recompute the TF-IDF vector as a product of (1) the original vector and 
>(2) the kernel matrix, resulting in
>
>                 puma, cougar, feline
>d1       =   [  2,        2,         0.8]
>d2       =   [  1,        1,         0.4]
>
>(note, the new vectors are much less sparse). 
>
>I can solve this problem (inefficiently) at the application layer but I was 
>wondering if there has been any attempts within the community to solve similar 
>problems, efficiently without paying a hefty response time price?
>
>thank you 
>
>Peyman
>
>[1] http://en.wikipedia.org/wiki/Kernel_methods
>
>

Reply via email to