Re: LSA Implementation

2007-11-28 Thread Eswar K
gt; To: solr-user@lucene.apache.org > Subject: Re: LSA Implementation > > The languages also include CJK :) among others. > > - Eswar > > On Nov 27, 2007 8:16 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote: > > > The WordNet project at Princeton (USA) is a large d

Re: LSA Implementation

2007-11-27 Thread Grant Ingersoll
apache.org Subject: Re: LSA Implementation The languages also include CJK :) among others. - Eswar On Nov 27, 2007 8:16 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote: The WordNet project at Princeton (USA) is a large database of synonyms. If you're only working in English this might

RE: LSA Implementation

2007-11-27 Thread Norskog, Lance
[mailto:[EMAIL PROTECTED] Sent: Monday, November 26, 2007 6:50 PM To: solr-user@lucene.apache.org Subject: Re: LSA Implementation The languages also include CJK :) among others. - Eswar On Nov 27, 2007 8:16 AM, Norskog, Lance <[EMAIL PROTECTED]> wrote: > The WordNet project at Princ

Re: LSA Implementation

2007-11-26 Thread Marvin Humphrey
On Nov 26, 2007, at 6:34 PM, Eswar K wrote: Although the algorithm doesn't understand anything about what the words *mean*, the patterns it notices can make it seem astonishingly intelligent. When you search an such an index, the search engine looks at similarity values it has calculated fo

Re: LSA Implementation

2007-11-26 Thread Eswar K
ng your own analyses. > > http://en.wikipedia.org/wiki/WordNet > http://wordnet.princeton.edu/ > > Lance > > -Original Message- > From: Eswar K [mailto:[EMAIL PROTECTED] > Sent: Monday, November 26, 2007 6:34 PM > To: solr-user@lucene.apache.org > Subject: Re:

RE: LSA Implementation

2007-11-26 Thread Norskog, Lance
EMAIL PROTECTED] Sent: Monday, November 26, 2007 6:34 PM To: solr-user@lucene.apache.org Subject: Re: LSA Implementation In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words.

Re: LSA Implementation

2007-11-26 Thread Eswar K
In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. this algo should consider documents that have many words in common to be semantically close, and ones with few words in

Re: LSA Implementation

2007-11-26 Thread Marvin Humphrey
On Nov 26, 2007, at 6:06 PM, Eswar K wrote: We essentially are looking at having an implementation for doing search which can return documents having conceptually similar words without necessarily having the original word searched for. Very challenging. Say someone searches for "LSA" and h

Re: LSA Implementation

2007-11-26 Thread Eswar K
We essentially are looking at having an implementation for doing search which can return documents having conceptually similar words without necessarily having the original word searched for. - Eswar On Nov 27, 2007 12:06 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Interesting. I am not a

Re: LSA Implementation

2007-11-26 Thread Chris Hostetter
: A more interesting solr related question is where a very heavy process like : SVD would operate. You'd want to run the 'training' half of it separate from a : indexing or querying. It'd almost be like an optimize. Is there any hook right : now to give Solr a "command" like and map it to the clas

Re: LSA Implementation

2007-11-26 Thread Renaud Delbru
LDA (Latent Dirichlet Allocation) is a similar technique that extends pLSI. You can find some implementation in C++ and Java on the Web. Grant Ingersoll wrote: Interesting. I am not a lawyer, but my understanding has always been that this is not something we could do. The question has come up

Re: LSA Implementation

2007-11-26 Thread Grant Ingersoll
Interesting. I am not a lawyer, but my understanding has always been that this is not something we could do. The question has come up from time to time on the Lucene mailing list: http://www.gossamer-threads.com/lists/engine?list=lucene&do=search_results&search_forum=forum_3&search_string=Late

Re: LSA Implementation

2007-11-26 Thread Brian Whitman
On Nov 26, 2007 6:58 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is patented, so it is not likely to happen unless the authors donate the patent to the ASF. -Grant There are many ways to catch a bird... LSA reduces to SVD on the

Re: LSA Implementation

2007-11-26 Thread Eswar K
I was just searching for info on LSA and came across Semantic Indexing project under GNU license...which of couse is still under development in C++ though. - Eswar On Nov 26, 2007 9:56 PM, Jack <[EMAIL PROTECTED]> wrote: > Interesting. Patents are valid for 20 years so it expires next year? :) >

Re: LSA Implementation

2007-11-26 Thread Jack
Interesting. Patents are valid for 20 years so it expires next year? :) PLSA does not seem to have been patented, at least not mentioned in http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis On Nov 26, 2007 6:58 AM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > LSA (http://en.wikip

Re: LSA Implementation

2007-11-26 Thread Grant Ingersoll
LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is patented, so it is not likely to happen unless the authors donate the patent to the ASF. -Grant On Nov 26, 2007, at 8:23 AM, Eswar K wrote: All, Is there any plan to implement Latent Semantic Analysis as part of Solr anyti

LSA Implementation

2007-11-26 Thread Eswar K
All, Is there any plan to implement Latent Semantic Analysis as part of Solr anytime in the near future? Regards, Eswar