Re: Text classification with Solr

2009-01-28 Thread Hannes Carl Meyer
>From my past projects, our Lucene classification corpus looked like this: 0|document text...|categoryA 1|document text...|categoryB 2|document text...|categoryA 3|document text...|categoryA ... 800|document text...|categoryC With the faceting capabilities of Solr it is now possible to design mor

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll wrote: > One of the things I am interested in is the marriage of Solr and Mahout > (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools. [snip] I love it, good to know you are thinking big here. Here's another big thought:

Re: Text classification with Solr

2009-01-27 Thread Grant Ingersoll
I guess I've been called to the chalkboard... I haven't looked specifically at putting the taxonomy in Lucene/Solr, but it is an interesting idea. In reading the paper you mentioned, there are some interesting ideas there and Solr could obviously just as easily be used as Lucene, I think.

Re: Text classification with Solr

2009-01-27 Thread Karl Wettin
27 jan 2009 kl. 17.23 skrev Neal Richter: Is it really neccessary to use Solr for it? Things going much faster with Lucene low-level api and much faster if you're loading the classification corpus into the RAM. Good points. At the moment I'd rather have a daemon with a service API.. as

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer wrote: > Yeah, know it, the challenge on this method is the calculation of the score > and parametrization of thresholds. Not as worried about score itself as the score thresholds for prediction in/out. > Is it really neccessary to use Solr for

Re: Text classification with Solr

2009-01-27 Thread Hannes Carl Meyer
>>Instead of indexing documents about 'sports' and searching for hits >>based upon 'basketball', 'football' etc.. I simply want to index the >>taxonomy and classify documents into it. This is a an ancient >>AI/Data-Mining discipline.. but the standard methods of 'indexing' the >>taxonomy are/were

Re: Text classification with Solr

2009-01-26 Thread Neal Richter
Thanks for the link Shalin... played with that a while back.. It's possibly got some indirect possibilities. On Mon, Jan 26, 2009 at 10:46 AM, Hannes Carl Meyer wrote: > I didn't understand, is the corpus of documents you want to use to classify > fix? Assume the 'documents' are not stored in th

Re: Text classification with Solr

2009-01-26 Thread Hannes Carl Meyer
Hi Neal, this sounds pretty similar to me. Did a lot of those projects some years ago (with Lucene low-level API)! I didn't understand, is the corpus of documents you want to use to classify fix? >>previously suggested procedure of 1) store document 2) execute >>more-like-this and 3) delete docu

Re: Text classification with Solr

2009-01-26 Thread Shalin Shekhar Mangar
On Mon, Jan 26, 2009 at 10:59 PM, Neal Richter wrote: > Hey all, > > I'm in the processing of implementing a system to do 'text > classification' with Solr. The basic idea is to take an > ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index > it and then classify documents into