Re: Text classification with Solr

2009-01-28 Thread Hannes Carl Meyer
>From my past projects, our Lucene classification corpus looked like this: 0|document text...|categoryA 1|document text...|categoryB 2|document text...|categoryA 3|document text...|categoryA ... 800|document text...|categoryC With the faceting capabilities of Solr it is now possible to design mor

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll wrote: > One of the things I am interested in is the marriage of Solr and Mahout > (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools. [snip] I love it, good to know you are thinking big here. Here's another big thought:

Re: Text classification with Solr

2009-01-27 Thread Grant Ingersoll
nce, a reasonable thing to do with the output from the classification is, of course, to facet on them. Neal, what did you have in mind for a JIRA issue? I'd love to see a patch. On Jan 26, 2009, at 12:29 PM, Neal Richter wrote: Hey all, I'm in the processing of implement

Re: Text classification with Solr

2009-01-27 Thread Karl Wettin
27 jan 2009 kl. 17.23 skrev Neal Richter: Is it really neccessary to use Solr for it? Things going much faster with Lucene low-level api and much faster if you're loading the classification corpus into the RAM. Good points. At the moment I'd rather have a daemon with a service API.. as

Re: Text classification with Solr

2009-01-27 Thread Neal Richter
On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer wrote: > Yeah, know it, the challenge on this method is the calculation of the score > and parametrization of thresholds. Not as worried about score itself as the score thresholds for prediction in/out. > Is it really neccessary to use Solr for

Re: Text classification with Solr

2009-01-27 Thread Hannes Carl Meyer
>>Instead of indexing documents about 'sports' and searching for hits >>based upon 'basketball', 'football' etc.. I simply want to index the >>taxonomy and classify documents into it. This is a an ancient >>AI/Data-Mining discipline.. but the standard methods of 'indexing' the >>taxonomy are/were

Re: Text classification with Solr

2009-01-26 Thread Neal Richter
Thanks for the link Shalin... played with that a while back.. It's possibly got some indirect possibilities. On Mon, Jan 26, 2009 at 10:46 AM, Hannes Carl Meyer wrote: > I didn't understand, is the corpus of documents you want to use to classify > fix? Assume the 'documents' are not stored in th

Re: Text classification with Solr

2009-01-26 Thread Hannes Carl Meyer
I'm in the processing of implementing a system to do 'text > classification' with Solr. The basic idea is to take an > ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index > it and then classify documents into the taxonomy by pushing parsed &

Re: Text classification with Solr

2009-01-26 Thread Shalin Shekhar Mangar
On Mon, Jan 26, 2009 at 10:59 PM, Neal Richter wrote: > Hey all, > > I'm in the processing of implementing a system to do 'text > classification' with Solr. The basic idea is to take an > ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,

Text classification with Solr

2009-01-26 Thread Neal Richter
Hey all, I'm in the processing of implementing a system to do 'text classification' with Solr. The basic idea is to take an ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index it and then classify documents into the taxonomy by pushing parse