>From my past projects, our Lucene classification corpus looked like this:
0|document text...|categoryA
1|document text...|categoryB
2|document text...|categoryA
3|document text...|categoryA
...
800|document text...|categoryC
With the faceting capabilities of Solr it is now possible to design mor
On Tue, Jan 27, 2009 at 2:21 PM, Grant Ingersoll wrote:
> One of the things I am interested in is the marriage of Solr and Mahout
> (which has some Genetic Algorithms support) and other ML (Weka, etc.) tools.
[snip]
I love it, good to know you are thinking big here. Here's another big thought:
I guess I've been called to the chalkboard...
I haven't looked specifically at putting the taxonomy in Lucene/Solr,
but it is an interesting idea. In reading the paper you mentioned,
there are some interesting ideas there and Solr could obviously just
as easily be used as Lucene, I think.
27 jan 2009 kl. 17.23 skrev Neal Richter:
Is it really neccessary to use Solr for it? Things going much
faster with
Lucene low-level api and much faster if you're loading the
classification
corpus into the RAM.
Good points. At the moment I'd rather have a daemon with a service
API.. as
On Tue, Jan 27, 2009 at 1:36 AM, Hannes Carl Meyer wrote:
> Yeah, know it, the challenge on this method is the calculation of the score
> and parametrization of thresholds.
Not as worried about score itself as the score thresholds for prediction in/out.
> Is it really neccessary to use Solr for
>>Instead of indexing documents about 'sports' and searching for hits
>>based upon 'basketball', 'football' etc.. I simply want to index the
>>taxonomy and classify documents into it. This is a an ancient
>>AI/Data-Mining discipline.. but the standard methods of 'indexing' the
>>taxonomy are/were
Thanks for the link Shalin... played with that a while back.. It's
possibly got some indirect possibilities.
On Mon, Jan 26, 2009 at 10:46 AM, Hannes Carl Meyer wrote:
> I didn't understand, is the corpus of documents you want to use to classify
> fix?
Assume the 'documents' are not stored in th
Hi Neal,
this sounds pretty similar to me. Did a lot of those projects some years ago
(with Lucene low-level API)!
I didn't understand, is the corpus of documents you want to use to classify
fix?
>>previously suggested procedure of 1) store document 2) execute
>>more-like-this and 3) delete docu
On Mon, Jan 26, 2009 at 10:59 PM, Neal Richter wrote:
> Hey all,
>
> I'm in the processing of implementing a system to do 'text
> classification' with Solr. The basic idea is to take an
> ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index
> it and then classify documents into