Hi Joel,
Do you need a supervised or unsupervised classification?
supervised: u have examples of your classes
unsupervised: u don't know your classes in advance
In the contribs, there is a solr clustering component which will handle
unsupervised classification:
http://wiki.apache.org/solr/ClusteringComponent
*i think the component meant to support small quantities of documents
for supervised solutions(or larger scale unsupervised solutions), mahout
could be a good start as it can use the solr index.
Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com
On 3/25/10 6:40 PM, Joel Nylund wrote:
Hi,
Does solr have something built in, or recommended add-on that does
document categorization? ( I found a thread about a year ago, but not
exact same topic)
For example, here is a commercial categorization product that will
take a website and categorize it
http://grapeshot.co.uk/online-demo-3.php?url=http://www.solutionstreet.com
I am looking for something similar that works with Solr/Lucene and is
open source based.
Seems like Weka
(http://weka.wikispaces.com/Frequently+Asked+Questions) might be
close, but not sure. Also not sure how to come up with a category
list....
thanks
Joel