Hi Joel,
Do you need a supervised or unsupervised classification?
supervised: u have examples of your classes
unsupervised: u don't know your classes in advance

In the contribs, there is a solr clustering component which will handle unsupervised classification:
http://wiki.apache.org/solr/ClusteringComponent
*i think the component meant to support small quantities of documents

for supervised solutions(or larger scale unsupervised solutions), mahout could be a good start as it can use the solr index.

Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com


On 3/25/10 6:40 PM, Joel Nylund wrote:
Hi,

Does solr have something built in, or recommended add-on that does document categorization? ( I found a thread about a year ago, but not exact same topic)

For example, here is a commercial categorization product that will take a website and categorize it

http://grapeshot.co.uk/online-demo-3.php?url=http://www.solutionstreet.com

I am looking for something similar that works with Solr/Lucene and is open source based.

Seems like Weka (http://weka.wikispaces.com/Frequently+Asked+Questions) might be close, but not sure. Also not sure how to come up with a category list....

thanks
Joel

Reply via email to