Text classification with Solr

Neal Richter Mon, 26 Jan 2009 09:30:08 -0800

Hey all,

  I'm in the processing of implementing a system to do 'text
classification' with Solr.  The basic idea is to take an
ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index
it and then classify documents into the taxonomy by pushing parsed
document into the Solr search API.  Why?  Lucene/Solr's ability to do
weighted term boosting at both search and index time has lots of
obvious uses here.


 Has anyone worked on this or a similar project yet?  I've seen some
talk on the list about this area but it's pretty thin... December
thread "Taxonomy Support on Solr".  I'm assuming Grant Ingersoll is
looking at similar things with his 'taming text' project.

I store the 'documents' in another repository and they are far too
dynamic (write intensive) for direct indexing in Solr... so the
previously suggested procedure of 1) store document 2) execute
more-like-this and 3) delete document would be too slow.

If people are interested I could start a JIRA issue on this (I do not
see anything there at the moment).

Thanks - Neal Richter
http://aicoder.blogspot.com

Text classification with Solr

Reply via email to