On Mon, Jan 26, 2009 at 10:59 PM, Neal Richter <nrich...@gmail.com> wrote:
> Hey all, > > I'm in the processing of implementing a system to do 'text > classification' with Solr. The basic idea is to take an > ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index > it and then classify documents into the taxonomy by pushing parsed > document into the Solr search API. Why? Lucene/Solr's ability to do > weighted term boosting at both search and index time has lots of > obvious uses here. > > Has anyone worked on this or a similar project yet? I've seen some > talk on the list about this area but it's pretty thin... December > thread "Taxonomy Support on Solr". I'm assuming Grant Ingersoll is > looking at similar things with his 'taming text' project. > > I store the 'documents' in another repository and they are far too > dynamic (write intensive) for direct indexing in Solr... so the > previously suggested procedure of 1) store document 2) execute > more-like-this and 3) delete document would be too slow. > > If people are interested I could start a JIRA issue on this (I do not > see anything there at the moment). > > Thanks - Neal Richter > http://aicoder.blogspot.com > Grant did some work at https://issues.apache.org/jira/browse/SOLR-769 Take a look and see if that helps. -- Regards, Shalin Shekhar Mangar.