On Mon, Jan 26, 2009 at 10:59 PM, Neal Richter <[email protected]> wrote:
> Hey all,
>
> I'm in the processing of implementing a system to do 'text
> classification' with Solr. The basic idea is to take an
> ontology/taxonomy like dmoz of {label: "X", tags: "a,b,c,d,e"}, index
> it and then classify documents into the taxonomy by pushing parsed
> document into the Solr search API. Why? Lucene/Solr's ability to do
> weighted term boosting at both search and index time has lots of
> obvious uses here.
>
> Has anyone worked on this or a similar project yet? I've seen some
> talk on the list about this area but it's pretty thin... December
> thread "Taxonomy Support on Solr". I'm assuming Grant Ingersoll is
> looking at similar things with his 'taming text' project.
>
> I store the 'documents' in another repository and they are far too
> dynamic (write intensive) for direct indexing in Solr... so the
> previously suggested procedure of 1) store document 2) execute
> more-like-this and 3) delete document would be too slow.
>
> If people are interested I could start a JIRA issue on this (I do not
> see anything there at the moment).
>
> Thanks - Neal Richter
> http://aicoder.blogspot.com
>
Grant did some work at https://issues.apache.org/jira/browse/SOLR-769
Take a look and see if that helps.
--
Regards,
Shalin Shekhar Mangar.