Looks like a great scraping engine technology :-) Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die.
Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/20/10, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > From: Tommaso Teofili <tommaso.teof...@gmail.com> > Subject: Solr UIMA integration > To: solr-user@lucene.apache.org > Date: Monday, September 20, 2010, 3:35 AM > Hi all, > I am working on integrating Apache UIMA as un > UpdateRequestProcessor for > Apache Solr and I am now at the first working snapshot. > I put the code on GoogleCode [1] and you can take a look at > the tutorial > [2]. > > I would be glad to donate it to the Apache Solr project, as > I think it could > be a useful module to trigger automatic content extraction > while indexing > documents. > > At the moment the UIMAUpdateRequestProcessor base > implementation can > automatically extract document's sentences, language, > keywords, concepts and > named entities using Apache UIMA's HMMTagger, > OpenCalaisAnnotator and > AlchemyAPIAnnotator components (but it can be easily > expanded). > > Any feedback is welcome. > Have a nice day. > Tommaso > > [1] : http://code.google.com/p/solr-uima/ > [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial >