Hi Tommaso, Really cool what you've done. Looking forward to testing it, and I'm sure it's a welcome contribution to Solr. You can easily contribute your code by opening a JIRA issue and attaching a patch file.
BTW Have you considered making the output field names configurable on a per instance basis? It could be done as follows: <processor class="org.apache.solr.uima.processor.UIMAProcessorFactory"> <str name="concept_field">concept</str> <str name="language_field">concept</str> <str name="keyword_field">concept</str> ... </processor> -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 20. sep. 2010, at 12.35, Tommaso Teofili wrote: > Hi all, > I am working on integrating Apache UIMA as un UpdateRequestProcessor for > Apache Solr and I am now at the first working snapshot. > I put the code on GoogleCode [1] and you can take a look at the tutorial > [2]. > > I would be glad to donate it to the Apache Solr project, as I think it could > be a useful module to trigger automatic content extraction while indexing > documents. > > At the moment the UIMAUpdateRequestProcessor base implementation can > automatically extract document's sentences, language, keywords, concepts and > named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and > AlchemyAPIAnnotator components (but it can be easily expanded). > > Any feedback is welcome. > Have a nice day. > Tommaso > > [1] : http://code.google.com/p/solr-uima/ > [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial