Hi Tommaso,

Really cool what you've done. Looking forward to testing it, and I'm sure it's 
a welcome contribution to Solr.
You can easily contribute your code by opening a JIRA issue and attaching a 
patch file.

BTW
Have you considered making the output field names configurable on a per 
instance basis? It could be done as follows:
<processor class="org.apache.solr.uima.processor.UIMAProcessorFactory">
  <str name="concept_field">concept</str>
  <str name="language_field">concept</str>
  <str name="keyword_field">concept</str>
  ...
</processor>

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 20. sep. 2010, at 12.35, Tommaso Teofili wrote:

> Hi all,
> I am working on integrating Apache UIMA as un UpdateRequestProcessor for
> Apache Solr and I am now at the first working snapshot.
> I put the code on GoogleCode [1] and you can take a look at the tutorial
> [2].
> 
> I would be glad to donate it to the Apache Solr project, as I think it could
> be a useful module to trigger automatic content extraction while indexing
> documents.
> 
> At the moment the UIMAUpdateRequestProcessor base implementation can
> automatically extract document's sentences, language, keywords, concepts and
> named entities using Apache UIMA's HMMTagger, OpenCalaisAnnotator and
> AlchemyAPIAnnotator components (but it can be easily expanded).
> 
> Any feedback is welcome.
> Have a nice day.
> Tommaso
> 
> [1] : http://code.google.com/p/solr-uima/
> [2] : http://code.google.com/p/solr-uima/wiki/5MinutesTutorial

Reply via email to