As Ahmet says, The Update Chain is probably the place to integrate such 
document oriented processing.
See http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ for how it 
integrates with Solr.

--
Jan Høydahl, search solution architect
Cominvent AS - www.facebook.com/Cominvent
Solr Training - www.solrtraining.com

On 24. mai 2012, at 14:04, Wunderlich, Tobias wrote:

> Hey Guys,
> 
> I am recently working on a project to integrate a 
> Named-Entity-Recognition-Framework (NER) in an existing searchplatform based 
> on Solr. The Platform uses ManifoldCF to automatically gather the content 
> from various repositories. The NER-Framework creates Annotations/Metadata 
> from given content which I then want to integrate into the search-platform as 
> metadata to use for faceting. Since MCF handles all content gathering, I need 
> a way to integrate the NER-Framework directly into Solr. The Goal is to get 
> all Annotations per document into a multivalued field.  My first thought was 
> to create a custom filter, which just takes the content and gives back only 
> the Annotations.  But as I understand it, a filter only processes 
> predetermined Tokens, which is useless for my purpose, since the 
> NER-Framework needs to process the whole content of a document. What about a 
> custom Tokenizer? Would it be possible to process the whole text and give 
> back only the Annotations as Tokens? A third thought was to manipulate the 
> ExtractRequestHandler (Solr Cell) used by MCF to somehow add the Annotations 
> as Metadata when the content and metadata is distributed to the different 
> fields.
> 
> I hope my problem description is sufficient. Does anybody have any thoughts 
> on that subject?
> 
> Best regards,
> Tobias

Reply via email to