As Ahmet says, The Update Chain is probably the place to integrate such document oriented processing. See http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ for how it integrates with Solr.
-- Jan Høydahl, search solution architect Cominvent AS - www.facebook.com/Cominvent Solr Training - www.solrtraining.com On 24. mai 2012, at 14:04, Wunderlich, Tobias wrote: > Hey Guys, > > I am recently working on a project to integrate a > Named-Entity-Recognition-Framework (NER) in an existing searchplatform based > on Solr. The Platform uses ManifoldCF to automatically gather the content > from various repositories. The NER-Framework creates Annotations/Metadata > from given content which I then want to integrate into the search-platform as > metadata to use for faceting. Since MCF handles all content gathering, I need > a way to integrate the NER-Framework directly into Solr. The Goal is to get > all Annotations per document into a multivalued field. My first thought was > to create a custom filter, which just takes the content and gives back only > the Annotations. But as I understand it, a filter only processes > predetermined Tokens, which is useless for my purpose, since the > NER-Framework needs to process the whole content of a document. What about a > custom Tokenizer? Would it be possible to process the whole text and give > back only the Annotations as Tokens? A third thought was to manipulate the > ExtractRequestHandler (Solr Cell) used by MCF to somehow add the Annotations > as Metadata when the content and metadata is distributed to the different > fields. > > I hope my problem description is sufficient. Does anybody have any thoughts > on that subject? > > Best regards, > Tobias