> I am recently working on a project to integrate a
> Named-Entity-Recognition-Framework (NER) in an existing
> searchplatform based on Solr. The Platform uses ManifoldCF
> to automatically gather the content from various
> repositories. The NER-Framework creates Annotations/Metadata
> from given content which I then want to integrate into the
> search-platform as metadata to use for faceting. Since MCF
> handles all content gathering, I need a way to integrate the
> NER-Framework directly into Solr. The Goal is to get all
> Annotations per document into a multivalued field.  My
> first thought was to create a custom filter, which just
> takes the content and gives back only the Annotations. 
> But as I understand it, a filter only processes
> predetermined Tokens, which is useless for my purpose, since
> the NER-Framework needs to process the whole content of a
> document. What about a custom Tokenizer? Would it be
> possible to process the whole text and give back only the
> Annotations as Tokens? A third thought was to manipulate the
> ExtractRequestHandler (Solr Cell) used by MCF to somehow add
> the Annotations as Metadata when the content and metadata is
> distributed to the different fields.
> 
> I hope my problem description is sufficient. Does anybody
> have any thoughts on that subject?

UpdateRequestProcessor is more appropriate in this case. Like 
http://wiki.apache.org/solr/SolrUIMA

Reply via email to