> I would like to build a component that during indexing
> analyses all tokens
> in a stream and adds metadata to a new field based on my
> analysis. I have
> different tasks that I would like to perform, like basic
> classification and
> certain more advanced phrase detections. How would I do
> this? A normal
> TokenFilter can only look at one token a time, but I need
> to access a larger
> context.
> 
> I've noticed that there is a TeeSinkTokenFilter that might
> be useful in
> someway since "It is also useful for doing things like
> entity extraction or
> proper noun analysis", but I don't understand how.
> 
> Can someone help me with some super-simple stub or similar?
> What I'm looking
> for is something like:
> 
> class MySmartFilter  {
> 
>   public AnalyzeTokens(tokenList)
>  {
>        metadataTokens =
> DoTheAnalysis(tokenList);
>        AddToField("metadata",
> metadataTokens);
>  }
> }
> 

http://wiki.apache.org/solr/UpdateRequestProcessor may help you.
http://wiki.apache.org/solr/SolrUIMA can be an example.

Reply via email to