> I would like to build a component that during indexing > analyses all tokens > in a stream and adds metadata to a new field based on my > analysis. I have > different tasks that I would like to perform, like basic > classification and > certain more advanced phrase detections. How would I do > this? A normal > TokenFilter can only look at one token a time, but I need > to access a larger > context. > > I've noticed that there is a TeeSinkTokenFilter that might > be useful in > someway since "It is also useful for doing things like > entity extraction or > proper noun analysis", but I don't understand how. > > Can someone help me with some super-simple stub or similar? > What I'm looking > for is something like: > > class MySmartFilter { > > public AnalyzeTokens(tokenList) > { > metadataTokens = > DoTheAnalysis(tokenList); > AddToField("metadata", > metadataTokens); > } > } >
http://wiki.apache.org/solr/UpdateRequestProcessor may help you. http://wiki.apache.org/solr/SolrUIMA can be an example.