Re: Document Processing

Michael Kelleher Mon, 05 Dec 2011 12:27:10 -0800

Hello Erik,

I will take a look at both:


org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor

and

org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessor

and figure out what I need to extend to handle processing in the way Iam looking for. I am assuming that "component" configuration is handledin a standard way such that I can configure my new UpdateProcessor inthe same way I would configure any other UpdateProcessor "component"?


Thanks for the suggestion.

1 more question: given that I am probably going to convert the HTML toXML so I can use XPath expressions to "extract" my content, do you thinkthat this kind of processing will overload Solr? This Solr instancewill be used solely for indexing, and will only ever have a singleManifoldCF crawling job feeding it documents at one time.


--mike

Re: Document Processing

Reply via email to