Hi, In my opinion, instead of hardcoding such functionality into multiple request handlers, we should go the opposite direction -> modularization, factoring out Tika extraction into its own UpdateProcessor (https://issues.apache.org/jira/browse/SOLR-1763). Then the ExtractingRequestHandler would eventually go away, and you could use it and language detection with any Request Handler you choose, including XML and DIH...
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. juni 2012, at 17:10, Martin Ruckli wrote: > Hi all, > > I just wanted to check if there is a demand for this feature. I had to > implement this functionality for one of our customers and would like to > contribute it. > > Here is the use case: > We are using the ExtractingRequestHandler with the extractOnly=true flag set. > With a request to this handler we get the content of a posted document like > we want to. We would also like to detect the language and return it as a > metadata field in the response from solr. > As there is already support for LanguageDetection based on tika integrated > into solr, the only thing what I did was add a new param to enable or disable > this feature and then do the language detection nearly the same way as it is > done in the TikaLanguageIdentifierUpdateProcessor > I think this would be a nice addition, especially in the extractOnly mode. > > What are your thoughts on this? > > Cheers > Martin >