Hi,

In my opinion, instead of hardcoding such functionality into multiple request 
handlers, we should go the opposite direction -> modularization, factoring out 
Tika extraction into its own UpdateProcessor 
(https://issues.apache.org/jira/browse/SOLR-1763). Then the 
ExtractingRequestHandler would eventually go away, and you could use it and 
language detection with any Request Handler you choose, including XML and DIH...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 19. juni 2012, at 17:10, Martin Ruckli wrote:

> Hi all,
> 
> I just wanted to check if there is a demand for this feature. I had to 
> implement this functionality for one of our customers and would like to 
> contribute it.
> 
> Here is the use case:
> We are using the ExtractingRequestHandler with the extractOnly=true flag set.
> With a request to this handler we get the content of a posted document like 
> we want to. We would also like to detect the language and return it as a 
> metadata field in the response from solr.
> As there is already support for LanguageDetection based on tika integrated 
> into solr, the only thing what I did was add a new param to enable or disable 
> this feature and then do the language detection nearly the same way as it is 
> done in the TikaLanguageIdentifierUpdateProcessor
> I think this would be a nice addition, especially in the extractOnly mode.
> 
> What are your thoughts on this?
> 
> Cheers
> Martin
> 

Reply via email to