Hi,

It should not be so hard but it looks like the current SolrContentHandler 
builds up the document via SAX-events. You could pass a 
BoilerpipeContentHandler((ContentHandler)parsingHandler, BoilerpipeExtractor) 
to the parser in ExtractingDocumentLoader.java. It should work.

Markus

 
 
-----Original message-----
> From:Lance Norskog <goks...@gmail.com>
> Sent: Thu 06-Sep-2012 05:51
> To: solr-user@lucene.apache.org
> Subject: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?
> 
> Tika integrated Boilerpipe a few releases back. Is it possible to invoke it 
> when using the ExtractingUpdateHandler (simple Tika) or the 
> DataImportHandler? 
> 
> http://code.google.com/p/boilerpipe/ 
> 
> 
> 

Reply via email to