Hello,

I would like to use BoilerPipe (a very good program which cleans the html content from surplus "clutter"). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right?

How I can Activate BoilerPipe in Solr? Do I need to change solrconfig.xml ( with org.apache.solr.handler.extraction.ExtractingRequestHandler)?

Or do I need to modify some code inside Solr?

I so something like TikaCLI -F in the tika forum (http://www.lucidimagination.com/search/document/242ce3a17f30f466/boilerpipe_integration) is it the right way?

Thanks in advance,

Arno.

Reply via email to