Hello,
I would like to use BoilerPipe (a very good program which cleans the
html content from surplus "clutter").
I saw that BoilerPipe is inside Tika 0.8 and so should be accessible
from solr, am I right?
How I can Activate BoilerPipe in Solr? Do I need to change
solrconfig.xml ( with
org.apache.solr.handler.extraction.ExtractingRequestHandler)?
Or do I need to modify some code inside Solr?
I so something like TikaCLI -F in the tika forum
(http://www.lucidimagination.com/search/document/242ce3a17f30f466/boilerpipe_integration)
is it the right way?
Thanks in advance,
Arno.