Re: boilerpipe solr tika howto please

2011-01-17 Thread arnaud gaudinat
Thanks Ken, this what I wanted to know, I'm not very familiar with this kind of modification. However, I will try to do it and ask you some information in case of need. regards, Arno Le 14.01.2011 18:04, Ken Krugler a écrit : Hi Arno, On Jan 14, 2011, at 3:57am, arnaud gaudinat

Re: boilerpipe solr tika howto please

2011-01-14 Thread arnaud gaudinat
ncluded in Solr? On Fri, Jan 14, 2011 at 6:57 AM, arnaud gaudinat wrote: Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus "clutter"). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right? How

Re: Tika Update, no Data

2011-01-14 Thread arnaud gaudinat
Le 14.01.2011 16:28, Jörg Agatz a écrit : If I well understood your problem try: so with stored="true" to get back the content Arnaud

Is deduplication possible during Tika extract?

2011-01-14 Thread arnaud gaudinat
Hello, here is an excerpt of my solrconfig.xml: class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy"> dedupe text true ignored_ true links ignored_ and class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> true signature false text

boilerpipe solr tika howto please

2011-01-14 Thread arnaud gaudinat
Hello, I would like to use BoilerPipe (a very good program which cleans the html content from surplus "clutter"). I saw that BoilerPipe is inside Tika 0.8 and so should be accessible from solr, am I right? How I can Activate BoilerPipe in Solr? Do I need to change solrconfig.xml ( with org.