Thanks Charlie...
It's just confusing for me, In the DIH configuration file, the inner entity
that takes "TikaEntityProcessor" as its processor, I can easily specify a
tikaConfig attribute to an xml file, located inside the config folder in the
core, and where in this file I should be able to override the PDFParser
default properties... As in parseContext.Config...
The thing is that I placed my tika-config.xml file in the config folder,
set "tikaConfig" attribute = "tika-config.xml"... But tika still not parsing
images inside PDF file!!!
Let's say this is just experimenting Solr DIH crawling... Why it's not
working.?

This is my tika-config.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>        
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.pdf.PDFParser">
            <params>
                true
                true               
            </params>
        </parser>
    </parsers>
</properties>

I've read the code in both TikaEntityProcessor and TikaConfig... It should
read the xml file from config folder, extract params and override original
PDFParser attributes. But It DOESN'T!
Any Idea??



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to