Thanks Charlie... It's just confusing for me, In the DIH configuration file, the inner entity that takes "TikaEntityProcessor" as its processor, I can easily specify a tikaConfig attribute to an xml file, located inside the config folder in the core, and where in this file I should be able to override the PDFParser default properties... As in parseContext.Config... The thing is that I placed my tika-config.xml file in the config folder, set "tikaConfig" attribute = "tika-config.xml"... But tika still not parsing images inside PDF file!!! Let's say this is just experimenting Solr DIH crawling... Why it's not working.?
This is my tika-config.xml file: <?xml version="1.0" encoding="UTF-8"?> <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"/> <parser class="org.apache.tika.parser.pdf.PDFParser"> <params> true true </params> </parser> </parsers> </properties> I've read the code in both TikaEntityProcessor and TikaConfig... It should read the xml file from config folder, extract params and override original PDFParser attributes. But It DOESN'T! Any Idea?? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html