One option (I think--answer is untested!) is to remove the parsers you don't want from the tika config file. Make sure to specify the tika.config file parameter in your ExtractingRequestHandler in Solr (https://wiki.apache.org/solr/ExtractingRequestHandler). In response to this question, I just added an example to tika trunk (TIKA-1418) for how to dump the current tika config (org.apache.tika.example.DumpTikaConfigExample). Users can use the dumped config file to make modifications. The last time I looked for a tika config file, examples were difficult to find.
An example from the dumper is here: https://issues.apache.org/jira/secure/attachment/12670000/tika-config-SNAPSHOT-1.7_20140919.xml Let me know if the above recommendation works! Happy extraction! Best, Tim -----Original Message----- From: keeblerh [mailto:keebl...@yahoo.com] Sent: Thursday, September 18, 2014 10:15 AM To: solr-user@lucene.apache.org Subject: Re: How to exclude a mimetype in tika? eShard wrote > Good afternoon, > I'm using solr 4.0 Final > I need movies "hidden" in zip files that need to be excluded from the > index. > I can't filter movies on the crawler because then I would have to exclude > all zip files. > I was told I can have tika skip the movies. > the details are escaping me at this point. > How do I exclude a file in the tika configuration? > I assume it's something I add in the update/extract handler but I'm not > sure. > > Thanks, I am having the same issue. I need to exlcude some mime types from the zip files and using SOLR 4.8. Did you ever get an answer to this? THanks. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168p4159676.html Sent from the Solr - User mailing list archive at Nabble.com.