One option (I think--answer is untested!) is to remove the parsers you don't 
want from the tika config file.  Make sure to specify the tika.config file 
parameter in your ExtractingRequestHandler in Solr 
(https://wiki.apache.org/solr/ExtractingRequestHandler).
 
In response to this question, I just added an example to tika trunk (TIKA-1418) 
for how to dump the current tika config 
(org.apache.tika.example.DumpTikaConfigExample).  Users can use the dumped 
config file to make modifications.  The last time I looked for a tika config 
file, examples were difficult to find.

An example from the dumper is here:

https://issues.apache.org/jira/secure/attachment/12670000/tika-config-SNAPSHOT-1.7_20140919.xml

Let me know if the above recommendation works!

Happy extraction!

Best,

          Tim

-----Original Message-----
From: keeblerh [mailto:keebl...@yahoo.com] 
Sent: Thursday, September 18, 2014 10:15 AM
To: solr-user@lucene.apache.org
Subject: Re: How to exclude a mimetype in tika?

eShard wrote
> Good afternoon,
> I'm using solr 4.0 Final
> I need movies "hidden" in zip files that need to be excluded from the
> index.
> I can't filter movies on the crawler because then I would have to exclude
> all zip files.
> I was told I can have tika skip the movies.
> the details are escaping me at this point.
> How do I exclude a file in the tika configuration?
> I assume it's something I add in the update/extract handler but I'm not
> sure.
> 
> Thanks,

I am having the same issue.  I need to exlcude some mime types from the zip
files and using SOLR 4.8.  Did you ever get an answer to this?  THanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-exclude-a-mimetype-in-tika-tp4127168p4159676.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to