Is there a way to specify the document types that Tika parses? In my DIH I index the content of a SQL database which has a field that points to the SQL record's binary file (which could be Word, PDF, JPG, MOV, etc.). Tika then uses the document URL to index that document's content. However there are a lot of document types that Tika cannot parse. I'd like to limit Tika to just parsing Word and PDF documents so that I don't have to wait for Tika to determine the document type and whether or not it can parse it. I suspect that the number of exceptions being thrown over documents that Tika cannot read is increasing my indexing time significantly. Any guidance is appreciated.
-Teague