Is it possible to extract content for file types that Tika doesn’t support
without changing and rebuilding Tika?  Do I need to specify a tika.config
file in the solrconfig.xml file, and if so, what is the format of that file?



One example that I’m trying to solve is for a document management system
where the files are compressed – so I’d like to have a content extractor
that first decompresses the file and then delegates to the standard Solr
content extraction mechanism.   Perhaps writing a custom extractor is more
trouble than it is worth for this use case and I should just decompress the
data before sending it to Solr?

Reply via email to