Re: Custom content extractor for Solr Cell

2011-12-27 Thread Jan Høydahl
Hi John, See discussion about the issue of indexing contents of ZIP files: https://issues.apache.org/jira/browse/SOLR-2416 Depending on your use case, you may be able to write a Tika parser which handles your specific case, such as uncompressing a GZIP file and using AutoDetect on its contents

Custom content extractor for Solr Cell

2011-12-05 Thread John Bartak
Is it possible to extract content for file types that Tika doesn’t support without changing and rebuilding Tika? Do I need to specify a tika.config file in the solrconfig.xml file, and if so, what is the format of that file? One example that I’m trying to solve is for a document management syst