>>Wow, that code looks familiar ;)... Erick and Paden, The following is not the source of your problem, but I thought I'd mention it while you reference Erick's fantastic blog post on solrj (http://lucidworks.com/blog/indexing-with-solrj/). I tried to comment on Erick's blog post, but something went wrong with the website failed, so I'll take this opportunity.
If you want Tika to parse embedded files (attachments within your .doc or any other embedded files), you need to send in the autodetectparser in the parsecontext: ParseContext context = new ParseContext(); context.set(Parser.class, autoParser); Shalin fixed this in DIH in SOLR-7189. If you don't include the parser in the ParseContext, Tika will only extract text from the container/original file that you send in, and it will ignore all attachments. For some applications, this is desired, but I think users would generally expect that Tika will extract everything. Happy extraction! Cheers, Tim -----Original Message----- From: Paden [mailto:rumsey...@gmail.com] Sent: Thursday, July 09, 2015 1:00 PM To: solr-user@lucene.apache.org Subject: Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text? Haha no need to reinvent wheels. Especially when you don't know java. Just a prototype anyway.