Well, I’d have to do the same thing, go spelunking in Tika.. When I used it from SolrJ, I just linked to the Tika distro and it “just worked”, but I admit that was a while ago.
Your best bet would probably be the Tika user’s list. Best, Erick > On Mar 20, 2019, at 7:24 AM, Tannen, Lev (USAEO) [Contractor] > <lev.tan...@usdoj.gov.INVALID> wrote: > > Hi Erick, > > I still need your advice. The program I have to fix uses class > AutoDetectParser along with Solrj for parsing PDF files before sending the > result to the solr server. To do this it linked two tika jar files taken from > the solr distribution. Namely: tika-core and tika-parsers. Maybe it used some > other tika related files but I have problems to identify them among a lot of > other jar files linked. The program worked more or less OK, but it gave too > many warnings of kind "Font not found". I had a rumor that this was fixed in > the next tika distribution. > I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that > problem. However when I switched I encountered an another problem: > java.lang.NoClassDefFoundError: > org/apache/cxf/jaxrs/ext/multipart/ContentDisposition. > Apparently I have not included some necessary jars. Those jars supposed to > come from a different project called cxf, but because they are related to > tika I expected them be distributed with solr. However I did not find them in > the solr 7.7.1 (in the solr 7.5 as well). > I have found the necessary file in the cxf distribution and included it. It > asked for an another file which I included as well. After this I got a > message that some temporary resources were not closed. Apparently something > is not matched. And now I am stuck. I do not want to start from scratch and > search the whole tika and cxf projects for the files I need and I do not want > to include all files from those projects especially because I was not able to > find a binary distribution. So could you please advise what is the best way > to proceed. > > Thank you, > Lev Tannen > > -----Original Message----- > From: Erick Erickson <erickerick...@gmail.com> > Sent: Tuesday, March 19, 2019 2:48 PM > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Upgrading tika > > Yes, Solr is distributed with Tika. Look in: > ./solr/contrib/extraction/lib > > Tika is upgraded when new versions come out, so the underlying files are > whatever are current at the time. > > The integration is a fairly loose coupling, if you're using some external > program (say a SolrJ program) to parse the files, there's no requirement to > use the jars distributed with Solr, use whatever suits your fancy. An > external program just constructs a SolrDocument to send to Solr. What you use > to create that document is irrelevant. See: > https://lucidworks.com/2012/02/14/indexing-with-solrj/ for some background. > > If you're using the ExtractingRequestHandler, where you just send the > semi-structured docs to Solr (PDFs, Word or whatever), then needing to know > anything about individual Tika-related jar files is kind of strange. > > If your predecessors wrote some custom code that runs as part of Solr, I > don't know what to say... > > Best, > Erick > > On Tue, Mar 19, 2019 at 10:47 AM Tannen, Lev (USAEO) [Contractor] > <lev.tan...@usdoj.gov.invalid> wrote: >> >> Thank you Shawn. >> I assumed that tika has been integrated with solr. I the project written >> before me they used two tika files taken from solr distribution. I am trying >> to do the same with solr 7.7.1. However this version contains a different >> set of tika related files. So I am confused. Does solr does not have >> integrated tika anymore, or I just cannot recognize them? >> >> -----Original Message----- >> From: Shawn Heisey <apa...@elyograg.org> >> Sent: Tuesday, March 19, 2019 11:11 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Upgrading tika >> >> On 3/19/2019 9:03 AM, levtannen wrote: >>> Could anybody suggest me what files do I need to use the latest >>> version of Tika and where to find them? >> >> This mailing list is solr-user. Tika is an entirely separate project from >> Solr within the Apache Foundation. To get help with Tika, you'll need to >> ask that project. >> >> https://tika.apache.org/mail-lists.html >> >> Thanks, >> Shawn