Hi Erick, I still need your advice. The program I have to fix uses class AutoDetectParser along with Solrj for parsing PDF files before sending the result to the solr server. To do this it linked two tika jar files taken from the solr distribution. Namely: tika-core and tika-parsers. Maybe it used some other tika related files but I have problems to identify them among a lot of other jar files linked. The program worked more or less OK, but it gave too many warnings of kind "Font not found". I had a rumor that this was fixed in the next tika distribution. I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that problem. However when I switched I encountered an another problem: java.lang.NoClassDefFoundError: org/apache/cxf/jaxrs/ext/multipart/ContentDisposition. Apparently I have not included some necessary jars. Those jars supposed to come from a different project called cxf, but because they are related to tika I expected them be distributed with solr. However I did not find them in the solr 7.7.1 (in the solr 7.5 as well). I have found the necessary file in the cxf distribution and included it. It asked for an another file which I included as well. After this I got a message that some temporary resources were not closed. Apparently something is not matched. And now I am stuck. I do not want to start from scratch and search the whole tika and cxf projects for the files I need and I do not want to include all files from those projects especially because I was not able to find a binary distribution. So could you please advise what is the best way to proceed.
Thank you, Lev Tannen -----Original Message----- From: Erick Erickson <erickerick...@gmail.com> Sent: Tuesday, March 19, 2019 2:48 PM To: solr-user <solr-user@lucene.apache.org> Subject: Re: Upgrading tika Yes, Solr is distributed with Tika. Look in: ./solr/contrib/extraction/lib Tika is upgraded when new versions come out, so the underlying files are whatever are current at the time. The integration is a fairly loose coupling, if you're using some external program (say a SolrJ program) to parse the files, there's no requirement to use the jars distributed with Solr, use whatever suits your fancy. An external program just constructs a SolrDocument to send to Solr. What you use to create that document is irrelevant. See: https://lucidworks.com/2012/02/14/indexing-with-solrj/ for some background. If you're using the ExtractingRequestHandler, where you just send the semi-structured docs to Solr (PDFs, Word or whatever), then needing to know anything about individual Tika-related jar files is kind of strange. If your predecessors wrote some custom code that runs as part of Solr, I don't know what to say... Best, Erick On Tue, Mar 19, 2019 at 10:47 AM Tannen, Lev (USAEO) [Contractor] <lev.tan...@usdoj.gov.invalid> wrote: > > Thank you Shawn. > I assumed that tika has been integrated with solr. I the project written > before me they used two tika files taken from solr distribution. I am trying > to do the same with solr 7.7.1. However this version contains a different set > of tika related files. So I am confused. Does solr does not have integrated > tika anymore, or I just cannot recognize them? > > -----Original Message----- > From: Shawn Heisey <apa...@elyograg.org> > Sent: Tuesday, March 19, 2019 11:11 AM > To: solr-user@lucene.apache.org > Subject: Re: Upgrading tika > > On 3/19/2019 9:03 AM, levtannen wrote: > > Could anybody suggest me what files do I need to use the latest > > version of Tika and where to find them? > > This mailing list is solr-user. Tika is an entirely separate project from > Solr within the Apache Foundation. To get help with Tika, you'll need to ask > that project. > > https://tika.apache.org/mail-lists.html > > Thanks, > Shawn