Thank you Shawn and Erick, I truly did not want to dive into Tika and Cxf worlds, but it looks I have no choice.
-----Original Message----- From: Shawn Heisey <apa...@elyograg.org> Sent: Wednesday, March 20, 2019 11:09 AM To: solr-user@lucene.apache.org Subject: Re: Upgrading tika On 3/20/2019 8:24 AM, Tannen, Lev (USAEO) [Contractor] wrote: > I still need your advice. The program I have to fix uses class > AutoDetectParser along with Solrj for parsing PDF files before sending the > result to the solr server. To do this it linked two tika jar files taken from > the solr distribution. Namely: tika-core and tika-parsers. Maybe it used some > other tika related files but I have problems to identify them among a lot of > other jar files linked. The program worked more or less OK, but it gave too > many warnings of kind "Font not found". I had a rumor that this was fixed in > the next tika distribution. Solr does include a subset of Tika - just enough to make the Extracting Request Handler work. Since you're writing your own program that uses Tika, the dependencies you need could be very different than what Solr needs for its Tika integration. It is strongly recommended, as Erick mentioned, to never use Solr's Tika integration in production. Tika has a tendency to crash with some input files, especially PDF, and if it crashes when it is running inside Solr, then Solr will crash too. No more search engine. > I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that > problem. However when I switched I encountered an another problem: > java.lang.NoClassDefFoundError: > org/apache/cxf/jaxrs/ext/multipart/ContentDisposition. > Apparently I have not included some necessary jars. Those jars supposed to > come from a different project called cxf, but because they are related to > tika I expected them be distributed with solr. However I did not find them in > the solr 7.7.1 (in the solr 7.5 as well). I had never heard of CXF before. It is not included with Solr. The Extracting Request Handler must not use the part of Tika that needs CXF, so we don't include it. > So could you please advise what is the best way to proceed. If you want to know how to use Tika in your program and what you need for your particular use case, talk to the Tika project. There is at least one person from the Tika project subscribed, but questions about that project are off-topic on this list. Thanks, Shawn