Thank you Shawn and Erick,
 I truly did not want to dive into Tika and Cxf worlds, but it looks I have no 
choice.

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org> 
Sent: Wednesday, March 20, 2019 11:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Upgrading tika

On 3/20/2019 8:24 AM, Tannen, Lev (USAEO) [Contractor] wrote:
> I still need your advice. The program I have to fix uses class 
> AutoDetectParser along with Solrj for parsing PDF files before sending the 
> result to the solr server. To do this it linked two tika jar files taken from 
> the solr distribution. Namely: tika-core and tika-parsers. Maybe it used some 
> other tika related files but I have problems to identify them among a lot of 
> other jar files linked. The program worked more or less OK, but it gave too 
> many warnings of kind "Font not found". I had a rumor that this was fixed in 
> the next tika distribution.

Solr does include a subset of Tika - just enough to make the Extracting Request 
Handler work.

Since you're writing your own program that uses Tika, the dependencies you need 
could be very different than what Solr needs for its Tika integration.

It is strongly recommended, as Erick mentioned, to never use Solr's Tika 
integration in production.  Tika has a tendency to crash with some input files, 
especially PDF, and if it crashes when it is running inside Solr, then Solr 
will crash too.  No more search engine.

>   I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that 
> problem. However when I switched I encountered an  another problem:
> java.lang.NoClassDefFoundError: 
> org/apache/cxf/jaxrs/ext/multipart/ContentDisposition.
> Apparently I have not included some necessary jars. Those jars supposed to 
> come from a different project  called cxf, but because they are related to 
> tika I expected them be distributed with solr. However I did not find them in 
> the solr 7.7.1 (in the solr 7.5 as well).

I had never heard of CXF before.  It is not included with Solr.  The Extracting 
Request Handler must not use the part of Tika that needs CXF, so we don't 
include it.

> So could you please advise what is the best way to proceed.

If you want to know how to use Tika in your program and what you need for your 
particular use case, talk to the Tika project.  There is at least one person 
from the Tika project subscribed, but questions about that project are 
off-topic on this list.

Thanks,
Shawn

Reply via email to