On 3/20/2019 8:24 AM, Tannen, Lev (USAEO) [Contractor] wrote:
I still need your advice. The program I have to fix uses class AutoDetectParser along 
with Solrj for parsing PDF files before sending the result to the solr server. To do this 
it linked two tika jar files taken from the solr distribution. Namely: tika-core and 
tika-parsers. Maybe it used some other tika related files but I have problems to identify 
them among a lot of other jar files linked. The program worked more or less OK, but it 
gave too many warnings of kind "Font not found". I had a rumor that this was 
fixed in the next tika distribution.

Solr does include a subset of Tika - just enough to make the Extracting Request Handler work.

Since you're writing your own program that uses Tika, the dependencies you need could be very different than what Solr needs for its Tika integration.

It is strongly recommended, as Erick mentioned, to never use Solr's Tika integration in production. Tika has a tendency to crash with some input files, especially PDF, and if it crashes when it is running inside Solr, then Solr will crash too. No more search engine.

  I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that 
problem. However when I switched I encountered an  another problem:
java.lang.NoClassDefFoundError: 
org/apache/cxf/jaxrs/ext/multipart/ContentDisposition.
Apparently I have not included some necessary jars. Those jars supposed to come 
from a different project  called cxf, but because they are related to tika I 
expected them be distributed with solr. However I did not find them in the solr 
7.7.1 (in the solr 7.5 as well).

I had never heard of CXF before. It is not included with Solr. The Extracting Request Handler must not use the part of Tika that needs CXF, so we don't include it.

So could you please advise what is the best way to proceed.

If you want to know how to use Tika in your program and what you need for your particular use case, talk to the Tika project. There is at least one person from the Tika project subscribed, but questions about that project are off-topic on this list.

Thanks,
Shawn

Reply via email to