Hi Erick,

I still need your advice. The program I have to fix uses class AutoDetectParser 
along with Solrj for parsing PDF files before sending the result to the solr 
server. To do this it linked two tika jar files taken from the solr 
distribution. Namely: tika-core and tika-parsers. Maybe it used some other tika 
related files but I have problems to identify them among a lot of other jar 
files linked. The program worked more or less OK, but it gave too many warnings 
of kind "Font not found". I had a rumor that this was fixed in the next tika 
distribution.
 I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that problem. 
However when I switched I encountered an  another problem: 
java.lang.NoClassDefFoundError: 
org/apache/cxf/jaxrs/ext/multipart/ContentDisposition.
Apparently I have not included some necessary jars. Those jars supposed to come 
from a different project  called cxf, but because they are related to tika I 
expected them be distributed with solr. However I did not find them in the solr 
7.7.1 (in the solr 7.5 as well). 
I have found the necessary file in the cxf distribution and included it. It 
asked for an another file which I included as well. After this I got a message 
that some temporary resources were not closed. Apparently something is not 
matched. And now I am stuck. I do not want to start from scratch and search the 
whole tika and cxf projects for the files I need and I do not want to include 
all files from those projects especially because I was not able to find a 
binary distribution. So could you please advise what is the best way to proceed.

Thank you,
Lev Tannen

-----Original Message-----
From: Erick Erickson <erickerick...@gmail.com> 
Sent: Tuesday, March 19, 2019 2:48 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Upgrading tika

Yes, Solr is distributed with Tika. Look in:
./solr/contrib/extraction/lib

Tika is upgraded when new versions come out, so the underlying files are 
whatever are current at the time.

The integration is a fairly loose coupling, if you're using some external 
program (say a SolrJ program) to parse the files, there's no requirement to use 
the jars distributed with Solr, use whatever suits your fancy. An external 
program just constructs a SolrDocument to send to Solr. What you use to create 
that document is irrelevant. See:
https://lucidworks.com/2012/02/14/indexing-with-solrj/ for some background.

If you're using the ExtractingRequestHandler, where you just send the 
semi-structured docs to Solr (PDFs, Word or whatever), then needing to know 
anything about individual Tika-related jar files is kind of strange.

If your predecessors wrote some custom code that runs as part of Solr, I don't 
know what to say...

Best,
Erick

On Tue, Mar 19, 2019 at 10:47 AM Tannen, Lev (USAEO) [Contractor] 
<lev.tan...@usdoj.gov.invalid> wrote:
>
> Thank you Shawn.
> I assumed that tika has been integrated with solr. I the project written 
> before me they used two tika files taken from solr distribution. I am trying 
> to do the same with solr 7.7.1. However this version contains a different set 
> of tika related files. So I am confused. Does  solr does not have integrated 
> tika anymore, or I just cannot recognize them?
>
> -----Original Message-----
> From: Shawn Heisey <apa...@elyograg.org>
> Sent: Tuesday, March 19, 2019 11:11 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Upgrading tika
>
> On 3/19/2019 9:03 AM, levtannen wrote:
> > Could anybody suggest me what files do I need to use the latest 
> > version of Tika and where to find them?
>
> This mailing list is solr-user.  Tika is an entirely separate project from 
> Solr within the Apache Foundation.  To get help with Tika, you'll need to ask 
> that project.
>
> https://tika.apache.org/mail-lists.html
>
> Thanks,
> Shawn

Reply via email to