Well, I’d have to do the same thing, go spelunking in Tika.. When I used it 
from SolrJ, I just linked to the Tika distro and it “just worked”, but I admit 
that was a while ago.

Your best bet would probably be the Tika user’s list.

Best,
Erick

> On Mar 20, 2019, at 7:24 AM, Tannen, Lev (USAEO) [Contractor] 
> <lev.tan...@usdoj.gov.INVALID> wrote:
> 
> Hi Erick,
> 
> I still need your advice. The program I have to fix uses class 
> AutoDetectParser along with Solrj for parsing PDF files before sending the 
> result to the solr server. To do this it linked two tika jar files taken from 
> the solr distribution. Namely: tika-core and tika-parsers. Maybe it used some 
> other tika related files but I have problems to identify them among a lot of 
> other jar files linked. The program worked more or less OK, but it gave too 
> many warnings of kind "Font not found". I had a rumor that this was fixed in 
> the next tika distribution.
> I switched from Solr 7.5 to solr 7.7.1 in a hope that will solve that 
> problem. However when I switched I encountered an  another problem: 
> java.lang.NoClassDefFoundError: 
> org/apache/cxf/jaxrs/ext/multipart/ContentDisposition.
> Apparently I have not included some necessary jars. Those jars supposed to 
> come from a different project  called cxf, but because they are related to 
> tika I expected them be distributed with solr. However I did not find them in 
> the solr 7.7.1 (in the solr 7.5 as well). 
> I have found the necessary file in the cxf distribution and included it. It 
> asked for an another file which I included as well. After this I got a 
> message that some temporary resources were not closed. Apparently something 
> is not matched. And now I am stuck. I do not want to start from scratch and 
> search the whole tika and cxf projects for the files I need and I do not want 
> to include all files from those projects especially because I was not able to 
> find a binary distribution. So could you please advise what is the best way 
> to proceed.
> 
> Thank you,
> Lev Tannen
> 
> -----Original Message-----
> From: Erick Erickson <erickerick...@gmail.com> 
> Sent: Tuesday, March 19, 2019 2:48 PM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Upgrading tika
> 
> Yes, Solr is distributed with Tika. Look in:
> ./solr/contrib/extraction/lib
> 
> Tika is upgraded when new versions come out, so the underlying files are 
> whatever are current at the time.
> 
> The integration is a fairly loose coupling, if you're using some external 
> program (say a SolrJ program) to parse the files, there's no requirement to 
> use the jars distributed with Solr, use whatever suits your fancy. An 
> external program just constructs a SolrDocument to send to Solr. What you use 
> to create that document is irrelevant. See:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/ for some background.
> 
> If you're using the ExtractingRequestHandler, where you just send the 
> semi-structured docs to Solr (PDFs, Word or whatever), then needing to know 
> anything about individual Tika-related jar files is kind of strange.
> 
> If your predecessors wrote some custom code that runs as part of Solr, I 
> don't know what to say...
> 
> Best,
> Erick
> 
> On Tue, Mar 19, 2019 at 10:47 AM Tannen, Lev (USAEO) [Contractor] 
> <lev.tan...@usdoj.gov.invalid> wrote:
>> 
>> Thank you Shawn.
>> I assumed that tika has been integrated with solr. I the project written 
>> before me they used two tika files taken from solr distribution. I am trying 
>> to do the same with solr 7.7.1. However this version contains a different 
>> set of tika related files. So I am confused. Does  solr does not have 
>> integrated tika anymore, or I just cannot recognize them?
>> 
>> -----Original Message-----
>> From: Shawn Heisey <apa...@elyograg.org>
>> Sent: Tuesday, March 19, 2019 11:11 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Upgrading tika
>> 
>> On 3/19/2019 9:03 AM, levtannen wrote:
>>> Could anybody suggest me what files do I need to use the latest 
>>> version of Tika and where to find them?
>> 
>> This mailing list is solr-user.  Tika is an entirely separate project from 
>> Solr within the Apache Foundation.  To get help with Tika, you'll need to 
>> ask that project.
>> 
>> https://tika.apache.org/mail-lists.html
>> 
>> Thanks,
>> Shawn

Reply via email to