Thanks for the explanation, it makes sense. I have noticed that sometimes a pdf 
document with spaces in the name can kill Tika and as result Solr so I get your 
point. Was trying to keep my webApp all Javascript/Typescript so went with the 
exposed extract/update handler. In my looking around I did see some python 
wrappers around a Tika service that might be a better solution. I can’t afford 
to crash the Solr server and it’s been many years since I wrote any java code! 
I remember we did use Solrj back in the day (I think the Lucene version was 3.5 
and the Solr was similar). May have to dust off that code. Anyway, thanks for 
the thorough explanation. Gonna have to rethink my design.
Geoff

> On Mar 20, 2019, at 12:51 PM, Geoffrey Willis <gwilli...@yahoo.com.INVALID> 
> wrote:
> 
> Could you expand on that please? I’m currently building a webApp that submits 
> documents to Solr/Tika via the update/extract handler and it’s working fine. 
> What do you mean when you say “You do not want to have your Solr instance 
> processing via Tika”? If that’s a bad design choice please elaborate. 
> Thanks,
> Geoff
> 
> 
>> On Mar 19, 2019, at 5:15 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote:
>> 
>> As per Erick advice, I would strongly recommend that you do anything tika in 
>> a  separate solrj programme. You do not want to have your solr instance 
>> processing via tika.
>> 
>> -----Original Message-----
>> From: Tannen, Lev (USAEO) [Contractor] <lev.tan...@usdoj.gov.INVALID>
>> Sent: Wednesday, 20 March 2019 08:17
>> To: solr-user@lucene.apache.org
>> Subject: RE: Upgrading tika
>> 
>> Sorry Erick,
>> Please disregard my previous message. Somehow I downloaded the version 
>> without those two files. I am going to download the latest version solr 
>> 8.0.0 and try it.
>> Best
>> Lev Tannen
>> 
>> -----Original Message-----
>> From: Erick Erickson <erickerick...@gmail.com>
>> Sent: Tuesday, March 19, 2019 2:48 PM
>> To: solr-user <solr-user@lucene.apache.org>
>> Subject: Re: Upgrading tika
>> 
>> Yes, Solr is distributed with Tika. Look in:
>> ./solr/contrib/extraction/lib
>> 
>> Tika is upgraded when new versions come out, so the underlying files are 
>> whatever are current at the time.
>> 
>> The integration is a fairly loose coupling, if you're using some external 
>> program (say a SolrJ program) to parse the files, there's no requirement to 
>> use the jars distributed with Solr, use whatever suits your fancy. An 
>> external program just constructs a SolrDocument to send to Solr. What you 
>> use to create that document is irrelevant. See:
>> https://lucidworks.com/2012/02/14/indexing-with-solrj/ for some background.
>> 
>> If you're using the ExtractingRequestHandler, where you just send the 
>> semi-structured docs to Solr (PDFs, Word or whatever), then needing to know 
>> anything about individual Tika-related jar files is kind of strange.
>> 
>> If your predecessors wrote some custom code that runs as part of Solr, I 
>> don't know what to say...
>> 
>> Best,
>> Erick
>> 
>> On Tue, Mar 19, 2019 at 10:47 AM Tannen, Lev (USAEO) [Contractor] 
>> <lev.tan...@usdoj.gov.invalid> wrote:
>>> 
>>> Thank you Shawn.
>>> I assumed that tika has been integrated with solr. I the project written 
>>> before me they used two tika files taken from solr distribution. I am 
>>> trying to do the same with solr 7.7.1. However this version contains a 
>>> different set of tika related files. So I am confused. Does  solr does not 
>>> have integrated tika anymore, or I just cannot recognize them?
>>> 
>>> -----Original Message-----
>>> From: Shawn Heisey <apa...@elyograg.org>
>>> Sent: Tuesday, March 19, 2019 11:11 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Upgrading tika
>>> 
>>> On 3/19/2019 9:03 AM, levtannen wrote:
>>>> Could anybody suggest me what files do I need to use the latest
>>>> version of Tika and where to find them?
>>> 
>>> This mailing list is solr-user.  Tika is an entirely separate project from 
>>> Solr within the Apache Foundation.  To get help with Tika, you'll need to 
>>> ask that project.
>>> 
>>> https://tika.apache.org/mail-lists.html
>>> 
>>> Thanks,
>>> Shawn
>> Notice: This email and any attachments are confidential and may not be used, 
>> published or redistributed without the prior written consent of the 
>> Institute of Geological and Nuclear Sciences Limited (GNS Science). If 
>> received in error please destroy and immediately notify GNS Science. Do not 
>> copy or disclose the contents.
> 

Reply via email to