Re: PDF extraction using Tika

2020-08-26 Thread Walter Underwood
one example. You >>>> should run Tika separately as it's entirely possible for it to fail to >>>> parse a PDF and crash - and if you're running it in DIH & Solr it then >>>> brings down everything. Separate your PDF processing from your Solr &

RE: [EXT] Re: PDF extraction using Tika

2020-08-26 Thread Hanjan, Harinderdeep S.
27;s JVM's memory footprint. For example, the following will limit it to 2GB > java -Xmx2048m -jar tika-server-1.24.jar - H -Original Message- From: Jan Høydahl [mailto:jan@cominvent.com] Sent: August 26, 2020 6:19 AM To: solr-user Subject: [EXT] Re: PDF extraction using Tika W

Re: PDF extraction using Tika

2020-08-26 Thread Jan Høydahl
e for it to fail to parse a PDF >>> and crash - and if you're running it in DIH & Solr it then brings down >>> everything. Separate your PDF processing from your Solr indexing. >>> >>> >>> Cheers >>> >>> Charlie >>> >

Re: PDF extraction using Tika

2020-08-26 Thread Charlie Hull
a PDF and crash - and if you're running it in DIH & Solr it then brings down everything. Separate your PDF processing from your Solr indexing. Cheers Charlie Thanks, Srinivas Kashyap -Original Message- From: Alexandre Rafalovitch Sent: 24 August 2020 20:54 To: solr-user Subj

RE: PDF extraction using Tika

2020-08-25 Thread Srinivas Kashyap
Thanks Phil, I will modify it according to the need. Thanks, Srinivas -Original Message- From: Phil Scadden Sent: 26 August 2020 02:44 To: solr-user@lucene.apache.org Subject: RE: PDF extraction using Tika Code for solrj is going to be very dependent on your needs but the beating

RE: PDF extraction using Tika

2020-08-25 Thread Phil Scadden
Admin", password); UpdateResponse ur = req.process(solr,"prindex"); req.commit(solr, "prindex"); -----Original Message----- From: Srinivas Kashyap Sent: Tuesday, 25 August 2020 17:04 To: solr-user@lucene.apache.org Subject: RE: PDF extraction usi

Re: PDF extraction using Tika

2020-08-25 Thread Joe Doupnik
r Subject: Re: PDF extraction using Tika The issue seems to be more with a specific file and at the level way below Solr's or possibly even Tika's: Caused by: java.io.IOException: expected='>' actual=' ' at offset 2383 at org.apache.pdfbox.pdfpar

Re: PDF extraction using Tika

2020-08-25 Thread Charlie Hull
e Rafalovitch Sent: 24 August 2020 20:54 To: solr-user Subject: Re: PDF extraction using Tika The issue seems to be more with a specific file and at the level way below Solr's or possibly even Tika's: Caused by: java.io.IOException: expected='>' act

RE: PDF extraction using Tika

2020-08-24 Thread Srinivas Kashyap
from PDF and pushes into solr? Thanks, Srinivas Kashyap -Original Message- From: Alexandre Rafalovitch Sent: 24 August 2020 20:54 To: solr-user Subject: Re: PDF extraction using Tika The issue seems to be more with a specific file and at the level way below Solr's or possibly

Re: PDF extraction using Tika

2020-08-24 Thread Alexandre Rafalovitch
The issue seems to be more with a specific file and at the level way below Solr's or possibly even Tika's: Caused by: java.io.IOException: expected='>' actual=' ' at offset 2383 at org.apache.pdfbox.pdfparser.BaseParser.readExpectedChar(BaseParser.java:1045) Are you indexing the sa