Re: TIKA OCR not working

2015-04-29 Thread Erick Erickson
b.vn; solr-user@lucene.apache.org >>> > Subject: FW: TIKA OCR not working >>> > >>> > Trung, >>> > >>> > I haven't experimented with our OCR parser yet, but this should give a >>> good >>> > start: https://wi

Re: TIKA OCR not working

2015-04-28 Thread trung.ht
/TikaOCR . >> > >> > Have you installed tesseract? >> > >> > Tika colleagues, >> > Any other tips? What else has to be configured and how? >> > >> > -Original Message- >> > From: trung.ht [mailto:trung...@anlab.vn] &

Re: TIKA OCR not working

2015-04-27 Thread trung.ht
d how? > > > > -Original Message- > > From: trung.ht [mailto:trung...@anlab.vn] > > Sent: Friday, April 24, 2015 11:22 PM > > To: solr-user@lucene.apache.org > > Subject: Re: TIKA OCR not working > > > > HI everyone, > > > > Does

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
++ -Original Message- From: Konstantin Gribov Reply-To: "u...@tika.apache.org" Date: Monday, April 27, 2015 at 12:43 PM To: "u...@tika.apache.org" Cc: "trung...@anlab.vn" , "solr-user@lucene.apache.org" Subject: Re: TIKA OCR not working >

Re: TIKA OCR not working

2015-04-27 Thread Konstantin Gribov
v] > > Sent: Monday, April 27, 2015 4:29 PM > > To: u...@tika.apache.org > > Cc: trung...@anlab.vn; solr-user@lucene.apache.org > > Subject: Re: TIKA OCR not working > > > > It should work out of the box in Solr as long as Tesseract is installed > and on > > the cl

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
t else has to be configured and how? > > -Original Message- > From: trung.ht [mailto:trung...@anlab.vn] > Sent: Friday, April 24, 2015 11:22 PM > To: solr-user@lucene.apache.org > Subject: Re: TIKA OCR not working > > HI everyone, > > Does anyone have the answ

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
che.org > Cc: trung...@anlab.vn; solr-user@lucene.apache.org > Subject: Re: TIKA OCR not working > > It should work out of the box in Solr as long as Tesseract is installed and on > the class path. Solr had an issue with it since Tika sends 2 startDocument > calls, > but

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
should give a >good start: https://wiki.apache.org/tika/TikaOCR . > >Have you installed tesseract? > >Tika colleagues, > Any other tips? What else has to be configured and how? > >-Original Message- >From: trung.ht [mailto:trung...@anlab.vn] >Sent: Friday, Apr

Re: TIKA OCR not working

2015-04-24 Thread trung.ht
HI everyone, Does anyone have the answer for this problem :)? I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika 1.7, > but it looks like it does not work. Does anyone know that TIKA OCR works > automatically with Solr or I have to change some settings? > >> Trung. > It's n

Re: TIKA OCR not working

2015-04-23 Thread trung.ht
Hi Jack, Alexandre, Thanks for answering. I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika 1.7, but it looks like it does not work. Does anyone know that TIKA OCR works automatically with Solr or I have to change some settings? Trung. On Thu, Apr 23, 2015 at 10:02 PM, Ja

Re: TIKA OCR not working

2015-04-23 Thread Jack Krupansky
It's not clear if OCR would happen automatically in Solr Cell, or if changes to Solr would be needed. For Tika OCR info, see: https://issues.apache.org/jira/browse/TIKA-93 https://wiki.apache.org/tika/TikaOCR -- Jack Krupansky On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch wrote: >

Re: TIKA OCR not working

2015-04-23 Thread Alexandre Rafalovitch
I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't seen it in use yet. Regards, Alex On 23 Apr 2015 10:24 pm, "Ahmet Arslan" wrote: > Hi Trung, > > I didn't know about OCR capabilities of tika. > Someone who is familiar with sold-cell can inform us whether this > functionalit

Re: TIKA OCR not working

2015-04-23 Thread Ahmet Arslan
Hi Trung, I didn't know about OCR capabilities of tika. Someone who is familiar with sold-cell can inform us whether this functionality is added to solr or not. Ahmet On Thursday, April 23, 2015 2:06 PM, trung.ht wrote: Hi Ahmet, I used a png file, not a pdf file. From the document, I under

Re: TIKA OCR not working

2015-04-23 Thread trung.ht
Hi Ahmet, I used a png file, not a pdf file. From the document, I understand that solr will post the file to tika, and since tika 1.7, OCR is included. Is there something I misunderstood. Trung. On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan wrote: > Hi Trung, > > solr-cell (tika) does not do O

Re: TIKA OCR not working

2015-04-23 Thread Ahmet Arslan
Hi Trung, solr-cell (tika) does not do OCR. It cannot exact text from image based pdfs. Ahmet On Thursday, April 23, 2015 7:33 AM, trung.ht wrote: Hi, I want to use solr to index some scanned document, after settings solr document with a two field "content" and "filename", I tried to up