Re: Get content in response from ExtractingRequestHandler

2015-07-15 Thread trung.ht
mplications. It it turns out that you > have to do this, consider running Tika in the app layer and > doing the extraction on demand there. It's not very hard, see: > https://lucidworks.com/blog/indexing-with-solrj/ > and ignore the db bits. > > Best, > Erick > > On T

Get content in response from ExtractingRequestHandler

2015-07-09 Thread trung.ht
Hi everyone, I use solr to index and search in office file (docx, pptx, ...). To reduce the size of solr index, I do not store the content of the file on solr, however now my customer want to preview the content of the file. I have read the document of ExtractingRequestHandler, but it seems that

Re: TIKA OCR not working

2015-04-28 Thread trung.ht
Hi Uwe, Today, I downloaded Solr 5.1 and it worked fine. It seems that this bug fix SOLR-7139 is only included in 5.1, not 5.0. Thank everyone for your support. Trung. On Tue, Apr 28, 2015 at 10:21 AM, trung.ht wrote: > Hi Uwe, > > Thanks for the answer, but it looks like it does no

Re: TIKA OCR not working

2015-04-27 Thread trung.ht
t; > > > I haven't experimented with our OCR parser yet, but this should give a > good > > start: https://wiki.apache.org/tika/TikaOCR . > > > > Have you installed tesseract? > > > > Tika colleagues, > > Any other tips? What else has to be configured an

Re: TIKA OCR not working

2015-04-24 Thread trung.ht
>> > >> > Regards, >> > Alex >> > On 23 Apr 2015 10:24 pm, "Ahmet Arslan" >> wrote: >> > >> > > Hi Trung, >> > > >> > > I didn't know about OCR capabilities of tika. >> > > Someone

Re: TIKA OCR not working

2015-04-23 Thread trung.ht
"Ahmet Arslan" > wrote: > > > > > Hi Trung, > > > > > > I didn't know about OCR capabilities of tika. > > > Someone who is familiar with sold-cell can inform us whether this > > > functionality is added to solr or not. > > &g

Re: TIKA OCR not working

2015-04-23 Thread trung.ht
es not do OCR. It cannot exact text from image based > pdfs. > > Ahmet > > > > On Thursday, April 23, 2015 7:33 AM, trung.ht wrote: > > > > Hi, > > I want to use solr to index some scanned document, after settings solr > document with a two field "c

TIKA OCR not working

2015-04-22 Thread trung.ht
Hi, I want to use solr to index some scanned document, after settings solr document with a two field "content" and "filename", I tried to upload the attached file, but it seems that the content of the file is only "\n \n \n". But if I used the tesseract from command line I got the result corre