subject:"Indexing speed reduced significantly with OCR"

Re: Indexing speed reduced significantly with OCR

2017-03-31 Thread Zheng Lin Edwin Yeo

..@gmail.com] > Sent: Thursday, 30 March 2017 4:53 p.m. > To: solr-user@lucene.apache.org > Subject: Re: Indexing speed reduced significantly with OCR > > Thanks for your reply. > > From what I see, getting more hardware to do the OCR is inevitable? > > Even if we run the OCR o

RE: Indexing speed reduced significantly with OCR

2017-03-30 Thread Phil Scadden

Yes, that would seem an accurate assessment of the problem. -Original Message- From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Thursday, 30 March 2017 4:53 p.m. To: solr-user@lucene.apache.org Subject: Re: Indexing speed reduced significantly with OCR Thanks for your reply

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Walter Underwood

om] > Sent: Thursday, March 30, 2017 7:37 AM > To: solr-user@lucene.apache.org > Subject: Re: Indexing speed reduced significantly with OCR > > The workflow is > -/ OCR new documents > -/ check quality and tune until you get good output text -/ keep the output > text in the

RE: Indexing speed reduced significantly with OCR

2017-03-30 Thread Allison, Timothy B.

> Note that the OCRing is a separate task from Solr indexing, and is best done > on separate machines. +1 -Original Message- From: Rick Leir [mailto:rl...@leirtech.com] Sent: Thursday, March 30, 2017 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing speed r

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Rick Leir

The workflow is -/ OCR new documents -/ check quality and tune until you get good output text -/ keep the output text in the file system -/ index and re-index to Solr as necessary from the file system Note that the OCRing is a separate task from Solr indexing, and is best done on separate mach

Re: Indexing speed reduced significantly with OCR

2017-03-29 Thread Zheng Lin Edwin Yeo

Thanks for your reply. >From what I see, getting more hardware to do the OCR is inevitable? Even if we run the OCR outside of Solr indexing stream, it will still take a long time to process it if it is on just one machine. And we still need to wait for the OCR to finish converting before we can r

RE: Indexing speed reduced significantly with OCR

2017-03-28 Thread Phil Scadden

Well I haven’t had to deal with a problem that size, but it seems to me that you have little alternative except through more computer hardware at it. For the job I did, I OCRed to convert PDF to searchable PDF outside the indexing workflow. I used pdftotext utility to extract text from pdf. If t

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Walter Underwood

t is searchable (text extractable without OCR). >>> >>> -Original Message- >>> From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] >>> Sent: Tuesday, 28 March 2017 4:13 p.m. >>> To: solr-user@lucene.apache.org >>> Subject: Index

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Zheng Lin Edwin Yeo

; documents I am working with OCR can be 100 times slower than indexing a PDF >> that is searchable (text extractable without OCR). >> >> -Original Message- >> From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] >> Sent: Tuesday, 28 March 2017 4:13 p.m. >&

Re: Indexing speed reduced significantly with OCR

2017-03-27 Thread Zheng Lin Edwin Yeo

with OCR can be 100 times slower than indexing a PDF > that is searchable (text extractable without OCR). > > -Original Message- > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > Sent: Tuesday, 28 March 2017 4:13 p.m. > To: solr-user@lucene.apache.org > S

RE: Indexing speed reduced significantly with OCR

2017-03-27 Thread Phil Scadden

: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Tuesday, 28 March 2017 4:13 p.m. To: solr-user@lucene.apache.org Subject: Indexing speed reduced significantly with OCR Hi, Does the indexing speed of Solr reduced significantly when we are using Tesseract OCR to extract scanned inline

Indexing speed reduced significantly with OCR

2017-03-27 Thread Zheng Lin Edwin Yeo

Hi, Does the indexing speed of Solr reduced significantly when we are using Tesseract OCR to extract scanned inline images from PDF? I found that after I implement the solution to extract those scanned images from PDF, the indexing speed is now slower by almost more than 10 times. I'm using Solr

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Indexing speed reduced significantly with OCR

12 matches

Site Navigation

Mail list logo

Footer information