subject:"Re\: Indexing Speed"

Re: Indexing speed reduced significantly with OCR

2017-03-31 Thread Zheng Lin Edwin Yeo

..@gmail.com] > Sent: Thursday, 30 March 2017 4:53 p.m. > To: solr-user@lucene.apache.org > Subject: Re: Indexing speed reduced significantly with OCR > > Thanks for your reply. > > From what I see, getting more hardware to do the OCR is inevitable? > > Even if we run the OCR o

RE: Indexing speed reduced significantly with OCR

2017-03-30 Thread Phil Scadden

Yes, that would seem an accurate assessment of the problem. -Original Message- From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] Sent: Thursday, 30 March 2017 4:53 p.m. To: solr-user@lucene.apache.org Subject: Re: Indexing speed reduced significantly with OCR Thanks for your reply

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Walter Underwood

om] > Sent: Thursday, March 30, 2017 7:37 AM > To: solr-user@lucene.apache.org > Subject: Re: Indexing speed reduced significantly with OCR > > The workflow is > -/ OCR new documents > -/ check quality and tune until you get good output text -/ keep the output > text in the

RE: Indexing speed reduced significantly with OCR

2017-03-30 Thread Allison, Timothy B.

> Note that the OCRing is a separate task from Solr indexing, and is best done > on separate machines. +1 -Original Message- From: Rick Leir [mailto:rl...@leirtech.com] Sent: Thursday, March 30, 2017 7:37 AM To: solr-user@lucene.apache.org Subject: Re: Indexing speed r

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Rick Leir

The workflow is -/ OCR new documents -/ check quality and tune until you get good output text -/ keep the output text in the file system -/ index and re-index to Solr as necessary from the file system Note that the OCRing is a separate task from Solr indexing, and is best done on separate mach

Re: Indexing speed reduced significantly with OCR

2017-03-29 Thread Zheng Lin Edwin Yeo

Thanks for your reply. >From what I see, getting more hardware to do the OCR is inevitable? Even if we run the OCR outside of Solr indexing stream, it will still take a long time to process it if it is on just one machine. And we still need to wait for the OCR to finish converting before we can r

RE: Indexing speed reduced significantly with OCR

2017-03-28 Thread Phil Scadden

Well I haven’t had to deal with a problem that size, but it seems to me that you have little alternative except through more computer hardware at it. For the job I did, I OCRed to convert PDF to searchable PDF outside the indexing workflow. I used pdftotext utility to extract text from pdf. If t

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Walter Underwood

Converting from PDF to text is embarrassingly parallel. You can throw as many machines at it as you want. This is a great time to use a cloud computing service. Need 1000 machines? No problem. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 28,

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Zheng Lin Edwin Yeo

Hi, Do you have suggestions that we can do to cope with the expensive process of indexing documents which requires OCR. For my current situation, the indexing takes about 2 weeks to complete. If the average indexing speed is say to be 50 times slower, it means it will require 100 weeks to index t

Re: Indexing speed reduced significantly with OCR

2017-03-27 Thread Zheng Lin Edwin Yeo

Yes, the sample document sizes are not very big. And also, the sample documents have a mixture of documents that consists of inline images, and also documents which are searchable (text extractable without OCR) I suppose only those documents which requires OCR will slow down the indexing? Which is

RE: Indexing speed reduced significantly with OCR

2017-03-27 Thread Phil Scadden

Only by 10? You must have quite small documents. OCR is extremely expensive process. Indexing is trivial by comparison. For quite large documents I am working with OCR can be 100 times slower than indexing a PDF that is searchable (text extractable without OCR). -Original Message- From:

Re: Indexing-speed issues (chart included)

2011-06-21 Thread Mathias Hodler

Sorry, here are some details: requestHandler: XmlUpdateRequesetHandler protocol: http (10 concurrend threads) document: 1kb size, 15 fields cpu load: 20% memory usage: 50% But generally speaking, is that normal or must be something wrong with my configuration, ... 2011/6/17 Erick Erickson >

Re: Indexing-speed issues (chart included)

2011-06-17 Thread Erick Erickson

No, generally this isn't what I'd expect. There will be periodic slowdowns when segments are flushed (I'm assuming you're not using trunk, there have been speedups here, see: http://blog.jteam.nl/2011/04/01/gimme-all-resources-you-have-i-can-use-them/) Does your config have any parameters set? Y

Re: Indexing-speed issues (chart included)

2011-06-17 Thread Mark Schoy

Sorry, here are some details: requestHandler: XmlUpdateRequesetHandler protocol: http (10 concurrend threads) document: 1kb size, 15 fields cpu load: 20% memory usage: 50% But generally speaking, is that normal or must be something wrong with my configuration, ... 2011/6/17 Erick Erickson > W

Re: Indexing-speed issues (chart included)

2011-06-17 Thread Erick Erickson

Well, it's kinda hard to say anything pertinent with so little information. How are you indexing things? What kind of documents? How are you feeding docs to Solr? You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jun 17, 2011 at 8:10 AM, Mark Schoy wrote: > Hi,

Re: Indexing Speed

2007-09-17 Thread Mike Klaas

On 16-Sep-07, at 8:01 PM, erolagnab wrote: Hi, Just a FYI. I've seen some posts mentioned that Solr can index 100-150 docs/s and the comparison between embedded solr and HTTP. I've tried to do the indexing with 1.7+ million docs, each doc has 30 fields among which 10 fields are indexed

Re: Indexing speed: web v.s. solrj app

2007-08-15 Thread Yonik Seeley

On 8/15/07, Lance Norskog <[EMAIL PROTECTED]> wrote: > Is indexing via solrj faster than going through the web service? There are > three cases: > Read a file from a local file system and indexing it directly, > Read a file on one machine and indexing it on another, and > Run solrj and

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

Re: Indexing speed reduced significantly with OCR

RE: Indexing speed reduced significantly with OCR

Re: Indexing-speed issues (chart included)

Re: Indexing-speed issues (chart included)

Re: Indexing-speed issues (chart included)

Re: Indexing-speed issues (chart included)

Re: Indexing Speed

Re: Indexing speed: web v.s. solrj app

17 matches

Site Navigation

Mail list logo

Footer information