subject:"Re\: ideas for indexing large amount of pdf docs"

RE: ideas for indexing large amount of pdf docs

2011-08-16 Thread Rode González

ae...@dot.wi.gov] > Enviado el: lunes, 15 de agosto de 2011 14:54 > Para: solr-user@lucene.apache.org > Asunto: RE: ideas for indexing large amount of pdf docs > > Note on i: Solr replication provides pretty good clustering support > out-of-the-box, including replication of m

RE: ideas for indexing large amount of pdf docs

2011-08-15 Thread Jaeger, Jay - DOT

Note on i: Solr replication provides pretty good clustering support out-of-the-box, including replication of multiple cores. Read the Wiki on replication (Google +solr +replication if you don't know where it is). In my experience, the problem with indexing PDFs is it takes a lot of CPU on t

Re: ideas for indexing large amount of pdf docs

2011-08-13 Thread Rode Gonzalez (libnova)

t, 13 Aug 2011 15:34:19 -0400 Subject: Re: ideas for indexing large amount of pdf docs Ahhh, ok, my reply was irrelevant ... Here's a good write-up on this problem: http://www.lucidimagination.com/content/scaling-lucene-and-solr [http://www.lucidimagination.com/content/scaling-lucen

Re: ideas for indexing large amount of pdf docs

2011-08-13 Thread Erick Erickson

tering in production time. > > Best, > > Rode. > > > -Original Message----- > > From: Erick Erickson > > To: solr-user@lucene.apache.org > > Date: Sat, 13 Aug 2011 12:13:27 -0400 > > Subject: Re: ideas for indexing large amount of pdf docs > >

Re: ideas for indexing large amount of pdf docs

2011-08-13 Thread Bill Bell

You could send PDF for processing using a queue solution like Amazon SQS. Kick off Amazon instances to process the queue. Once you process with Tika to text just send the update to Solr. Bill Bell Sent from mobile On Aug 13, 2011, at 10:13 AM, Erick Erickson wrote: > Yeah, parsing PDF files

Re: ideas for indexing large amount of pdf docs

2011-08-13 Thread Rode Gonzalez (libnova)

dea to minimize this time all as possible when we entering in production time. Best, Rode. -Original Message- From: Erick Erickson To: solr-user@lucene.apache.org Date: Sat, 13 Aug 2011 12:13:27 -0400 Subject: Re: ideas for indexing large amount of pdf docs Yeah, parsing PDF

Re: ideas for indexing large amount of pdf docs

2011-08-13 Thread Erick Erickson

Yeah, parsing PDF files can be pretty resource-intensive, so one solution is to offload it somewhere else. You can use the Tika libraries in SolrJ to parse the PDFs on as many clients as you want, just transmitting the results to Solr for indexing. HOw are all these docs being submitted? Is this s

RE: ideas for indexing large amount of pdf docs

RE: ideas for indexing large amount of pdf docs

Re: ideas for indexing large amount of pdf docs

Re: ideas for indexing large amount of pdf docs

Re: ideas for indexing large amount of pdf docs

Re: ideas for indexing large amount of pdf docs

Re: ideas for indexing large amount of pdf docs

7 matches

Site Navigation

Mail list logo

Footer information