ae...@dot.wi.gov]
> Enviado el: lunes, 15 de agosto de 2011 14:54
> Para: solr-user@lucene.apache.org
> Asunto: RE: ideas for indexing large amount of pdf docs
>
> Note on i: Solr replication provides pretty good clustering support
> out-of-the-box, including replication of m
{
print "Query: lnamesyn:$lname AND fnamesyn:$fname$fuzzy";
print $response->content();
}
print "POST for $fname $lname completed, HTTP status=" .
$response->code . "\n";
}
$elapsed = time() - $starttime;
$average
t, 13 Aug 2011 15:34:19 -0400
Subject: Re: ideas for indexing large amount of pdf docs
Ahhh, ok, my reply was irrelevant ...
Here's a good write-up on this problem:
http://www.lucidimagination.com/content/scaling-lucene-and-solr
[http://www.lucidimagination.com/content/scaling-lucen
tering in production time.
>
> Best,
>
> Rode.
>
>
> -Original Message-----
>
> From: Erick Erickson
>
> To: solr-user@lucene.apache.org
>
> Date: Sat, 13 Aug 2011 12:13:27 -0400
>
> Subject: Re: ideas for indexing large amount of pdf docs
>
>
You could send PDF for processing using a queue solution like Amazon SQS. Kick
off Amazon instances to process the queue.
Once you process with Tika to text just send the update to Solr.
Bill Bell
Sent from mobile
On Aug 13, 2011, at 10:13 AM, Erick Erickson wrote:
> Yeah, parsing PDF files
dea to minimize this
time all as possible when we entering in production time.
Best,
Rode.
-Original Message-
From: Erick Erickson
To: solr-user@lucene.apache.org
Date: Sat, 13 Aug 2011 12:13:27 -0400
Subject: Re: ideas for indexing large amount of pdf docs
Yeah, parsing PDF
Yeah, parsing PDF files can be pretty resource-intensive, so one solution
is to offload it somewhere else. You can use the Tika libraries in SolrJ
to parse the PDFs on as many clients as you want, just transmitting the
results to Solr for indexing.
HOw are all these docs being submitted? Is this s
Hi all,
I want to ask about the best way to implement a solution for indexing a
large amount of pdf documents between 10-60 MB each one. 100 to 1000 users
connected simultaneously.
I actually have 1 core of solr 3.3.0 and it works fine for a few number of
pdf docs but I'm afraid about the mome