Erick,
That helps so I can focus on the problem areas. Thanks.
On 3/5/14, 6:03 PM, Erick Erickson wrote:
Here's the easiest thing to try to figure out where to
concentrate your energies..... Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from SolrJ.
My bet: Your program will run at about the same speed
it does when you actually index the docs, indicating that
your problem is in the data acquisition side. Of course
the older I get, the more times I've been wrong :).
You can also monitor the CPU usage on the box running
Solr. I often see it idling along < 30% when indexing, or
even < 10%, again indicating that the bottleneck is on the
acquisition side.
Note I haven't mentioned any solutions, I'm a believer in
identifying the _problem_ before worrying about a solution.
Best,
Erick
On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky <j...@basetechnology.com> wrote:
Make sure you're not doing a commit on each individual document add. Commit
every few minutes or every few hundred or few thousand documents is
sufficient. You can set up auto commit in solrconfig.xml.
-- Jack Krupansky
-----Original Message----- From: Rallavagu
Sent: Wednesday, March 5, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Indexing huge data
All,
Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj based solution to collect data from difference resources to index
into Solr. It takes hours to index Solr.
Thanks in advance