Don't load 50M documents in one shot. Break it up into reasonable chunks 
(100K?) with commits at each point.
You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be 
disk. If you get faster disks, it might become the CPU.

wunder

On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote:

> Bumping this one again, any suggestions?
> 
> 
> On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote:
> 
>> Hello,
>> 
>> I load data from csv to solr via UpdateCSV. There are about 50M documents
>> with 10 columns in each document. The index size is about 15GB and I am
>> using a 3 node distributed solr cluster.
>> 
>> While loading the data the disk IO goes to 100%. if the load balancer in
>> front of solr hits the machine which is doing the processing then the
>> request times out. But in general, requests to all the machines become
>> slow. I have attached a screenshot of the diskI/O and CPU usage.
>> 
>> Is there a fix in solr which can possibly throttle the load or maybe its
>> due to MergePolicy? How can I debug solr to get the exact cause?
>> 
>> --
>> Thanks,
>> -Utkarsh
>> 
> 
> 
> 
> -- 
> Thanks,
> -Utkarsh

--
Walter Underwood
wun...@wunderwood.org



Reply via email to