Don't load 50M documents in one shot. Break it up into reasonable chunks (100K?) with commits at each point.
You will have a bottleneck somewhere, usually disk or CPU. Yours appears to be disk. If you get faster disks, it might become the CPU. wunder On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > Bumping this one again, any suggestions? > > > On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <utkarsh2...@gmail.com>wrote: > >> Hello, >> >> I load data from csv to solr via UpdateCSV. There are about 50M documents >> with 10 columns in each document. The index size is about 15GB and I am >> using a 3 node distributed solr cluster. >> >> While loading the data the disk IO goes to 100%. if the load balancer in >> front of solr hits the machine which is doing the processing then the >> request times out. But in general, requests to all the machines become >> slow. I have attached a screenshot of the diskI/O and CPU usage. >> >> Is there a fix in solr which can possibly throttle the load or maybe its >> due to MergePolicy? How can I debug solr to get the exact cause? >> >> -- >> Thanks, >> -Utkarsh >> > > > > -- > Thanks, > -Utkarsh -- Walter Underwood wun...@wunderwood.org