Thanks guys! I will start splitting the file in chunks of 5M (10 chunks) to start with reduce the size if needed.
Thanks, -Utkarsh On Wed, Nov 13, 2013 at 9:08 AM, Walter Underwood <wun...@wunderwood.org>wrote: > Don't load 50M documents in one shot. Break it up into reasonable chunks > (100K?) with commits at each point. > > You will have a bottleneck somewhere, usually disk or CPU. Yours appears > to be disk. If you get faster disks, it might become the CPU. > > wunder > > On Nov 13, 2013, at 8:22 AM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > > > Bumping this one again, any suggestions? > > > > > > On Tue, Nov 12, 2013 at 3:58 PM, Utkarsh Sengar <utkarsh2...@gmail.com > >wrote: > > > >> Hello, > >> > >> I load data from csv to solr via UpdateCSV. There are about 50M > documents > >> with 10 columns in each document. The index size is about 15GB and I am > >> using a 3 node distributed solr cluster. > >> > >> While loading the data the disk IO goes to 100%. if the load balancer in > >> front of solr hits the machine which is doing the processing then the > >> request times out. But in general, requests to all the machines become > >> slow. I have attached a screenshot of the diskI/O and CPU usage. > >> > >> Is there a fix in solr which can possibly throttle the load or maybe its > >> due to MergePolicy? How can I debug solr to get the exact cause? > >> > >> -- > >> Thanks, > >> -Utkarsh > >> > > > > > > > > -- > > Thanks, > > -Utkarsh > > -- > Walter Underwood > wun...@wunderwood.org > > > > -- Thanks, -Utkarsh