Thanks Erik, I think I have been able to exhaust a resource if I split the data in 2 and upload it with 2 clients like benchmark 1.1 it takes 120s here the bottleneck it my LAN, if I use a setting like benchmark 1 probably the bottleneck is the ramBuffer.
I'm going to buy a Gigabit ethernet cable so I can make a better test. OutOfMemory error: it's the solrj client that crashes I'm using solr 4.2.1 and corresponding solrj client httpsolrserver works fine concurrentupdatesolrsever gives me problems, and I didn't understand how to size the queuesize parameter optimally Il giorno 07/ott/2013, alle ore 14:03, Erick Erickson ha scritto: > Just skimmed, but the usual reason you can't max out the server > is that the client can't go fast enough. Very quick experiment: > comment out the server.add line in your client and run it again, > does that speed up the client substantially? If not, then the time > is being spent on the client. > > Or split your csv file into, say, 5 parts and run it from 5 different > PCs in parallel. > > bq: I can't rely on auto commit, otherwise I get an OutOfMemory error > This shouldn't be happening, I'd get to the bottom of this. Perhaps simply > allocating more memory to the JVM running Solr. > > bq: committing every 100k docs gives worse performance > It'll be best to specify openSearcher=false for max indexing throughput > BTW. You should be able to do this quite frequently, 15 seconds seems > quite reasonable. > > Best, > Erick > > On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla <matteo.gro...@gmail.com> > wrote: >> I'd like to have some suggestion on how to improve the indexing performance >> on the following scenario >> I'm uploading 1M docs to solr, >> >> every docs has >> id: sequential number >> title: small string >> date: date >> body: 1kb of text >> >> Here are my benchmarks (they are all single executions, not averages from >> multiple executions): >> >> 1) using the updaterequesthandler >> and streaming docs from a csv file on the same disk of solr >> auto commit every 15s with openSearcher=false and commit after last >> document >> >> total time: 143035ms >> >> 1.1) using the updaterequesthandler >> and streaming docs from a csv file on the same disk of solr >> auto commit every 15s with openSearcher=false and commit after last >> document >> <ramBufferSizeMB>500</ramBufferSizeMB> >> <maxBufferedDocs>100000</maxBufferedDocs> >> >> total time: 134493ms >> >> 1.2) using the updaterequesthandler >> and streaming docs from a csv file on the same disk of solr >> auto commit every 15s with openSearcher=false and commit after last >> document >> <mergeFactor>30</mergeFactor> >> >> total time: 143134ms >> >> 2) using a solrj client from another pc in the lan (100Mbps) >> with httpsolrserver >> with javabin format >> add documents to the server in batches of 1k docs ( server.add( >> <collection> ) ) >> auto commit every 15s with openSearcher=false and commit after last >> document >> >> total time: 139022ms >> >> 3) using a solrj client from another pc in the lan (100Mbps) >> with concurrentupdatesolrserver >> with javelin format >> add documents to the server in batches of 1k docs ( server.add( >> <collection> ) ) >> server queue size=20k >> server threads=4 >> no auto-commit and commit every 100k docs >> >> total time: 167301ms >> >> >> --On the solr server-- >> cpu averages 25% >> at best 100% for 1 core >> IO is still far from being saturated >> iostat gives a pattern like this (every 5 s) >> >> time(s) %util >> 100 45,20 >> 105 1,68 >> 110 17,44 >> 115 76,32 >> 120 2,64 >> 125 68 >> 130 1,28 >> >> I thought that using concurrentupdatesolrserver I was able to max cpu or IO >> but I wasn't. >> With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get >> an OutOfMemory error >> and I found that committing every 100k docs gives worse performance than >> auto commit every 15s (benchmark 3 with httpsolrserver took 193515) >> >> I'd really like to understand why I can't max out the resources on the >> server hosting solr (disk above all) >> And I'd really like to understand what I'm doing wrong with >> concurrentupdatesolrserver >> >> thanks >>