On 12/22/2015 6:46 AM, Bram Van Dam wrote: > This indexing job has been running for about 5 days now, and is pretty > much IO-bound. CPU usage is ~50%. The load average, on the other hand, > has been 128 for 5 days straight. Which is high, but fine: the machine > is responsive.
A load average of 128 does not sound fine to me, unless you've got 128 CPU cores in this machine. That much CPU power is achievable, but it is very expensive. Your specs don't sound like you've got anywhere near that many CPU cores, so this load average definitely sounds like a problem. > Memory usage is fine. Most of it is going towards file system caches and > the like. Each Solr instance has 8GB Xmx, and is currently using about > 7GB. I haven't noticed any OutOfMemoryErrors in the log files. You can't tell anything about JVM heap usage unless you watch it over time, with samples happening every few seconds. Seeing a usage of 7GB at one point in time will not tell you anything about how healthy the JVM heap is. > Monitoring shows that both Solr instances have been up throughout these > procedings. > > Now, I'm willing to accept that these Solr instances don't have enough > memory, or anything else, but I'm not seeing any of this reflected in > the log files, which I'm finding troubling. General advice regarding how much hardware you need is nearly impossible. There are simply too many variables to consider. I have some educated guesses for your situation, though. 200 million documents per collection, even if they are small, likely results in dozens or hundreds of gigabytes of index data. It also has a fairly high heap requirement. You said that the same 200 million docs were loaded into three different collections, which seems very odd, as it will greatly increase resource requirements. How many 64GB machines do you have in your SolrCloud? For what you are asking it to do (600 million docs total), I hope that it's at *least* 6 servers for each replica. Depending on the actual index size on disk, more may be needed. If there are fewer servers, and especially if you've only got one, your index is far too big for your hardware. I suspect that you are having two problems, quite possibly at the same time: 1) Your heap is too small, but not small enough to hit OOM errors. Java is frequently doing full garbage collections, but is able to free enough memory on each full GC to keep working. 2) You don't have enough total memory in the machine to effectively cache your index data. The only general advice I have regarding how much memory you need is condensed into this wiki page: https://wiki.apache.org/solr/SolrPerformanceProblems On that page, there is mention of the "ideal" setup -- where you have enough memory to cache your entire index. With very large indexes, the budget required to reach this goal is rarely achievable. It is not usually necessary to have the ideal setup, though. There merely needs to be enough memory to result in very frequent cache hits while querying the index. Thanks, Shawn