On 12/22/2015 6:46 AM, Bram Van Dam wrote:
> This indexing job has been running for about 5 days now, and is pretty
> much IO-bound. CPU usage is ~50%. The load average, on the other hand,
> has been 128 for 5 days straight. Which is high, but fine: the machine
> is responsive.

A load average of 128 does not sound fine to me, unless you've got 128
CPU cores in this machine.  That much CPU power is achievable, but it is
very expensive.  Your specs don't sound like you've got anywhere near
that many CPU cores, so this load average definitely sounds like a problem.

> Memory usage is fine. Most of it is going towards file system caches and
> the like. Each Solr instance has 8GB Xmx, and is currently using about
> 7GB. I haven't noticed any OutOfMemoryErrors in the log files.

You can't tell anything about JVM heap usage unless you watch it over
time, with samples happening every few seconds.  Seeing a usage of 7GB
at one point in time will not tell you anything about how healthy the
JVM heap is.

> Monitoring shows that both Solr instances have been up throughout these
> procedings.
>
> Now, I'm willing to accept that these Solr instances don't have enough
> memory, or anything else, but I'm not seeing any of this reflected in
> the log files, which I'm finding troubling.

General advice regarding how much hardware you need is nearly
impossible.  There are simply too many variables to consider.  I have
some educated guesses for your situation, though.

200 million documents per collection, even if they are small, likely
results in dozens or hundreds of gigabytes of index data.  It also has a
fairly high heap requirement.  You said that the same 200 million docs
were loaded into three different collections, which seems very odd, as
it will greatly increase resource requirements.

How many 64GB machines do you have in your SolrCloud?  For what you are
asking it to do (600 million docs total), I hope that it's at *least* 6
servers for each replica.  Depending on the actual index size on disk,
more may be needed.  If there are fewer servers, and especially if
you've only got one, your index is far too big for your hardware.

I suspect that you are having two problems, quite possibly at the same
time:  1) Your heap is too small, but not small enough to hit OOM
errors.  Java is frequently doing full garbage collections, but is able
to free enough memory on each full GC to keep working.  2) You don't
have enough total memory in the machine to effectively cache your index
data.

The only general advice I have regarding how much memory you need is
condensed into this wiki page:

https://wiki.apache.org/solr/SolrPerformanceProblems

On that page, there is mention of the "ideal" setup -- where you have
enough memory to cache your entire index.  With very large indexes, the
budget required to reach this goal is rarely achievable.  It is not
usually necessary to have the ideal setup, though.  There merely needs
to be enough memory to result in very frequent cache hits while querying
the index.

Thanks,
Shawn

Reply via email to