Hi,
Thanks Erick for your input. I've added GC logging, but it was normal
when the error came again this morning. I was adding a large collection
(27 Gb): on the first server all went well. At the time I created the
core on a second server, it was almost immediately disconnected from the
cloud. Th
right, do note that when you _do_ hit an OOM, you really
should restart the JVM as nothing is _really_ certain after
that.
You're right, just bumping the memory is a band-aid, but
whatever gets you by. Lucene makes heavy use of
MMapDirectory which uses OS memory rather than JVM
memory, so you're r
OK, great. I've eliminated OOM errors after increasing the memory
allocated to Solr: 12Gb out of 20Gb. It's probably not an optimal
setting but this is all I can have right now on the Solr machines. I'll
look into GC logging too.
Turning to the Solr logs, a quick sweep revealed a lot of "Caused by
ZK isn't pushed all that heavily, although all things are possible. Still,
for maintenance putting Zk on separate machines is a good idea. They
don't have to be very beefy machines.
Look in your logs for LeaderInitiatedRecovery messages. If you find them
then _probably_ you have some issues with t
Thanks, I'll have a try. Can the load on the Solr servers impair the zk
response time in the current situation, which would cause the desync? Is
this the reason for the change?
John.
On 21/12/15 16:45, Erik Hatcher wrote:
> John - the first recommendation that pops out is to run (only) 3 zookeep
John - the first recommendation that pops out is to run (only) 3 zookeepers,
entirely separate from Solr servers, and then as many Solr servers from there
that you need to scale indexing and querying to your needs. Sounds like 3 ZKs
+ 2 Solr’s is a good start, given you have 5 servers at your d