This is a minor followup to this thread which includes required context:

http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html

I haven't solved the problem, but since negative results can also be
useful I thought I would share them.  Things I tried unsuccessfully (on
individual nodes except for the upgrade):

- Upgrade from Cassandra 0.6 to 0.7
- Different collectors: -XX:+UseParallelGC -XX:+UseParallelOldGC
- JNA (but not mlockall)
- Switch disk_access_mode from standard to mmap_index_only (obviously in
this case RSS is less than useful, but overall memory graph still was
bad looking like this [1]).


On #cassandra there was speculation that a large (200k) row cache may be
inducing heap fragmentation.  I have not ruled this out but have been
unable to do that in stand alone ConcurrentLinkedHashMap stress testing.
 Since turning off the row cache would be a cure worse than the disease
I have not tried that yet with a real cluster.

Future possibilities would be to get the limits set right for mlockall,
trying combinations of the above, and running without caches.

I have gc logs if anyone is interested.

[1] http://img194.imageshack.us/img194/383/2weekmem.png

Reply via email to