OOM on ccm with large cluster on a single node

2020-10-27 Thread onmstester onmstester
Hi, I'm using ccm to create a cluster of 80 nodes on a physical server with 10 cores and 64GB of ram, but always the 43th node could not start with error: java.lang.OutOfMemoryError: unable to create new native thread apache cassandra 3.11.2 cassandra xmx600M 30GB of memory is still free

Re: OOM only on one datacenter nodes

2020-04-06 Thread Reid Pinchback
;user@cassandra.apache.org" Cc: Reid Pinchback Subject: Re: OOM only on one datacenter nodes Message from External Sender We are using JRE and not JDK , hence not able to take heap dump . On Sun, 5 Apr 2020 at 19:21, Jeff Jirsa mailto:jji...@gmail.com>> wrote: Set the jvm flags to

Re: OOM only on one datacenter nodes

2020-04-05 Thread Jeff Jirsa
We are using JRE and not JDK , hence not able to take heap dump . > >> On Sun, 5 Apr 2020 at 19:21, Jeff Jirsa wrote: >> >> Set the jvm flags to heap dump on oom >> >> Open up the result in a heap inspector of your preference (like yourkit or >> simil

Re: OOM only on one datacenter nodes

2020-04-05 Thread Surbhi Gupta
We are using JRE and not JDK , hence not able to take heap dump . On Sun, 5 Apr 2020 at 19:21, Jeff Jirsa wrote: > > Set the jvm flags to heap dump on oom > > Open up the result in a heap inspector of your preference (like yourkit or > similar) > > Find a view that co

Re: OOM only on one datacenter nodes

2020-04-05 Thread Jeff Jirsa
Set the jvm flags to heap dump on oom Open up the result in a heap inspector of your preference (like yourkit or similar) Find a view that counts objects by total retained size. Take a screenshot. Send that. > On Apr 5, 2020, at 6:51 PM, Surbhi Gupta wrote: > >  > I just

Re: OOM only on one datacenter nodes

2020-04-05 Thread Surbhi Gupta
erence in how repairs are handled. Somewhere, there is a > difference. I’d start with focusing on that. > > RP> From: Erick Ramirez > RP> Reply-To: "user@cassandra.apache.org" > RP> Date: Saturday, April 4, 2020 at 8:28 PM > RP> To: "user@cassandra.ap

Re: OOM only on one datacenter nodes

2020-04-05 Thread Alex Ott
M RP> To: "user@cassandra.apache.org" RP> Subject: Re: OOM only on one datacenter nodes RP> Message from External Sender RP> With a lack of heapdump for you to analyse, my hypothesis is that your DC2 nodes are taking on traffic (from some client somewhere) but you

Re: OOM only on one datacenter nodes

2020-04-04 Thread Reid Pinchback
difference in how repairs are handled. Somewhere, there is a difference. I’d start with focusing on that. From: Erick Ramirez Reply-To: "user@cassandra.apache.org" Date: Saturday, April 4, 2020 at 8:28 PM To: "user@cassandra.apache.org" Subject: Re: OOM only on one datacenter

Re: OOM only on one datacenter nodes

2020-04-04 Thread Erick Ramirez
With a lack of heapdump for you to analyse, my hypothesis is that your DC2 nodes are taking on traffic (from some client somewhere) but you're just not aware of it. The hints replay is just a side-effect of the nodes getting overloaded. To rule out my hypothesis in the first instance, my recommend

OOM only on one datacenter nodes

2020-04-04 Thread Surbhi Gupta
r DC2 where we don't have live traffic we see lots OOM(Out of Memory)and node goes down(only on DC2 nodes). We were using 16GB heap with G1GC in DC1 and DC2 both . As DC2 nodes were OOM so we increased 16GB to 24GB and then to 32GB but still DC2 nodes goes down with OOM , but obviously not as fre

RE: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-30 Thread Steinmaurer, Thomas
Subject: Re: Cassandra going OOM due to tombstones (heapdump screenshots provided) It looks like the number of tables is the problem, with 5,000 - 10,000 tables, that is way above the recommendations. Take a look here: https://docs.datastax.com/en/dse-planning/doc/planning/planningAntiPatterns.html

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Erick Ramirez
> > It looks like the number of tables is the problem, with 5,000 - 10,000 > tables, that is way above the recommendations. > Take a look here: > https://docs.datastax.com/en/dse-planning/doc/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatTooManyTables > This suggests that 5-10GB o

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Hannu Kröger
It means that you are using 5-10GB of memory just to hold information about tables. Memtables hold the data that is written to the database until those are flushed to the disk, and those happen when memory is low or some other threshold is reached. Every table will have a memtable that takes at

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Behroz Sikander
It doesn't seem to be the problem but I do not have deep knowledge of C* internals. When do memtable come into play? Only at startup? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mai

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Paul Chandler
Hi Behroz, It looks like the number of tables is the problem, with 5,000 - 10,000 tables, that is way above the recommendations. Take a look here: https://docs.datastax.com/en/dse-planning/doc/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatTooManyTables

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Hannu Kröger
IIRC there is an overhead of about 1MB per table which you have about 5000-1 => 5GB - 10GB overhead of just having that many tables. To me it looks like that you need to increase the heap size and later potentially work on the data models to have less tables. Hannu > On 29. Jan 2020, at 15

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Behroz Sikander
>> If it's after the host comes online and it's hint replay from the other hosts, you probably want to throttle hint replay significantly on the rest of the cluster. Whatever your hinted handoff throttle is, consider dropping it by 50-90% to work around whichever of those two problems it is. This

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Behroz Sikander
>> Startup would replay commitlog, which would re-materialize all of those mutations and put them into the memtable. The memtable would flush over time to disk, and clear the commitlog. >From our observation, the node is already online and it seems to be happening >after the commit log replay has

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Behroz Sikander
>> Some environment details like Cassandra version, amount of physical RAM, JVM configs (heap and others), and any other non-default cassandra.yaaml configs would help. The amount of data, number of keyspaces & tables, since you mention "clients", would also be helpful for people to suggest tun

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Reid Pinchback
Just a thought along those lines. If the memtable flush isn’t keeping up, you might find that manifested in the I/O queue length and dirty page stats leading into the time the OOM event took place. If you do see that, then you might need to do some I/O tuning as well. From: Jeff Jirsa Reply

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Jeff Jirsa
oes the machine have? How much of that is allocated to > the heap? What are your memtable settings? Do you see log lines about > flushing memtables to free room (probably something like the slab pool > cleaner)? > > > > On Fri, Jan 24, 2020 at 3:16 AM Behroz Sikander > wrote:

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Jeff Jirsa
On Fri, Jan 24, 2020 at 3:16 AM Behroz Sikander wrote: > We recently had a lot of OOM in C* and it was generally happening during > startup. > We took some heap dumps but still cannot pin point the exact reason. So, > we need some help from experts. > > Our clients are not expli

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Michael Shuler
ple to suggest tuning improvements. Michael On 1/24/20 5:16 AM, Behroz Sikander wrote: We recently had a lot of OOM in C* and it was generally happening during startup. We took some heap dumps but still cannot pin point the exact reason. So, we need some help from experts. Our clients are not expli

Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-24 Thread Behroz Sikander
We recently had a lot of OOM in C* and it was generally happening during startup. We took some heap dumps but still cannot pin point the exact reason. So, we need some help from experts. Our clients are not explicitly deleting data but they have TTL enabled. C* details: > show version [cq

Re: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Reid Pinchback
aurer, Thomas" Reply-To: "user@cassandra.apache.org" Date: Wednesday, November 6, 2019 at 2:43 PM To: "user@cassandra.apache.org" Subject: RE: Cassandra 3.0.18 went OOM several hours after joining a cluster Message from External Sender Reid, thanks for thoughts. I agree

RE: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas
ion usage. To avoid double work, I will try to continue providing additional information / thoughts on the Cassandra ticket. Regards, Thomas From: Reid Pinchback Sent: Mittwoch, 06. November 2019 18:28 To: user@cassandra.apache.org Subject: Re: Cassandra 3.0.18 went OOM several hours after joinin

Re: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Reid Pinchback
of them, until eventually…pop. From: Reid Pinchback Reply-To: "user@cassandra.apache.org" Date: Wednesday, November 6, 2019 at 12:11 PM To: "user@cassandra.apache.org" Subject: Re: Cassandra 3.0.18 went OOM several hours after joining a cluster Message from External Sender My

Re: Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Reid Pinchback
underlying problem. From: "Steinmaurer, Thomas" Reply-To: "user@cassandra.apache.org" Date: Wednesday, November 6, 2019 at 11:27 AM To: "user@cassandra.apache.org" Subject: Cassandra 3.0.18 went OOM several hours after joining a cluster Message from External Sender He

Cassandra 3.0.18 went OOM several hours after joining a cluster

2019-11-06 Thread Steinmaurer, Thomas
Hello, after moving from 2.1.18 to 3.0.18, we are facing OOM situations after several hours a node has successfully joined a cluster (via auto-bootstrap). I have created the following ticket trying to describe the situation, including hprof / MAT screens: https://issues.apache.org/jira/browse

RE: cassandra node was put down with oom error

2019-05-01 Thread ZAIDI, ASAD A
round -Original Message- From: Mia [mailto:yeomii...@gmail.com] Sent: Wednesday, May 01, 2019 5:47 AM To: user@cassandra.apache.org Subject: Re: cassandra node was put down with oom error Hello, Ayub. I'm using apache cassandra, not dse edition. So I have never used the dse search feature. In my

Re: cassandra node was put down with oom error

2019-05-01 Thread Steve Lacerda
> > > In open source, the only things offheap that vary significantly are > bloom filters and compression offsets - both scale with disk space, and > both increase during compaction. Large STCS compaction can cause pretty > meaningful allocations for these. Also, if you have an unusually low

Re: cassandra node was put down with oom error

2019-05-01 Thread Sandeep Nethi
search feature. > > > In my case, all the nodes of the cluster have the same problem. > > > > > > Thanks. > > > > > > On 2019/05/01 06:13:06, Ayub M wrote: > > > > Do you have search on the same nodes or is it only cassandra. In my > c

Re: cassandra node was put down with oom error

2019-05-01 Thread Mia
; > In my case, all the nodes of the cluster have the same problem. > > > > Thanks. > > > > On 2019/05/01 06:13:06, Ayub M wrote: > > > Do you have search on the same nodes or is it only cassandra. In my case > > it > > > was due to a memory leak

Re: cassandra node was put down with oom error

2019-05-01 Thread Sandeep Nethi
luster have the same problem. > > Thanks. > > On 2019/05/01 06:13:06, Ayub M wrote: > > Do you have search on the same nodes or is it only cassandra. In my case > it > > was due to a memory leak bug in dse search that consumed more memory > > resulting in oom. >

Re: cassandra node was put down with oom error

2019-05-01 Thread Mia
case it > was due to a memory leak bug in dse search that consumed more memory > resulting in oom. > > On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com > wrote: > > > Hello, > > > > I'm suffering from similar problem with OSS cassandra version3.11.3. >

Re: cassandra node was put down with oom error

2019-04-30 Thread Ayub M
Do you have search on the same nodes or is it only cassandra. In my case it was due to a memory leak bug in dse search that consumed more memory resulting in oom. On Tue, Apr 30, 2019, 2:58 AM yeomii...@gmail.com wrote: > Hello, > > I'm suffering from similar problem with

Re: cassandra node was put down with oom error

2019-04-29 Thread yeomii999
ignificantly are bloom > filters and compression offsets - both scale with disk space, and both > increase during compaction. Large STCS compaction can cause pretty meaningful > allocations for these. Also, if you have an unusually low compression chunk > size or a very low bloom f

Re: cassandra node was put down with oom error

2019-01-26 Thread Jeff Jirsa
pretty meaningful allocations for these. Also, if you have an unusually low compression chunk size or a very low bloom filter FP ratio, those will be larger. -- Jeff Jirsa > On Jan 26, 2019, at 12:11 PM, Ayub M wrote: > > Cassandra node went down due to OOM, and checking the /var/lo

cassandra node was put down with oom error

2019-01-26 Thread Ayub M
Cassandra node went down due to OOM, and checking the /var/log/message I see below. ``` Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 Jan 23 20:07:17 ip-xxx-xxx-xxx-xxx kernel: java cpuset=/ mems_allowed=0 Jan 23 20:07:17 ip

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-28 Thread Jeff Jirsa
I’ve lost some context but there are two direct memory allocations per sstable - compression offsets and the bloom filter. Both of those get built during sstable creation and the bloom filter’s size is aggressively allocated , so you’ll see a big chunk of memory disappear as compaction kicks off

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-28 Thread Yuri de Wit
We On Fri, Dec 28, 2018, 4:23 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Fri, Dec 7, 2018 at 12:43 PM Oleksandr Shulgin < > oleksandr.shul...@zalando.de> wrote: > >> >> After a fresh JVM start the memory allocation looks roughly like this: >> >> total use

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-28 Thread Oleksandr Shulgin
On Fri, Dec 7, 2018 at 12:43 PM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > After a fresh JVM start the memory allocation looks roughly like this: > > total used free sharedbuffers cached > Mem: 14G14G 173M 1.1M

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-07 Thread Oleksandr Shulgin
On Thu, Dec 6, 2018 at 3:39 PM Riccardo Ferrari wrote: > To be honest I've never seen the OOM in action on those instances. My Xmx > was 8GB just like yours and that let me think you have some process that is > competing for memory, is it? Do you have any cron, any backup, anyth

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Riccardo Ferrari
Hi, To be honest I've never seen the OOM in action on those instances. My Xmx was 8GB just like yours and that let me think you have some process that is competing for memory, is it? Do you have any cron, any backup, anything that can trick the OOMKiller ? My unresponsiveness was seconds

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Oleksandr Shulgin
> iotop that was the kswapd0 process. That system was an ubuntu 16.04 > actually "Ubuntu 16.04.4 LTS". > Riccardo, Did you by chance also observe Linux OOM? How long did the unresponsiveness last in your case? >From there I started to dig what kswap process was involved in

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-06 Thread Riccardo Ferrari
Alex, I had few instances in the past that were showing that unresponsivveness behaviour. Back then I saw with iotop/htop/dstat ... the system was stuck on a single thread processing (full throttle) for seconds. According to iotop that was the kswapd0 process. That system was an ubuntu 16.04 actua

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Oleksandr Shulgin
On Wed, 5 Dec 2018, 19:34 Riccardo Ferrari Hi Alex, > > I saw that behaviout in the past. > Riccardo, Thank you for the reply! Do you refer to kswapd issue only or have you observed more problems that match behavior I have described? I can tell you the kswapd0 usage is connected to the `disk_a

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Oleksandr Shulgin
On Wed, 5 Dec 2018, 19:53 Jonathan Haddad Seeing high kswapd usage means there's a lot of churn in the page cache. > It doesn't mean you're using swap, it means the box is spending time > clearing pages out of the page cache to make room for the stuff you're > reading now. > Jon, Thanks for your

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Jonathan Haddad
ent instances in a seemingly >>> random fashion. Most of the time it affects only one instance, but we've >>> had one incident when 9 nodes (3 from each of the 3 Availability Zones) >>> were down at the same time due to this exact issue. >>> >>> Actu

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Jon Meredith
RAM). >> >> As a mitigation measure we have migrated away from those to r4.2xlarge. >> Then we didn't observe any issues for a few weeks, so we have scaled down >> two times: to r4.xlarge and then to r4.large. The last migration was >> completed before Nov 13th.

Re: Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Riccardo Ferrari
ave 4 vCPUs and 16GB > RAM). > > As a mitigation measure we have migrated away from those to r4.2xlarge. > Then we didn't observe any issues for a few weeks, so we have scaled down > two times: to r4.xlarge and then to r4.large. The last migration was > completed before Nov 13th.

Sporadic high IO bandwidth and Linux OOM killer

2018-12-05 Thread Oleksandr Shulgin
r4.xlarge and then to r4.large. The last migration was completed before Nov 13th. No changes to the cluster or application happened since that time. Now, after some weeks the issue appears again... When we are not fast enough to react and reboot the affected instance, we can see that ultimately Linux

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Laszlo Szabo
The last run I attempted used 135GB of RAM allocated to the JVM (arguments below), and while there are OOM errors, there is not a stack trace in either the system or debug log. On direct memory runs, there is a stack trace. The last Direct memory run used 60GB heaps and 60GB for off heap (that

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Jeff Jirsa
That's a direct memory OOM - it's not the heap, it's the offheap. You can see that gpsmessages.addressreceivedtime_idxgpsmessages.addressreceivedtime_idx is holding about 2GB of offheap memory (most of it for the bloom filter), but none of the others look like they're h

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Jonathan Haddad
By default Cassandra is set to generate a heap dump on OOM. It can be a bit tricky to figure out what’s going on exactly but it’s the best evidence you can work with. On Tue, Aug 7, 2018 at 6:30 AM Laszlo Szabo wrote: > Hi, > > Thanks for the fast response! > > We are not using a

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-07 Thread Laszlo Szabo
Aug 6, 2018 at 8:57 PM, Jeff Jirsa wrote: > > > Upgrading to 3.11.3 May fix it (there were some memory recycling bugs > fixed recently), but analyzing the heap will be the best option > > If you can print out the heap histogram and stack trace or open a heap > dump in your ki

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-06 Thread Jeff Jirsa
>> >> Hello All, >> >> I'm having JVM unstable / OOM errors when attempting to auto bootstrap a 9th >> node to an existing 8 node cluster (256 tokens). Each machine has 24 cores >> 148GB RAM and 10TB (2TB used). Under normal operation the 8 nodes have JVM

Re: Bootstrap OOM issues with Cassandra 3.11.1

2018-08-06 Thread Jeff Jirsa
Are you using materialized views or secondary indices? -- Jeff Jirsa > On Aug 6, 2018, at 3:49 PM, Laszlo Szabo > wrote: > > Hello All, > > I'm having JVM unstable / OOM errors when attempting to auto bootstrap a 9th > node to an existing 8 node cluster (256 tok

Bootstrap OOM issues with Cassandra 3.11.1

2018-08-06 Thread Laszlo Szabo
Hello All, I'm having JVM unstable / OOM errors when attempting to auto bootstrap a 9th node to an existing 8 node cluster (256 tokens). Each machine has 24 cores 148GB RAM and 10TB (2TB used). Under normal operation the 8 nodes have JVM memory configured with Xms35G and Xmx35G, and handl

Re: OOM after a while during compacting

2018-04-05 Thread Nate McCall
> > > - Heap size is set to 8GB > - Using G1GC > - I tried moving the memtable out of the heap. It helped but I still got > an OOM last night > - Concurrent compactors is set to 1 but it still happens and also tried > setting throughput between 16 and 128, no changes. >

Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
o this >> isn't an ideal solution) >> >> Some more info: >> - Version is the newest 3.11.2 with java8u116 >> - Using LeveledCompactionStrategy (we have mostly reads) >> - Heap size is set to 8GB >> - Using G1GC >> - I tried moving the memtable out

Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
;> - Version is the newest 3.11.2 with java8u116 >> - Using LeveledCompactionStrategy (we have mostly reads) >> - Heap size is set to 8GB >> - Using G1GC >> - I tried moving the memtable out of the heap. It helped but I still got an >> OOM last night >> - Co

Re: OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
gt; - Using LeveledCompactionStrategy (we have mostly reads) > - Heap size is set to 8GB > - Using G1GC > - I tried moving the memtable out of the heap. It helped but I still got > an OOM last night > - Concurrent compactors is set to 1 but it still happens and also tried > setti

Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
>> isn't an ideal solution) >> >> Some more info: >> - Version is the newest 3.11.2 with java8u116 >> - Using LeveledCompactionStrategy (we have mostly reads) >> - Heap size is set to 8GB >> - Using G1GC >> - I tried moving the memtable out of the heap.

Re: OOM after a while during compacting

2018-04-05 Thread Evelyn Smith
- Version is the newest 3.11.2 with java8u116 > - Using LeveledCompactionStrategy (we have mostly reads) > - Heap size is set to 8GB > - Using G1GC > - I tried moving the memtable out of the heap. It helped but I still got an > OOM last night > - Concurrent compactors is set to 1 but it

OOM after a while during compacting

2018-04-05 Thread Zsolt Pálmai
Some more info: - Version is the newest 3.11.2 with java8u116 - Using LeveledCompactionStrategy (we have mostly reads) - Heap size is set to 8GB - Using G1GC - I tried moving the memtable out of the heap. It helped but I still got an OOM last night - Concurrent compactors is set to 1 but it still ha

?????? secondary index creation causes C* oom

2018-01-10 Thread Peng Xiao
Thanks Kurt. -- -- ??: "kurt";; : 2018??1??11??(??) 11:46 ??: "User"; : Re: secondary index creation causes C* oom 1.not sure if secondary index creation is the same as index rebuild Fairly sure they

Re: secondary index creation causes C* oom

2018-01-10 Thread kurt greaves
> 1.not sure if secondary index creation is the same as index rebuild > Fairly sure they are the same. > 2.we noticed that the memory table flush looks still working,not the same > as CASSANDRA-12796 mentioned,but the compactionExecutor pending is > increasing. > Do you by chance have concurrent_c

secondary index creation causes C* oom

2018-01-09 Thread Peng Xiao
Dear All, We met some C* nodes oom during secondary index creation with C* 2.1.18. As per https://issues.apache.org/jira/browse/CASSANDRA-12796,the flush writer will be blocked by index rebuild.but we still have some confusions: 1.not sure if secondary index creation is the same as index

Re: sstablescrum fails with OOM

2017-11-03 Thread kurt greaves
able >> >> >> -- >> Jeff Jirsa >> >> >> On Nov 2, 2017, at 7:58 PM, sai krishnam raju potturi < >> pskraj...@gmail.com> wrote: >> >> Yes. Move the corrupt sstable, and run a repair on this node, so that it >> gets in sync with

Re: sstablescrum fails with OOM

2017-11-03 Thread Shashi Yachavaram
> On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram > wrote: > >> We are cassandra 2.0.17 and have corrupted sstables. Ran offline >> sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it >> still fails. >> >> Can we move the corrupted sstable file and rerun sstablescrub followed by >> repair. >> >> -shashi.. >> > >

Re: sstablescrum fails with OOM

2017-11-02 Thread Jeff Jirsa
aju potturi > wrote: > > Yes. Move the corrupt sstable, and run a repair on this node, so that it gets > in sync with it's peers. > >> On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram >> wrote: >> We are cassandra 2.0.17 and have corrupted sstables. Ran offli

Re: sstablescrum fails with OOM

2017-11-02 Thread sai krishnam raju potturi
Yes. Move the corrupt sstable, and run a repair on this node, so that it gets in sync with it's peers. On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram wrote: > We are cassandra 2.0.17 and have corrupted sstables. Ran offline > sstablescrub but it fails with OOM. Increased the MAX_H

sstablescrum fails with OOM

2017-11-02 Thread Shashi Yachavaram
We are cassandra 2.0.17 and have corrupted sstables. Ran offline sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it still fails. Can we move the corrupted sstable file and rerun sstablescrub followed by repair. -shashi..

Re: Nodes just dieing with OOM

2017-10-07 Thread Alain RODRIGUEZ
>> Sorry about that. We eventually found that one column family had some >>> large/corrupt data and causing OOM's >>> >>> Luckily it was a pretty ephemeral data set and we were able to just >>> truncate it. However, it was a guess based on some l

Re: Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
>> tombstones? Could that be the cause? What else would you recommend? >> >> Thank you in advance. >> >> On Fri, Oct 6, 2017 at 6:33 AM Brian Spindler >> wrote: >> >>> Hi guys, our cluster - around 18 nodes - just starting having nodes die >>> and when restarting them they are dying with OOM. How can we handle this? >>> I've tried adding a couple extra gigs on these machines to help but it's >>> not. >>> >>> Help! >>> -B >>> >>> >

Re: Nodes just dieing with OOM

2017-10-06 Thread Alain RODRIGUEZ
gt; On Fri, Oct 6, 2017 at 6:33 AM Brian Spindler > wrote: > >> Hi guys, our cluster - around 18 nodes - just starting having nodes die >> and when restarting them they are dying with OOM. How can we handle this? >> I've tried adding a couple extra gigs on these machines to help but it's >> not. >> >> Help! >> -B >> >>

Re: Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
just starting having nodes die > and when restarting them they are dying with OOM. How can we handle this? > I've tried adding a couple extra gigs on these machines to help but it's > not. > > Help! > -B > >

Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Hi guys, our cluster - around 18 nodes - just starting having nodes die and when restarting them they are dying with OOM. How can we handle this? I've tried adding a couple extra gigs on these machines to help but it's not. Help! -B

Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-11 Thread qiang zhang
be that's the reason. > That's a bit thin - depending on data model and data volume, you may be able to construct a read that fills up your 3G heap, and causes you to OOM with a single read. How much data is involved? What does 'nodetool tablestats' look like, and finally, how

Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-10 Thread Jeff Jirsa
ore INFO level or other level logs in that log file, while there are still > many logs in the system.log after 2017-07-07 14:43:59. Why doesn't these > two log files match? > > My hardware is 4 core cpu and 12G ram, and I'm using windows server 2012 > r2. That's a bi

Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-10 Thread 张强
Thanks for your reply! There are 3 column families, they are created by kairosdb, one column family takes almost all the workload. I didn't tune the heap size, so by default it'll be 3GB. I have monitored the cpu and memory usage, the cpu usage is about 30% in average, and the available memory is a

Re: Cassandra crashed with OOM, and the system.log and debug.log doesn't match.

2017-07-10 Thread Varun Barala
Hi, *How many column families are there? What is the heap size?* You can turn off logs for statusLogger.java and gc to optimize heap usage. Can you also monitor cpu usage and memory usage? IMO, in your case memory is the bottle-neck. Thanks!! On Mon, Jul 10, 2017 at 5:07 PM, 张强 wrote: > Hi

Re: cassandra OOM

2017-04-26 Thread Jean Carlo
uva.go...@aspect.com] > *Sent:* Tuesday, April 04, 2017 5:34 PM > *To:* user@cassandra.apache.org > *Subject:* Re: cassandra OOM > > > > Thanks, that’s interesting – so CMS is a better option for > stability/performance? We’ll try this out in our cluster. > > > >

Re: cassandra OOM

2017-04-25 Thread Carlos Rolo
> Sean Durity > > > > *From:* Gopal, Dhruva [mailto:dhruva.go...@aspect.com] > *Sent:* Tuesday, April 04, 2017 5:34 PM > *To:* user@cassandra.apache.org > *Subject:* Re: cassandra OOM > > > > Thanks, that’s interesting – so CMS is a better option for > stabi

RE: cassandra OOM

2017-04-25 Thread Durity, Sean R
We have seen much better stability (and MUCH less GC pauses) from G1 with a variety of heap sizes. I don’t even consider CMS any more. Sean Durity From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com] Sent: Tuesday, April 04, 2017 5:34 PM To: user@cassandra.apache.org Subject: Re: cassandra OOM

Re: cassandra OOM

2017-04-04 Thread Gopal, Dhruva
Thanks, that’s interesting – so CMS is a better option for stability/performance? We’ll try this out in our cluster. From: Alexander Dejanovski Reply-To: "user@cassandra.apache.org" Date: Monday, April 3, 2017 at 10:31 PM To: "user@cassandra.apache.org" Subject: Re: cassa

Re: cassandra OOM

2017-04-03 Thread Alexander Dejanovski
Hi, we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap when the workload is intense, and given you're running on m4.2xl I wouldn't go over 16GB for the heap. I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new g

Re: cassandra OOM

2017-04-03 Thread Gopal, Dhruva
ndra.apache.org" Date: Monday, April 3, 2017 at 8:00 AM To: "user@cassandra.apache.org" Subject: Re: cassandra OOM Hi, could you share your GC settings ? G1 or CMS ? Heap size, etc... Thanks, On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva mailto:dhruva.go...@aspect.com>> wro

Re: cassandra OOM

2017-04-03 Thread Alexander Dejanovski
Hi, could you share your GC settings ? G1 or CMS ? Heap size, etc... Thanks, On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva wrote: > Hi – > > We’ve had what looks like an OOM situation with Cassandra (we have a > dump file that got generated) in our staging (performance/

cassandra OOM

2017-04-02 Thread Gopal, Dhruva
Hi – We’ve had what looks like an OOM situation with Cassandra (we have a dump file that got generated) in our staging (performance/load testing environment) and I wanted to reach out to this user group to see if you had any recommendations on how we should approach our investigation as to

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Shravan C
In fact I truncated hints table to stabilize the cluster. Through the heap dumps I was able to identify the table on which there were numerous queries. Then I focused on system_traces.session table around the time OOM occurred. It turned out to be a full table scan on a large table which caused

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Jeff Jirsa
On 2017-03-03 09:18 (-0800), Shravan Ch wrote: > > nodetool compactionstats -H > pending tasks: 3 > compaction typekeyspace table > completed totalunit progress > Compaction system hints > 28.

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-07 Thread Jeff Jirsa
On 2017-03-04 07:23 (-0800), "Thakrar, Jayesh" wrote: > LCS does not rule out frequent updates - it just says that there will be more > frequent compaction, which can potentially increase compaction activity > (which again can be throttled as needed). > But STCS will

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-06 Thread Eric Evans
On Fri, Mar 3, 2017 at 11:18 AM, Shravan Ch wrote: > More than 30 plus Cassandra servers in the primary DC went down OOM > exception below. What puzzles me is the scale at which it happened (at the > same minute). I will share some more details below. You'd be surprised; When it&#

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Thakrar, Jayesh
s , "user@cassandra.apache.org" Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time I was looking at nodetool info across all nodes. Consistently JVM heap used is ~ 12GB and off heap is ~ 4-5GB. From: Thakrar, Jayesh Sent: Saturday, M

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Priyanka
Sent from my iPhone > On Mar 3, 2017, at 12:18 PM, Shravan Ch wrote: > > Hello, > > More than 30 plus Cassandra servers in the primary DC went down OOM exception > below. What puzzles me is the scale at which it happened (at the same > minute). I will share so

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Shravan C
I was looking at nodetool info across all nodes. Consistently JVM heap used is ~ 12GB and off heap is ~ 4-5GB. From: Thakrar, Jayesh Sent: Saturday, March 4, 2017 9:23:01 AM To: Shravan C; Joaquin Casares; user@cassandra.apache.org Subject: Re: OOM on Apache

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Edward Capriolo
On Saturday, March 4, 2017, Thakrar, Jayesh wrote: > LCS does not rule out frequent updates - it just says that there will be > more frequent compaction, which can potentially increase compaction > activity (which again can be throttled as needed). > > But STCS will guarantee OO

Re: OOM on Apache Cassandra on 30 Plus node at the same time

2017-03-04 Thread Thakrar, Jayesh
LCS does not rule out frequent updates - it just says that there will be more frequent compaction, which can potentially increase compaction activity (which again can be throttled as needed). But STCS will guarantee OOM when you have large datasets. Did you have a look at the offheap + onheap

  1   2   3   4   5   6   >