Table Id mismatches

2022-01-24 Thread Amandeep Srivastava
Hi, We're running an embedded Janus graph on top of Cassandra. On starting the graph, it creates certain tables in cassandra for its operation. We noticed that there is an Id mismatch for one of the tables named system_properties_lock i.e. Id fetched from schema table of cass (SELECT keyspace_nam

Re: Hanging repairs in Cassandra

2022-01-24 Thread manish khandelwal
TCP aging value is 10 mins. So with 7200 seconds for tcp_keepalive_time node was going unresponsive. Is TCP aging value tool low or right enough? On Mon, Jan 24, 2022 at 11:32 PM Bowen Song wrote: > Is reconfiguring your firewall an option? A stateful firewall really > shouldn't remove a TCP co

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Bowen Song
From the source code I've read, by default Cassandra will run a clean up for the system.repairs table every 10 minutes, any row related to a repair that has completed over 1 day ago will be automatically removed. I highly doubt that you have ran 75,000 repairs in the 24 hours prior to shutting

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Paul Chandler
Hi Bowen, Yes, there does seem to be a lot of rows, on one of the upgraded clusters there 75,000 rows. I have been experimenting on a test cluster, this has about a 5 minute pause, and around 15,000 rows. If I clear the system.repairs table ( by deleting the sstables ) then this does not pau

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Bowen Song
Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a slow starting node, do you get a lots of rows? This is the most obvious loop run (indirectly) by the ActiveRepairService.start(). On 24/01/2022 13:30, Romain Anselin wrote: Hi everyone, We generated a JFR profile of the st

Re: Hanging repairs in Cassandra

2022-01-24 Thread Bowen Song
Is reconfiguring your firewall an option? A stateful firewall really shouldn't remove a TCP connection in such short time, unless the number of connections is very large and generally short lived (which often see in web servers). On 24/01/2022 13:03, manish khandelwal wrote: Hi All Thanks fo

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Romain Anselin
Hi everyone, We generated a JFR profile of the startup phase of Cassandra with Paul, and it would appear that the time is spent in the ActiveRepairSession within the main thread (11mn of execution of the "main" thread in his environment, vs 15s in mine), which has been introduced in CASSANDRA-

Re: Hanging repairs in Cassandra

2022-01-24 Thread manish khandelwal
Hi All Thanks for the suggestions. The issue was *tcp_keepalive_time* has the default value (7200 seconds). So once the idle connection is broken by the firewall, the application (Cassandra node) was getting notified very late. Thus we were seeing one node sending merkle tree and other not receivi