Hi,
We're running an embedded Janus graph on top of Cassandra. On starting the
graph, it creates certain tables in cassandra for its operation. We noticed
that there is an Id mismatch for one of the tables named
system_properties_lock i.e.
Id fetched from schema table of cass (SELECT keyspace_nam
TCP aging value is 10 mins. So with 7200 seconds for tcp_keepalive_time
node was going unresponsive. Is TCP aging value tool low or right enough?
On Mon, Jan 24, 2022 at 11:32 PM Bowen Song wrote:
> Is reconfiguring your firewall an option? A stateful firewall really
> shouldn't remove a TCP co
From the source code I've read, by default Cassandra will run a clean
up for the system.repairs table every 10 minutes, any row related to a
repair that has completed over 1 day ago will be automatically removed.
I highly doubt that you have ran 75,000 repairs in the 24 hours prior to
shutting
Hi Bowen,
Yes, there does seem to be a lot of rows, on one of the upgraded clusters there
75,000 rows.
I have been experimenting on a test cluster, this has about a 5 minute pause,
and around 15,000 rows.
If I clear the system.repairs table ( by deleting the sstables ) then this does
not pau
Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a
slow starting node, do you get a lots of rows? This is the most obvious
loop run (indirectly) by the ActiveRepairService.start().
On 24/01/2022 13:30, Romain Anselin wrote:
Hi everyone,
We generated a JFR profile of the st
Is reconfiguring your firewall an option? A stateful firewall really
shouldn't remove a TCP connection in such short time, unless the number
of connections is very large and generally short lived (which often see
in web servers).
On 24/01/2022 13:03, manish khandelwal wrote:
Hi All
Thanks fo
Hi everyone,
We generated a JFR profile of the startup phase of Cassandra with Paul, and it would appear that
the time is spent in the ActiveRepairSession within the main thread (11mn of execution of the
"main" thread in his environment, vs 15s in mine), which has been introduced in
CASSANDRA-
Hi All
Thanks for the suggestions. The issue was *tcp_keepalive_time* has the
default value (7200 seconds). So once the idle connection is broken by the
firewall, the application (Cassandra node) was getting notified very late.
Thus we were seeing one node sending merkle tree and other not receivi