Behavior of replicas and forwarded requests

Kyle Butt Fri, 23 May 2025 00:00:56 -0700

I have a keyspace that is using the NetworkTopologyStrategy. In one ofthe datacenters, there are 3 nodes, and the replication factor is 3. I wouldexpect thatwhen I make a request to any of the 3 nodes, that node would answerinstead offorwarding the request. But that doesn't always happen, and I can't seemto find

any reason that would trigger it. For example, look at the following traces:

--------------------------------------+------------+---------+-------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+----------

fea76550-3775-11f0-88c7-67d23f0ba311 | 10.0.4.68 | QUERY |10.0.4.63 | 7000 | 5024 | {'bound_var_0_slugs':'[''rules'']', 'consistency_level': 'LOCAL_ONE', 'query': 'SELECT title,content, "tableOfContents" FROM "CmsPage" WHERE slugs = ?','serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query |2025-05-23 01:33:57.157000+0000 008c2ae0-3776-11f0-88c7-67d23f0ba311 | 10.0.4.68 | QUERY |10.0.4.63 | 7000 | 1681 | {'bound_var_0_slugs':'[''rules'']', 'consistency_level': 'LOCAL_ONE', 'query': 'SELECT title,content, "tableOfContents" FROM "CmsPage" WHERE slugs = ?','serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query |2025-05-23 01:34:00.334000+0000


Trace Events:

--------------------------------------+--------------------------------------+--------------------------------------------------------------------------+------------+----------------+-------------+-----------------------------

fea76550-3775-11f0-88c7-67d23f0ba311 |fea78c60-3775-11f0-88c7-67d23f0ba311| reading data from/10.0.3.184:7000 | 10.0.4.63 | 674 | 7000 |Native-Transport-Requests-1 fea76550-3775-11f0-88c7-67d23f0ba311 |fea78c6a-3775-11f0-88c7-67d23f0ba311 | Sending READ_REQ message to/10.0.3.184:7000 message size 182 bytes | 10.0.4.63 | 994| 7000 | Messaging-EventLoop-3-1 fea76550-3775-11f0-88c7-67d23f0ba311 |fea828a0-3775-11f0-88c7-67d23f0ba311 | READ_RSP message received from/10.0.3.184:7000 | 10.0.4.63 | 4558 | 7000 | Messaging-EventLoop-3-3 fea76550-3775-11f0-88c7-67d23f0ba311 |fea828a0-3775-11f0-accb-b7ca6218b7fc | READ_REQ message received from/10.0.4.63:7000 | 10.0.3.184 | 70 | 7000 | Messaging-EventLoop-3-3 fea76550-3775-11f0-88c7-67d23f0ba311 |fea828aa-3775-11f0-88c7-67d23f0ba311 | Processing response from /10.0.3.184:7000 | 10.0.4.63 | 4683| 7000 | RequestResponseStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea84fb0-3775-11f0-accb-b7ca6218b7fc | Executing single-partition query on CmsPage | 10.0.3.184 | 718 | 7000 | ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea84fba-3775-11f0-accb-b7ca6218b7fc| Acquiring sstablereferences | 10.0.3.184 | 836 | 7000 | ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea84fc4-3775-11f0-accb-b7ca6218b7fc| Merging memtablecontents | 10.0.3.184 | 871 | 7000 | ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea84fce-3775-11f0-accb-b7ca6218b7fc | Bloom filter allowsskipping sstable 3gqh_0etw_07i282pmrgsayqybc9 | 10.0.3.184 | 932 | 7000 | ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea84fd8-3775-11f0-accb-b7ca6218b7fc | Partition index found for sstable3gqh_0etv_3lghs2pmrgsayqybc9, size = 0 | 10.0.3.184 | 1068| 7000 | ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea876c0-3775-11f0-accb-b7ca6218b7fc | Read 1 live rows and 0 tombstone cells | 10.0.3.184 | 1437| 7000 | ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea876ca-3775-11f0-accb-b7ca6218b7fc| Enqueuing response to/10.0.4.63:7000 | 10.0.3.184 | 1493 | 7000| ReadStage-2 fea76550-3775-11f0-88c7-67d23f0ba311 |fea876d4-3775-11f0-accb-b7ca6218b7fc | Sending READ_RSP message to/10.0.4.63:7000 message size 2322 bytes | 10.0.3.184 | 1843| 7000 | Messaging-EventLoop-3-4

--------------------------------------+--------------------------------------+--------------------------------------------------------------------------+-----------+----------------+-------------+-------------

008c2ae0-3776-11f0-88c7-67d23f0ba311 |008c51f0-3776-11f0-88c7-67d23f0ba311 | Executing single-partition query on CmsPage | 10.0.4.63 | 843| 7000 | ReadStage-2 008c2ae0-3776-11f0-88c7-67d23f0ba311 |008c51fa-3776-11f0-88c7-67d23f0ba311| Acquiring sstablereferences | 10.0.4.63 | 948 | 7000 | ReadStage-2 008c2ae0-3776-11f0-88c7-67d23f0ba311 |008c5204-3776-11f0-88c7-67d23f0ba311| Merging memtablecontents | 10.0.4.63 | 989 | 7000 | ReadStage-2 008c2ae0-3776-11f0-88c7-67d23f0ba311 |008c520e-3776-11f0-88c7-67d23f0ba311 | Partition index found for sstable3go5_02qz_3ttfe2hrx6r82xf1x4, size = 0 | 10.0.4.63 | 1129| 7000 | ReadStage-2 008c2ae0-3776-11f0-88c7-67d23f0ba311 |008c7900-3776-11f0-88c7-67d23f0ba311 | Read 1 live rows and 0 tombstone cells | 10.0.4.63 | 1417| 7000 | ReadStage-2

The client is the same, with the same coordinator, the same preparedquery, and

the same query parameters.

Initially, I was worried that it might be a miscoordination of preparedqueries

to the host where they were prepared. I'm working on improving the Haskell
cassandra driver, and adding a token aware routing policy.

The library currently makes the following assumption:

    The spec scopes the 'QueryId' to the node the query has
    been prepared with. The spec does not state anything
    about the format of the 'QueryId'. However the official
    Java driver assumes that any given 'QueryString' yields
    the same 'QueryId' on every node. This client make the
    same assumption.

But if that broken assumption were causing the behavior that I'm seeing,I would

expect it to be consistent based on the client IP, and I'm not seeing that.
Perhaps if the query was prepared on the coordinator transparently while the
query was forwarded, it might explain what I'm seeing.

I've tried do a full repair on all of the nodes in the cluster for this
keyspace, and it didn't seem to change anything.

I'm happy to try and get more information. If there are details thatI've left

out that would help diagnose this issue, I'd be happy to get them.

In a related question: I'm not sure if I should be spreading out readrequests

across the replicas, or if they should instead go to the primary replica,
assuming that it is up.



Hopefully someone can explain what I'm missing.

Thank You,
Kyle

Behavior of replicas and forwarded requests

Reply via email to