I have a keyspace that is using the NetworkTopologyStrategy. In one of
the data
centers, there are 3 nodes, and the replication factor is 3. I would
expect that
when I make a request to any of the 3 nodes, that node would answer
instead of
forwarding the request. But that doesn't always happen, and I can't seem
to find
any reason that would trigger it. For example, look at the following traces:
session_id | client | command |
coordinator | coordinator_port | duration | parameters |
request | started_at
--------------------------------------+------------+---------+-------------+------------------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------+----------
fea76550-3775-11f0-88c7-67d23f0ba311 | 10.0.4.68 | QUERY |
10.0.4.63 | 7000 | 5024 | {'bound_var_0_slugs':
'[''rules'']', 'consistency_level': 'LOCAL_ONE', 'query': 'SELECT title,
content, "tableOfContents" FROM "CmsPage" WHERE slugs = ?',
'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query |
2025-05-23 01:33:57.157000+0000
008c2ae0-3776-11f0-88c7-67d23f0ba311 | 10.0.4.68 | QUERY |
10.0.4.63 | 7000 | 1681 | {'bound_var_0_slugs':
'[''rules'']', 'consistency_level': 'LOCAL_ONE', 'query': 'SELECT title,
content, "tableOfContents" FROM "CmsPage" WHERE slugs = ?',
'serial_consistency_level': 'SERIAL'} | Execute CQL3 prepared query |
2025-05-23 01:34:00.334000+0000
Trace Events:
session_id |
event_id | activity | source |
source_elapsed | source_port | thread
--------------------------------------+--------------------------------------+--------------------------------------------------------------------------+------------+----------------+-------------+-----------------------------
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea78c60-3775-11f0-88c7-67d23f0ba311
| reading data from
/10.0.3.184:7000 | 10.0.4.63 | 674 | 7000 |
Native-Transport-Requests-1
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea78c6a-3775-11f0-88c7-67d23f0ba311 | Sending READ_REQ message to
/10.0.3.184:7000 message size 182 bytes | 10.0.4.63 | 994
| 7000 | Messaging-EventLoop-3-1
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea828a0-3775-11f0-88c7-67d23f0ba311 | READ_RSP message received from
/10.0.3.184:7000 | 10.0.4.63 | 4558 | 7000 |
Messaging-EventLoop-3-3
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea828a0-3775-11f0-accb-b7ca6218b7fc | READ_REQ message received from
/10.0.4.63:7000 | 10.0.3.184 | 70 | 7000 |
Messaging-EventLoop-3-3
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea828aa-3775-11f0-88c7-67d23f0ba311 |
Processing response from /10.0.3.184:7000 | 10.0.4.63 | 4683
| 7000 | RequestResponseStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea84fb0-3775-11f0-accb-b7ca6218b7fc |
Executing single-partition query on CmsPage | 10.0.3.184 |
718 | 7000 | ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea84fba-3775-11f0-accb-b7ca6218b7fc
| Acquiring sstable
references | 10.0.3.184 | 836 | 7000 |
ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea84fc4-3775-11f0-accb-b7ca6218b7fc
| Merging memtable
contents | 10.0.3.184 | 871 | 7000 |
ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea84fce-3775-11f0-accb-b7ca6218b7fc | Bloom filter allows
skipping sstable 3gqh_0etw_07i282pmrgsayqybc9 | 10.0.3.184 |
932 | 7000 | ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea84fd8-3775-11f0-accb-b7ca6218b7fc | Partition index found for sstable
3gqh_0etv_3lghs2pmrgsayqybc9, size = 0 | 10.0.3.184 | 1068
| 7000 | ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea876c0-3775-11f0-accb-b7ca6218b7fc |
Read 1 live rows and 0 tombstone cells | 10.0.3.184 | 1437
| 7000 | ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea876ca-3775-11f0-accb-b7ca6218b7fc
| Enqueuing response to
/10.0.4.63:7000 | 10.0.3.184 | 1493 | 7000
| ReadStage-2
fea76550-3775-11f0-88c7-67d23f0ba311 |
fea876d4-3775-11f0-accb-b7ca6218b7fc | Sending READ_RSP message to
/10.0.4.63:7000 message size 2322 bytes | 10.0.3.184 | 1843
| 7000 | Messaging-EventLoop-3-4
session_id |
event_id | activity | source |
source_elapsed | source_port | thread
--------------------------------------+--------------------------------------+--------------------------------------------------------------------------+-----------+----------------+-------------+-------------
008c2ae0-3776-11f0-88c7-67d23f0ba311 |
008c51f0-3776-11f0-88c7-67d23f0ba311 |
Executing single-partition query on CmsPage | 10.0.4.63 | 843
| 7000 | ReadStage-2
008c2ae0-3776-11f0-88c7-67d23f0ba311 |
008c51fa-3776-11f0-88c7-67d23f0ba311
| Acquiring sstable
references | 10.0.4.63 | 948 | 7000 | ReadStage-2
008c2ae0-3776-11f0-88c7-67d23f0ba311 |
008c5204-3776-11f0-88c7-67d23f0ba311
| Merging memtable
contents | 10.0.4.63 | 989 | 7000 | ReadStage-2
008c2ae0-3776-11f0-88c7-67d23f0ba311 |
008c520e-3776-11f0-88c7-67d23f0ba311 | Partition index found for sstable
3go5_02qz_3ttfe2hrx6r82xf1x4, size = 0 | 10.0.4.63 | 1129
| 7000 | ReadStage-2
008c2ae0-3776-11f0-88c7-67d23f0ba311 |
008c7900-3776-11f0-88c7-67d23f0ba311 |
Read 1 live rows and 0 tombstone cells | 10.0.4.63 | 1417
| 7000 | ReadStage-2
The client is the same, with the same coordinator, the same prepared
query, and
the same query parameters.
Initially, I was worried that it might be a miscoordination of prepared
queries
to the host where they were prepared. I'm working on improving the Haskell
cassandra driver, and adding a token aware routing policy.
The library currently makes the following assumption:
The spec scopes the 'QueryId' to the node the query has
been prepared with. The spec does not state anything
about the format of the 'QueryId'. However the official
Java driver assumes that any given 'QueryString' yields
the same 'QueryId' on every node. This client make the
same assumption.
But if that broken assumption were causing the behavior that I'm seeing,
I would
expect it to be consistent based on the client IP, and I'm not seeing that.
Perhaps if the query was prepared on the coordinator transparently while the
query was forwarded, it might explain what I'm seeing.
I've tried do a full repair on all of the nodes in the cluster for this
keyspace, and it didn't seem to change anything.
I'm happy to try and get more information. If there are details that
I've left
out that would help diagnose this issue, I'd be happy to get them.
In a related question: I'm not sure if I should be spreading out read
requests
across the replicas, or if they should instead go to the primary replica,
assuming that it is up.
Hopefully someone can explain what I'm missing.
Thank You,
Kyle