Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection

Jeremiah D Jordan Fri, 19 Nov 2021 08:00:34 -0800

If it is per query, then I would think protocol level might be easier to “test” 
a given application with.  Rather than having to append "WITH ADDITIONAL 
LATENCY” to all your queries, you just set some option in your query based 
object or such.  We already have support at the protocol level for arbitrary 
query options being added, if you are worried about some driver needing to add 
support it could be done through those.  Most of the drivers I have looked at 
provide a method to put data into that metadata.

I guess if you want to be the most flexible you could do both?  I could see 
such a setting being done on multiple levels, implementing one or more of the 
following:

1. As a new entry in the STARTUP message during connection handshake introduced 
in a new native protocol version -> add latency to every response over this 
connection
2. A new CQL command that sets the connection level latency -> all requests 
after this command on this connection get additional latency XYZ (this probably 
does still need some driver support, like USE as it would need the driver to 
know to run the command on every open connection it had)
3. A new CQL command that sets the latency to a given ip/user -> all requests 
after this command for a connection from the specified ip/user to the current 
node get additional latency XYZ (this could help getting around multiple 
connection issues, though unless it got propagated to all nodes in the cluster 
you would still need some driver support to send the command to every node the 
client was connected to)
4. As part of the request custom payload side channel -> just affects this query
5. As part of a new flag introduced in a new native protocol version -> just 
affects this query
6. As part of the CQL statements themselves -> just affects this query

I can see good uses for most of those.  A CQL command to enable it globally (2 
or 3), and then additional CQL for per query (6) is probably supported by the 
most existing clients without needing any changes.  I do think just having a 
new per statement CQL option is not a great choice.  Though the limitations of 
how 2/3 could be implemented make me think the “per request custom payload” may 
actually be the option that is the most useful with the least driver/user code 
change needed to work with it.

-Jeremiah

> On Nov 19, 2021, at 8:25 AM, [email protected] wrote:
> 
> To resurrect this discussion briefly, does anyone have a preference for 
> either CQL Grammar or Protocol support?
> 
> This originally felt to me like something we might want to support at the 
> native protocol level, however that creates a dependency on specific clients 
> and the feature might ultimately be less flexible. It’s not clear why we 
> wouldn’t prefer some kind of CQL change like:
> 
> SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY
> 
> With queries being able to supply specific latencies if they so choose:
> 
> SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY 4ms
> 
> That might even support some DC->DC map for additional latencies:
> 
> SELECT * FROM table WHERE pk = x WITH ADDITIONAL LATENCY ‘{dc1:{dc2: 4ms}}’
> 
> This would leave applications a great deal of flexibility for experimenting 
> with latency impacts, and greater ease for evolving this feature over time 
> than specifying query eligibility at the protocol level.
> 
> Does anyone have any thoughts about this?
> 
> From: [email protected] <mailto:[email protected]> <[email protected] 
> <mailto:[email protected]>>
> Date: Wednesday, 6 October 2021 at 14:48
> To: [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>>
> Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> This is a very good point. I forget the reason we settled on consistency 
> levels, I assume it was due to simplicity of the solution, as deploying 
> support for a new protocol-level change is more involved.
> 
> That’s probably not a good reason here, and I agree that overloading 
> consistency level feels wrong. I hope we will retire user-provided 
> consistency levels over the coming year or two, which is another good reason 
> not to begin enhancing it with new meanings.
> 
> I will rework the ticket and patches.
> 
> From: Paulo Motta <[email protected]>
> Date: Wednesday, 6 October 2021 at 14:37
> To: Cassandra DEV <[email protected]>
> Subject: Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection
> This sounds like a great feature!
> 
> I wonder if Consistencylevel is the best way to expose this to users
> though, can't we implement this via another driver/protocol option ? Ie.
> "delay_enabled" flag that would be a modifier to an existing CL.
> 
> If we decide to go the CL route, I wonder if this isn't a good opportunity
> to introduce pluggable consistency levels (CASSANDRA-8119 <
> https://issues.apache.org/jira/browse/CASSANDRA-8119 
> <https://issues.apache.org/jira/browse/CASSANDRA-8119>>)<https://issues.apache.org/jira/browse/CASSANDRA-8119%3e
>  <https://issues.apache.org/jira/browse/CASSANDRA-8119%3e>)> so these would 
> only
> become available when the feature is enabled.
> 
> My concern here is adding niche consistency levels to the default CL table
> which may create confusion to non-power users.
> 
> Em qua., 6 de out. de 2021 às 10:12, [email protected] <
> [email protected]> escreveu:
> 
>> Hi Everyone,
>> 
>> This is a modest user-facing feature that I want to highlight in case
>> anyone has any input. In order to validate if a real cluster may modify its
>> topology or consistency level (e.g. from local to global), this ticket
>> introduces a facility for injecting latency to internode messages. This is
>> particularly helpful for high-availability topologies, and in particular
>> for LWTs (where performance may be unpredictable due to contention), so
>> that real traffic may be modified to experience gradually increasing
>> latency in order to validate a topology (or the impact of a global
>> consistency level) before any transition is undertaken.
>> 
>> The user-visible changes include new config parameters, new JMX end points
>> for modifying these parameters, and new consistency levels that may be
>> supplied to mark queries as suitable for latency injection (so that
>> applications may nominate queries for this mechanism)

Re: [DISCUSS] CASSANDRA-17024: Artificial Latency Injection

Reply via email to