Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-04-10 Thread Jaydeep Chovatia
Just created an official CEP-41

incorporating the feedback from this discussion. Feel free to let me know
if I may have missed some important feedback in this thread that is not
captured in the CEP-41.

Jaydeep

On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Thanks, Josh. I will file an official CEP with all the details in a few
> days and update this thread with that CEP number.
> Thanks a lot everyone for providing valuable insights!
>
> Jaydeep
>
> On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie 
> wrote:
>
>> Do folks think we should file an official CEP and take it there?
>>
>> +1 here.
>>
>> Synthesizing your gdoc, Caleb's work, and the feedback from this thread
>> into a draft seems like a solid next step.
>>
>> On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
>>
>> I see a lot of great ideas being discussed or proposed in the past to
>> cover the most common rate limiter candidate use cases. Do folks think we
>> should file an official CEP and take it there?
>>
>> Jaydeep
>>
>> On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe 
>> wrote:
>>
>> I just remembered the other day that I had done a quick writeup on the
>> state of compaction stress-related throttling in the project:
>>
>>
>> https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
>>
>> I'm sure most of it is old news to the people on this thread, but I
>> figured I'd post it just in case :)
>>
>> On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie 
>> wrote:
>>
>>
>> 2.) We should make sure the links between the "known" root causes of
>> cascading failures and the mechanisms we introduce to avoid them remain
>> very strong.
>>
>> Seems to me that our historical strategy was to address individual known
>> cases one-by-one rather than looking for a more holistic load-balancing and
>> load-shedding solution. While the engineer in me likes the elegance of a
>> broad, more-inclusive *actual SEDA-like* approach, the pragmatist in me
>> wonders how far we think we are today from a stable set-point.
>>
>> i.e. are we facing a handful of cases where nodes can still get pushed
>> over and then cascade that we can surgically address, or are we facing a
>> broader lack of back-pressure that rears its head in different domains
>> (client -> coordinator, coordinator -> replica, internode with other
>> operations, etc) at surprising times and should be considered more
>> holistically?
>>
>> On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
>>
>> I almost forgot CASSANDRA-15817, which introduced
>> reject_repair_compaction_threshold, which provides a mechanism to stop
>> repairs while compaction is underwater.
>>
>> On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> Hey all,
>>
>> I'm a bit late to the discussion. I see that we've already discussed
>> CASSANDRA-15013 
>>  and CASSANDRA-16663
>>  at least in
>> passing. Having written the latter, I'd be the first to admit it's a crude
>> tool, although it's been useful here and there, and provides a couple
>> primitives that may be useful for future work. As Scott mentions, while it
>> is configurable at runtime, it is not adaptive, although we did
>> make configuration easier in CASSANDRA-17423
>> . It also is
>> global to the node, although we've lightly discussed some ideas around
>> making it more granular. (For example, keyspace-based limiting, or limiting
>> "domains" tagged by the client in requests, could be interesting.) It also
>> does not deal with inter-node traffic, of course.
>>
>> Something we've not yet mentioned (that does address internode traffic)
>> is CASSANDRA-17324
>> , which I
>> proposed shortly after working on the native request limiter (and have just
>> not had much time to return to). The basic idea is this:
>>
>> When a node is struggling under the weight of a compaction backlog and
>> becomes a cause of increased read latency for clients, we have two safety
>> valves:
>>
>>
>> 1.) Disabling the native protocol server, which stops the node from
>> coordinating reads and writes.
>> 2.) Jacking up the severity on the node, which tells the dynamic snitch
>> to avoid the node for reads from other coordinators.
>>
>>
>> These are useful, but we don’t appear to have any mechanism that would
>> allow us to temporarily reject internode hint, batch, and mutation messages
>> that could further delay resolution of the compaction backlog.
>>
>>
>> Whether it's done as part of a larger framework or on its own, it still
>> feels like a good idea.
>>
>> Thinking in terms of opportunity costs here (i.e. where we spend our
>> finite engineering time to holisti

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-10 Thread C. Scott Andreas
Oh, one note on this item:>  The operator can ensure that files in the destination matches with the source. In the first iteration of this feature, an API is introduced to calculate digest for the list of file names and their lengths to identify any mismatches. It does not validate the file contents at the binary level, but, such feature can be added at a later point of time.When enabled for LCS, single sstable uplevel will mutate only the level of an SSTable in its stats metadata component, which wouldn't alter the filename and may not alter the length of the stats metadata component. A change to the level of an SSTable on the source via single sstable uplevel may not be caught by a digest based only on filename and length.Including the file’s modification timestamp would address this without requiring a deep hash of the data. This would be good to include to ensure SSTables aren’t downleveled unexpectedly during migration.- ScottOn Apr 8, 2024, at 2:15 PM, C. Scott Andreas  wrote:Hi Jon,Thanks for taking the time to read and reply to this proposal. Would encourage you to approach it from an attitude of seeking understanding on the part of the first-time CEP author, as this reply casts it off pretty quickly as NIH.The proposal isn't mine, but I'll offer a few notes on where I see this as valuable:– It's valuable for Cassandra to have an ecosystem-native mechanism of migrating data between physical/virtual instances outside the standard streaming path. As Hari mentions, the current ecosystem-native approach of executing repairs, decommissions, and bootstraps is time-consuming and cumbersome.– An ecosystem-native solution is safer than a bunch of bash and rsync. Defining a safe protocol to migrate data between instances via rsync without downtime is surprisingly difficult - and even moreso to do safely and repeatedly at scale. Enabling this process to be orchestrated by a control plane mechanizing offical endpoints of the database and sidecar – rather than trying to move data around behind its back – is much safer than hoping one's cobbled together the right set of scripts to move data in a way that won't violate strong / transactional consistency guarantees. This complexity is kind of exemplified by the "Migrating One Instance" section of the doc and state machine diagram, which illustrates an approach to solving that problem.– An ecosystem-native approach poses fewer security concerns than rsync. mTLS-authenticated endpoints in the sidecar for data movement eliminate the requirement for orchestration to occur via (typically) high-privilege SSH, which often allows for code execution of some form or complex efforts to scope SSH privileges of particular users; and eliminates the need to manage and secure rsyncd processes on each instance if not via SSH.– An ecosystem-native approach is more instrumentable and measurable than rsync. Support for data migration endpoints in the sidecar would allow for metrics reporting, stats collection, and alerting via mature and modern mechanisms rather than monitoring the output of a shell script.I'll yield to Hari to share more, though today is a public holiday in India.I do see this CEP as solving an important problem.Thanks,– ScottOn Apr 8, 2024, at 10:23 AM, Jon Haddad  wrote:This seems like a lot of work to create an rsync alternative.  I can't really say I see the point.  I noticed your "rejected alternatives" mentions it with this note:However, it might not be permitted by the administrator or available in various environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar facilitates smooth instance migration.This feels more like NIH than solving a real problem, as what you've listed is a hypothetical, and one that's easily addressed.JonOn Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala  wrote:Hi all,I have filed CEP-40 [1] for live migrating Cassandra instances using the Cassandra Sidecar.When someone needs to move all or a portion of the Cassandra nodes belonging to a cluster to different hosts, the traditional approach of Cassandra node replacement can be time-consuming due to repairs and the bootstrapping of new nodes. Depending on the volume of the storage service load, replacements (repair + bootstrap) may take anywhere from a few hours to days.Proposing a Sidecar based solution to address these challenges. This solution proposes transferring data from the old host (source) to the new host (destination) and then bringing up the Cassandra process at the destination, to enable fast instance migration. This approach would help to minimise node downtime, as it is based on a Sidecar solution for data transfer and avoids repairs and bootstrap.Looking forward to the discussions.[1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+InstancesThanks!Hari

Re: discuss: add to_human_size function

2024-04-10 Thread Brad
It's a useful idea and something supported in other databases.

MySQL has FORMAT function:

FORMAT(X,D[,locale])


Formats the number X to a format like '#,###,###.##', rounded to D decimal
places, and returns the result as a string. If D is 0, the result has no
decimal point or fractional part. If X or D is NULL, the function returns
NULL.FORMAT(X,D[,locale])



ex:


SELECT FORMAT(250500.5634, 2);

250,500.56


SELECT FORMAT(250500.5634,0);

250,500


https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_format


On Tue, Apr 9, 2024 at 8:10 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Hi,
>
> I want to propose CASSANDRA-19546. It would be possible to convert raw
> numbers to something human-friendly.
> There are cases when we write just a number of bytes in our system tables
> but these numbers are just hard to parse visually. Users can indeed use
> this for their tables too if they find it useful.
>
> Also, a user can indeed write a UDF for this but I would prefer if we had
> something baked in.
>
> Does this make sense to people? Are there any other approaches to do this?
>
> https://issues.apache.org/jira/browse/CASSANDRA-19546
> https://github.com/apache/cassandra/pull/3239/files
>
> Regards
>