Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-07 Thread Jon Haddad
+1

On 2023/02/06 16:15:19 Sam Tunnicliffe wrote:
> Hi everyone,
> 
> I would like to start a vote on this CEP.
> 
> Proposal:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
> 
> Discussion:
> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
> 
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding vetoes.
> 
> Thanks,
> Sam


Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-10 Thread Jon Haddad
Good suggestion Mike.  I'm +1 on the idea and agree the name KEYSPACE is 
confusing to new users.

Jon

On 2023/04/04 15:48:26 Mike Adamson wrote:
> Hi,
> 
> I'd like to propose that we add DATABASE to the CQL grammar as an
> alternative to KEYSPACE.
> 
> Background: While TABLE was introduced as an alternative for COLUMNFAMILY
> in the grammar we have kept KEYSPACE for the container name for a group of
> tables. Nearly all traditional SQL databases use DATABASE as the container
> name for a group of tables so it would make sense for Cassandra to adopt
> this naming as well.
> 
> KEYSPACE would be kept in the grammar but we would update some logging and
> documentation to encourage use of the new name.
> 
> Mike Adamson
> 
> -- 
> [image: DataStax Logo Square]  *Mike Adamson*
> Engineering
> 
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online: [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS Feed]
>    [image: Github Logo]
> 
> 


Re: [DISCUSS] [PATCH] Enable Direct I/O For CommitLog Files

2023-04-21 Thread Jon Haddad
This sounds awesome.  Could you share the fio configuration you used to 
benchmark and what hardware you used?  


On 2023/04/18 18:10:24 "Pawar, Amit" wrote:
> [Public]
> 
> Hi,
> 
> I shared my investigation about Commitlog I/O issue on large core count 
> system in my previous email dated July-22 and link to the thread is given 
> below.
> https://lists.apache.org/thread/xc5ocog2qz2v2gnj4xlw5hbthfqytx2n
> 
> Basically, two solutions looked possible to improve the CommitLog I/O.
> 
>   1.  Multi-threaded syncing
>   2.  Using Direct-IO through JNA
> 
> I worked on 2nd option considering the following benefit compared to the 
> first one
> 
>   1.  Direct I/O read/write throughput is very high compared to non-Direct 
> I/O. Learnt through FIO benchmarking.
>   2.  Reduces kernel file cache uses which in-turn reduces kernel I/O 
> activity for Commitlog files only.
>   3.  Overall CPU usage reduced for flush activity. JVisualvm shows CPU usage 
> < 30% for Commitlog syncer thread with Direct I/O feature
>   4.  Direct I/O implementation is easier compared to multi-threaded
> 
> As per the community suggestion, less in code complex is good to have. Direct 
> I/O enablement looked promising but there was one issue.
> Java version 8 does not have native support to enable Direct I/O. So, JNA 
> library usage is must. The same implementation should also work across other 
> versions of Java (like 11 and beyond).
> 
> I have completed Direct I/O implementation and summary of the attached patch 
> changes are given below.
> 
>   1.  This implementation is not using Java file channels and file is opened 
> through JNA to use Direct I/O feature.
>   2.  New Segment are defined named "DirectIOSegment"  for Direct I/O and 
> "NonDirectIOSegment" for non-direct I/O (NonDirectIOSegment is test purpose 
> only).
>   3.  JNA write call is used to flush the changes.
>   4.  New helper functions are defined in NativeLibrary.java and platform 
> specific file. Currently tested on Linux only.
>   5.  Patch allows user to configure optimum block size  and alignment if 
> default values are not OK for CommitLog disk.
>   6.  Following configuration options are provided in Cassandra.yaml file
>  *   use_jna_for_commitlog_io : to use jna feature
>  *   use_direct_io_for_commitlog : to use Direct I/O feature.
>  *   direct_io_minimum_block_alignment: 512 (default)
>  *   nvme_disk_block_size: 32MiB (default and can be changed as per the 
> required size)
> 
> Test matrix is complex so CommitLog related testcases and TPCx-IOT benchmark 
> was tested. It works with both Java 8 and 11 versions. Compressed and 
> Encrypted based segments are not supported yet and it can be enabled later 
> based on the Community feedback.
> 
> Following improvement are seen with Direct I/O enablement.
> 
>   1.  32 cores >= ~15%
>   2.  64 cores >= ~80%
> 
> Also, another observation would like to share here. Reading Commitlog files 
> with Direct I/O might help in reducing node bring-up time after the node 
> crash.
> 
> Tested with commit ID: 91f6a9aca8d3c22a03e68aa901a0b154d960ab07
> 
> The attached patch enables Direct I/O feature for Commitlog files. Please 
> check and share your feedback.
> 
> Thanks,
> Amit
> 


Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-04 Thread Jon Haddad
+1.

Awesome work Doug!  Great to see this moving forward.  

On 2023/05/04 18:34:46 "C. Scott Andreas" wrote:
> +1nb.As someone familiar with this work, it's pretty hard to overstate the 
> impact it has on completing Cassandra's HTAP story. Eliminating the overhead 
> of bulk reads and writes on production OLTP clusters is transformative.– 
> ScottOn May 4, 2023, at 9:47 AM, Doug Rohrer  wrote:Hello 
> all,I’d like to put CEP-28 to a 
> vote.Proposal:https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+AnalyticsJira:https://issues.apache.org/jira/browse/CASSANDRA-16222Draft
>  implementation:- Apache Cassandra Spark Analytics source code: 
> https://github.com/frankgh/cassandra-analytics- Changes required for Sidecar: 
> https://github.com/frankgh/cassandra-sidecar/tree/CEP-28-bulk-apisDiscussion:https://lists.apache.org/thread/lrww4d7cdxgtg8o3gt8b8foymzpvq7z3The
>  vote will be open for 72 hours. A vote passes if there are at least three 
> binding +1s and no binding vetoes. Thanks,Doug Rohrer


Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-14 Thread Jon Haddad
+1

On 2023/06/13 14:14:35 Jeremy Hanna wrote:
> Calling for a vote on CEP-8 [1].
> 
> To clarify the intent, as Benjamin said in the discussion thread [2], the 
> goal of this vote is simply to ensure that the community is in favor of the 
> donation. Nothing more.
> The plan is to introduce the drivers, one by one. Each driver donation will 
> need to be accepted first by the PMC members, as it is the case for any 
> donation. Therefore the PMC should have full control on the pace at which new 
> drivers are accepted.
> 
> If this vote passes, we can start this process for the Java driver under the 
> direction of the PMC.
> 
> Jeremy
> 
> 1. 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp


Re: [Discuss] Repair inside C*

2023-07-26 Thread Jon Haddad
I'm 100% in favor of repair being part of the core DB, not the sidecar.  The 
current (and past) state of things where running the DB correctly *requires* 
running a separate process (either community maintained or official C* sidecar) 
is incredibly painful for folks.  The idea that your data integrity needs to be 
opt-in has never made sense to me from the perspective of either the product or 
the end user.

I've worked with way too many teams that have either configured this 
incorrectly or not at all.  

Ideally Cassandra would ship with repair built in and on by default.  Power 
users can disable if they want to continue to maintain their own repair tooling 
for some reason. 

Jon

On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> All,
> 
> We had a brief discussion in [2] about the Uber article [1] where they talk 
> about having integrated repair into Cassandra and how great that is. I 
> expressed my disappointment that they didn't work with the community on that 
> (Uber, if you are listening time to make amends 🙂) and it turns out Joey 
> already had the idea and wrote the code [3] - so I wanted to start a 
> discussion to gauge interest and maybe how to revive that effort.
> 
> Thanks,
> German
> 
> [1] 
> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> 


Re: Status Update on CEP-7 Storage Attached Indexes (SAI)

2023-07-27 Thread Jon Haddad
Very nice!  I'll kick the tires a bit, and add a sai test to tlp-stress 

On 2023/07/26 18:56:29 Caleb Rackliffe wrote:
> Alright, the cep-7-sai branch is now merged to trunk!
> 
> Now we move to addressing the most urgent items from "Phase 2" (
> CASSANDRA-18473 )
> before (and in the case of some testing after) the 5.0 freeze...
> 
> On Wed, Jul 26, 2023 at 6:07 AM Jeremy Hanna 
> wrote:
> 
> > Thanks Caleb and Mike and Zhao and Andres and Piotr and everyone else
> > involved with the SAI implementation!
> >
> > On Jul 25, 2023, at 3:01 PM, Caleb Rackliffe 
> > wrote:
> >
> > 
> > Just a quick update...
> >
> > With CASSANDRA-18670
> >  complete, and all
> > remaining items in the category of performance optimizations and further
> > testing, the process of merging to trunk will likely start today, beginning
> > with a final rebase on the current trunk and J11 and J17 test runs.
> >
> > On Tue, Jul 18, 2023 at 3:47 PM Caleb Rackliffe 
> > wrote:
> >
> >> Hello there!
> >>
> >> After much toil, the first phase of CEP-7 is nearing completion (see
> >> CASSANDRA-16052 ).
> >> There are presently two issues to resolve before we'd like to merge the
> >> cep-7-sai feature branch and all its goodness to trunk:
> >>
> >> CASSANDRA-18670 
> >> - Importer should build SSTable indexes successfully before making new
> >> SSTables readable (in review)
> >>
> >> CASSANDRA-18673 
> >> - Reduce size of per-SSTable index components (in progress)
> >>
> >> (We've been getting clean CircleCI runs for a while now, and have been
> >> using the multiplexer to sniff out as much flakiness as possible up front.)
> >>
> >> Once merged to trunk, the next steps are:
> >>
> >> 1.) Finish a Harry model that we can use to further fuzz test SAI before
> >> 5.0 releases (see CASSANDRA-18275
> >> ). We've done a
> >> fair amount of fuzz/randomized testing at the component level, but I'd
> >> still consider Harry (at least around single-partition query use-cases) a
> >> critical item for us to have confidence before release.
> >>
> >> 2.) Start pursuing Phase 2 items as time and our needs allow. (see
> >> CASSANDRA-18473 )
> >>
> >> A reminder, SAI is a secondary index, and therefore is by definition an
> >> opt-in feature, and has no explicit "feature flag". However, its
> >> availability to users is still subject to the secondary_indexes_enabled
> >> guardrail, which currently defaults to allowing creation.
> >>
> >> Any thoughts, questions, or comments on the pre-merge plan here?
> >>
> >
> 


Re: Tokenization and SAI query syntax

2023-08-02 Thread Jon Haddad
Certain bits of functionality also already exist on the SASI side of things, 
but I'm not sure how much overlap there is.  Currently, there's a LIKE keyword 
that handles token matching, although it seems to have some differences from 
the feature set in SAI.  

That said, there seems to be enough of an overlap that it would make sense to 
consider using LIKE in the same manner, doesn't it?  I think it would be a 
little odd if we have different syntax for different indexes.  

https://github.com/apache/cassandra/blob/trunk/doc/SASI.md

I think one complication here is that there seems to be a desire, that I very 
much agree with, to expose as much of the underlying flexibility of Lucene as 
much as possible.  If it means we use Caleb's suggestion, I'd ask that the 
queries that SASI and SAI both support use the same syntax, even if it means 
there's two ways of writing the same query.  To use Caleb's example, this would 
mean supporting both LIKE and the `expr` column.  

Jon

On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
> Here are some additional bits of prior art, if anyone finds them useful:
> 
> 
> The Stratio Lucene Index -
> https://github.com/Stratio/cassandra-lucene-index#examples
> 
> Stratio was the reason C* added the "expr" functionality. They embedded
> something similar to ElasticSearch JSON, which probably isn't my favorite
> choice, but it's there.
> 
> 
> The ElasticSearch match query syntax -
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> 
> Again, not my favorite. It's verbose, and probably too powerful for us.
> 
> 
> ElasticSearch's documentation for the basic Lucene query syntax -
> https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html#query-string-syntax
> 
> One idea is to take the basic Lucene index, which it seems we already have
> some support for, and feed it to "expr". This is nice for two reasons:
> 
> 1.) People can just write Lucene queries if they already know how.
> 2.) No changes to the grammar.
> 
> Lucene has distinct concepts of filtering and querying, and this is kind of
> the latter. I'm not sure how, for example, we would want "expr" to interact
> w/ filters on other column indexes in vanilla CQL space...
> 
> 
> On Mon, Jul 24, 2023 at 9:37 AM Josh McKenzie  wrote:
> 
> > `column CONTAINS term`. Contains is used by both Java and Python for
> > substring searches, so at least some users will be surprised by term-based
> > behavior.
> >
> > I wonder whether users are in their "programming language" headspace or in
> > their "querying a database" headspace when interacting with CQL? i.e. this
> > would only present confusion if we expected users to be thinking in the
> > idioms of their respective programming languages. If they're thinking in
> > terms of SQL, MATCHES would probably end up confusing them a bit since it
> > doesn't match the general structure of the MATCH operator.
> >
> > That said, I also think CONTAINS loses something important that you allude
> > to here Jonathan:
> >
> > with corresponding query-time tokenization and analysis.  This means that
> > the query term is not always a substring of the original string!  Besides
> > obvious transformations like lowercasing, you have things like
> > PhoneticFilter available as well.
> >
> > So to me, neither MATCHES nor CONTAINS are particularly great candidates.
> >
> > So +1 to the "I don't actually hate it" sentiment on:
> >
> > column : term`. Inspired by Lucene’s syntax
> >
> >
> > On Mon, Jul 24, 2023, at 8:35 AM, Benedict wrote:
> >
> >
> > I have a strong preference not to use the name of an SQL operator, since
> > it precludes us later providing the SQL standard operator to users.
> >
> > What about CONTAINS TOKEN term? Or CONTAINS TERM term?
> >
> >
> > On 24 Jul 2023, at 13:34, Andrés de la Peña  wrote:
> >
> > 
> > `column = term` is definitively problematic because it creates an
> > ambiguity when the queried column belongs to the primary key. For some
> > queries we wouldn't know whether the user wants a primary key query using
> > regular equality or an index query using the analyzer.
> >
> > `term_matches(column, term)` seems quite clear and hard to misinterpret,
> > but it's quite long to write and its implementation will be challenging
> > since we would need a bunch of special casing around SelectStatement and
> > functions.
> >
> > LIKE, MATCHES and CONTAINS could be a bit misleading since they seem to
> > evoke different behaviours to what they would have.
> >
> > `column LIKE :term:` seems a bit redundant compared to just using `column
> > : term`, and we are still introducing a new symbol.
> >
> > I think I like `column : term` the most, because it's brief, it's similar
> > to the equivalent Lucene's syntax, and it doesn't seem to clash with other
> > different meanings that I can think of.
> >
> > On Mon, 24 Jul 2023 at 13:13, Jonathan Ellis  wrote:
> >
> > Hi all,
> >
> > With phase 1 of SAI w

Re: [Discuss] Repair inside C*

2023-08-02 Thread Jon Haddad
> That said I would happily support an effort to bring repair scheduling to the 
> sidecar immediately. This has nothing blocking it, and would potentially 
> enable the sidecar to provide an official repair scheduling solution that is 
> compatible with current or even previous versions of the database.

This is something I hadn't thought much about, and is a pretty good argument 
for using the sidecar initially.  There's a lot of deployments out there and 
having an official repair option would be a big win.  


On 2023/07/26 23:20:07 "C. Scott Andreas" wrote:
> I agree that it would be ideal for Cassandra to have a repair scheduler in-DB.
> 
> That said I would happily support an effort to bring repair scheduling to the 
> sidecar immediately. This has nothing blocking it, and would potentially 
> enable the sidecar to provide an official repair scheduling solution that is 
> compatible with current or even previous versions of the database.
> 
> Once TCM has landed, we’ll have much stronger primitives for repair 
> orchestration in the database itself. But I don’t think that should block 
> progress on a repair scheduling solution in the sidecar, and there is nothing 
> that would prevent someone from continuing to use a sidecar-based solution in 
> perpetuity if they preferred.
> 
> - Scott
> 
> > On Jul 26, 2023, at 3:25 PM, Jon Haddad  wrote:
> > 
> > I'm 100% in favor of repair being part of the core DB, not the sidecar.  
> > The current (and past) state of things where running the DB correctly 
> > *requires* running a separate process (either community maintained or 
> > official C* sidecar) is incredibly painful for folks.  The idea that your 
> > data integrity needs to be opt-in has never made sense to me from the 
> > perspective of either the product or the end user.
> > 
> > I've worked with way too many teams that have either configured this 
> > incorrectly or not at all.  
> > 
> > Ideally Cassandra would ship with repair built in and on by default.  Power 
> > users can disable if they want to continue to maintain their own repair 
> > tooling for some reason.
> > 
> > Jon
> > 
> >> On 2023/07/24 20:44:14 German Eichberger via dev wrote:
> >> All,
> >> We had a brief discussion in [2] about the Uber article [1] where they 
> >> talk about having integrated repair into Cassandra and how great that is. 
> >> I expressed my disappointment that they didn't work with the community on 
> >> that (Uber, if you are listening time to make amends 🙂) and it turns out 
> >> Joey already had the idea and wrote the code [3] - so I wanted to start a 
> >> discussion to gauge interest and maybe how to revive that effort.
> >> Thanks,
> >> German
> >> [1] 
> >> https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/
> >> [2] https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619
> >> [3] https://issues.apache.org/jira/browse/CASSANDRA-14346
> 


Re: Tokenization and SAI query syntax

2023-08-03 Thread Jon Haddad
Yes, I understand that.  What I'm trying to point out is the potential 
confusion with having the same syntax behave differently for different index 
types.  

I'm not holding this view strongly, I'd just like folks to consider the impact 
to the end user, who in my experience is great with foot guns and bad with 
nuance.

On 2023/08/03 00:20:02 Jeremiah Jordan wrote:
> SASI just uses “=“ for the tokenized equality matching, which is the exact 
> thing this discussion is about changing/not liking.
> 
> > On Aug 2, 2023, at 7:18 PM, J. D. Jordan  wrote:
> > 
> > I do not think LIKE actually applies here. LIKE is used for prefix, 
> > contains, or suffix searches in SASI depending on the index type.
> > 
> > This is about exact matching of tokens.
> > 
> >> On Aug 2, 2023, at 5:53 PM, Jon Haddad  wrote:
> >> 
> >> Certain bits of functionality also already exist on the SASI side of 
> >> things, but I'm not sure how much overlap there is.  Currently, there's a 
> >> LIKE keyword that handles token matching, although it seems to have some 
> >> differences from the feature set in SAI.  
> >> 
> >> That said, there seems to be enough of an overlap that it would make sense 
> >> to consider using LIKE in the same manner, doesn't it?  I think it would 
> >> be a little odd if we have different syntax for different indexes.  
> >> 
> >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> >> 
> >> I think one complication here is that there seems to be a desire, that I 
> >> very much agree with, to expose as much of the underlying flexibility of 
> >> Lucene as much as possible.  If it means we use Caleb's suggestion, I'd 
> >> ask that the queries that SASI and SAI both support use the same syntax, 
> >> even if it means there's two ways of writing the same query.  To use 
> >> Caleb's example, this would mean supporting both LIKE and the `expr` 
> >> column.  
> >> 
> >> Jon
> >> 
> >>>> On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
> >>> Here are some additional bits of prior art, if anyone finds them useful:
> >>> 
> >>> 
> >>> The Stratio Lucene Index -
> >>> https://github.com/Stratio/cassandra-lucene-index#examples
> >>> 
> >>> Stratio was the reason C* added the "expr" functionality. They embedded
> >>> something similar to ElasticSearch JSON, which probably isn't my favorite
> >>> choice, but it's there.
> >>> 
> >>> 
> >>> The ElasticSearch match query syntax -
> >>> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
> >>>  
> >>> 
> >>> Again, not my favorite. It's verbose, and probably too powerful for us.
> >>> 
> >>> 
> >>> ElasticSearch's documentation for the basic Lucene query syntax -
> >>> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html*query-string-syntax__;Iw!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAXEPP1sK$
> >>>  
> >>> 
> >>> One idea is to take the basic Lucene index, which it seems we already have
> >>> some support for, and feed it to "expr". This is nice for two reasons:
> >>> 
> >>> 1.) People can just write Lucene queries if they already know how.
> >>> 2.) No changes to the grammar.
> >>> 
> >>> Lucene has distinct concepts of filtering and querying, and this is kind 
> >>> of
> >>> the latter. I'm not sure how, for example, we would want "expr" to 
> >>> interact
> >>> w/ filters on other column indexes in vanilla CQL space...
> >>> 
> >>> 
> >>>> On Mon, Jul 24, 2023 at 9:37 AM Josh McKenzie  
> >>>> wrote:
> >>>> 
> >>>> `column CONTAINS term`. Contains is used by both Java and Python for
> >>>> substring searches, so at least some users will be surprised by 
> >>>> term-based
> >>>> behavior.
> >>>> 
> >>>> I wonder whether users are in their "programming language" headspace or 
> >>>> in
> >>>> their 

Re: Tokenization and SAI query syntax

2023-08-03 Thread Jon Haddad
Assuming SAI is a superset of SASI, and we were to set up something so that 
SASI indexes auto convert to SAI, this gives even more weight to my point 
regarding how differing behavior for the same syntax can lead to issues.  Imo 
the best case scenario results in the user not even noticing their indexes have 
changed.

An (maybe better?) alternative is to add a flag to the index configuration for 
"compatibility mod", which might address the concerns around using an equality 
operator when it actually is a partial match.

For what it's worth, I'm in agreement that = should mean full equality and not 
token match.

On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
> For what it's worth, I'd very much like to completely remove SASI from the
> codebase for 6.0. The only remaining functionality gaps at the moment are
> LIKE (prefix/suffix) queries and its limited tokenization
> capabilities, both of which already have SAI Phase 2 Jiras.
> 
> On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
> wrote:
> 
> > SASI just uses “=“ for the tokenized equality matching, which is the exact
> > thing this discussion is about changing/not liking.
> >
> > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan 
> > wrote:
> > >
> > > I do not think LIKE actually applies here. LIKE is used for prefix,
> > contains, or suffix searches in SASI depending on the index type.
> > >
> > > This is about exact matching of tokens.
> > >
> > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad 
> > wrote:
> > >>
> > >> Certain bits of functionality also already exist on the SASI side of
> > things, but I'm not sure how much overlap there is.  Currently, there's a
> > LIKE keyword that handles token matching, although it seems to have some
> > differences from the feature set in SAI.
> > >>
> > >> That said, there seems to be enough of an overlap that it would make
> > sense to consider using LIKE in the same manner, doesn't it?  I think it
> > would be a little odd if we have different syntax for different indexes.
> > >>
> > >> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
> > >>
> > >> I think one complication here is that there seems to be a desire, that
> > I very much agree with, to expose as much of the underlying flexibility of
> > Lucene as much as possible.  If it means we use Caleb's suggestion, I'd ask
> > that the queries that SASI and SAI both support use the same syntax, even
> > if it means there's two ways of writing the same query.  To use Caleb's
> > example, this would mean supporting both LIKE and the `expr` column.
> > >>
> > >> Jon
> > >>
> > >>>> On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
> > >>> Here are some additional bits of prior art, if anyone finds them
> > useful:
> > >>>
> > >>>
> > >>> The Stratio Lucene Index -
> > >>> https://github.com/Stratio/cassandra-lucene-index#examples
> > >>>
> > >>> Stratio was the reason C* added the "expr" functionality. They embedded
> > >>> something similar to ElasticSearch JSON, which probably isn't my
> > favorite
> > >>> choice, but it's there.
> > >>>
> > >>>
> > >>> The ElasticSearch match query syntax -
> > >>>
> > https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
> > >>>
> > >>> Again, not my favorite. It's verbose, and probably too powerful for us.
> > >>>
> > >>>
> > >>> ElasticSearch's documentation for the basic Lucene query syntax -
> > >>>
> > https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html*query-string-syntax__;Iw!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAXEPP1sK$
> > >>>
> > >>> One idea is to take the basic Lucene index, which it seems we already
> > have
> > >>> some support for, and feed it to "expr". This is nice for two reasons:
> > >>>
> > >>> 1.) People can just write Lucene queries if they already know how.
> > >>> 2.) No changes to the grammar.
> > >>>
> > >>> Lucene has distinct concepts

Re: Tokenization and SAI query syntax

2023-08-13 Thread Jon Haddad
dict  wrote:
> >>>> 
> >>>> 
> >>>> I’m strongly opposed to : 
> >>>> 
> >>>> It is very dissimilar to our current operators. CQL is already not the 
> >>>> prettiest language, but let’s not make it a total mish mash.
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>>> On 7 Aug 2023, at 10:59, Mike Adamson  wrote:
> >>>>> 
> >>>>> I am also in agreement with 'column : token' in that 'I don't hate it' 
> >>>>> but I'd like to offer an alternative to this in 'column HAS token'. HAS 
> >>>>> is currently not a keyword that we use so wouldn't cause any brain 
> >>>>> conflicts.
> >>>>> 
> >>>>> While I don't hate ':' I have a particular dislike of the lucene search 
> >>>>> syntax because of its terseness and lack of easy readability. 
> >>>>> 
> >>>>> Saying that, I'm happy to do with ':' if that is the decision. 
> >>>>> 
> >>>>> On Fri, 4 Aug 2023 at 00:23, Jon Haddad  
> >>>>> wrote:
> >>>>>> Assuming SAI is a superset of SASI, and we were to set up something so 
> >>>>>> that SASI indexes auto convert to SAI, this gives even more weight to 
> >>>>>> my point regarding how differing behavior for the same syntax can lead 
> >>>>>> to issues.  Imo the best case scenario results in the user not even 
> >>>>>> noticing their indexes have changed.
> >>>>>> 
> >>>>>> An (maybe better?) alternative is to add a flag to the index 
> >>>>>> configuration for "compatibility mod", which might address the 
> >>>>>> concerns around using an equality operator when it actually is a 
> >>>>>> partial match.
> >>>>>> 
> >>>>>> For what it's worth, I'm in agreement that = should mean full equality 
> >>>>>> and not token match.
> >>>>>> 
> >>>>>> On 2023/08/03 03:56:23 Caleb Rackliffe wrote:
> >>>>>> > For what it's worth, I'd very much like to completely remove SASI 
> >>>>>> > from the
> >>>>>> > codebase for 6.0. The only remaining functionality gaps at the 
> >>>>>> > moment are
> >>>>>> > LIKE (prefix/suffix) queries and its limited tokenization
> >>>>>> > capabilities, both of which already have SAI Phase 2 Jiras.
> >>>>>> >
> >>>>>> > On Wed, Aug 2, 2023 at 7:20 PM Jeremiah Jordan 
> >>>>>> > 
> >>>>>> > wrote:
> >>>>>> >
> >>>>>> > > SASI just uses “=“ for the tokenized equality matching, which is 
> >>>>>> > > the exact
> >>>>>> > > thing this discussion is about changing/not liking.
> >>>>>> > >
> >>>>>> > > > On Aug 2, 2023, at 7:18 PM, J. D. Jordan 
> >>>>>> > > > 
> >>>>>> > > wrote:
> >>>>>> > > >
> >>>>>> > > > I do not think LIKE actually applies here. LIKE is used for 
> >>>>>> > > > prefix,
> >>>>>> > > contains, or suffix searches in SASI depending on the index type.
> >>>>>> > > >
> >>>>>> > > > This is about exact matching of tokens.
> >>>>>> > > >
> >>>>>> > > >> On Aug 2, 2023, at 5:53 PM, Jon Haddad 
> >>>>>> > > >> 
> >>>>>> > > wrote:
> >>>>>> > > >>
> >>>>>> > > >> Certain bits of functionality also already exist on the SASI 
> >>>>>> > > >> side of
> >>>>>> > > things, but I'm not sure how much overlap there is.  Currently, 
> >>>>>> > > there's a
> >>>>>> > > LIKE keyword that handles token matching, although it seems to 
> >>>>>> > > have some
> >>>>>> > > differences from the feature set in SAI.
> >>&

Re: Tokenization and SAI query syntax

2023-08-14 Thread Jon Haddad
I was thinking a subproject like I’d normally use with Gradle. Is there an 
advantage to moving it out completely? 

On 2023/08/13 18:34:38 Caleb Rackliffe wrote:
> We’ve already started down the path of using a git sub-module for the Accord 
> library. That could be an option at some point.
> 
> > On Aug 13, 2023, at 12:53 PM, Jon Haddad  wrote:
> > 
> > Functions make sense to me too.  In addition to the reasons listed, I if 
> > we acknowledge that functions in predicates are inevitable, then it makes 
> > total sense to use them here.  I think this is the most forward thinking 
> > approach.
> > 
> > Assuming this happens, one thing that would be great down the line would be 
> > if the CQL parser was broken out into a subproject with an artifact 
> > published so the soon to be additional complexity of parsing CQL didn't 
> > have to be pushed to every single end user like it does today.  I'm not 
> > trying to expand the scope right now, just laying an idea down for the 
> > future.  
> > 
> > Jon
> > 
> >> On 2023/08/07 21:26:40 Josh McKenzie wrote:
> >> Been chatting a bit w/Caleb about this offline and poking around to better 
> >> educate myself.
> >> 
> >>> using functions (ignoring the implementation complexity) at least removes 
> >>> ambiguity. 
> >> This, plus using functions lets us kick the can down the road a bit in 
> >> terms of landing on an integrated grammar we agree on. It seems to me 
> >> there's a tension between:
> >> 1. "SQL-like" (i.e. postgres-like)
> >> 2. "Indexing and Search domain-specific-like" (i.e. lucene syntax which, 
> >> as Benedict points out, doesn't really jell w/what we have in CQL at this 
> >> point), and
> >> 3. ??? Some other YOLO CQL / C* specific thing where we go our own road
> >> I don't think we're really going to know what our feature-set in terms of 
> >> indexing is going to look like or the shape it's going to take for awhile, 
> >> so backing ourselves into any of the 3 corners above right now feels very 
> >> premature to me.
> >> 
> >> So I'm coming around to the expr / method call approach to preserve that 
> >> flexibility. It's maximally explicit and preserves optionality at the 
> >> expense of being clunky. For now.
> >> 
> >> On Mon, Aug 7, 2023, at 4:00 PM, Caleb Rackliffe wrote:
> >>>> I do not think we should start using lucene syntax for it, it will make 
> >>>> people think they can do everything else lucene allows.
> >>> 
> >>> I'm sure we won't be supporting everything Lucene allows, but this is 
> >>> going to evolve. Right off the bat, if you introduce support for 
> >>> tokenization and filtering, someone is, for example, going to ask for 
> >>> phrase queries. ("John Smith landed in Virginia" is tokenized, but 
> >>> someone wants to match exactly on "John Smith".) The whole point of the 
> >>> Vector project is to do relevance, right? Are we going to do term 
> >>> boosting? Do we need queries like "field: quick brown +fox -news" where 
> >>> fox must be present, news cannot be present, and quick and brown increase 
> >>> relevance?
> >>> 
> >>> SASI uses "=" and "LIKE" in a way that assumes the user understands the 
> >>> tokenization scheme in use on the target field. I understand that's a bit 
> >>> ambiguous.
> >>> 
> >>> If we object to allowing expr embedding of a subset of the Lucene syntax, 
> >>> I can't imagine we're okay w/ then jamming a subset of that syntax into 
> >>> the main CQL grammar.
> >>> 
> >>> If we want to do this in non-expr CQL space, I think using functions 
> >>> (ignoring the implementation complexity) at least removes ambiguity. 
> >>> "token_match", "phrase_match", "token_like", "=", and "LIKE" would all be 
> >>> pretty clear, although there may be other problems. For instance, what 
> >>> happens when I try to use "token_match" on an indexed field whose 
> >>> analyzer does not tokenize? We obviously can't use the index, so we'd be 
> >>> reduced to requiring a filtering query, but maybe that's fine. My point 
> >>> is that, if we're going to make write and read analyzers symmetrical, 
> >>

Re: [VOTE] Accept java-driver

2023-10-03 Thread Jon Haddad
+1


On 2023/10/03 04:52:47 Mick Semb Wever wrote:
> The donation of the java-driver is ready for its IP Clearance vote.
> https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
> 
> The SGA has been sent to the ASF.  This does not require acknowledgement
> before the vote.
> 
> Once the vote passes, and the SGA has been filed by the ASF Secretary, we
> will request ASF Infra to move the datastax/java-driver as-is to
> apache/java-driver
> 
> This means all branches and tags, with all their history, will be kept.  A
> cleaning effort has already cleaned up anything deemed not needed.
> 
> Background for the donation is found in CEP-8:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> 
> PMC members, please take note of (and check) the IP Clearance requirements
> when voting.
> 
> The vote will be open for 72 hours (or longer). Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
> 
> regards,
> Mick
> 


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Jon Haddad
I haven't looked at the patch, but at a high level, defaulting to direct I/O 
for commit logs makes a lot of sense to me.  

On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> [Public]
> 
> Hi,
> 
> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO feature 
> is proposed through new PR[1] to improve the CommitLog IO speed. Enabling 
> this by default could be useful feature to address IO bottleneck seen during 
> peak load.
> 
> Need your input regarding changing this default. Please suggest.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-18464
> 
> thanks,
> Amit Pawar
> 
> [1] - https://github.com/apache/cassandra/pull/2777
> 


Re: [DISCUSS] CommitLog default disk access mode

2023-10-16 Thread Jon Haddad
Glad you brought up compaction here - I think there would be a significant 
benefit to moving compaction to direct i/o.


On 2023/10/16 16:14:28 Benedict wrote:
> I have some plans to (eventually) use the commit log as memtable payload 
> storage (ie memtables would reference the commit log entries directly, 
> storing only indexing info), and to back first level of sstables by reference 
> to commit log entries. This will permit us to deliver not only much bigger 
> memtables (cutting compaction throughput requirements by the ratio of size 
> increase - so pretty dramatically), and faster flushing (so better behaviour 
> ling write bursts), but also a fairly cheap and simple way to support MVCC - 
> which will be helpful for transaction throughput.
> 
> There is also a new commit log (“journal”) coming with Accord, that the rest 
> of C* may or may not transition to.
> 
> I only say this because this makes the utility of direct IO for commit log 
> suspect, as we will be reading from the files as a matter of course should 
> this go ahead; and we may end up relying on a different commit log 
> implementation before long anyway.
> 
> This is obviously a big suggestion and is not guaranteed to transpire, and 
> probably won’t within the next year or so, but it should perhaps form some 
> minimal part of any calculus. If the patch is otherwise simple and beneficial 
> I don’t have anything against it, and the use of direct IO could well be of 
> benefit eg in compaction - and also in future if we manage to bring a page 
> management in process. So laying foundations there could be of benefit, even 
> if the commit log eventually does not use it.
> 
> > On 16 Oct 2023, at 17:00, Jon Haddad  wrote:
> > 
> > I haven't looked at the patch, but at a high level, defaulting to direct 
> > I/O for commit logs makes a lot of sense to me.  
> > 
> >> On 2023/10/16 06:34:05 "Pawar, Amit" wrote:
> >> [Public]
> >> 
> >> Hi,
> >> 
> >> CommitLog uses mmap (memory mapped ) segments by default. Direct-IO 
> >> feature is proposed through new PR[1] to improve the CommitLog IO speed. 
> >> Enabling this by default could be useful feature to address IO bottleneck 
> >> seen during peak load.
> >> 
> >> Need your input regarding changing this default. Please suggest.
> >> 
> >> https://issues.apache.org/jira/browse/CASSANDRA-18464
> >> 
> >> thanks,
> >> Amit Pawar
> >> 
> >> [1] - https://github.com/apache/cassandra/pull/2777
> >> 
> 


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Jon Haddad
>From the folks I've been talking to - Accord is one of the biggest things to 
>be excited about in 5.0.  Everyone giving a presentation about the 5.0 release 
>has been hyping up accord.  

Shipping a release to make a date (that means practically nothing to most 
people) by gutting one of the most useful features is a bad tradeoff.

Jon

On 2023/10/23 14:39:36 Patrick McFadin wrote:
> I'm really surprised to see this email. The last I heard everything was on
> track for getting into 5.0 and TBH and Accord is what a majority of users
> are expecting in 5.0. And how could this be a .1 release?
> 
> What is it going to take to get it into 5.0? What is off track and how did
> we get here?
> 
> On Mon, Oct 23, 2023 at 6:51 AM Sam Tunnicliffe  wrote:
> 
> > +1 from me too.
> >
> > Regarding Benedict's point, backwards incompatibility should be minimal;
> > we modified snitch behaviour slightly, so that local snitch config only
> > relates to the local node, all peer info is fetched from cluster metadata.
> > There is also a minor change to the way failed bootstraps are handled, as
> > with TCM they require an explicit cancellation step (running a nodetool
> > command).
> >
> > Whether consensus decrees that this constitutes a major bump or not, I
> > think decoupling these major projects from 5.0 is the right move.
> >
> >
> > On 23 Oct 2023, at 12:57, Benedict  wrote:
> >
> > I’m cool with this.
> >
> > We may have to think about numbering as I think TCM will break some
> > backwards compatibility and we might technically expect the follow-up
> > release to be 6.0
> >
> > Maybe it’s not so bad to have such rapid releases either way.
> >
> > On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
> >
> > 
> >
> > The TCM work (CEP-21) is in its review stage but being well past our
> > cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
> > like to propose the following.
> >
> > We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut
> > an immediate 5.1-alpha1 release.
> >
> > I see this as a win-win scenario for us, considering our current
> > situation.  (Though it is unfortunate that Accord is included in this
> > scenario because we agreed it to be based upon TCM.)
> >
> > This will mean…
> >  - We get to focus on getting 5.0 to beta and GA, which already has a ton
> > of features users want.
> >  - We get an alpha release with TCM and Accord into users hands quickly
> > for broader testing and feedback.
> >  - We isolate GA efforts on TCM and Accord – giving oss and downstream
> > engineers time and patience reviewing and testing.  TCM will be the biggest
> > patch ever to land in C*.
> >  - Give users a choice for a more incremental upgrade approach, given just
> > how many new features we're putting on them in one year.
> >  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all
> > 4.x versions, just as if it had landed in 5.0.
> >
> >
> > The risks/costs this introduces are
> >  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch,
> > and at some point decide to undo this work, while we can throw away the
> > cassandra-5.1 branch we would need to do a bit of work reverting the
> > changes in trunk.  This is a _very_ edge case, as confidence levels on the
> > design and implementation of both are already tested and high.
> >  - We will have to maintain an additional branch.  I propose that we treat
> > the 5.1 branch in the same maintenance window as 5.0 (like we have with 3.0
> > and 3.11).  This also adds the merge path overhead.
> >  - Reviewing of TCM and Accord will continue to happen post-merge.  This
> > is not our normal practice, but this work will have already received its
> > two +1s from committers, and such ongoing review effort is akin to GA
> > stabilisation work on release branches.
> >
> >
> > I see no other ok solution in front of us that gets us at least both the
> > 5.0 beta and TCM+Accord alpha releases this year.  Keeping in mind users
> > demand to start experimenting with these features, and our Cassandra Summit
> > in December.
> >
> >
> > 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
> >
> >
> >
> >
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-23 Thread Jon Haddad
I think this is a great more generally useful than the two scenarios you've 
outlined.  I think it could / should be possible to use an object store as the 
primary storage for sstables and rely on local disk as a cache for reads.  

I don't know the roadmap for TCM, but imo if it allowed for more stable, 
pre-allocated ranges that compaction will always be aware of (plus a bunch of 
plumbing I'm deliberately avoiding the details on), then you could bootstrap a 
new node by copying s3 directories around rather than streaming data between 
nodes.  That's how we get to 20TB / node, easy scale up / down, etc, and 
always-ZCS for non-object store deployments.

Jon

On 2023/09/25 06:48:06 "Claude Warren, Jr via dev" wrote:
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of
> the standard storage space.
> 
> There are two desires  driving this change:
> 
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situations where there is not enough disk space for compaction and
>the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper storage
>layers.
> 
> I have a working POC implementation [2] though there are some issues still
> to be solved and much logging to be reduced.
> 
> I look forward to productive discussions,
> Claude
> 
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> 


Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Jon Haddad
I guess at the end of the day, shipping a release with a bunch of awesome 
features is better than holding it back.  If there's 2 big releases in 6 months 
the community isn't any worse off.  

We either ship something, or nothing, and something is probably better.

Jon


On 2023/10/24 16:27:04 Patrick McFadin wrote:
> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
> was excited about Accord, but SAI and UCS were pretty high on the list.
> 
> Benedict and I had a good conversation last night, and now I understand
> more essential details for this conversation. TCM is taking far more work
> than initially scoped, and Accord depends on a stable TCM. TCM is months
> behind and that's a critical fact, and one I personally just learned of. I
> thought things were wrapping up this month, and we were in the testing
> phase. I get why that's a topic we are dancing around. Nobody wants to say
> ship dates are slipping because that's part of our culture. It's
> disappointing and, if new information, an unwelcome surprise, but none of
> us should be angry or in a blamey mood because I guarantee every one of us
> has shipped the code late. My reaction yesterday was based on an incorrect
> assumption. Now that I have a better picture, my point of view is changing.
> 
> Josh's point about what's best for users is crucial. Users deserve stable
> code with a regular cadence of features that make their lives easier. If we
> put 5.0 on hold for TCM + Accord, users will get neither for a very long
> time. And I mentioned a disaster yesterday. A bigger disaster would be
> shipping Accord with a major bug that causes data loss, eroding community
> trust. Accord has to be the most bulletproof of all bulletproof features.
> The pressure to ship is only going to increase and that's fertile ground
> for that sort of bug.
> 
> So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
> plan mainly because I don't think 5.1 is (or should be) a fast follow.
> 
> For the user community, the communication should be straightforward. TCM +
> Accord are turning out to be much more complicated than was originally
> scoped, and for good reasons. Our first principle is to provide a stable
> and reliable system, so as a result, we'll be de-coupling TCM + Accord from
> 5.0 into a 5.1 branch, which is available in parallel to 5.0 while
> additional hardening and testing is done. We can communicate this in a blog
> post.,
> 
> To make this much more palatable to our use community, if we can get a
> build and docker image available ASAP with Accord, it will allow developers
> to start playing with the syntax. Up to this point, that hasn't been widely
> available unless you compile the code yourself. Developers need to
> understand how this will work in an application, and up to this point, the
> syntax is text they see in my slides. We need to get some hands-on and that
> will get our user community engaged on Accord this calendar year. The
> feedback may even uncover some critical changes we'll need to make. Lack of
> access to Accord by developers is a critical problem we can fix soon and
> there will be plenty of excitement there and start building use cases
> before the final code ships.
> 
> I'm bummed but realistic. It sucks that I won't have a pony for Christmas,
> but maybe one for my birthday?
> 
> Patrick
> 
> 
> 
> On Tue, Oct 24, 2023 at 7:23 AM Josh McKenzie  wrote:
> 
> > Maybe it won't be a glamorous release but shipping
> > 5.0 mitigates our worst case scenario.
> >
> > I disagree with this characterization of 5.0 personally. UCS, SAI, Trie
> > memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are
> > accurate, all combine to make 5.0 a pretty glamorous release IMO
> > independent of TCM and Accord. Accord is a true paradigm-shift game-changer
> > so it's easy to think of 5.0 as uneventful in comparison, and TCM helps
> > resolve one of the biggest pain-points in our system for over a decade, but
> > I think 5.0 is a very meaty release in its own right today.
> >
> > Anyway - I agree with you Brandon re: timelines. If things take longer
> > than we'd hope (which, if I think back, they do roughly 100% of the time on
> > this project), blocking on these features could both lead to a significant
> > delay in 5.0 going out as well as increasing pressure and risk of burnout
> > on the folks working on it. While I believe we all need some balanced
> > urgency to do our best work, being under the gun for something with a hard
> > deadline or having an entire project drag along blocked on you is not where
> > I want any of us to be.
> >
> > Part of why we talked about going to primarily annual calendar-based
> > releases was to avoid precisely this situation, where something that
> > *feels* right at the cusp of merging leads us to delay a release
> > repeatedly. We discussed this a couple times this year:
> > 1: https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3,

Re: Voice of Apache (Feathercast) at summit?

2023-12-08 Thread Jon Haddad
Count me in!

On 2023/12/05 14:34:48 Rich Bowen wrote:
> Hey, folks. I'll be at Cassandra Summit next week, and was wondering if any 
> of you who might be there would be interested in doing a podcast interview 
> with me for Voice Of Apache (the podcast formerly known as Feathercast - see 
> https://feathercast.apache.org for context). Topics might include something 
> about 5.0, retrospectives on the last 13 years, or whatever you think might 
> be of interest.
> 
> Let me know soon of anyone's interested/available, so I know to pack my gear.
> 
> Thanks!
> 
> --Rich
> 


Ext4 data corruption in stable kernels

2023-12-11 Thread Jon Haddad
Hey folks,

Just wanted to raise awareness about a I/O issue that seems to be affecting
some Linux Kernal releases that were listed as STABLE, causing corruption
when using the ext4 filesystem with direct I/O.  I don't have time to get a
great understanding of the full scope of the issue, what versions are
affected, etc, I just want to get this in front of the project.  I am
disappointed that this might negatively affect our ability to leverage
direct I/O for both the commitlog (recently merged) and SSTables
(potentially a future use case), since users won't be able to discern
between a bug we ship and one that we hit as a result of our filesystem
choices.

I think it might be worth putting a note in our docs and in the config to
warn the user to ensure they're not affected, and we may even want to
consider hiding this feature if the blast radius is significant enough that
users would be affected.

https://lwn.net/Articles/954285/

Jon


Re: Ext4 data corruption in stable kernels

2023-12-11 Thread Jon Haddad
Like I said, I didn't have time to verify the full scope and what's
affected, just that some stable kernels are affected.  Adding to the
problem is that it might be vendor specific as well.  For example, RH might
backport an upstream patch in the kernel they ship that's non-standard.

Hopefully someone compiles a list.

Jon

On Mon, Dec 11, 2023 at 11:51 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> Aren't only specific kernels affected? If we can detect the kernel
> version, the feature can be force disabled with the problematic kernels
>
>
> pon., 11 gru 2023, 20:45 użytkownik Jon Haddad 
> napisał:
>
>> Hey folks,
>>
>> Just wanted to raise awareness about a I/O issue that seems to be
>> affecting some Linux Kernal releases that were listed as STABLE, causing
>> corruption when using the ext4 filesystem with direct I/O.  I don't have
>> time to get a great understanding of the full scope of the issue, what
>> versions are affected, etc, I just want to get this in front of the
>> project.  I am disappointed that this might negatively affect our ability
>> to leverage direct I/O for both the commitlog (recently merged) and
>> SSTables (potentially a future use case), since users won't be able to
>> discern between a bug we ship and one that we hit as a result of our
>> filesystem choices.
>>
>> I think it might be worth putting a note in our docs and in the config to
>> warn the user to ensure they're not affected, and we may even want to
>> consider hiding this feature if the blast radius is significant enough that
>> users would be affected.
>>
>> https://lwn.net/Articles/954285/
>>
>> Jon
>>
>


Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-12 Thread Jon Haddad
I think it makes sense to see what the actual overhead is of CBO before
making the assumption it'll be so high that we need to have two code
paths.  I'm happy to provide thorough benchmarking and analysis when it
reaches a testing phase.

I'm excited to see where this goes.  I think it sounds very forward looking
and opens up a lot of possibilities.

Jon

On Tue, Dec 12, 2023 at 4:25 PM guo Maxwell  wrote:

> Nothing expresses my thoughts better than +1
> ,It feels like it means a lot to Cassandra.
>
> I have a question. Is it easy to turn off cbo's optimizer or by pass in
> some way? Because some simple read and write requests will have better
> performance without cbo, which is also the advantage of Cassandra compared
> to some rdbms.
>
>
> David Capwell 于2023年12月13日 周三上午3:37写道:
>
>> Overall LGTM.
>>
>>
>> On Dec 12, 2023, at 5:29 AM, Benjamin Lerer  wrote:
>>
>> Hi everybody,
>>
>> I would like to open the discussion on the introduction of a cost based
>> optimizer to allow Cassandra to pick the best execution plan based on the
>> data distribution.Therefore, improving the overall query performance.
>>
>> This CEP should also lay the groundwork for the future addition of
>> features like joins, subqueries, OR/NOT and index ordering.
>>
>> The proposal is here:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-39%3A+Cost+Based+Optimizer
>>
>> Thank you in advance for your feedback.
>>
>>
>>


Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Jon Haddad
I think we should probably figure out how much value it actually provides
by getting some benchmarks around a few use cases along with some
profiling.  tlp-stress has a --rowcache flag that I added a while back to
be able to do this exact test.  I was looking for a use case to profile and
write up so this is actually kind of perfect for me.  I can take a look in
January when I'm back from the holidays.

Jon

On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:

>
>
>
> I would avoid taking away a feature even if it works in narrow set of
>> use-cases. I would instead suggest -
>>
>> 1. Leave it disabled by default.
>> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
>> it off. Cassandra should ideally detect this and do it automatically.
>> 3. Move to Caffeine instead of OHC.
>>
>> I would suggest having this as the middle ground.
>>
>
>
>
> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to
> warn, hard value when to disable.
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-15 Thread Jon Haddad
At a high level I really like the idea of being able to better leverage
cheaper storage especially object stores like S3.

One important thing though - I feel pretty strongly that there's a big,
deal breaking downside.   Backups, disk failure policies, snapshots and
possibly repairs would get more complicated which haven't been particularly
great in the past, and of course there's the issue of failure recovery
being only partially possible if you're looking at a durable block store
paired with an ephemeral one with some of your data not replicated to the
cold side.  That introduces a failure case that's unacceptable for most
teams, which results in needing to implement potentially 2 different backup
solutions.  This is operationally complex with a lot of surface area for
headaches.  I think a lot of teams would probably have an issue with the
big question mark around durability and I probably would avoid it myself.

On the other hand, I'm +1 if we approach it something slightly differently
- where _all_ the data is located on the cold storage, with the local hot
storage used as a cache.  This means we can use the cold directories for
the complete dataset, simplifying backups and node replacements.

For a little background, we had a ticket several years ago where I pointed
out it was possible to do this *today* at the operating system level as
long as you're using block devices (vs an object store) and LVM [1].  For
example, this works well with GP3 EBS w/ low IOPS provisioning + local NVMe
to get a nice balance of great read performance without going nuts on the
cost for IOPS.  I also wrote about this in a little more detail in my blog
[2].  There's also the new mount point tech in AWS which pretty much does
exactly what I've suggested above [3] that's probably worth evaluating just
to get a feel for it.

I'm not insisting we require LVM or the AWS S3 fs, since that would rule
out other cloud providers, but I am pretty confident that the entire
dataset should reside in the "cold" side of things for the practical and
technical reasons I listed above.  I don't think it massively changes the
proposal, and should simplify things for everyone.

Jon

[1] https://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/
[2] https://issues.apache.org/jira/browse/CASSANDRA-8460
[3] https://aws.amazon.com/about-aws/whats-new/2023/03/mountpoint-amazon-s3/


On Thu, Dec 14, 2023 at 1:56 AM Claude Warren  wrote:

> Is there still interest in this?  Can we get some points down on electrons
> so that we all understand the issues?
>
> While it is fairly simple to redirect the read/write to something other
> than the local system for a single node this will not solve the problem for
> tiered storage.
>
> Tiered storage will require that on read/write the primary key be assessed
> and determine if the read/write should be redirected.  My reasoning for
> this statement is that in a cluster with a replication factor greater than
> 1 the node will store data for the keys that would be allocated to it in a
> cluster with a replication factor = 1, as well as some keys from nodes
> earlier in the ring.
>
> Even if we can get the primary keys for all the data we want to write to
> "cold storage" to map to a single node a replication factor > 1 means that
> data will also be placed in "normal storage" on subsequent nodes.
>
> To overcome this, we have to explore ways to route data to different
> storage based on the keys and that different storage may have to be
> available on _all_  the nodes.
>
> Have any of the partial solutions mentioned in this email chain (or
> others) solved this problem?
>
> Claude
>


Re: Future direction for the row cache and OHC implementation

2023-12-18 Thread Jon Haddad
Sure, I’d love to work with you on this.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Mon, Dec 18, 2023 at 8:30 AM Ariel Weisberg  wrote:

> Hi,
>
> Thanks for the generous offer. Before you do that can you give me a chance
> to add back support for Caffeine for the row cache so you can test the
> option of switching back to an on-heap row cache?
>
> Ariel
>
> On Thu, Dec 14, 2023, at 9:28 PM, Jon Haddad wrote:
>
> I think we should probably figure out how much value it actually provides
> by getting some benchmarks around a few use cases along with some
> profiling.  tlp-stress has a --rowcache flag that I added a while back to
> be able to do this exact test.  I was looking for a use case to profile and
> write up so this is actually kind of perfect for me.  I can take a look in
> January when I'm back from the holidays.
>
> Jon
>
> On Thu, Dec 14, 2023 at 5:44 PM Mick Semb Wever  wrote:
>
>
>
>
> I would avoid taking away a feature even if it works in narrow set of
> use-cases. I would instead suggest -
>
> 1. Leave it disabled by default.
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
> it off. Cassandra should ideally detect this and do it automatically.
> 3. Move to Caffeine instead of OHC.
>
> I would suggest having this as the middle ground.
>
>
>
>
> Yes, I'm ok with this. (2) can also be a guardrail: soft value when to
> warn, hard value when to disable.
>
>
>


Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-07 Thread Jon Haddad
I like the idea of the ability to execute certain commands via CQL, but I
think it only makes sense for the nodetool commands that cause an action to
take place, such as compact or repair.  We already have virtual tables, I
don't think we need another layer to run informational queries.  I see
little value in having the following (I'm using exec here for simplicity):

cqlsh> exec tpstats

which returns a string in addition to:

cqlsh> select * from system_views.thread_pools

which returns structured data.

I'd also rather see updatable configuration virtual tables instead of

cqlsh> exec setcompactionthroughput 128

Fundamentally, I think it's better for the project if administration is
fully done over CQL and we have a consistent, single way of doing things.
I'm not dead set on it, I just think less is more in a lot of situations,
this being one of them.

Jon


On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov  wrote:

> Happy New Year to everyone! I'd like to thank everyone for their
> questions, because answering them forces us to move towards the right
> solution, and I also like the ML discussions for the time they give to
> investigate the code :-)
>
> I'm deliberately trying to limit the scope of the initial solution
> (e.g. exclude the agent part) to keep the discussion short and clear,
> but it's also important to have a glimpse of what we can do next once
> we've finished with the topic.
>
> My view of the Command<> is that it is an abstraction in the broader
> sense of an operation that can be performed on the local node,
> involving one of a few internal components. This means that updating a
> property in the settings virtual table via an update statement, or
> executing e.g. the setconcurrentcompactors command are just aliases of
> the same internal command via different APIs. Another example is the
> netstats command, which simply aggregates the MessageService metrics
> and returns them in a human-readable format (just another way of
> looking at key-value metric pairs). More broadly, the command input is
> Map and String as the result (or List).
>
> As Abe mentioned, Command and CommandRegistry should be largely based
> on the nodetool command set at the beginning. We have a few options
> for how we can initially construct command metadata during the
> registry implementation (when moving command metadata from the
> nodetool to the core part), so I'm planning to consult with the
> command representations of the k8cassandra project in the way of any
> further registry adoptions have zero problems (by writing a test
> openapi registry exporter and comparing the representation results).
>
> So, the MVP is the following:
> - Command
> - CommandRegistry
> - CQLCommandExporter
> - JMXCommandExporter
> - the nodetool uses the JMXCommandExporter
>
>
> = Answers =
>
> > What do you have in mind specifically there? Do you plan on rewriting a
> brand new implementation which would be partially inspired by our agent? Or
> would the project integrate our agent code in-tree or as a dependency?
>
> Personally, I like the state of the k8ssandra project as it is now. My
> understanding is that the server part of a database always lags behind
> the client and sidecar parts in terms of the jdk version and the
> features it provides. In contrast, sidecars should always be on top of
> the market, so if we want to make an agent part in-tree, this should
> be carefully considered for the flexibility which we may lose, as we
> will not be able to change the agent part within the sidecar. The only
> closest change I can see is that we can remove the interceptor part
> once the CQL command interface is available. I suggest we move the
> agent part to phase 2 and research it. wdyt?
>
>
> > How are the results of the commands expressed to the CQL client? Since
> the command is being treated as CQL, I guess it will be rows, right? If
> yes, some of the nodetool commands output are a bit hierarchical in nature
> (e.g. cfstats, netstats etc...). How are these cases handled?
>
> I think the result of the execution should be a simple string (or set
> of strings), which by its nature matches the nodetool output. I would
> avoid building complex output or output schemas for now to simplify
> the initial changes.
>
>
> > Any changes expected at client/driver side?
>
> I'd like to keep the initial changes to a server part only, to avoid
> scope inflation. For the driver part, I have checked the ExecutionInfo
> interface provided by the java-driver, which should probably be used
> as a command execution status holder. We'd like to have a unique
> command execution id for each command that is executed on the node, so
> the ExecutionInfo should probably hold such an id. Currently it has
> the UUID getTracingId(), which is not well suited for our case and I
> think further changes and follow-ups will be required here (including
> the binary protocol, I think).
>
>
> > The term COMMAND is a bit abstract I feel (subjective)... And I al

Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-08 Thread Jon Haddad
ng backwards compat, especially for automated ops (i.e.
>nodetool, JMX, etc), is crucial. Painful, but crucial.
>2. We need something that's available for use before the node comes
>fully online; the point Jeff always brings up when we discuss moving away
>from JMX. So long as we have some kind of "out-of-band" access to nodes or
>accommodation for that, we should be good.
>
> For context on point 2, see slack:
> https://the-asf.slack.com/archives/CK23JSY2K/p1688745128122749?thread_ts=1688662169.018449&cid=CK23JSY2K
>
> I point out that JMX works before and after the native protocol is running
> (startup, shutdown, joining, leaving), and also it's semi-common for us to
> disable the native protocol in certain circumstances, so at the very least,
> we'd then need to implement a totally different cql protocol interface just
> for administration, which nobody has committed to building yet.
>
>
> I think this is a solvable problem, and I think the benefits of having a
> single, elegant way of interacting with a cluster and configuring it
> justifies the investment for us as a project. Assuming someone has the
> cycles to, you know, actually do the work. :D
>
> On Sun, Jan 7, 2024, at 10:41 PM, Jon Haddad wrote:
>
> I like the idea of the ability to execute certain commands via CQL, but I
> think it only makes sense for the nodetool commands that cause an action to
> take place, such as compact or repair.  We already have virtual tables, I
> don't think we need another layer to run informational queries.  I see
> little value in having the following (I'm using exec here for simplicity):
>
> cqlsh> exec tpstats
>
> which returns a string in addition to:
>
> cqlsh> select * from system_views.thread_pools
>
> which returns structured data.
>
> I'd also rather see updatable configuration virtual tables instead of
>
> cqlsh> exec setcompactionthroughput 128
>
> Fundamentally, I think it's better for the project if administration is
> fully done over CQL and we have a consistent, single way of doing things.
> I'm not dead set on it, I just think less is more in a lot of situations,
> this being one of them.
>
> Jon
>
>
> On Wed, Jan 3, 2024 at 2:56 PM Maxim Muzafarov  wrote:
>
> Happy New Year to everyone! I'd like to thank everyone for their
> questions, because answering them forces us to move towards the right
> solution, and I also like the ML discussions for the time they give to
> investigate the code :-)
>
> I'm deliberately trying to limit the scope of the initial solution
> (e.g. exclude the agent part) to keep the discussion short and clear,
> but it's also important to have a glimpse of what we can do next once
> we've finished with the topic.
>
> My view of the Command<> is that it is an abstraction in the broader
> sense of an operation that can be performed on the local node,
> involving one of a few internal components. This means that updating a
> property in the settings virtual table via an update statement, or
> executing e.g. the setconcurrentcompactors command are just aliases of
> the same internal command via different APIs. Another example is the
> netstats command, which simply aggregates the MessageService metrics
> and returns them in a human-readable format (just another way of
> looking at key-value metric pairs). More broadly, the command input is
> Map and String as the result (or List).
>
> As Abe mentioned, Command and CommandRegistry should be largely based
> on the nodetool command set at the beginning. We have a few options
> for how we can initially construct command metadata during the
> registry implementation (when moving command metadata from the
> nodetool to the core part), so I'm planning to consult with the
> command representations of the k8cassandra project in the way of any
> further registry adoptions have zero problems (by writing a test
> openapi registry exporter and comparing the representation results).
>
> So, the MVP is the following:
> - Command
> - CommandRegistry
> - CQLCommandExporter
> - JMXCommandExporter
> - the nodetool uses the JMXCommandExporter
>
>
> = Answers =
>
> > What do you have in mind specifically there? Do you plan on rewriting a
> brand new implementation which would be partially inspired by our agent? Or
> would the project integrate our agent code in-tree or as a dependency?
>
> Personally, I like the state of the k8ssandra project as it is now. My
> understanding is that the server part of a database always lags behind
> the client and sidecar parts in terms of the jdk version and the
> features it provides. In contrast, sidecars sho

Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-08 Thread Jon Haddad
Ugh, I moved some stuff around and 2 paragraphs got merged that shouldn't
have been.

I think there's no way we could rip out JMX, there's just too many benefits
to having it and effectively zero benefits to removing.

Regarding disablebinary, part of me wonders if this is a bit of a hammer,
and what we really want is "disable binary for non-admins".  I'm not sure
what the best path is to get there.  The local unix socket might be the
easiest path as it allows us to disable network binary easily and still
allow local admins, and allows the OS to reject the incoming connections vs
passing that work onto a connection handler which would have to evaluate
whether or not the user can connect.  If a node is already in a bad spot
requring disable binary, it's probably not a good idea to have it get
DDOS'ed as part of the remediation.

Sorry for multiple emails.

Jon

On Mon, Jan 8, 2024 at 4:11 PM Jon Haddad  wrote:

> > Syntactically, if we’re updating settings like compaction throughput, I
> would prefer to simply update a virtual settings table
> > e.g. UPDATE system.settings SET compaction_throughput = 128
>
> I agree with this, sorry if that wasn't clear in my previous email.
>
> > Some operations will no doubt require a stored procedure syntax,
>
> The alternative to the stored procedure syntax is to have first class
> support for operations like REPAIR or COMPACT, which could be interesting.
> It might be a little nicer if the commands are first class citizens. I'm
> not sure what the downside would be besides adding complexity to the
> parser.  I think I like the idea as it would allow for intuitive tab
> completion (REPAIR ) and mentally fit in with the rest of the
> permission system, and be fairly obvious what permission relates to what
> action.
>
> cqlsh > GRANT INCREMENTAL REPAIR ON mykeyspace.mytable TO jon;
>
> I realize the ability to grant permissions could be done for the stored
> procedure syntax as well, but I think it's a bit more consistent to
> represent it the same way as DDL and probably better for the end user.
>
> Postgres seems to generally do admin stuff with SELECT function():
> https://www.postgresql.org/docs/9.3/functions-admin.html.  It feels a bit
> weird to me to use SELECT to do things like kill DB connections, but that
> might just be b/c it's not how I typically work with a database.  VACUUM is
> a standalone command though.
>
> Curious to hear what people's thoughts are on this.
>
> > I would like to see us move to decentralised structured settings
> management at the same time, so that we can set properties for the whole
> cluster, or data centres, or individual nodes via the same mechanism - all
> from any node in the cluster. I would be happy to help out with this work,
> if time permits.
>
> This would be nice.  Spinnaker has this feature and I found it to be very
> valuable at Netflix when making large changes.
>
> Regarding JMX - I think since it's about as close as we can get to "free"
> I don't really consider it to be additional overhead, a decent escape
> hatch, and I can't see us removing any functionality that most teams would
> consider critical.
>
> > We need something that's available for use before the node comes fully
> online
> > Supporting backwards compat, especially for automated ops (i.e.
> nodetool, JMX, etc), is crucial. Painful, but crucial.
>
> I think there's no way we could rip out JMX, there's just too many
> benefits to having it and effectively zero benefits to removing.  Part of
> me wonders if this is a bit of a hammer, and what we really want is
> "disable binary for non-admins".  I'm not sure what the best path is to get
> there.  The local unix socket might be the easiest path as it allows us to
> disable network binary easily and still allow local admins, and allows the
> OS to reject the incoming connections vs passing that work onto a
> connection handler which would have to evaluate whether or not the user can
> connect.  If a node is already in a bad spot requring disable binary, it's
> probably not a good idea to have it get DDOS'ed as part of the remediation.
>
> I think it's safe to say there's no appetite to remove JMX, at least not
> for anyone that would have to rework their entire admin control plane, plus
> whatever is out there in OSS provisioning tools like puppet / chef / etc
> that rely on JMX.  I see no value whatsoever in removing it.
>
> I should probably have phrased my earlier email a bit differently.  Maybe
> this is better:
>
> Fundamentally, I think it's better for the project if administration is
> fully supported over CQL in addition

Re: [DISCUSSION] CEP-38: CQL Management API

2024-01-08 Thread Jon Haddad
It's great to see where this is going and thanks for the discussion on the
ML.

Personally, I think adding two new ways of accomplishing the same thing is
a net negative.  It means we need more documentation and creates
inconsistencies across tools and users.  The tradeoffs you've listed are
worth considering, but in my opinion adding 2 new ways to accomplish the
same thing hurts the project more than it helps.

> - I'd like to see a symmetry between the JMX and CQL APIs, so that users
will have a sense of the commands they are using and are less
likely to check the documentation;

I've worked with a couple hundred teams and I can only think of a few who
use JMX directly.  It's done very rarely.  After 10 years, I still have to
look up the JMX syntax to do anything useful, especially if there's any
quoting involved.  Power users might know a handful of JMX commands by
heart, but I suspect most have a handful of bash scripts they use instead,
or have a sidecar.  I also think very few users will migrate their
management code from JMX to CQL, nor do I imagine we'll move our own tools
until the `disablebinary` problem is solved.

> - It will be easier for us to move the nodetool from the jmx client that
is used under the hood to an implementation based on a java-driver and use
the CQL for the same;

I can't imagine this would make a material difference.  If someone's
rewriting a nodetool command, how much time will be spent replacing the JMX
call with a CQL one?  Looking up a virtual table isn't going to be what
consumes someone's time in this process.  Again, this won't be done without
solving `nodetool disablebinary`.

> if we have cassandra-15254 merged, it will cost almost nothing to support
the exec syntax for setting properties;

My concern is more about the weird user experience of having two ways of
doing the same thing, less about the technical overhead of adding a second
implementation.  I propose we start simple, see if any of the reasons
you've listed are actually a real problem, then if they are, address the
issue in a follow up.

If I'm wrong, it sounds like it's fairly easy to add `exec` for changing
configs.  If I'm right, we'll have two confusing syntaxes forever.  It's a
lot easier to add something later than take it away.

How does that sound?

Jon




On Mon, Jan 8, 2024 at 7:55 PM Maxim Muzafarov  wrote:

> > Some operations will no doubt require a stored procedure syntax, but
> perhaps it would be a good idea to split the work into two:
>
> These are exactly the first steps I have in mind:
>
> [Ready for review]
> Allow UPDATE on settings virtual table to change running configurations
> https://issues.apache.org/jira/browse/CASSANDRA-15254
>
> This issue is specifically aimed at changing the configuration
> properties we are talking about (value is in yaml format):
> e.g. UPDATE system_views.settings SET compaction_throughput = 128Mb/s;
>
> [Ready for review]
> Expose all table metrics in virtual table
> https://issues.apache.org/jira/browse/CASSANDRA-14572
>
> This is to observe the running configuration and all available metrics:
> e.g. select * from system_views.thread_pools;
>
>
> I hope both of the issues above will become part of the trunk branch
> before we move on to the CQL management commands. In this topic, I'd
> like to discuss the design of the CQL API, and gather feedback, so
> that I can prepare a draft of changes to look at without any
> surprises, and that's exactly what this discussion is about.
>
>
> cqlsh> UPDATE system.settings SET compaction_throughput = 128;
> cqlsh> exec setcompactionthroughput 128
>
> I don't mind removing the exec command from the CQL command API which
> is intended to change settings. Personally, I see the second option as
> just an alias for the first command, and in fact, they will have the
> same implementation under the hood, so please consider the rationale
> below:
>
> - I'd like to see a symmetry between the JMX and CQL APIs, so that
> users will have a sense of the commands they are using and are less
> likely to check the documentation;
> - It will be easier for us to move the nodetool from the jmx client
> that is used under the hood to an implementation based on a
> java-driver and use the CQL for the same;
> - if we have cassandra-15254 merged, it will cost almost nothing to
> support the exec syntax for setting properties;
>
> On Mon, 8 Jan 2024 at 20:13, Jon Haddad  wrote:
> >
> > Ugh, I moved some stuff around and 2 paragraphs got merged that
> shouldn't have been.
> >
> > I think there's no way we could rip out JMX, there's just too many
> benefits to having it and effectively zero benefits to removing.
> >
> &g

Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-16 Thread Jon Haddad
Server side rate limiting can be useful, but imo if we were to focus effort
into a single place, time would be much better spent adding adaptive rate
limiting to the drivers.

Rate limiting at the driver level can be done based on 2 simple feedback
mechanisms - error rate and latency.  When a node is throwing errors (or
exceeds the defined latency SLO), requests to that node can be rate
limited.  It does a better job of preventing issues than server side rate
limiting as we don't get the DDOS effect in addition to whatever issue the
node is dealing with at the time.  Netflix has a good post on it here [1],
and I've incorporated the latency version into my fork of tlp-stress [2]
for benchmarking.

Adding this to the driver means the DS Spark Connector can also take
advantage of it, which is nice because tuning it to get the
optimal throughput is a bit of a headache.

Regarding the server side, I think the proposal to use various system
metrics is overly complicated.  The only metrics that matter are latency
and error rate.  As long as you're within acceptable thresholds, you don't
need to rate limit.

Jon

[1] https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581
[2]
https://rustyrazorblade.com/post/2023/2023-10-31-tlp-stress-adaptive-scheduler/

On Tue, Jan 16, 2024 at 10:02 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Hi Jaydeep,
>
> That seems quite interesting. Couple points though:
>
> 1) It would be nice if there is a way to "subscribe" to decisions your
> detection framework comes up with. Integration with e.g. diagnostics
> subsystem would be beneficial. This should be pluggable - just coding up an
> interface to dump / react on the decisions how I want. This might also act
> as a notifier to other systems, e-mail, slack channels ...
>
> 2) Have you tried to incorporate this with the Guardrails framework? I
> think that if something is detected to be throttled or rejected (e.g
> writing to a table), there might be a guardrail which would be triggered
> dynamically in runtime. Guardrails are useful as such but here we might
> reuse them so we do not need to code it twice.
>
> 3) I am curious how complex this detection framework would be, it can be
> complicated pretty fast I guess. What would be desirable is to act on it in
> such a way that you will not put that node under even more pressure. In
> other words, your detection system should work in such a way that there
> will not be any "doom loop" whereby mere throttling of various parts of
> Cassandra you make it even worse for other nodes in the cluster. For
> example, if a particular node starts to be overwhelmed and you detect this
> and requests start to be rejected, is it not possible that Java driver
> would start to see this node as "erroneous" with delayed response time etc
> and it would start to prefer other nodes in the cluster when deciding what
> node to contact for query coordination? So you would put more load on other
> nodes, making them more susceptible to be throttled as well ...
>
> Regards
>
> Stefan Miklosovic
>
> On Tue, Jan 16, 2024 at 6:41 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Hi,
>>
>> Happy New Year!
>>
>> I would like to discuss the following idea:
>>
>> Open-source Cassandra (CASSANDRA-15013
>> ) has an
>> elementary built-in memory rate limiter based on the incoming payload from
>> user requests. This rate limiter activates if any incoming user request’s
>> payload exceeds certain thresholds. However, the existing rate limiter only
>> solves limited-scope issues. Cassandra's server-side meltdown due to
>> overload is a known problem. Often we see that a couple of busy nodes take
>> down the entire Cassandra ring due to the ripple effect. The following
>> document proposes a generic purpose comprehensive rate limiter that works
>> considering system signals, such as CPU, and internal signals, such as
>> thread pools. The rate limiter will have knobs to filter out internal
>> traffic, system traffic, replication traffic, and furthermore based on the
>> types of queries.
>>
>> More design details to this doc: [OSS] Cassandra Generic Purpose Rate
>> Limiter - Google Docs
>> 
>>
>> Please let me know your thoughts.
>>
>> Jaydeep
>>
>


Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jon Haddad
I am definitely +1 on the ability to rate limit operations to tables and
keyspaces, and if we can do it at a granular level per user I'm +1 to that
as well.  I think this would need to be exposed to the operator regardless
of any automatic rate limiter.

Thinking about the bigger picture for a minute, I think there's a few
things we could throttle dynamically on the server before limiting the
client requests.  I've long wanted to see a dynamic rate limiter with
compaction and any streaming operation - using resources when they're
available but slowing down to allow an influx of requests.  Being able to
throttle background operations to free up resources to ensure the DB stays
online and healthy would be a big win.

> The major challenge with latency based rate limiters is that the latency
is subjective from one workload to another.

You're absolutely right.  This goes to my other suggestion that client-side
rate limiting would be a higher priority (on my list at least) as it is
perfectly suited for multiple varying workloads.  Of course, if you're not
interested in working on the drivers and only on C* itself, this is a moot
point.  You're free to work on whatever you want - I just think there's a
ton more value in the drivers being able to throttle requests to deal than
server side.

> And if these two are +ve then consider the server under pressure. And
once it is under the pressure, then shed the traffic from less aggressive
to more aggressive, etc. The idea is to prevent Cassandra server from
melting (by considering the above two signals to begin with and add any
more based on the learnings)

Yes, I agree using dropped metrics (errors) is useful, as well as queue
length.  I can't remember offhand all the details of the request queue and
how load shedding works there, I need to go back and look.  If we don't
already have load shedding based on queue depth that seems like an easy
thing to do immediately, and is a high quality signal.  Maybe someone can
remind me if we have that already?

My issue with using CPU to rate limit clients is that I think it's a very
low quality signal, and I suspect it'll trigger a ton of false positives.
For example, there's a big difference from performance being impacted by
repair vs large reads vs backing up a snapshot to an object store, but they
have similar effects on the CPU - high I/O, high CPU usage, both sustained
over time.  Imo it would be a pretty bad decision to throttle clients when
we should be throttling repair instead, and we should only do so if it's
actually causing an issue for the client, something CPU usage can't tell
us, only the response time and error rates can.

In the case of a backup, throttling might make sense, or might not, it
really depends on the environment and if backups are happening
concurrently.  If a backup's configured with nice +19 (as it should be),
I'd consider throttling user requests to be a false positive, potentially
one that does more harm than good to the cluster, since the OS should be
deprioritizing the backup for us rather than us deprioritizing C*.

In my ideal world, if C* detected problematic response times (possibly
violating a per-table, target latency time) or query timeouts, it would
start by throttling back compactions, repairs, and streaming to ensure
client requests can be serviced.  I think we'd need to define the latency
targets in order for this to work optimally, b/c you might not want to wait
for query timeouts before you throttle.  I think there's a lot of value in
dynamically adaptive compaction, repair, and streaming since it would
prioritize user requests, but again, if you're not willing to work on that,
it's your call.

Anyways - I like the idea of putting more safeguards in the database
itself, we're fundamentally in agreement there.  I see a ton of value in
having flexible rate limiters, whether it be per-table, keyspace, or
user+table combination.  I'd also like to ensure the feature doesn't cause
more disruptions than it solves, which I think would be the case from using
CPU usage as a signal.

Jon


On Wed, Jan 17, 2024 at 10:26 AM Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Jon,
>
> The major challenge with latency based rate limiters is that the latency
> is subjective from one workload to another. As a result, in the proposal I
> have described, the idea is to make decision on the following combinations:
>
>1. System parameters (such as CPU usage, etc.)
>2. Cassandra thread pools health (are they dropping requests, etc.)
>
> And if these two are +ve then consider the server under pressure. And once
> it is under the pressure, then shed the traffic from less aggressive to
> more aggressive, etc. The idea is to prevent Cassandra server from melting
> (by considering the above two signals to begin with and add any more based
> on the learnings)
>
> Scott,
>
> Yes, I did look at some of the implementations, but they are all great
> systems and helping quite a lot. But they are s

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jon Haddad
> The problem with generalizing things is if you’re behind on compaction,
reads get expensive, so you pause compaction completely, you’re SOL and
you’ll eventually have to throttle traffic to recover

Yeah - there's definitely quite a few ways this can go sideways, and this
is a good example that won't be consistent across deployments.  There's a
lot of variables to consider.  I agree that building the machinery for
operators to make adjustments is the right first step.  Assuming we get all
the rate limiting options available over CQL and stats available via
virtual tables, operators can make whatever decisions they feel is best,
and we'd hopefully get some good feedback about what works well and what
doesn't.

Jon




On Thu, Jan 18, 2024 at 4:16 PM Jeff Jirsa  wrote:

> The problem with generalizing things is if you’re behind on compaction,
> reads get expensive, so you pause compaction completely, you’re SOL and
> you’ll eventually have to throttle traffic to recover
>
> The SEDA model is bad at back pressure and deferred cost makes it
> non-obvious which resource to slow to ensure stability
>
> Just start by exposing it instead of pretending we can outsmart the very
> complex system
>
> On Jan 18, 2024, at 4:56 PM, Jon Haddad  wrote:
>
> 
> I am definitely +1 on the ability to rate limit operations to tables and
> keyspaces, and if we can do it at a granular level per user I'm +1 to that
> as well.  I think this would need to be exposed to the operator regardless
> of any automatic rate limiter.
>
> Thinking about the bigger picture for a minute, I think there's a few
> things we could throttle dynamically on the server before limiting the
> client requests.  I've long wanted to see a dynamic rate limiter with
> compaction and any streaming operation - using resources when they're
> available but slowing down to allow an influx of requests.  Being able to
> throttle background operations to free up resources to ensure the DB stays
> online and healthy would be a big win.
>
> > The major challenge with latency based rate limiters is that the latency
> is subjective from one workload to another.
>
> You're absolutely right.  This goes to my other suggestion that
> client-side rate limiting would be a higher priority (on my list at least)
> as it is perfectly suited for multiple varying workloads.  Of course, if
> you're not interested in working on the drivers and only on C* itself, this
> is a moot point.  You're free to work on whatever you want - I just think
> there's a ton more value in the drivers being able to throttle requests to
> deal than server side.
>
> > And if these two are +ve then consider the server under pressure. And
> once it is under the pressure, then shed the traffic from less aggressive
> to more aggressive, etc. The idea is to prevent Cassandra server from
> melting (by considering the above two signals to begin with and add any
> more based on the learnings)
>
> Yes, I agree using dropped metrics (errors) is useful, as well as queue
> length.  I can't remember offhand all the details of the request queue and
> how load shedding works there, I need to go back and look.  If we don't
> already have load shedding based on queue depth that seems like an easy
> thing to do immediately, and is a high quality signal.  Maybe someone can
> remind me if we have that already?
>
> My issue with using CPU to rate limit clients is that I think it's a very
> low quality signal, and I suspect it'll trigger a ton of false positives.
> For example, there's a big difference from performance being impacted by
> repair vs large reads vs backing up a snapshot to an object store, but they
> have similar effects on the CPU - high I/O, high CPU usage, both sustained
> over time.  Imo it would be a pretty bad decision to throttle clients when
> we should be throttling repair instead, and we should only do so if it's
> actually causing an issue for the client, something CPU usage can't tell
> us, only the response time and error rates can.
>
> In the case of a backup, throttling might make sense, or might not, it
> really depends on the environment and if backups are happening
> concurrently.  If a backup's configured with nice +19 (as it should be),
> I'd consider throttling user requests to be a false positive, potentially
> one that does more harm than good to the cluster, since the OS should be
> deprioritizing the backup for us rather than us deprioritizing C*.
>
> In my ideal world, if C* detected problematic response times (possibly
> violating a per-table, target latency time) or query timeouts, it would
> start by throttling back compactions, repairs, and streaming t

Re: [Discuss] CASSANDRA-16999 introduction of a column in system.peers_v2

2024-02-13 Thread Jon Haddad
+1 to deprecating dual ports and removing in 5.0

On Tue, Feb 13, 2024 at 4:29 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Alright ...  so how I am interpreting this, even more so after Sam's and
> Brandon's mail, is that we should just get rid of that completely in trunk
> and deprecate in 5.0.
>
> There are already patches for 3.x and 4.x branches of the driver so the
> way I was looking at that was that we might resurrect this feature but if
> there is actually no need for this then the complete removal in trunk is
> probably unavoidable.
>
> On Tue, Feb 13, 2024 at 1:27 PM Brandon Williams  wrote:
>
>> On Tue, Feb 13, 2024 at 6:17 AM Sam Tunnicliffe  wrote:
>> > Also, if CASSANDRA-16999 is only going to trunk, why can't we just
>> deprecate dual ports in 5.0 (as it isn't at -rc stage yet) and remove it
>> from trunk? That seems preferable to shoehorning something into the new
>> system_views.peers table, which isn't going to help any existing drivers
>> anyway as none of them will be using it.
>>
>> I agree and I think it will be a mess having the port in 3.x, then not
>> in 4.0, 4.1, or 5.0, then resurrected again after that.
>>
>> Kind Regards,
>> Brandon
>>
>


Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-14 Thread Jon Haddad
Stefan, can you elaborate on what you are proposing?  It's not clear (at
least to me) what level of testing you're advocating for.  Dropping testing
both on dev branches, every commit, just on release?  In addition, can you
elaborate on what is a hassle about it?  It's been a long time since I
committed anything but I don't remember 2 JVMs (8 & 11) being a problem.

Jon



On Wed, Feb 14, 2024 at 2:35 PM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> I agree with Jacek, I don't quite understand why we are running the
> pipeline for j17 and j11 every time. I think this should be opt-in.
> Majority of the time, we are just refactoring and coding stuff for
> Cassandra where testing it for both jvms is just pointless and we _know_
> that it will be fine in 11 and 17 too because we do not do anything
> special. If we find some subsystems where testing that on both jvms is
> crucial, we might do that, I just do not remember when it was last time
> that testing it in both j17 and j11 suddenly uncovered some bug. Seems more
> like a hassle.
>
> We might then test the whole pipeline with a different config basically
> for same time as we currently do.
>
> On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
>> śr., 14 lut 2024 o 17:30 Josh McKenzie  napisał(a):
>>
>>> When we have failing tests people do not spend the time to figure out if
>>> their logic caused a regression and merge, making things more unstable… so
>>> when we merge failing tests that leads to people merging even more failing
>>> tests...
>>>
>>> What's the counter position to this Jacek / Berenguer?
>>>
>>
>> For how long are we going to deceive ourselves? Are we shipping those
>> features or not? Perhaps it is also a good opportunity to distinguish
>> subsets of tests which make sense to run with a configuration matrix.
>>
>> If we don't add those tests to the pre-commit pipeline, "people do not
>> spend the time to figure out if their logic caused a regression and merge,
>> making things more unstable…"
>> I think it is much more valuable to test those various configurations
>> rather than test against j11 and j17 separately. I can see a really little
>> value in doing that.
>>
>>
>>


Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-15 Thread Jon Haddad
Would it make sense to only block commits on the test strategy you've
listed, and shift the entire massive test suite to post-commit?  If there
really is only a small % of times the entire suite is useful this seems
like it could unblock the dev cycle but still have the benefit of the full
test suite.



On Thu, Feb 15, 2024 at 3:18 AM Berenguer Blasi 
wrote:

> On reducing circle ci usage during dev while iterating, not with the
> intention to replace the pre-commit CI (yet), we could do away with testing
> only dtests, jvm-dtests, units and cqlsh for a _single_ configuration imo.
> That would greatly reduce usage. I hacked it quickly here for illustration
> purposes:
> https://app.circleci.com/pipelines/github/bereng/cassandra/1164/workflows/3a47c9ef-6456-4190-b5a5-aea2aff641f1
> The good thing is that we have the tooling to dial in whatever we decide
> atm.
>
> Changing pre-commit is a different discussion, to which I agree btw. But
> the above could save time and $ big time during dev and be done and merged
> in a matter of days imo.
>
> I can open a DISCUSS thread if we feel it's worth it.
> On 15/2/24 10:24, Mick Semb Wever wrote:
>
>
>
>> Mick and Ekaterina (and everyone really) - any thoughts on what test
>> coverage, if any, we should commit to for this new configuration?
>> Acknowledging that we already have *a lot* of CI that we run.
>>
>
>
>
> Branimir in this patch has already done some basic cleanup of test
> variations, so this is not a duplication of the pipeline.  It's a
> significant improvement.
>
> I'm ok with cassandra_latest being committed and added to the pipeline,
> *if* the authors genuinely believe there's significant time and effort
> saved in doing so.
>
> How many broken tests are we talking about ?
> Are they consistently broken or flaky ?
> Are they ticketed up and 5.0-rc blockers ?
>
> Having to deal with flakies and broken tests is an unfortunate reality to
> having a pipeline of 170k tests.
>
> Despite real frustrations I don't believe the broken windows analogy is
> appropriate here – it's more of a leave the campground cleaner…   That
> being said, knowingly introducing a few broken tests is not that either,
> but still having to deal with a handful of consistently breaking tests
> for a short period of time is not the same cognitive burden as flakies.
> There are currently other broken tests in 5.0: VectorUpdateDeleteTest,
> upgrade_through_versions_test; are these compounding to the frustrations ?
>
> It's also been questioned about why we don't just enable settings we
> recommend.  These are settings we recommend for new clusters.  Our existing
> cassandra.yaml needs to be tailored for existing clusters being upgraded,
> where we are very conservative about changing defaults.
>
>


Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-15 Thread Jon Haddad
e-commit:*
>>
>>- Build on all supported jdks
>>- All test suites on highest supported jdk using recommended config
>>- Repeat testing on new or changed tests on highest supported JDK
>>w/recommended config
>>- JDK-based test suites on highest supported jdk using other config
>>
>> *Post-commit:*
>>
>>- Run everything. All suites, all supported JDK's, both config files.
>>
>> With Butler + the *jenkins-jira* integration script
>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>(need
>> to dust that off but it should remain good to go), we should have a pretty
>> clear view as to when any consistent regressions are introduced and why.
>> We'd remain exposed to JDK-specific flake introductions and flakes in
>> unchanged tests, but there's no getting around the 2nd one and I expect the
>> former to be rare enough to not warrant the compute to prevent it.
>>
>> On Thu, Feb 15, 2024, at 10:02 AM, Jon Haddad wrote:
>>
>> Would it make sense to only block commits on the test strategy you've
>> listed, and shift the entire massive test suite to post-commit?  If there
>> really is only a small % of times the entire suite is useful this seems
>> like it could unblock the dev cycle but still have the benefit of the full
>> test suite.
>>
>>
>>
>> On Thu, Feb 15, 2024 at 3:18 AM Berenguer Blasi 
>> wrote:
>>
>>
>> On reducing circle ci usage during dev while iterating, not with the
>> intention to replace the pre-commit CI (yet), we could do away with testing
>> only dtests, jvm-dtests, units and cqlsh for a _single_ configuration imo.
>> That would greatly reduce usage. I hacked it quickly here for illustration
>> purposes:
>> https://app.circleci.com/pipelines/github/bereng/cassandra/1164/workflows/3a47c9ef-6456-4190-b5a5-aea2aff641f1
>> The good thing is that we have the tooling to dial in whatever we decide
>> atm.
>>
>> Changing pre-commit is a different discussion, to which I agree btw. But
>> the above could save time and $ big time during dev and be done and merged
>> in a matter of days imo.
>>
>> I can open a DISCUSS thread if we feel it's worth it.
>> On 15/2/24 10:24, Mick Semb Wever wrote:
>>
>>
>>
>> Mick and Ekaterina (and everyone really) - any thoughts on what test
>> coverage, if any, we should commit to for this new configuration?
>> Acknowledging that we already have *a lot* of CI that we run.
>>
>>
>>
>>
>> Branimir in this patch has already done some basic cleanup of test
>> variations, so this is not a duplication of the pipeline.  It's a
>> significant improvement.
>>
>> I'm ok with cassandra_latest being committed and added to the pipeline,
>> *if* the authors genuinely believe there's significant time and effort
>> saved in doing so.
>>
>> How many broken tests are we talking about ?
>> Are they consistently broken or flaky ?
>> Are they ticketed up and 5.0-rc blockers ?
>>
>> Having to deal with flakies and broken tests is an unfortunate reality to
>> having a pipeline of 170k tests.
>>
>> Despite real frustrations I don't believe the broken windows analogy is
>> appropriate here – it's more of a leave the campground cleaner…   That
>> being said, knowingly introducing a few broken tests is not that either,
>> but still having to deal with a handful of consistently breaking tests
>> for a short period of time is not the same cognitive burden as flakies.
>> There are currently other broken tests in 5.0: VectorUpdateDeleteTest,
>> upgrade_through_versions_test; are these compounding to the frustrations ?
>>
>> It's also been questioned about why we don't just enable settings we
>> recommend.  These are settings we recommend for new clusters.  Our existing
>> cassandra.yaml needs to be tailored for existing clusters being upgraded,
>> where we are very conservative about changing defaults.
>>
>>
>>


Re: [DISCUSS] CQL handling of WHERE clause predicates

2024-03-26 Thread Jon Haddad
I like the idea of accepting more types of queries with fewer
restrictions.  I think we've been moving in the right direction, with SAI
opening up more types of query possibilities.

I think the long term path towards more flexibility requires paying off
some technical debt.  We have a ton of places where we over-allocate or
perform I/O sub-optimally.  I think in order to support more types of
queries we need to address this.  At the moment it's difficult for me to
envision a path towards complex queries with multiple predicates over
massive datasets without highly efficient distributed indexes and a serious
reduction in allocation amplification.

Here are the high level requirements as I see them in order to get there.

1. Optimized read path with minimal garbage.  This is a problem today, and
would become a bigger one as we read bigger datasets.  Relying on GC to
handle this for us isn't a great solution here, it takes up a massive
amount of CPU time and I've found it's fairly easy to lock up an entire
cluster using ZGC.  We need to reduce allocations on reads.  I don't see
any way around this.  Fortunately there's some nice stuff coming in future
JDKs that allow for better memory management that we can take advantage
of.  Stefan just pointed this out to me a couple days ago:
https://openjdk.org/jeps/454, I think arenas would work nicely for
allocations during reads as well as memtables.

2. Optimized I/O for bulk reads. If we're going to start doing more random
I/O then we need to do better about treating it as a limited resource,
especially since so many teams are moving to IOPS limited volumes like EBS.
I've already filed CASSANDRA-15452 to address this from a compaction
standpoint, since it's massively wasteful right now.  If we're going to
support inequality searches we're going need more efficient table scans as
well.  We currently read chunk by chunk which is generally only ~8KB or so,
assuming 50% compression w/ 16KB chunk length.  CASSANDRA-15452 should make
it fairly easy to address the I/O waste during compaction, and I've just
filed CASSANDRA-19494 which would help with table scans.

3. Materialized views that have some guarantees of consistency that work
well with larger partitions.  The current state of them makes them unfit
for production since they can't be repaired and it's too easy to create MVs
that ruin a cluster.  I think we do need them for certain types of global
queries, but not all.  Anything that has very high cardinality with large
clusters would work better with MVs than round-robin of the entire cluster.

4. Secondary indexes that perform well for global searches, that scale
efficiently with SSTable counts.  I believe we still need node-centric 2i
that isn't SAI.  I'm testing SAI with global queries tomorrow but I can't
see how queries without partition restrictions that scale O(N) with sstable
counts will be performant.  We need a good indexing solution that is
node-centric.

5. Pagination.  Reevaluating entire queries in order to paginate is OK now
since there's not much to gain by materializing result sets up front.
There's a couple routes we could go down here, probably requiring multiple
strategies depending on the query type and cost.  This is probably most
impactful if we want to do any sorting, joins or aggregations.

I'm sure there's other things that would help, but these are the top 5 I
can think of right now.  I think at a bare minimum, to do != search we'd
want #1 and #2 as they would perform full cluster scans.  For other types
of inequality such as c > 5, we'd need #4.  #3 would make non-pk equality
searches friendly to large clusters, and #5 would help with the more
advanced types of SQL-ish queries.

Thanks for bringing this up, it's an interesting topic!
Jon


On Tue, Mar 26, 2024 at 8:35 AM Benjamin Lerer  wrote:

> Hi everybody,
>
> CQL appears to be inconsistent in how it handles predicates.
>
> One type of inconsistencies is that some operators can be used in some
> places but not in others or on some expressions but not others.
> For example:
>
>- != can be used in LWT conditions but not elsewhere
>- Token expressions (e.g. token(pk) = ?) support =, >, >=, =< and <
>but not IN
>- Map elements predicates (e.g m[?] = ?) only support =
>- ... (a long list)
>
> This type of inconsistencies can be easily fixed over time.
>
> The other type of inconsistencies that is more confusing is about how we
> deal with the combination of multiple predicates as we accept some and
> reject others.
> For example, we allow: "token(pk) > ? AND pk IN ?" but reject "c > ? AND
> c IN ?".
> For CQL, rejection seems to be the norm with only a few exceptions.
> Whereas SQL accepts all inputs in terms of predicates combination.
>
> For the IS NULL and IS NOT NULL predicates for partition Key and
> clustering columns, that we know cannot be null, that lead us to 2 choices:
> either throwing an exception when any of them is specified on partition
> keys/clusterin

Re: [DISCUSS] Modeling JIRA fix version for subprojects

2024-04-04 Thread Jon Haddad
Imo it would be better to have standalone JIRA projects for each of the
subprojects we have, just like we do the sidecar.

On Thu, Apr 4, 2024 at 10:47 AM Francisco Guerrero 
wrote:

> Hi Bret,
>
> Thanks for bringing up this issue. The Cassandra Analytics library will
> also need to have its own versioning. We should align on version naming
> for subprojects and start using it for both the Java Driver and the
> Analytics library.
>
> I propose the following versioning "java-driver-${version}" for the driver
> and "analytics-${version}" for Cassandra Analytics.
>
> Let me know what your thoughts are.
>
> Best,
> - Francisco
>
> On 2024/04/04 05:12:14 Bret McGuire wrote:
> >Greetings all!  For those I haven't met yet I'm Bret and I'm working
> > mainly on the newly-donated Java driver.  As part of that effort we've
> hit
> > upon an issue that we felt needed some additional discussion... and so
> you
> > now have an email from me. :)
> >
> >Our JIRA instance currently has a single field named "Fix Version/s"
> to
> > indicate the Cassandra version which will contain a fix for the
> > corresponding ticket; the field is populated with some (most? all?)
> > versions of the server.  The Java driver has a need for something
> similar,
> > but in our case we'd like for the options to correspond to Java driver
> > releases rather than Cassandra server releases.  To be clear there is no
> > explicit correlation between Java driver releases and any specific server
> > version or versions.
> >
> >How should we model this requirement?  We considered a few options:
> >
> > * Use the "Fix Version/s" field for both Cassandra and Java driver
> > versions; basically just add the Java driver versions to what we already
> > have.  There will be some overlap which could cause some confusion; the
> > most recent Java driver release was 4.18.0 which looks vaguely similar
> to,
> > say, 4.1.x.  Everybody can figure it out but the overlap might make that
> > more perplexing than we'd like.
> > * Add Java driver versions but use some sort of prefix specific to the
> > driver.  So instead of "4.18.0" we might have "java driver 4.18.0".
> > * Add a new field, perhaps "Java Driver Fix Version/s".  This field is
> only
> > used for Java driver tickets and is populated with known driver versions
> > (e.g. "4.18.0")
> >
> >Note that whatever choice is made here would presumably apply to *any*
> > subproject which maintains its own versioning scheme.
> >
> >The general consensus of the conversation was that the third option (a
> > "Java Driver Fix Version/s" field) was the cleanest option but it seemed
> > worthwhile raising this to the group as a whole.
> >
> >Thanks all!
> >
> >   - Bret -
> >
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-08 Thread Jon Haddad
This seems like a lot of work to create an rsync alternative.  I can't
really say I see the point.  I noticed your "rejected alternatives"
mentions it with this note:


   - However, it might not be permitted by the administrator or available
   in various environments such as Kubernetes or virtual instances like EC2.
   Enabling data transfer through a sidecar facilitates smooth instance
   migration.

This feels more like NIH than solving a real problem, as what you've listed
is a hypothetical, and one that's easily addressed.

Jon



On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala <
n.v.harikrishna.apa...@gmail.com> wrote:

> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. Depending on the volume of the storage service
> load, replacements (repair + bootstrap) may take anywhere from a few hours
> to days.
>
> Proposing a Sidecar based solution to address these challenges. This
> solution proposes transferring data from the old host (source) to the new
> host (destination) and then bringing up the Cassandra process at the
> destination, to enable fast instance migration. This approach would help to
> minimise node downtime, as it is based on a Sidecar solution for data
> transfer and avoids repairs and bootstrap.
>
> Looking forward to the discussions.
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>
> Thanks!
> Hari
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-11 Thread Jon Haddad
 encourage you to approach it from an attitude of seeking understanding on
>> the part of the first-time CEP author, as this reply casts it off pretty
>> quickly as NIH.
>>
>> The proposal isn't mine, but I'll offer a few notes on where I see this
>> as valuable:
>>
>> – It's valuable for Cassandra to have an ecosystem-native mechanism of
>> migrating data between physical/virtual instances outside the standard
>> streaming path. As Hari mentions, the current ecosystem-native approach of
>> executing repairs, decommissions, and bootstraps is time-consuming and
>> cumbersome.
>>
>> – An ecosystem-native solution is safer than a bunch of bash and rsync.
>> Defining a safe protocol to migrate data between instances via rsync
>> without downtime is surprisingly difficult - and even moreso to do safely
>> and repeatedly at scale. Enabling this process to be orchestrated by a
>> control plane mechanizing offical endpoints of the database and sidecar –
>> rather than trying to move data around behind its back – is much safer than
>> hoping one's cobbled together the right set of scripts to move data in a
>> way that won't violate strong / transactional consistency guarantees. This
>> complexity is kind of exemplified by the "Migrating One Instance" section
>> of the doc and state machine diagram, which illustrates an approach to
>> solving that problem.
>>
>> – An ecosystem-native approach poses fewer security concerns than rsync.
>> mTLS-authenticated endpoints in the sidecar for data movement eliminate the
>> requirement for orchestration to occur via (typically) high-privilege SSH,
>> which often allows for code execution of some form or complex efforts to
>> scope SSH privileges of particular users; and eliminates the need to manage
>> and secure rsyncd processes on each instance if not via SSH.
>>
>> – An ecosystem-native approach is more instrumentable and measurable than
>> rsync. Support for data migration endpoints in the sidecar would allow for
>> metrics reporting, stats collection, and alerting via mature and modern
>> mechanisms rather than monitoring the output of a shell script.
>>
>> I'll yield to Hari to share more, though today is a public holiday in
>> India.
>>
>> I do see this CEP as solving an important problem.
>>
>> Thanks,
>>
>> – Scott
>>
>> On Apr 8, 2024, at 10:23 AM, Jon Haddad  wrote:
>>
>>
>> This seems like a lot of work to create an rsync alternative.  I can't
>> really say I see the point.  I noticed your "rejected alternatives"
>> mentions it with this note:
>>
>>
>>- However, it might not be permitted by the administrator or
>>available in various environments such as Kubernetes or virtual instances
>>like EC2. Enabling data transfer through a sidecar facilitates smooth
>>instance migration.
>>
>> This feels more like NIH than solving a real problem, as what you've
>> listed is a hypothetical, and one that's easily addressed.
>>
>> Jon
>>
>>
>>
>> On Fri, Apr 5, 2024 at 3:47 AM Venkata Hari Krishna Nukala <
>> n.v.harikrishna.apa...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I have filed CEP-40 [1] for live migrating Cassandra instances using the
>>> Cassandra Sidecar.
>>>
>>> When someone needs to move all or a portion of the Cassandra nodes
>>> belonging to a cluster to different hosts, the traditional approach of
>>> Cassandra node replacement can be time-consuming due to repairs and the
>>> bootstrapping of new nodes. Depending on the volume of the storage service
>>> load, replacements (repair + bootstrap) may take anywhere from a few hours
>>> to days.
>>>
>>> Proposing a Sidecar based solution to address these challenges. This
>>> solution proposes transferring data from the old host (source) to the new
>>> host (destination) and then bringing up the Cassandra process at the
>>> destination, to enable fast instance migration. This approach would help to
>>> minimise node downtime, as it is based on a Sidecar solution for data
>>> transfer and avoids repairs and bootstrap.
>>>
>>> Looking forward to the discussions.
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>>>
>>> Thanks!
>>> Hari
>>>
>>
>>


Re: [VOTE] Release Apache Cassandra 3.0.30

2024-04-15 Thread Jon Haddad
+1

On Mon, Apr 15, 2024 at 7:49 AM Mick Semb Wever  wrote:

>
>
>
>> Proposing the test build of Cassandra 3.0.30 for release.
>>
>
>
> +1
>
>
> Checked
> - signing correct
> - checksums are correct
> - source artefact builds
> - binary artefact runs
> - debian package runs
> - debian repo installs and runs
>
>


Re: [VOTE] Release Apache Cassandra 3.11.17

2024-04-15 Thread Jon Haddad
+1

On Mon, Apr 15, 2024 at 8:03 AM Mick Semb Wever  wrote:

>
>
>
>> Proposing the test build of Cassandra 3.11.17 for release.
>
>
>
>
>
> +1
>
>
> Checked
> - signing correct
> - checksums are correct
> - source artefact builds
> - binary artefact runs
> - debian package runs
> - debian repo installs and runs
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Jon Haddad
Ariel, having it in C* process makes sense to me.

Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
have no distinguishable difference in overhead from doing it using the
sidecar?  Since the underlying call is sendfile, never hitting userspace, I
can't see why we'd opt for the transfer in sidecar.  What's the
advantage of duplicating the work that's already been done?

I can see using the sidecar for coordination to start and stop instances or
do things that require something out of process.

Jon


On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg  wrote:

> Hi,
>
> If there is a faster/better way to replace a node why not  have Cassandra
> support that natively without the sidecar so people who aren’t running the
> sidecar can benefit?
>
> Copying files over a network shouldn’t be slow in C* and it would also
> already have all the connectivity issues solved.
>
> Regards,
> Ariel
>
> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
>
> Hi all,
>
> I have filed CEP-40 [1] for live migrating Cassandra instances using the
> Cassandra Sidecar.
>
> When someone needs to move all or a portion of the Cassandra nodes
> belonging to a cluster to different hosts, the traditional approach of
> Cassandra node replacement can be time-consuming due to repairs and the
> bootstrapping of new nodes. Depending on the volume of the storage service
> load, replacements (repair + bootstrap) may take anywhere from a few hours
> to days.
>
> Proposing a Sidecar based solution to address these challenges. This
> solution proposes transferring data from the old host (source) to the new
> host (destination) and then bringing up the Cassandra process at the
> destination, to enable fast instance migration. This approach would help to
> minimise node downtime, as it is based on a Sidecar solution for data
> transfer and avoids repairs and bootstrap.
>
> Looking forward to the discussions.
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>
> Thanks!
> Hari
>
>
>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-18 Thread Jon Haddad
Hmm... I guess if you're using encryption you can't use ZCS so there's that.

It probably makes sense to implement kernel TLS:
https://www.kernel.org/doc/html/v5.7/networking/tls.html

Then we can get ZCS all the time, for bootstrap & replacements.

Jon


On Thu, Apr 18, 2024 at 12:50 PM Jon Haddad  wrote:

> Ariel, having it in C* process makes sense to me.
>
> Please correct me if I'm wrong here, but shouldn't using ZCS to transfer
> have no distinguishable difference in overhead from doing it using the
> sidecar?  Since the underlying call is sendfile, never hitting userspace, I
> can't see why we'd opt for the transfer in sidecar.  What's the
> advantage of duplicating the work that's already been done?
>
> I can see using the sidecar for coordination to start and stop instances
> or do things that require something out of process.
>
> Jon
>
>
> On Thu, Apr 18, 2024 at 12:44 PM Ariel Weisberg  wrote:
>
>> Hi,
>>
>> If there is a faster/better way to replace a node why not  have Cassandra
>> support that natively without the sidecar so people who aren’t running the
>> sidecar can benefit?
>>
>> Copying files over a network shouldn’t be slow in C* and it would also
>> already have all the connectivity issues solved.
>>
>> Regards,
>> Ariel
>>
>> On Fri, Apr 5, 2024, at 6:46 AM, Venkata Hari Krishna Nukala wrote:
>>
>> Hi all,
>>
>> I have filed CEP-40 [1] for live migrating Cassandra instances using the
>> Cassandra Sidecar.
>>
>> When someone needs to move all or a portion of the Cassandra nodes
>> belonging to a cluster to different hosts, the traditional approach of
>> Cassandra node replacement can be time-consuming due to repairs and the
>> bootstrapping of new nodes. Depending on the volume of the storage service
>> load, replacements (repair + bootstrap) may take anywhere from a few hours
>> to days.
>>
>> Proposing a Sidecar based solution to address these challenges. This
>> solution proposes transferring data from the old host (source) to the new
>> host (destination) and then bringing up the Cassandra process at the
>> destination, to enable fast instance migration. This approach would help to
>> minimise node downtime, as it is based on a Sidecar solution for data
>> transfer and avoids repairs and bootstrap.
>>
>> Looking forward to the discussions.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-40%3A+Data+Transfer+Using+Cassandra+Sidecar+for+Live+Migrating+Instances
>>
>> Thanks!
>> Hari
>>
>>
>>


Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jon Haddad
I haven't looked at streaming over TLS, so I might be way off base here,
but our own docs (
https://cassandra.apache.org/doc/latest/cassandra/architecture/streaming.html)
say ZCS is not available when using encryption, and if we have to bring the
data into the JVM then I'm not sure how it would even work.  sendfile is a
direct file descriptor to file descriptor copy.  How are we simultaneously
doing kernel-only operations while also performing encryption in the JVM?

I'm assuming you mean something other than ZCS when you say "ZCS with
TLS"?  Maybe "no serde" streaming?

Jon




On Fri, Apr 19, 2024 at 2:36 PM C. Scott Andreas 
wrote:

> These are the salient points here for me, yes:
>
> > My understanding from the proposal is that Sidecar would be able to
> migrate from a Cassandra instance that is already dead and cannot recover.
>
> > That’s one thing I like about having it an external process — not that
> it’s bullet proof but it’s one less thing to worry about.
>
> The manual/rsync version of the state machine Hari describes in the CEP is
> one of the best escape hatches for migrating an instance that’s
> overstressed, limping on ailing hardware, or that has exhausted disk. If
> the system is functional but the C* process is in bad shape, it’s great to
> have a paved-path flow for migrating the instance and data to more capable
> hardware.
>
> I also agree in principle that “streaming should be just as fast via the
> C* process itself.” This hits a couple snags today:
>
> - This option isn’t available when the C* instance is struggling.
> - In the scenario of replacing an entire cluster’s hardware with new
> machines, applying this process to an entire cluster via host replacements
> of all instances (which also requires repairs) or by doubling then halving
> capacity is incredibly cumbersome and operationally-impacting to the
> database’s users - especially if the DB is already having a hard time.
> - The host replacement process also puts a lot of stress on gossip and is
> a great way to encounter all sorts of painful races if you perform it
> hundreds or thousands of times (but shouldn’t be a problem in TCM-world).
>
> So I think I agree with both points:
>
> - Cassandra should be able to do this itself.
> - It is also valuable to have a paved path implementation of a safe
> migration/forklift state machine when you’re in a bind, or need to do this
> hundreds or thousands of times.
>
> On zero copy: what really makes ZCS fast compared to legacy streaming is
> that the JVM is able to ship entire files around, rather than deserializing
> SSTables and reserializing them to stream each individual row. That’s the
> slow and expensive part. It’s true that TLS means you incur an extra memcpy
> as that stream is encrypted before it’s chunked into packets — but the cost
> of that memcpy for encryption pales in comparison to how slow
> deserializing/reserializing SSTables is/was.
>
> ZCS with TLS can push 20Gbps+ today on decent but not extravagant Xeon
> hardware. In-kernel TLS would also still encounter a memcpy in the
> encryption path; the kernel.org doc alludes to this via “the kernel will
> need to allocate a buffer for the encrypted data.” But it would allow using
> sendfile and cut a copy in userspace. If someone is interested in testing
> it out I’d love to learn what they find. It’s always a great surprise to
> learn there’s a more perf left on the table. This comparison looks
> promising: https://tinselcity.github.io/SSL_Sendfile/
>
> – Scott
>
> —
> Mobile
>
> On Apr 19, 2024, at 11:31 AM, Jordan West  wrote:
>
> 
> If we are considering the main process then we have to do some additional
> work to ensure that it doesn’t put pressure on the JVM and introduce
> latency. That’s one thing I like about having it an external process — not
> that it’s bullet proof but it’s one less thing to worry about.
>
> Jordan
>
> On Thu, Apr 18, 2024 at 15:39 Francisco Guerrero 
> wrote:
>
>> My understanding from the proposal is that Sidecar would be able to
>> migrate
>> from a Cassandra instance that is already dead and cannot recover. This
>> is a
>> scenario that is possible where Sidecar should still be able to migrate
>> to a new
>> instance.
>>
>> Alternatively, Cassandra itself could have some flag to start up with
>> limited
>> subsystems enabled to allow live migration.
>>
>> In any case, we'll need to weigh in the pros and cons of each alternative
>> and
>> decide if the live migration process can be handled within the C* process
>> itself
>> or if we allow this functionality to be handled by Sidecar.
>>
>> I am looking forward to this 

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jon Haddad
Jeff, this is probably the best explanation and justification of the idea
that I've heard so far.

I like it because

1) we really should have something official for backups
2) backups / object store would be great for analytics
3) it solves a much bigger problem than the single goal of moving instances.

I'm a huge +1 in favor of this perspective, with live migration being one
use case for backup / restore.

Jon


On Fri, Apr 19, 2024 at 7:08 PM Jeff Jirsa  wrote:

> I think Jordan and German had an interesting insight, or at least their
> comment made me think about this slightly differently, and I’m going to
> repeat it so it’s not lost in the discussion about zerocopy / sendfile.
>
> The CEP treats this as “move a live instance from one machine to another”.
> I know why the author wants to do this.
>
> If you think of it instead as “change backup/restore mechanism to be able
> to safely restore from a running instance”, you may end up with a cleaner
> abstraction that’s easier to think about (and may also be easier to
> generalize in clouds where you have other tools available ).
>
> I’m not familiar enough with the sidecar to know the state of
> orchestration for backup/restore, but “ensure the original source node
> isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe
> “forcibly exclude the original instance from the cluster” are all things
> the restore code is going to need to do anyway, and if restore doesn’t do
> that today, it seems like we can solve it once.
>
> Backup probably needs to be generalized to support many sources, too.
> Object storage is obvious (s3 download). Block storage is obvious (snapshot
> and reattach). Reading sstables from another sidecar seems reasonable, too.
>
> It accomplishes the original goal, in largely the same fashion, it just
> makes the logic reusable for other purposes?
>
>
>
>
>
> On Apr 19, 2024, at 5:52 PM, Dinesh Joshi  wrote:
>
> 
> On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg  wrote:
>
>>
>> If there is a faster/better way to replace a node why not  have Cassandra
>> support that natively without the sidecar so people who aren’t running the
>> sidecar can benefit?
>>
>
> I am not the author of the CEP so take whatever I say with a pinch of
> salt. Scott and Jordan have pointed out some benefits of doing this in the
> Sidecar vs Cassandra.
>
> Today Cassandra is able to do fast node replacements. However, this CEP is
> addressing an important corner case when Cassandra is unable to start up
> due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die
> on old hardware? Sure. However, you would still need operator intervention
> to start it up in some special mode both on the old and new node so the new
> node can peer with the old node, copy over its data and join the ring. This
> would still require some orchestration outside the database. The Sidecar
> can do that orchestration for the operator. The point I'm making here is
> that the CEP addresses a real issue. The way it is currently built can
> improve over time with improvements in Cassandra.
>
> Dinesh
>
>


Re: discuss: add to_human_size function

2024-04-25 Thread Jon Haddad
I can’t see a good reason not to support it. Seems like extra work to avoid
with no benefit.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Thu, Apr 25, 2024 at 7:16 AM Štefan Miklošovič <
stefan.mikloso...@gmail.com> wrote:

> Can you elaborate on intentionally not supporting some conversions? Are we
> safe to base these conversions on DataStorageUnit? We have set of units
> from BYTES to GIBIBYTES and respective methods on them which convert from
> that unit to whatever else. Is this OK to be used for the purposes of this
> feature? I would expect that once we have units like these and methods on
> them to convert from-to, it can be reused in wherever else.
>
> On Thu, Apr 25, 2024 at 4:06 PM Ekaterina Dimitrova 
> wrote:
>
>> All I am saying is be careful with adding those conversions not to end up
>> used while setting our configuration. Thanks 🙏
>>
>> On Thu, 25 Apr 2024 at 6:53, Štefan Miklošovič <
>> stefan.mikloso...@gmail.com> wrote:
>>
>>> Well, technically I do not need DataStorageSpec at all. All I need is
>>> DataStorageUnit for that matter. That can convert from one unit to another
>>> easily.
>>>
>>> We can omit tebibytes, that's just fine. People would need to live with
>>> gibibytes at most in cqlsh output. They would not get 5 TiB but 5120 GiB, I
>>> guess that is just enough to have a picture of what magnitude that value
>>> looks like.
>>>
>>> On Thu, Apr 25, 2024 at 3:36 PM Ekaterina Dimitrova <
>>> e.dimitr...@gmail.com> wrote:
>>>
>>>> Quick comment:
>>>>
>>>> DataRateSpec, DataStorageSpec, or DurationSpec
>>>> - we intentionally do not support going smaller to bigger size in those
>>>> classes which are specific for cassandra.yaml - precision issues. Please
>>>> keep it that way. That is why the notion of min unit was added in
>>>> cassandra.yaml for parameters that are internally represented in a bigger
>>>> unit.
>>>>
>>>> I am not sure that people want to add TiB. There was explicit agreement
>>>> what units we will allow in cassandra.yaml. I suspect any new units should
>>>> be approved on the ML
>>>>
>>>> Hope this helps
>>>>
>>>>
>>>>
>>>> On Thu, 25 Apr 2024 at 5:55, Claude Warren, Jr via dev <
>>>> dev@cassandra.apache.org> wrote:
>>>>
>>>>> TiB is not yet in DataStorageSpec (perhaps we should add it).
>>>>>
>>>>> A quick review tells me that all the units are unique across the 3
>>>>> specs.  As long as we guarantee that in the future the method you propose
>>>>> should be easily expandable to the other specs.
>>>>>
>>>>> +1 to this idea.
>>>>>
>>>>> On Thu, Apr 25, 2024 at 12:26 PM Štefan Miklošovič <
>>>>> stefan.mikloso...@gmail.com> wrote:
>>>>>
>>>>>> That is a very interesting point, Claude. My so-far implementation is
>>>>>> using FileUtils.stringifyFileSize which is just dividing a value by a
>>>>>> respective divisor based on how big a value is. While this works, it will
>>>>>> prevent you from specifying what unit you want that value to be converted
>>>>>> to as well as it will prevent you from specifying what unit a value you
>>>>>> provided is of. So, for example, if a column is known to be in kibibytes
>>>>>> and we want that to be converted into gibibytes, that won't be possible
>>>>>> because that function will think that a value is in bytes.
>>>>>>
>>>>>> It would be more appropriate to have something like this:
>>>>>>
>>>>>> to_human_size(val) -> alias to FileUtils.stringifyFileSize, without
>>>>>> any source nor target unit, it will consider it to be in bytes and it 
>>>>>> will
>>>>>> convert it like in FileUtils.stringifyFileSize
>>>>>>
>>>>>> to_human_size(val, 'MiB') -> alias for to_human_size(val, 'B', 'MiB')
>>>>>> to_human_size(val, 'GiB') -> alias for to_human_size(val, 'B', 'GiB')
>>>>>>
>>>>>> the first argument is the source unit, the second argument is target
>>>>>> unit
>>>>>>
>>>>>> to_human_size(val, 'B', 'MiB')
>&

[DISCUSS] Donating easy-cass-stress to the project

2024-04-25 Thread Jon Haddad
I've been asked by quite a few people, both in person and in JIRA [1] about
contributing easy-cass-stress [2] to the project.  I've been happy to
maintain the project myself over the years but given its widespread use I
think it makes sense to make it more widely available and under the
project's umbrella.

My goal with the project was always to provide something that's easy to
use.  Up and running in a couple minutes, using the parameters to shape the
workload rather than defining everything through configuration.  I was
happy to make this tradeoff since Cassandra doesn't have very many types of
queries and it's worked well for me over the years.

Obviously I would continue working on this project, and I hope this would
encourage others to contribute.  I've heard a lot of good ideas that other
teams have implemented in their folks.  I'd love to see those ideas make it
into the project, and it sounds like it would be a lot easier for teams to
get approval to contribute if it was under the project umbrella.

Would love to hear your thoughts.

Thanks,
Jon

[1] https://issues.apache.org/jira/browse/CASSANDRA-18661
[2] https://github.com/rustyrazorblade/easy-cass-stress


Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-25 Thread Jon Haddad
Yeah, I agree with your concerns.  I very firmly prefer a separate
subproject.  I've got zero interest in moving from a modern Gradle project
to an ant based one.  It would be a lot of work for not much benefit.

If we wanted to replace cassandra-stress, I'd say bring in the release
artifact as part of the build process instead of tying it all together, but
I'm OK if we keep it separate as well.

Jon




On Thu, Apr 25, 2024 at 2:43 PM Brandon Williams  wrote:

> I want to begin by saying I am generally +1 on this because I have
> become a fan of easy-cass-stress after using it, but I am curious if
> this is intended to be a subproject, or replace cassandra-stress?  If
> the latter, we are going to have to reconcile the build systems
> somehow.  I don't really want to drag ECS back to ant, but I also
> don't want two different build systems in-tree.
>
> Kind Regards,
> Brandon
>
> On Thu, Apr 25, 2024 at 9:38 AM Jon Haddad  wrote:
> >
> > I've been asked by quite a few people, both in person and in JIRA [1]
> about contributing easy-cass-stress [2] to the project.  I've been happy to
> maintain the project myself over the years but given its widespread use I
> think it makes sense to make it more widely available and under the
> project's umbrella.
> >
> > My goal with the project was always to provide something that's easy to
> use.  Up and running in a couple minutes, using the parameters to shape the
> workload rather than defining everything through configuration.  I was
> happy to make this tradeoff since Cassandra doesn't have very many types of
> queries and it's worked well for me over the years.
> >
> > Obviously I would continue working on this project, and I hope this
> would encourage others to contribute.  I've heard a lot of good ideas that
> other teams have implemented in their folks.  I'd love to see those ideas
> make it into the project, and it sounds like it would be a lot easier for
> teams to get approval to contribute if it was under the project umbrella.
> >
> > Would love to hear your thoughts.
> >
> > Thanks,
> > Jon
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-18661
> > [2] https://github.com/rustyrazorblade/easy-cass-stress
>


Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-25 Thread Jon Haddad
I should probably have noted - since TLP is no more, I renamed tlp-stress
to easy-cass-stress around half a year ago when I took it over again.

Jon

On Thu, Apr 25, 2024 at 3:05 PM Jeff Jirsa  wrote:

> Unless there’s 2-3 other people who expect to keep working on it, I don’t
> see how we justify creating a subproject
>
> And if there’s not 2-3 people expressing interest, even pulling it into
> the main project seems risky
>
> So: besides Jon, who in the community expects/desires to maintain this
> going forward?
>
> On Apr 25, 2024, at 5:55 PM, Jon Haddad  wrote:
>
> 
> Yeah, I agree with your concerns.  I very firmly prefer a separate
> subproject.  I've got zero interest in moving from a modern Gradle project
> to an ant based one.  It would be a lot of work for not much benefit.
>
> If we wanted to replace cassandra-stress, I'd say bring in the release
> artifact as part of the build process instead of tying it all together, but
> I'm OK if we keep it separate as well.
>
> Jon
>
>
>
>
> On Thu, Apr 25, 2024 at 2:43 PM Brandon Williams  wrote:
>
>> I want to begin by saying I am generally +1 on this because I have
>> become a fan of easy-cass-stress after using it, but I am curious if
>> this is intended to be a subproject, or replace cassandra-stress?  If
>> the latter, we are going to have to reconcile the build systems
>> somehow.  I don't really want to drag ECS back to ant, but I also
>> don't want two different build systems in-tree.
>>
>> Kind Regards,
>> Brandon
>>
>> On Thu, Apr 25, 2024 at 9:38 AM Jon Haddad  wrote:
>> >
>> > I've been asked by quite a few people, both in person and in JIRA [1]
>> about contributing easy-cass-stress [2] to the project.  I've been happy to
>> maintain the project myself over the years but given its widespread use I
>> think it makes sense to make it more widely available and under the
>> project's umbrella.
>> >
>> > My goal with the project was always to provide something that's easy to
>> use.  Up and running in a couple minutes, using the parameters to shape the
>> workload rather than defining everything through configuration.  I was
>> happy to make this tradeoff since Cassandra doesn't have very many types of
>> queries and it's worked well for me over the years.
>> >
>> > Obviously I would continue working on this project, and I hope this
>> would encourage others to contribute.  I've heard a lot of good ideas that
>> other teams have implemented in their folks.  I'd love to see those ideas
>> make it into the project, and it sounds like it would be a lot easier for
>> teams to get approval to contribute if it was under the project umbrella.
>> >
>> > Would love to hear your thoughts.
>> >
>> > Thanks,
>> > Jon
>> >
>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-18661
>> > [2] https://github.com/rustyrazorblade/easy-cass-stress
>>
>


Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-26 Thread Jon Haddad
@mck I haven't done anything with IP clearance.  Not sure how to, and I
want to get a feel for if we even want it in the project before I invest
time in.  Jeff's question about people willing to maintain the project is a
good one and if people aren't willing to maintain it with me, it's only
going to make my life harder to move under the project umbrella.  I don't
want to go from my wild west style of committing whatever I want to waiting
around for days or weeks to get features committed.


Project rename happened here:

commit 6c9493254f7bed57f19aaf5bda19f0b7734b5333
Author: Jon Haddad 
Date:   Wed Feb 14 13:21:36 2024 -0800

Renamed the project





On Fri, Apr 26, 2024 at 12:50 AM Mick Semb Wever  wrote:

>
>
> On Fri, 26 Apr 2024 at 00:11, Jon Haddad  wrote:
>
>> I should probably have noted - since TLP is no more, I renamed tlp-stress
>> to easy-cass-stress around half a year ago when I took it over again.
>>
>
>
> Do we have the IP cleared for donation ?
> At what SHA did you take and rename tlp-stress, and who was the copyright
> holder til that point ?
> We can fix this I'm sure, but we will need the paperwork.
>
>
>


Re: [VOTE] Release Apache Cassandra 4.1.5

2024-05-02 Thread Jon Haddad
+1

On Thu, May 2, 2024 at 9:37 AM Brandon Williams 
wrote:

> Proposing the test build of Cassandra 4.1.5 for release.
>
> sha1: 6b134265620d6b39f9771d92edd29abdfd27de6a
> Git: https://github.com/apache/cassandra/tree/4.1.5-tentative
> Maven Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1329/org/apache/cassandra/cassandra-all/4.1.5/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/4.1.5/
>
> The vote will be open for 72 hours (longer if needed). Everyone who
> has tested the build is invited to vote. Votes by PMC members are
> considered binding. A vote passes if there are at least three binding
> +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/4.1.5-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/4.1.5-tentative/NEWS.txt
>


Re: [RESULT] [VOTE] Release Apache Cassandra 3.0.30

2024-05-07 Thread Jon Haddad
Brandon, myself, and Mick are +1 PMC votes.

On Tue, May 7, 2024 at 4:46 PM Justin Mclean  wrote:

> Hi,
>
> In the vote thread, there are only two explicit +1 PMC votes. In the
> future, it would be best to wait for three +1 votes, or the release manager
> should also vote.
>
> Kind Regards,
> Justin
>


Re: [RESULT] [VOTE] Release Apache Cassandra 3.0.30

2024-05-07 Thread Jon Haddad
Justin, I just re-read what you wrote, and I think you're saying that
you're not counting Brandon's original email as a vote?  Is that correct?

Jon

On Tue, May 7, 2024 at 6:01 PM Jon Haddad  wrote:

> Brandon, myself, and Mick are +1 PMC votes.
>
> On Tue, May 7, 2024 at 4:46 PM Justin Mclean  wrote:
>
>> Hi,
>>
>> In the vote thread, there are only two explicit +1 PMC votes. In the
>> future, it would be best to wait for three +1 votes, or the release manager
>> should also vote.
>>
>> Kind Regards,
>> Justin
>>
>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Jon Haddad
Is there a technical limitation that would prevent a range write that
functions the same way as a range tombstone, other than probably needing a
version bump of the storage format?


On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:

> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They
> do work on DELETE because under the hood C* they get translated into range
> tombstones.
>
> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>
>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
>> work.
>>
>> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>>
>> This is a great feature addition to CQL! I get asked about it from time
>> to time but then people figure out a workaround. It will be great to just
>> have it available.
>>
>> And right on Simon! I think the only project I had as a high school
>> senior was figuring out how many parties I could go to and still maintain a
>> passing grade. Thanks for your work here.
>>
>> Patrick
>>
>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer  wrote:
>>
>>> Hi everybody,
>>>
>>> Just raising awareness that Simon is working on adding support for the
>>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
>>> We plan to add support for it in conditions in a separate patch.
>>>
>>> The patch is available.
>>>
>>> As a side note, Simon chose to do his highschool senior project
>>> contributing to Apache Cassandra. This patch is his first contribution for
>>> his senior project (his second feature contribution to Apache Cassandra).
>>>
>>>
>>>
>>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-14 Thread Jon Haddad
Personally, I don't think that something being scary at first glance is a
good reason not to explore an idea.  The scenario you've described here is
tricky but I'm not expecting it to be any worse than say, SAI, which (the
last I checked) has O(N) complexity on returning result sets with regard to
rows returned.  We've also merged in Vector search which has O(N) overhead
with the number of SSTables.  We're still fundamentally looking at, in most
cases, a limited number of SSTables and some merging of values.

Write updates are essentially a timestamped mask, potentially overlapping,
and I suspect potentially resolvable during compaction by propagating the
values.  They could be eliminated or narrowed based on how they've
propagated by using the timestamp metadata on the SSTable.

It would be a lot more constructive to apply our brains towards solving an
interesting problem than pointing out all its potential flaws based on gut
feelings.  We haven't even moved this past an idea.

I think it would solve a massive problem for a lot of people and is 100%
worth considering.  Thanks Patrick and David for raising this.

Jon



On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev 
wrote:

> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> On 14/05/2024 16:59, Benjamin Lerer wrote:
>
> It should be like range tombstones ... in much worse ;-). A tombstone is a
> simple marker (deleted). An update can be far more complex.
>
> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>
>> Is there a technical limitation that would prevent a range write that
>> functions the same way as a range tombstone, other than probably needing a
>> version bump of the storage format?
>>
>>
>> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer 
>> wrote:
>>
>>> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs.
>>> They do work on DELETE because under the hood C* they get translated into
>>> range tombstones.
>>>
>>> Le mar. 14 mai 2024 à 02:44, David Capwell  a
>>> écrit :
>>>
>>>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this
>>>> work.
>>>>
>>>> On May 13, 2024, at 7:40 AM, Patrick McFadin 
>>>> wrote:
>>>>
>>>> This is a great feature addition to CQL! I get asked about it from time
>>>> to time but then people figure out a workaround. It will be great to just
>>>> have it available.
>>>>
>>>> And right on Simon! I think the only project I had as a high school
>>>> senior was figuring out how many parties I could go to and still maintain a
>>>> passing grade. Thanks for your work here.
>>>>
>>>> Patrick
>>>>
>>>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer 
>>>> wrote:
>>>>
>>>>> Hi everybody,
>>>>>
>>>>> Just raising awareness that Simon is working on adding support for the
>>>>> BETWEEN operator in WHERE clauses (SELECT and DELETE) in CASSANDRA-19604.
>>>>> We plan to add support for it in conditions in a separate patch.
>>>>>
>>>>> The patch is available.
>>>>>
>>>>> As a side note, Simon chose to do his highschool senior project
>>>>> contributing to Apache Cassandra. This patch is his first contribution for
>>>>> his senior project (his second feature contribution to Apache Cassandra).
>>>>>
>>>>>
>>>>>
>>>>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread Jon Haddad
I was trying to have a discussion about a technical possibility, not a cost
benefit analysis.  More of a "how could we technically reach mars?"
discussion than a "how we get congress to authorize a budget to reach mars?"

Happy to talk about this privately with anyone interested as I enjoy a
technical discussion for the sake of a good technical discussion.

Thanks,
Jon

On Wed, May 15, 2024 at 7:18 AM Josh McKenzie  wrote:

> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
> The technical limitation would be cost/benefit due to how this intersects
> w/our architecture I think.
>
> Range tombstones have taught us that something that should be relatively
> simple (merge in deletion mask at read time) introduces a significant
> amount of complexity on all the paths Benjamin enumerated with a pretty
> long tail of bugs and data incorrectness issues and edge cases. The work to
> get there, at a high level glance, would be:
>
>1. Updates to CQL grammar, spec
>2. Updates to write path
>3. Updates to accord. And thinking about how this intersects
>w/accord's WAL / logic (I think? Consider me not well educated on details
>here)
>4. Updates to compaction w/consideration for edge cases on all the
>different compaction strategies
>5. Updates to iteration and merge logic
>6. Updates to paging logic
>7. Indexing
>8. repair, both full and incremental implications, support, etc
>9. the list probably goes on? There's always >= 1 thing we're not
>thinking of with a change like this. Usually more.
>
> For all of the above we also would need unit, integration, and fuzz
> testing extensively to ensure the introduction of this new spanning concept
> on a write doesn't introduce edge cases where incorrect data is returned on
> merge.
>
> All of which is to say: it's an interesting problem, but IMO given our
> architecture and what we know about the past of trying to introduce an
> architectural concept like this, the costs to getting something like this
> to production ready are pretty high.
>
> To me the cost/benefit don't really balance out. Just my .02 though.
>
> On Tue, May 14, 2024, at 2:50 PM, Benjamin Lerer wrote:
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.
>
>
> It is not simply a gut feeling, Jon. This change impacts read, write,
> indexing, storage, compaction, repair... The risk and cost associated with
> it are pretty significant and I am not convinced at this point of its
> benefit.
>
> Le mar. 14 mai 2024 à 19:05, Jon Haddad  a écrit :
>
> Personally, I don't think that something being scary at first glance is a
> good reason not to explore an idea.  The scenario you've described here is
> tricky but I'm not expecting it to be any worse than say, SAI, which (the
> last I checked) has O(N) complexity on returning result sets with regard to
> rows returned.  We've also merged in Vector search which has O(N) overhead
> with the number of SSTables.  We're still fundamentally looking at, in most
> cases, a limited number of SSTables and some merging of values.
>
> Write updates are essentially a timestamped mask, potentially overlapping,
> and I suspect potentially resolvable during compaction by propagating the
> values.  They could be eliminated or narrowed based on how they've
> propagated by using the timestamp metadata on the SSTable.
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.  We haven't even moved this past an idea.
>
> I think it would solve a massive problem for a lot of people and is 100%
> worth considering.  Thanks Patrick and David for raising this.
>
> Jon
>
>
>
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>
> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> O

Re: [DISCUSS] Gossip Protocol Change

2024-05-16 Thread Jon Haddad
I have also recently worked with a teams who lost critical data as a result
of gossip issues combined with collision in our token allocation.  I
haven’t filed a jira yet as it slipped my mind but I’ve seen it in my own
testing as well. I’ll get a JIRA in describing it in detail.

It’s severe enough that it should probably block 5.0.

Jon

On Thu, May 16, 2024 at 10:37 AM Jordan West  wrote:

> I’m a big +1 on 18917 or more testing of gossip. While I appreciate that
> it makes TCM more complicated, gossip and schema propagation bugs have been
> the source of our two worst data loss events in the last 3 years. Data loss
> should immediately cause us to evaluate what we can do better.
>
> We will likely live with gossip for at least 1, maybe 2, more years.
> Otherwise outside of bug fixes (and to some degree even still) I think the
> only other solution is to not touch gossip *at all* until we are all
> TCM-only which I don’t think is practical or realistic. recent changes to
> gossip in 4.1 introduced several subtle bugs that had serious impact (from
> data loss to loss of ability to safely replace nodes in the cluster).
>
> I am happy to contribute some time to this if lack of folks is the issue.
>
> Jordan
>
> On Mon, May 13, 2024 at 17:05 David Capwell  wrote:
>
>> So, I created https://issues.apache.org/jira/browse/CASSANDRA-18917 which
>> lets you do deterministic gossip simulation testing cross large clusters
>> within seconds… I stopped this work as it conflicted with TCM (they were
>> trying to merge that week) and it hit issues where some nodes never
>> converged… I didn’t have time to debug so I had to drop the patch…
>>
>> This type of change would be a good reason to resurrect that patch as
>> testing gossip is super dangerous right now… its behavior is only in a few
>> peoples heads and even then its just bits and pieces scattered cross
>> multiple people (and likely missing pieces)…
>>
>> My brain is far too fried right now to say your idea is safe or not, but
>> honestly feel that we would need to improve our tests (we have 0) before
>> making such a change…
>>
>> I do welcome the patch though...
>>
>>
>> On May 12, 2024, at 8:05 PM, Zemek, Cameron via dev <
>> dev@cassandra.apache.org> wrote:
>>
>> In looking into CASSANDRA-19580 I noticed something that raises a
>> question. With Gossip SYN it doesn't check for missing digests. If its
>> empty for shadow round it will add everything from endpointStateMap to the
>> reply. But why not included missing entries in normal replies? The
>> branching for reply handling of SYN requests could then be merged into
>> single code path (though shadow round handles empty state different with
>> CASSANDRA-16213). Potential is performance impact as this requires doing a
>> set difference.
>>
>> For example, something along the lines of:
>>
>> ```
>> Set missing = new
>> HashSet<>(endpointStateMap.keySet());
>>
>> missing.removeAll(gDigestList.stream().map(GossipDigest::getEndpoint).collect(Collectors.toSet()));
>> for ( InetAddressAndPort endpoint : missing)
>> {
>> gDigestList.add(new GossipDigest(endpoint, 0, 0));
>> }
>> ```
>>
>> It seems odd to me that after shadow round for a new node we have
>> endpointStateMap with only itself as an entry. Then the only way it gets
>> the gossip state is by another node choosing to send the new node a gossip
>> SYN. The choosing of this is random. Yeah this happens every second so
>> eventually its going to receive one (outside the issue of CASSANDRA-19580
>> were it doesn't if its in a dead state like hibernate) , but doesn't this
>> open up bootstrapping to failures on very large clusters as it can take
>> longer before its sent a SYN (as the odds of being chosen for SYN get
>> lower)? For years been seeing bootstrap failures with 'Unable to contact
>> any seeds' but they are infrequent and never been able to figure out how to
>> reproduce in order to open a ticket, but I wonder if some of them have been
>> due to not receiving a SYN message before it does the seenAnySeed check.
>>
>>
>>


Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-16 Thread Jon Haddad
Benjamin, I’m +1 on adding BETWEEN, thanks for bringing this up.

To all, my intention wasn’t to suggest we add support for update between
via range writes at the same time, if it came across that way i apologize
for the confusion.

Josh, thanks for the suggestion. If I feel inspired to discuss with the dev
list any further I’ll be sure to start a new thread.

Jon


On Thu, May 16, 2024 at 7:57 AM Josh McKenzie  wrote:

> More of a "how could we technically reach mars?" discussion than a "how we
> get congress to authorize a budget to reach mars?"
>
> Wow - that is genuinely a great simile. Really good point.
>
> To Jeff's point - want to kick off a [DISCUSS] thread referencing this
> thread Jon so we can take the conversation there? Definitely think it's
> worth continuing from a technical perspective.
>
> On Wed, May 15, 2024, at 2:49 PM, Jeff Jirsa wrote:
>
> You can remove the shadowed values at compaction time, but you can’t ever
> fully propagate the range update to point updates, so you’d be propagating
> all of the range-update structures throughout everything forever. It’s JUST
> like a range tombstone - you don’t know what it’s shadowing (and can’t, in
> many cases, because the width of the range is uncountable for some types).
>
> Setting aside whether or not this construct is worth adding (I suspect a
> lot of binding votes would say it’s not), the thread focuses on BETWEEN
> operator, and there’s no reason we should pollute the conversation of “add
> a missing SQL operator that basically maps to existing functionality” with
> creation of a brand new form of update that definitely doesn’t map to any
> existing concepts.
>
>
>
>
>
> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
>
> Personally, I don't think that something being scary at first glance is a
> good reason not to explore an idea.  The scenario you've described here is
> tricky but I'm not expecting it to be any worse than say, SAI, which (the
> last I checked) has O(N) complexity on returning result sets with regard to
> rows returned.  We've also merged in Vector search which has O(N) overhead
> with the number of SSTables.  We're still fundamentally looking at, in most
> cases, a limited number of SSTables and some merging of values.
>
> Write updates are essentially a timestamped mask, potentially overlapping,
> and I suspect potentially resolvable during compaction by propagating the
> values.  They could be eliminated or narrowed based on how they've
> propagated by using the timestamp metadata on the SSTable.
>
> It would be a lot more constructive to apply our brains towards solving an
> interesting problem than pointing out all its potential flaws based on gut
> feelings.  We haven't even moved this past an idea.
>
> I think it would solve a massive problem for a lot of people and is 100%
> worth considering.  Thanks Patrick and David for raising this.
>
> Jon
>
>
>
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev <
> dev@cassandra.apache.org> wrote:
>
>
> Ranged update sounds like a disaster for compaction and read performance.
>
> Imagine compacting or reading some SSTables in which a large number of
> overlapping but non-identical ranges were updated with different values. It
> gives me a headache by just thinking about it.
>
> Ranged delete is much simpler, because the "value" is the same tombstone
> marker, and it also is guaranteed to expire and disappear eventually, so
> the performance impact of dealing with them at read and compaction time
> doesn't suffer in the long term.
>
> On 14/05/2024 16:59, Benjamin Lerer wrote:
>
> It should be like range tombstones ... in much worse ;-). A tombstone is a
> simple marker (deleted). An update can be far more complex.
>
> Le mar. 14 mai 2024 à 15:52, Jon Haddad  a écrit :
>
> Is there a technical limitation that would prevent a range write that
> functions the same way as a range tombstone, other than probably needing a
> version bump of the storage format?
>
>
> On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer  wrote:
>
> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. They
> do work on DELETE because under the hood C* they get translated into range
> tombstones.
>
> Le mar. 14 mai 2024 à 02:44, David Capwell  a écrit :
>
> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this work.
>
> On May 13, 2024, at 7:40 AM, Patrick McFadin  wrote:
>
> This is a great feature addition to CQL! I get asked about it from time to
> time but then people figure out a workaround. It will be great to just have
> it available.
>
> And right on Simon! I think

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-04 Thread Jon Haddad
The idea is interesting.  I think it would help to have more concrete
examples.  It's a bit sparse at the moment, and I have a hard time getting
on board with new features where the main selling point is Extensibility
over the value they provide on their own.

I think it would help a lot if we knew what types of constraints, besides
the size check, you were thinking of adding.

Jon

On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Yes, that is correct. This particular behavior will need CEP-24 in order
> to work reliably. But, if my understanding is correct, that statement holds
> true for the entirety of Guardrails, and not only for this particular
> feature.
>
> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
> That would work reliably in case there is no way how to misconfigure
> guardrails in the cluster. What if you set a guardrail on one node but you
> don’t set it (or set it differently) on the other? If it is configured
> differently and you want to check the guardrails if constraints do not
> violate them, then your query might fail or not based on what node is hit.
>
> I guess that guardrails would need to start to be transactional to be sure
> this is avoided and guardrails are indeed same everywhere (CEP-24 thread
> sent recently here in ML).
>
>
>
> *From: *Bernardo Botella 
> *Date: *Tuesday, 4 June 2024 at 00:31
> *To: *dev@cassandra.apache.org 
> *Cc: *Miklosovic, Stefan 
> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
> You don't often get email from conta...@bernardobotella.com. Learn why
> this is important 
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
>
> Basically, I am trying to protect the limits set by the operator against
> misconfigured schemas from the customers.
>
> I see the guardrails as a safety limit added by the operator, setting the
> limits within the customers owning the actual schema (and their
> constraints) can operate. With that vision, if a customer tries to “ignore”
> the actual limits set by the operator by adding more relaxed constraints,
> it gets a nice message saying that “that is not allowed for the cluster,
> please contact your admin".
>
>
>
>
> On Jun 3, 2024, at 2:51 PM, Miklosovic, Stefan via dev <
> dev@cassandra.apache.org> wrote:
>
> You wrote in the CEP:
>
> As we mentioned in the motivation section, we currently have some
> guardrails for columns size in place which can be extended for other data
> types.
> Those guardrails will take preference over the defined constraints in the
> schema, and a SCHEMA ALTER adding constraints that break the limits defined
> by the guardrails framework will fail.
> If the guardrails themselves are modified, operator should get a warning
> mentioning that there are schemas with offending constraints.
>
> I think that this should be other way around. Guardrails should kick in
> when there are no constraints and they would be overridden by table schema.
> That way, there is always a “default” in terms of guardrails (which one can
> turn off on demand / change) but you can override it by table alternation.
>
> Basically, what is in schema should win regardless of how guardrails are
> configured. They don’t matter when a constraint is explicitly specified in
> a schema. It should take the defaults in guardrails if there are any and no
> constraint is specified on schema level.
>
> What is your motivation to do it like you suggested?
>
>
> *From: *Bernardo Botella 
> *Date: *Friday, 31 May 2024 at 23:24
> *To: *dev@cassandra.apache.org 
> *Subject: *[DISCUSS] CEP-42: Constraints Framework
> You don't often get email from conta...@bernardobotella.com. Learn why
> this is important 
>
> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>
>
> Hello everyone,
>
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
> 
> cwiki.apache.org
> 
> 
> 
>
>
> And I’m looking for feedback from the community.
>
> Thanks a lot!
> Bernardo
>
>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-05 Thread Jon Haddad
I think there's some promising ideas here, but the CEP needs to be
developed a bit more.

> Another types of constraints and functions can be added in the future to
provide even more flexibility, but are out of the scope of this CEP.

> For the third point, I didn’t want to be prescriptive on what those
validations should be, but the fact that the proposal is extensible to
those potential use cases is something concrete that, in my opinion, comes
as a benefit of the actual proposal. I’d be happy to develop a bit more the
main example used of sizeOf if it helps alleviate your concerns on this
point.

I disagree, quite strongly, with this.  While I appreciate extensibility, I
think having a variety of actual constraints that ship with the feature
means it needs to be built to satisfy real world use cases.  Without going
through this process, it feels a bit too much like triggers, UDAs and UDFs
- incomplete, and too much left to the end user.

To me, punting on thinking through constraints kicks the most important can
down the road.

Jon


On Tue, Jun 4, 2024 at 5:37 PM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> In the CEP document there is another example (altho not explicetly
> mentioned) adding a constraint to the max value of an int ->
> `number_of_items int CONSTRAINT number_of_items < 1000`
>
> This basic example can also be used to expand on how to extend this
> functionality with these two initial constraints (size and value), by
> composing them to create new data types with proper validation.
>
> For example, this could create an ipv4 with built in validation:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> Or a color type:
> CREATE TYPE keyspace.color (
>   r int,
>   g int,
>   b int,
>   CONSTRAINT r >= 0,
>   CONSTRAINT r < 255,
>   CONSTRAINT g >= 0,
>   CONSTRAINT g < 255,
>   CONSTRAINT b >= 0,
>   CONSTRAINT b < 255,
> )
>
>
> Another types of constraints and functions can be added in the future to
> provide even more flexibility, but are out of the scope of this CEP.
>
> Bernardo
>
> On Jun 4, 2024, at 1:01 PM, Jon Haddad  wrote:
>
> The idea is interesting.  I think it would help to have more concrete
> examples.  It's a bit sparse at the moment, and I have a hard time getting
> on board with new features where the main selling point is Extensibility
> over the value they provide on their own.
>
> I think it would help a lot if we knew what types of constraints, besides
> the size check, you were thinking of adding.
>
> Jon
>
> On Mon, Jun 3, 2024 at 5:27 PM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Yes, that is correct. This particular behavior will need CEP-24 in order
>> to work reliably. But, if my understanding is correct, that statement holds
>> true for the entirety of Guardrails, and not only for this particular
>> feature.
>>
>> On Jun 3, 2024, at 3:54 PM, Miklosovic, Stefan <
>> stefan.mikloso...@netapp.com> wrote:
>>
>> That would work reliably in case there is no way how to misconfigure
>> guardrails in the cluster. What if you set a guardrail on one node but you
>> don’t set it (or set it differently) on the other? If it is configured
>> differently and you want to check the guardrails if constraints do not
>> violate them, then your query might fail or not based on what node is hit.
>>
>> I guess that guardrails would need to start to be transactional to be
>> sure this is avoided and guardrails are indeed same everywhere (CEP-24
>> thread sent recently here in ML).
>>
>>
>>
>> *From: *Bernardo Botella 
>> *Date: *Tuesday, 4 June 2024 at 00:31
>> *To: *dev@cassandra.apache.org 
>> *Cc: *Miklosovic, Stefan 
>> *Subject: *Re: [DISCUSS] CEP-42: Constraints Framework
>> You don't often get email from conta...@bernardobotella.com. Learn why
>> this is important <https://aka.ms/LearnAboutSenderIdentification>
>>
>> *EXTERNAL EMAIL - USE CAUTION when clicking links or attachments *
>>
>>
>>
>> Basically, I am trying to protect the limits set by the operator against
>> misconfigured schemas from the customers.
>>
>> I see the guardrails as a safety limit added by the operator, setting the
>> limits within the customers owning the actual schema (and their
>> constraints) can operate. With that vision, if a customer tries to “ignore”
>> the actual limits set by the operator by adding more relaxed constraints,
>> it gets a nice message saying that “that is not allowed for the cluster,
&

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-06 Thread Jon Haddad
ion with size of 7
>> T4 - node is restarted and guardrail in cassandra.yaml is set to forbid
>> sizes bigger than 5
>> T5 - mutation with size of 7 is replayed from FQL and it will fail to
>> replay it because of "global guardrail" in yaml
>>
>> In general, the problem I see with this CEP is that I feel like we
>> clearly see that it is a little bit hairy around the configuration and it
>> _can_ be broken or misconfigured etc but the feedback I see is that "yeah
>> but ... it is possible to break it already, so what?"
>>
>> I do not follow this logic. If we see that it "leaks", why is the leakage
>> an excuse to put more features on top of that? Should not we fix the
>> leakage in the first place? Why is that an excuse? I don't get that ... It
>> is like "yeah it is broken so by putting more stuff on top of that it can't
>> be worse".
>>
>> What if we focused our effort to make configuration transactional etc or
>> at least tried to fix this problem so it does not happen? If we do not do
>> that before we introduce this, then we will have more work to do once we go
>> to address that but it might be probably too late because we will need to
>> live with all our decisions made earlier, whatever ineffective they might
>> be.
>>
>>
>>
>> On Thu, Jun 6, 2024 at 7:33 PM Yifan Cai  wrote:
>>
>>> Hi Stefan,
>>>
>>> Thanks for putting the FQL example! However, it seems to be incorrect.
>>> FQL only records the _successful_ queries. The query at T4 fails, and it
>>> will not be included in FQL log.
>>> I do agree that changing guardrails on the fly can cause confusion when
>>> FQL is enabled on the node. Operator should probably avoid doing so. But it
>>> seems unrelated with contraints. Besides, there are value size guardrails,
>>> i.e. columnValueSize and collectionSize, available in Cassandra already.
>>>
>>> On extensibility, I agree that the CEP should make it clear what
>>> constraints are included and how they work. My understanding is that it
>>> wants to have size check and value check, which are useful for most cases.
>>>
>>> - Yifan
>>>
>>> On Thu, Jun 6, 2024 at 9:25 AM Štefan Miklošovič <
>>> stefan.mikloso...@gmail.com> wrote:
>>>
>>>> Another problem with this constraints feature is that if it does not
>>>> solely rely on constraints in CQL, then it would be non-deterministic if we
>>>> want to replay all mutations from a fql log.
>>>>
>>>> Let's take this into consideration (T = time)
>>>>
>>>> T0 - a node is started with no guardrails set
>>>> T1 - guardrail is set via JMX to not allow anything bigger than size of
>>>> 10 (whatever size means)
>>>> T2 - a user creates a table with a constraint that anything bigger than
>>>> size of 8 is forbidden
>>>> T3 - a user inserts a mutation with size of 5
>>>> T4 - a user modifies a table to set the constraint in such a way that
>>>> anything bigger than size of 15 is forbidden - this will fail because we
>>>> have a guardrail that anything bigger than 10 is forbidden from T1.
>>>>
>>>> Then we gather FQL log and restart the node, as guardrails do not
>>>> survive restarts for now, when we replay, then T4 will be replayed too but
>>>> it should not be.
>>>>
>>>> Is this correct?
>>>>
>>>> On Thu, Jun 6, 2024 at 9:49 AM Štefan Miklošovič <
>>>> stefan.mikloso...@gmail.com> wrote:
>>>>
>>>>> I agree with Jon that a detailed description of all constraints to be
>>>>> introduced is necessary. Only to say that it will be extensible so we can
>>>>> add other constraints later is not enough. What other constraints?
>>>>>
>>>>> On Thu, Jun 6, 2024 at 6:24 AM Jon Haddad  wrote:
>>>>>
>>>>>> I think there's some promising ideas here, but the CEP needs to be
>>>>>> developed a bit more.
>>>>>>
>>>>>> > Another types of constraints and functions can be added in the
>>>>>> future to provide even more flexibility, but are out of the scope of this
>>>>>> CEP.
>>>>>>
>>>>>> > For the third point, I didn’t want to be prescriptive on what those
>>>>>> validations should be, but the fact that the proposal is extensi

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-12 Thread Jon Haddad
I think having JSON validation on existing text fields is a pretty
reasonable idea, regardless if we have a JSON type or not.  I could see
folks wanting to add a JSON constraint to an existing text field, for
example.

I like the idea of a postgres-style JSONB type, but I don't want to derail
this convo into a JSON one.  I'd be happy to see a JSONB added to Cassandra
along with all the functionality that is included in postgres, especially
searching / indexes on JSON fields, I think it should be its own CEP though.

DB Constraints vs Client side logic, I see both aspects here.  I've gone
back and forth over the years on what belongs in the DB vs not, and there's
good arguments to be made for both.  For example, supporting a regex
constraint on a field can be done, but from a cost and
scalability perspective it's way better to do it in the application logic.
However, putting a constraint in like this could make sense in some cases:

```
CREATE TABLE circles (
  key id primary key,
  radius double,
  diameter double,
  CONSTRAINT diameter = 2 * radius
)
```

which is also a (maybe contrived) example of an equality constraint.
There's a good argument to be made in this case that the constraint isn't
what we really need here - it's default values (`circumference double
default radius * 2`), and that's a whole read-before-write can of worms we
probably don't need to get into on this thread.

Jon




On Wed, Jun 12, 2024 at 8:46 AM Abe Ratnofsky  wrote:

> Hey Bernardo,
>
> Thanks for the proposal and putting together your summary of the
> discussion. A few thoughts:
>
> I'm not completely convinced of the value of CONSTRAINTS for a database
> like Cassandra, which doesn't support any referential integrity checks,
> doesn't do read-before-write for all queries, and doesn't have a wide
> library of built-in functions.
>
> I'd be a supporter of more BIFs, and that's a solvable problem. String
> size, collection size, timestamp conversions, etc. could all be useful,
> even though there's not much gained over doing them in the client.
>
> With constraints only being applied during write coordination, there's not
> much of an advantage over implementing the equivalent constraints in
> clients. Writes that don't include all columns could violate multi-column
> constraints, like your (a > b) example, for the same reason as
> CASSANDRA-19007 .
> Constraints could be limited to only apply to frozen columns, where it's
> known that the entire value will be updated at once.
>
> I don't think we should include any constraints where valid user action
> would lead to a violated constraint, like permitting multi-column
> constraints on regular columns or non-frozen types, since they would be too
> prone to mis-use.
>
> Regarding 19007, it could be useful to have a constraint that indicates
> that a subset of columns will always be updated together, since that would
> actually allow Cassandra to know which read queries are safe, and permit a
> fix for 19007 that minimizes the additional data replicas need to send to
> coordinators on ALLOW FILTERING queries. That's a very specific situation
> and shouldn't justify a new framework / API, but might be a useful
> consequence of it.
>
> > - isJson (is the text a json?)
>
> Wouldn't it be more compelling to have a new type, analogous to the
> Postgres JSONB type?
> https://www.postgresql.org/docs/current/datatype-json.html
>
> If we're going to parse the entire JSON blob for validation, we might as
> well store it in an optimized format, support better access patterns, etc.
>


Re: Suggestions for CASSANDRA-18078

2024-06-20 Thread Jon Haddad
Agreed. If we release it, we can’t remove it after. Option 2 is off the
table.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Thu, Jun 20, 2024 at 7:13 PM Jeff Jirsa  wrote:

> If we have a public-facing API that we’re contemplating releasing to the
> public, and we don’t think it’s needed, we should remove it before it’s
> launched and we’re stuck with it forever.
>
>
>
>
> On Jun 20, 2024, at 9:55 AM, Jeremiah Jordan 
> wrote:
>
> +1 from me for 1, just remove it now.
> I think this case is different from CASSANDRA-19556/CASSANDRA-17425.  The
> new guardrail from 19556 which would deprecate the 17425 has not been
> committed yet.  In the case of MAXWRITETIME the replacement is already in
> the code, we just didn’t remove MAXWRITETIME yet.
>
> Jeremiah Jordan
> e. jerem...@datastax.com
> w. www.datastax.com
>
>
>
> On Jun 20, 2024 at 11:46:08 AM, Štefan Miklošovič 
> wrote:
>
>> List,
>>
>> we need your opinions about CASSANDRA-18078.
>>
>> That ticket is about the removal of MAXWRITETIME function which was added
>> in CASSANDRA-17425 and firstly introduced in 5.0-alpha1.
>>
>> This function was identified to be redundant in favor of CASSANDRA-8877
>> and CASSANDRA-18060.
>>
>> The idea of the removal was welcomed and the patch was prepared doing so
>> but it was never delivered and the question what to do with it, in
>> connection with 5.0.0, still remains.
>>
>> The options are:
>>
>> 1) since 18078 was never released in GA, there is still time to remove it.
>> 2) it is too late for the removal hence we would keep it in 5.0.0 and we
>> would deprecate it in 5.0.1 and remove it in trunk.
>>
>> It is worth to say that there is a precedent in 2), in CASSANDRA-17495,
>> where it was the very same scenario. A guardrail was introduced in alpha1.
>> We decided to release and deprecate in 5.0.1 and remove in trunk. The same
>> might be applied here, however we would like to have it confirmed if this
>> is indeed the case or we prefer to just go with 1) and be done with it.
>>
>> Regards
>>
>
>


Re: Suggestions for CASSANDRA-18078

2024-06-21 Thread Jon Haddad
I’m on vacation, so I’ll keep this brief.  While its not the end of the
world, I think shipping a feature that’s immediately deprecated reflects
poorly on the project and our ability to manage it.

I don’t know how much work need to be done to merge that patch, so its hard
to say if we should wait for it or if we should ship 5.0 and make an
exception to add it in 5.0.1.  I’d prefer 5.0.1 but i won’t die on this
hill.

Jon


On Fri, Jun 21, 2024 at 11:35 AM Mick Semb Wever  wrote:

>
>
> On Fri, 21 Jun 2024 at 09:43, Sam Tunnicliffe  wrote:
>
>> 100% Option 1. Once it's out in GA release we're stuck with it so any
>> short term disruption to adopters of pre-release versions is a trivial
>> price to pay.
>
>
>
> Sam, Jeremiah, Jeff, Jon,
>
>  we need some clarity on this.
>
> To remove MAXWRITETIME (CASSANDRA-18078) we must now (as Yifan notes)
> first add CASSANDRA-18085.
>
> 18085 was slated for 5.x
> Are we really going to both a) remove an API that was already released in
> a beta, and b) add in a new improvement into an rc ?
>
> This is the only remaining issue blocking us from cutting a 5.0-rc1.
>
>
>


Re: [DISCUSS] Increments on non-existent rows in Accord

2024-06-21 Thread Jon Haddad
Seems to me that this should use the same behavior as a counter unless IF
EXISTS is supplied.

I can see a solid argument for failing though, if the argument is that only
counters behave that way, vs increment / decrement.



On Fri, Jun 21, 2024 at 4:32 PM Josh McKenzie  wrote:

> What about aborting the transaction / raising an exception if you try and
> do an incremental operator against a non-existent PK?
>
> The reasonable inferred intent of the user is to change a value that's
> expected to be there, so if it's not there it's an error case right?
> Otherwise you'd append "IF EXISTS".
>
> On Fri, Jun 21, 2024, at 1:56 AM, Caleb Rackliffe wrote:
>
> It does, but the primary reason it does is that it is setting a value, not
> incrementing one. When we’re setting a value, we don’t care what was there
> before. Incrementing a value is not possible in a non-transitional update,
> hence this thread…
>
> On Jun 20, 2024, at 5:17 PM, Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
> Doesn’t an UPDATE statement creates a row if the partition key does not
> exist? That’s also confirmed by the official Cassandra documentation here
> 
> :
>
> ”Unlike in SQL, UPDATE does not check the prior existence of the row by
> default. The row is created if none existed before, and updated otherwise.
> Furthermore, there is no means of knowing which action occurred.”
>
> That being the case, I think the second option you mention is what keeps
> consistency with the UPDATEs out of the transaction.
>
> Kind regards,
> Bernardo
>
> On Jun 20, 2024, at 1:54 PM, Caleb Rackliffe 
> wrote:
>
> We had a bug report a while back from Luis E Fernandez and team in
> CASSANDRA-18988 
> around the behavior of increments/decrements on numeric fields for
> non-existent rows. Consider the following, wich can be run on the
> cep-15-accord branch:
>
> CREATE KEYSPACE accord WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'} AND durable_writes = true
>
>
> CREATE TABLE accord.accounts (
> partition text,
> account_id int,
> balance int,
> PRIMARY KEY (partition, account_id)
> ) WITH CLUSTERING ORDER BY (account_id ASC) AND transactional_mode='full'
>
>
> BEGIN TRANSACTION
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 0, 100);
> INSERT INTO accord.accounts (partition, account_id, balance) VALUES 
> ('default', 1, 100);
> COMMIT TRANSACTION
>
>
> BEGIN TRANSACTION
> UPDATE accord.accounts SET balance -= 10 WHERE partition = 'default' AND 
> account_id = 1;
> UPDATE accord.accounts SET balance += 10 WHERE partition = 'default' AND 
> account_id = 3;
> COMMIT TRANSACTION
>
>
> Reading the 'default' partition will produce the following result.
>
>
>  partition | account_id | balance
> ---++-
>default |  0 | 100
>default |  1 |  90
>
>
> As you will notice, we have not implicitly inserted a row for account_id 3, 
> which does not exist when we request that its balance be incremented by 10. 
> This is by design, as null + 10 == null.
>
>
> Before I close CASSANDRA-18988 
> , *I'd like to confirm 
> with everyone reading this that the behavior above is reasonable*. The only 
> other option I've seen proposed that would make sense is perhaps producing a 
> result like:
>
>
>  partition | account_id | balance
> ---++-
>default |  0 | 100
>default |  1 |  90
>
>default |  3 |null
>
>
> Note however that this is exactly what we would produce if we had first 
> inserted a row w/ no value for balance:
>
>
> INSERT INTO accord.accounts (partition, account_id) VALUES ('default', 3);
>
>
>


Re: [DISCUSS] spark-cassandra-connector donation to Analytics subproject

2024-06-24 Thread Jon Haddad
I also think it would be a great contribution, especially since the bulk
analytics library can’t be used by the majority of teams, since it’s hard
coded to only work with single token clusters.



On Mon, Jun 24, 2024 at 9:51 AM Dinesh Joshi  wrote:

> This would be a great contribution to have for the Analytics subproject.
> The current bulk functionality in the Analytics subproject complements the
> spark-cassandra-connector so I see it as a good fit for donation.
>
> On Mon, Jun 24, 2024 at 12:32 AM Mick Semb Wever  wrote:
>
>>
>> What are folks thoughts on accepting a donation of
>> the spark-cassandra-connector project into the Analytics subproject ?
>>
>> A number of folks have requested this, stating that they cannot
>> contribute to the project while it is under DataStax.  The project has
>> largely been in maintenance mode the past few years.  Under ASF I believe
>> that it will attract more attention and contributions, and offline
>> discussions I have had indicate that the spark-cassandra-connector remains
>> an important complement to the bulk analytics component.
>>
>


Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Jon Haddad
I love where this is going. I have one question , however. I think it would
be more consistent if these were table level guardrails.  Is there anything
that prevents us from utilizing the same underlying system and terminology
for both the node level guardrails and the table ones?

If we can avoid duplicate concepts we should.

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Mon, Jun 24, 2024 at 4:19 PM Doug Rohrer  wrote:

> To your point about Guardrails vs. Constraints, I do think the distinct
> roles of “cluster operator” and “application developer” help show how these
> two frameworks are both valuable. I don’t think I’d expect a cluster
> operator to be involved in every table design decision, but being able to
> set warning and error-level guardrails allows an operator to set absolute
> limits on what the database itself accepts. Table-level constraints allow
> application developers (hopefully in concert with operators, where they are
> two distinct people/groups) to add *additional*, application-layer
> constraints that are likely to be app specific. To restate what I think you
> were getting at, your example of a production issue caused by the
> development team missing a key verbal agreement probably helps illustrate
> why both table-level constraints *and* guardrails are valuable.
>
> Imagine that, as an operator, you are *generally* comfortable with
> individual values in rows being, say, 256k, but because of the way in which
> this *particular* use case works, 64k chunks needed to be enforced. Your
> cluster-level *guardrails* could be set at 256k, but the table-level
> *constraints* could have enforced this 64k chunk size rule.
>
> Doug
>
> On Jun 23, 2024, at 5:38 PM, Jordan West  wrote:
>
> I am generally for this CEP, particularly the sizeOf guardrail. For
> example, we recently had an incident caused by a client who wrote outside
> of the contract we had verbally established. The constraint would have let
> us encode that contract into the database. In this case, clients are
> writing large blobs at the application layer and internally the client
> performs chunking.  We had established a chunk size of 64k, for example.
> However, the application team wanted to use a different programming
> language than the ones we provide clients for so they wrote their own. The
> new client had a bug that did not honor the agreed upon chunk size and
> wrote chunks that were MBs in size. This eventually led to a production
> incident and the issue was discovered as a result of a bunch of analysis
> (dumping sstables, etc). Had we had the sizeOf guardrail it would have
> turned a production incident with hours of investigation into a bug found
> immediately during development. Could this be done with a node-level
> guardrail? Likely. But config has the issues described above and its
> possible to have two tables with different constraints around similar
> fields (for example, two different chunk size configs due to data shape).
> Could it be done at the client layer? Yes that's what we are doing now, but
> this incident highlights the weakness with that approach (having to
> implement the contract everywhere and having disjoint features across
> clients).
>
> I also think there is benefit to application owners. Encoding constraints
> in the database ensures continuity as ownership and contributors change and
> reduces the need for comments or documentation as the means to enforce or
> share this knowledge.
>
> I think enforcing them at write time makes sense. Thinking about it in the
> scope of compaction for example reminds me of a data loss incident where
> someone ran a validation in an older version (like 2.0 or 2.1) and a bunch
> of 4 byte ints were thrown away because the field expected an 8 byte long.
>
> My primary concern would be ensuring that we don't implement constraints
> that require a read before right (not inList comes to mind as an example of
> one that could imply reading before writing and could confuse a user if it
> doesn't).
>
> Regarding the conflict with existing guardrails, I do think that is
> tougher. On one hand I find this feature to be more evolved than those
> guardrails and would be fine to see them be replaced by it. On the other,
> the guardrails provide sole control to the operator which is nice but adds
> some complexity that has been rightly called out.  But I don't see that as
> a reason not to go forward with this feature. We should pick a path and
> accept the tradeoffs.
>
> Jordan
>
>
> On Thu, Jun 13, 2024 at 2:39 PM Bernardo Botella <
> conta...@bernardobotella.com> wrote:
>
>> Thanks a lot for your comments Abe!
>>
>> I do agree that the Constraint clause should be as simpl

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-24 Thread Jon Haddad
I think my suggestion was unclear. I was referring to the name guardrail,
using the same infra as guardrails, rather than a separate concept. Not
applying it like we do table options.



On Tue, Jun 25, 2024 at 12:44 AM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> Hi Ariel and Jon,
>
> Let me address your question first. Yes, AND is supported in the proposal.
> Below you can find some examples of different constraints applied to the
> same column.
>
> As per the LENGTH name instead of sizeOf as in the proposal, I am also not
> opposed to it if it is more consistent with terminology in the databases
> universe.
>
> So, to recap, there seems to be general agreement on the usefulness of the
> Constraints Framework.
> Now, from the feedback that has arrived after the voting has been called,
> I see there are three different proposals for syntax:
>
> 1.-
> The syntax currently described in the CEP. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT subnet_mask > 0,
>   CONSTRAINT subnet_mask < 32
> )
>
> 2.-
> As Jon suggested, leaving this definitions to more specific Guardrails at
> table level. Example, something like:
> column_min_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 0
> column_max_int_value_size_threshold_keyspace_address_ipv4_ip_adress = 32
>
> 3.-
> As Ariel suggested, having the CHECK keyword added to align consistency
> with SQL. Example:
> CREATE TYPE keyspace.cidr_address_ipv4 (
>   ip_adress inet,
>   subnet_mask int,
>   CONSTRAINT CHECK subnet_mask > 0,
>   CONSTRAINT CHECK subnet_mask < 32
> )
>
> For the guardrails vs cql syntax, I think that keeping the conceptual
> separation that has been explored in this thread, and perfectly recapped by
> Doug, is closer to what we are trying to achieve with this framework. In my
> opinion, having them in the CQL schema definition provides those
> application level constraints that Doug mentions in an more accesible way
> than having to configure such specific guardrais.
>
> For the addition of the CHECK keyword, I'm definitely not opposed to it if
> it helps Cassandra users coming from other databases understand concepts
> that were already familiar to them.
>
> I hope this helps move the conversation forward,
> Bernardo
>
>
>
> On Jun 24, 2024, at 12:17 PM, Ariel Weisberg  wrote:
>
> Hi,
>
> I see a vote for this has been called. I should have provided more prompt
> feedback sooner.
>
> I am a strong +1 on adding column level constraints being a good thing to
> add. I'm not too concerned about row/partition/table level constraints, but
> I would like to change the syntax before I would be +1 on this CEP.
>
> It would be good to align the syntax as closely as possible to our
> existing syntax, and if not that then MySQL/Postgres. For example it looks
> like we don't have a string length function so maybe add `LENGTH`
> (consistent with MySQL/Postgres) to also use with column level constraints.
>
> It looks like there are generally two forms of constraint syntax, one is
> expressed as part of the column definition, and the other is a named or
> anonymous constraint on the table.
> https://www.w3schools.com/sql/sql_check.asp
>
> Can we align with having these column level ones as `CHECK` constraints
> like in SQL, and `CONSTRAINT [constraint_name] CHECK` would be used if
> creating a named or multi-column constraint?
>
> Will column level check constraints support `AND` so that you can specify
> multiple constraints on the column? I am not sure if that is supported in
> other databases, but it would be good to align on that as well.
>
> RE some implementation things to keep in mind:
>
> If TCM is in use and the constraints are defined in the schema data
> structure this should work fine with Accord because all coordinators
> (regular, recovery) will deterministically agree on the constraints being
> enforced BUT... this also has to map to how/when constraints are enforced.
>
> Both Accord and Paxos work best when the constraints are enforced when the
> final mutation to be applied is created and not later when it is being
> applied to the CFS. This also reduces duplication of enforcement checking
> work to just the coordinator for the write.
>
> Ariel
>
> On Fri, May 31, 2024, at 5:23 PM, Bernardo Botella wrote:
>
> Hello everyone,
>
> I am proposing this CEP:
> CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundation
> 
> cwiki.apache.org
> 
> 
> 
>
>
> And I’m looking for feedback from the community.
>
> Thanks a lot!
> Bernardo
>
>
>


Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-06-25 Thread Jon Haddad
5.0 is a massive milestone.  A huge thank you to everyone that's invested
their time into the release.  I've done a lot of testing, benchmarking, and
tire kicking and it's truly mind blowing how much has gone into 5.0 and how
great it is for the community.

I am a bit concerned that CASSANDRA-19668, which I found in 4.1, will also
affect 5.0.  This is a pretty serious bug, where using Paxos v2 + off heap
memtables can cause a SIGSEV process crash.  I've seen this happen about a
dozen times with a client over the last 3 months.  Since the new trie
memtables rely on off heap, and both Trie memtables & Paxos V2 is so
compelling (esp for multi-dc users), I think there's a good chance that
we'll be making an already bad problem even worse, for folks that use LWT.

Unfortunately, until next week I'm unable to put any time into this; I'm on
vacation with my family.  I wish I had been able to confirm and raise this
issue as a 5.0 blocker sooner, but I've deliberately tried to keep work
stuff out of my mind.   Since I'm not 100% sure if this affects 5.0, I'm
not blocking the RC, but I don't feel comfortable putting a +1 on a release
that I'm at least 80% certain contains a process-crashing bug.

I have a simple 4.1 patch in CASSANDRA-19668, but I haven't landed a commit
in several years and I have zero recollection of the entire process of
getting it in, nor have I spent any time writing unit or dtests in the C*
repo.  I ran a test of 160MM LWTs over several hours with my 4.1 branch and
didn't hit any issues, but my client ran for weeks without hitting it so
it's hard to say if I've actually addressed the problem, as it's a rare
race condition.  Fwiw, I don't need to be the one to
handle CASSANDRA-19668, so if someone wants to address it before me, please
feel free.  It will likely take me a lot longer to deal with than someone
more involved with the process, and I'd want 2 sets of eyes on it anyways
given what I already mentioned previously about committing and testing.

Jon


On Tue, Jun 25, 2024 at 2:53 PM Mick Semb Wever  wrote:

>
>
> .
>
> Proposing the test build of Cassandra 5.0-rc1 for release.
>>
>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
>> Maven Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>>
>
>
> The three green CI runs for this are
> -
> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-2
> -
> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-3
> -
> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-4
>
>
>


Re: [VOTE][IP CLEARANCE] GoCQL driver

2024-06-25 Thread Jon Haddad
+1.

On Wed, Jun 26, 2024 at 1:50 AM J. D. Jordan 
wrote:

> +1 nb. Good to see this heavily used driver get continued development in
> the project.
>
> > On Jun 25, 2024, at 5:29 PM, Michael Shuler 
> wrote:
> >
> > +1
> >
> > Kind regards,
> > Michael
> >
> >> On 6/25/24 12:29, Mick Semb Wever wrote:
> >> Please vote on the acceptance of the GoCQL driver and its IP Clearance:
> >> https://incubator.apache.org/ip-clearance/cassandra-gocql-driver.html <
> https://incubator.apache.org/ip-clearance/cassandra-gocql-driver.html>
> >> All consent from original authors of the donation, and tracking of
> collected CLAs, is found in:
> >>  - https://github.com/gocql/gocql/issues/1751 <
> https://github.com/gocql/gocql/issues/1751>
> >>  -
> https://cwiki.apache.org/confluence/pages/worddav/preview.action?fileName=GoCQL+ASF+CLA+collection.xlsx&pageId=225152485
> <
> https://cwiki.apache.org/confluence/pages/worddav/preview.action?fileName=GoCQL+ASF+CLA+collection.xlsx&pageId=225152485
> >
> >> These do not require acknowledgement before thevote.
> >> The code is prepared for donation at https://github.com/gocql/gocql <
> https://github.com/gocql/gocql>
> >> Once thisvotepasses we will request ASF Infra to move the gocql/gocql
> as-is to apache/cassandra-gocql-driver  . The master branch and tags, with
> all their history, will be kept.  Because consent and CLAs were not
> received from all original authors the source files keep additional
> reference to their earlier copyright authors and license.
> >> This will become part of the Drivers subproject, ref CEP-8:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> <
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
> >
> >> PMC members, please check carefully the IP Clearance requirements
> beforevoting.
> >> Thevotewill be open for 72 hours (or longer).Votesby PMC members are
> considered binding. Avotepasses if there are at least three binding +1s and
> no -1's.
> >> regards,
> >> Mick
>


Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-06-27 Thread Jon Haddad
For those that want to go ahead, how do you to disclose to the community
that there’s a serious risk to availability?

Jon


On Thu, Jun 27, 2024 at 7:52 PM Jeremy Hanna 
wrote:

> It definitely looks like a good thing to investigate and fix.  However,
> it's not a regression and not new in 5.0.  I think we should push forward
> with 5.0 and fix/release it separately in a 4.1.x and 5.0.x release.
>
> > On Jun 27, 2024, at 12:46 PM, Brandon Williams  wrote:
> >
> > I don't know that we expect to fix anything if we don't know it is
> > affected yet. ¯\_(ツ)_/¯
> >
> > Kind Regards,
> > Brandon
> >
> > On Thu, Jun 27, 2024 at 12:37 PM Aleksey Yeshchenko 
> wrote:
> >>
> >> Not voting on this, however, if we expect to fix something specific
> between an RC and GA, then we shouldn’t be starting a vote on RC. In that
> case it should be another beta.
> >>
> >>> On 27 Jun 2024, at 18:30, Brandon Williams  wrote:
> >>>
> >>> The last time paxos v2 blocked us in CASSANDRA-19617 which also
> >>> affected 4.1, I didn't get a sense of strong usage from the community,
> >>> so I agree that RC shouldn't be blocked but this can get fixed before
> >>> GA.  +1 from me.
> >>>
> >>> Kind Regards,
> >>> Brandon
> >>>
> >>> On Tue, Jun 25, 2024 at 11:11 PM Jon Haddad  wrote:
> >>>>
> >>>> 5.0 is a massive milestone.  A huge thank you to everyone that's
> invested their time into the release.  I've done a lot of testing,
> benchmarking, and tire kicking and it's truly mind blowing how much has
> gone into 5.0 and how great it is for the community.
> >>>>
> >>>> I am a bit concerned that CASSANDRA-19668, which I found in 4.1, will
> also affect 5.0.  This is a pretty serious bug, where using Paxos v2 + off
> heap memtables can cause a SIGSEV process crash.  I've seen this happen
> about a dozen times with a client over the last 3 months.  Since the new
> trie memtables rely on off heap, and both Trie memtables & Paxos V2 is so
> compelling (esp for multi-dc users), I think there's a good chance that
> we'll be making an already bad problem even worse, for folks that use LWT.
> >>>>
> >>>> Unfortunately, until next week I'm unable to put any time into this;
> I'm on vacation with my family.  I wish I had been able to confirm and
> raise this issue as a 5.0 blocker sooner, but I've deliberately tried to
> keep work stuff out of my mind.   Since I'm not 100% sure if this affects
> 5.0, I'm not blocking the RC, but I don't feel comfortable putting a +1 on
> a release that I'm at least 80% certain contains a process-crashing bug.
> >>>>
> >>>> I have a simple 4.1 patch in CASSANDRA-19668, but I haven't landed a
> commit in several years and I have zero recollection of the entire process
> of getting it in, nor have I spent any time writing unit or dtests in the
> C* repo.  I ran a test of 160MM LWTs over several hours with my 4.1 branch
> and didn't hit any issues, but my client ran for weeks without hitting it
> so it's hard to say if I've actually addressed the problem, as it's a rare
> race condition.  Fwiw, I don't need to be the one to handle
> CASSANDRA-19668, so if someone wants to address it before me, please feel
> free.  It will likely take me a lot longer to deal with than someone more
> involved with the process, and I'd want 2 sets of eyes on it anyways given
> what I already mentioned previously about committing and testing.
> >>>>
> >>>> Jon
> >>>>
> >>>>
> >>>> On Tue, Jun 25, 2024 at 2:53 PM Mick Semb Wever 
> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>>> Proposing the test build of Cassandra 5.0-rc1 for release.
> >>>>>>
> >>>>>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
> >>>>>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
> >>>>>> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
> >>>>>
> >>>>>
> >>>>>
> >>>>> The three green CI runs for this are
> >>>>> -
> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-2
> >>>>> -
> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-3
> >>>>> -
> https://app.circleci.com/pipelines/github/driftx/cassandra?branch=5.0-rc1-4
> >>>>>
> >>>>>
> >>
>
>


Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-06-27 Thread Jon Haddad
Thanks for confirming this, Blake. I agree that we should not knowingly
ship new versions with severe bugs that cause the DB to crash, regression
or not.

-1 from me as well


On Fri, Jun 28, 2024 at 1:39 AM Blake Eggleston 
wrote:

> Looking at the ticket, I’d say Jon’s concern is legitimate. The segfaults
> Jon is seeing are probably caused by paxos V2 when combined with off heap
> memtables for the reason Benedict suggests in the JIRA. This problem will
> continue to exist in 5.0. Unfortunately, it looks like the patch posted is
> not enough to address the issue and will need to be a bit more involved to
> properly fix the problem.
>
> While this is not a regression, I think Jon’s point about trie memtables
> increasing usage of off heap memtables is a good one, and anyway we
> shouldn’t be doing major releases with known process crashing bugs.
>
> So I’m voting -1 on this release and will work with Jon and Benedict to
> get this fixed.
>
> Thanks,
>
> Blake
>
>
> On Jun 26, 2024, at 6:47 AM, Josh McKenzie  wrote:
>
> Blake or Benedict - can either of you speak to Jon's concerns around
> CASSANDRA-19668?
>
> On Wed, Jun 26, 2024, at 12:18 AM, Jeff Jirsa wrote:
>
>
> +1
>
>
>
> On Jun 25, 2024, at 5:04 AM, Mick Semb Wever  wrote:
>
> 
>
> Proposing the test build of Cassandra 5.0-rc1 for release.
>
> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt
>
>
>


Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-06-30 Thread Jon Haddad
This came in after our vote, but we might also have a problem with
performing schema changes after a full restart.  Appears to only be if the
entire cluster was shut down, according to the report.  If it's true, this
might affect anyone trying to restore from a backup.  This would also be a
blocker for me, if that's the case.

https://issues.apache.org/jira/browse/CASSANDRA-19735

Jon


On Thu, Jun 27, 2024 at 9:49 PM Jon Haddad  wrote:

> Thanks for confirming this, Blake. I agree that we should not knowingly
> ship new versions with severe bugs that cause the DB to crash, regression
> or not.
>
> -1 from me as well
>
>
> On Fri, Jun 28, 2024 at 1:39 AM Blake Eggleston 
> wrote:
>
>> Looking at the ticket, I’d say Jon’s concern is legitimate. The segfaults
>> Jon is seeing are probably caused by paxos V2 when combined with off heap
>> memtables for the reason Benedict suggests in the JIRA. This problem will
>> continue to exist in 5.0. Unfortunately, it looks like the patch posted is
>> not enough to address the issue and will need to be a bit more involved to
>> properly fix the problem.
>>
>> While this is not a regression, I think Jon’s point about trie memtables
>> increasing usage of off heap memtables is a good one, and anyway we
>> shouldn’t be doing major releases with known process crashing bugs.
>>
>> So I’m voting -1 on this release and will work with Jon and Benedict to
>> get this fixed.
>>
>> Thanks,
>>
>> Blake
>>
>>
>> On Jun 26, 2024, at 6:47 AM, Josh McKenzie  wrote:
>>
>> Blake or Benedict - can either of you speak to Jon's concerns around
>> CASSANDRA-19668?
>>
>> On Wed, Jun 26, 2024, at 12:18 AM, Jeff Jirsa wrote:
>>
>>
>> +1
>>
>>
>>
>> On Jun 25, 2024, at 5:04 AM, Mick Semb Wever  wrote:
>>
>> 
>>
>> Proposing the test build of Cassandra 5.0-rc1 for release.
>>
>> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
>> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
>> Maven Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
>>
>> The Source and Build Artifacts, and the Debian and RPM packages and
>> repositories, are available here:
>> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
>>
>> The vote will be open for 72 hours (longer if needed). Everyone who has
>> tested the build is invited to vote. Votes by PMC members are considered
>> binding. A vote passes if there are at least three binding +1s and no -1's.
>>
>> [1]: CHANGES.txt:
>> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
>> [2]: NEWS.txt:
>> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt
>>
>>
>>


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-01 Thread Jon Haddad
> I personally don't have anything against what you suggested, however I
think that this kind of work will put additional stress on us being sure
that the output of the commands will be exactly as it is now. We do have
nodetool tests which are covering the tests for the output which is very
handy in this kind of situation, but I think we do not test all of them. It
would be great to increase our test coverage where possible in this area
and I think it is actually going to be a requirement as only then we will
be sure that old and new code produces the same output.

I'm not familiar with picocli, only jcommander, however I don't think
keeping the output consistent should be a problem as long as it's a similar
programming model.

I also don't think keeping the output consistent needs to be a strict long
term requirement.  We *should* have either a JSON output option for every
command, or a virtual table for structured data.  I don't remember us ever
making a promise that human readable tools would keep consistent across
major versions.


On Mon, Jul 1, 2024 at 6:44 AM Bernardo Botella <
conta...@bernardobotella.com> wrote:

> +1 on the feature branch allowing breaking the effort into smaller chunks
> that can be even worked in parallel.
>
>
>
> On Jul 1, 2024, at 3:13 AM, Štefan Miklošovič 
> wrote:
>
> Hi Maxim,
>
> thank you for doing this. I think that Picocli is a great choice,
> comparing it with airline v2 which is an attempt to resurrect the original
> airline, it seems to be way more active and popular.
>
> I personally don't have anything against what you suggested, however I
> think that this kind of work will put additional stress on us being sure
> that the output of the commands will be exactly as it is now. We do have
> nodetool tests which are covering the tests for the output which is very
> handy in this kind of situation, but I think we do not test all of them. It
> would be great to increase our test coverage where possible in this area
> and I think it is actually going to be a requirement as only then we will
> be sure that old and new code produces the same output.
>
> I think it is too soon to contemplate when we switch to this, we just need
> to be sure that it is the same so existing integrations will not be broken.
>
> Regards
>
> On Fri, Jun 28, 2024 at 3:48 PM Maxim Muzafarov  wrote:
>
>> Hello everyone,
>>
>>
>> The nodetool relies on the airlift/airline library to mark up the CLI
>> commands used to manage Cassandra, which are part of our public API.
>> This library is no longer maintained, so we need to update it anyway,
>> and the good news is that we already have several good alternatives:
>> airline-2 [3] or picocli [2].
>>
>> In this message, I'm mainly talking about CASSANDRA-17445 [4], which
>> refers to the problem and is a prerequisite for a larger CEP-38 CQL
>> Management API [5]. It doesn't make sense to use annotations from the
>> deprecated library to build a new API, so this is another reason to
>> update the library as soon as possible and do some inherently small
>> code refactoring required for the CEP-38.
>>
>> In addition to being widely used and well supported, the Picocli
>> library offers the following advantages for us:
>> - We can detach the jmx-specific parameters from the commands so that
>> they can be reused in other APIs (e.g. without host, port) while
>> remaining backwards compatible;
>> - We can set up nodetool's autocompletion after the migration with
>> minimal effort;
>> - There is a good Picocli ecosystem of tools that we can use to
>> simplify our codebase, e.g. generate man pages tool to make our CLIs
>> more Unix friendly [7];
>>
>>
>> = Prototype =
>>
>> I have a working prototype [8] that shows what the result will look
>> like. The prototype includes:
>> - Tests between the execution of commands via the nodetool and nodtoolv2;
>> - 5 out of 164 nodetool commands have been moved so far, to show the
>> refactoring we need to do to the command's body;
>> - The command help output under for the nodetoolv2 is the same as it
>> is currently for the nodetool and this is the default, however a
>> "cassandra.cli.picocli.layout" is added to switch to the Picocli
>> defaults;
>> - You can also see that the colour scheme is applied by the Picocli
>> out of the box, and this is how it looks [9];
>> - The nodetoolv2 is called first when the shell is triggered, and if
>> the nodetoolv2 doesn't contain the command it needs yet, it falls back
>> to the nodetool and the old argument parser;
>>
>>
>> Since the number of commands is quite large (164), I'd like to create
>> a feature branch and move all the commands one at a time, while
>> keeping the output backwards by applying additional tests at the same
>> time and checking that the CI is always green. I think the "feature
>> branch" approach will be less stressful for us since it focuses on
>> requiring a review of only tedious changes to the feature branch,
>> rather than reviewing the 15k line patch.

Re: [VOTE] CEP-42: Constraints Framework

2024-07-02 Thread Jon Haddad
+1

On Tue, Jul 2, 2024 at 5:06 AM  wrote:

> +1
>
>
> On Jul 1, 2024, at 8:34 PM, Doug Rohrer  wrote:
>
> +1 (nb) - Thanks for all of the suggestions and Bernardo for wrangling the
> CEP into shape!
>
> Doug
>
> On Jul 1, 2024, at 3:06 PM, Dinesh Joshi  wrote:
>
> +1
>
> On Mon, Jul 1, 2024 at 11:58 AM Ariel Weisberg  wrote:
>
>> Hi,
>>
>> I am +1 on CEP-42 with the latest updates to the CEP to clarify syntax,
>> error messages, constraint naming and generated naming, alter/drop,
>> describe etc.
>>
>> I think this now tracks very closely to how other SQL databases define
>> constraints and the syntax is easily extensible to multi-column and
>> multi-table constraints.
>>
>> Ariel
>>
>> On Mon, Jul 1, 2024, at 9:48 AM, Bernardo Botella wrote:
>>
>> With all the feedback that came in the discussion thread after the call
>> for votes, I’d like to extend the period another 72 hours starting today.
>>
>> As before, a vote passes if there are at least 3 binding +1s and no
>> binding vetoes.
>>
>> Thanks,
>> Bernardo Botella
>>
>> On Jun 24, 2024, at 7:17 AM, Bernardo Botella <
>> conta...@bernardobotella.com> wrote:
>>
>> Hi everyone,
>>
>> I would like to start the voting for CEP-42.
>>
>> Proposal:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework
>> Discussion:
>> https://lists.apache.org/thread/xc2phmxgsc7t3y9b23079vbflrhyyywj
>>
>> The vote will be open for 72 hours. A vote passes if there are at least 3
>> binding +1s and no binding vetoes.
>>
>> Thanks,
>> Bernardo Botella
>>
>>
>>
>
>


Re: [VOTE] Release Apache Cassandra 5.0-rc1 (take2)

2024-07-03 Thread Jon Haddad
+1

Thanks Mick!

On Tue, Jul 2, 2024 at 4:20 AM Mick Semb Wever  wrote:

>
> Proposing the test build of Cassandra 5.0-rc1 for release.
>
> sha1: 01eea8a0d74deaede236edb25335fa470502106e
> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1337/org/apache/cassandra/cassandra-all/5.0-rc1/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt
>


Re: [DISCUSS] Feature branch to update a nodetool obsolete dependency (airline)

2024-07-08 Thread Jon Haddad
Without getting into the pros and cons of both libraries, I have to point
out there's something unsettling about making decisions about libraries we
used based on arbitrary rules an employer has put into place on its
employees.  The project isn't governed by Apple, it's governed by
individual contributors to open source.

We need to pick libraries based on their merits.  Apple's draconian rules
should not prevent us from using the best option available.

Jon


On Mon, Jul 8, 2024 at 12:57 PM Dinesh Joshi  wrote:

> I agree, having a DISCUSS thread with a specific subject line is less
> likely to be overlooked.
>
> One thing I'd like to note here is PicoCLI and Airline 2 are independent
> projects that are ALv2 licensed. A subset of the Cassandra contributors may
> have difficulty contributing to such projects due to preexisting policies
> that their employers may have in place.
>
> I am concerned about hostile licensing changes in the future which will
> necessitate another migration for us. That said, is there a specific reason
> to not consider Apache Commons CLI[1]?
>
> Dinesh
>
> [1] https://commons.apache.org/proper/commons-cli/
>
> On Mon, Jul 8, 2024 at 10:22 AM David Capwell  wrote:
>
>> I don't think that a separate thread would add extra visibility
>>
>>
>> Disagree.  This thread is about adding a feature branch, so many could
>> ignore if they don’t care.  The fact you are switching the library (and
>> which one) is something we have to hunt for.  By having a new DISCUSS
>> thread it makes it clear which library you wish to add, and people can sign
>> off if they care or not.
>>
>> I wouldn’t create this thread until you settle on which one you wish to
>> move forward with.
>>
>> Is adding the PicoCLI library as a project dependency getting any objections
>> from the Community?
>>
>>
>> Thats the point of the new DISCUSS thread.  By being very clear you wish
>> to add PicoCLI people can either validate we are allowed to, or raise any
>> objections.  I have not really seen any pushback so far outside of 1 case
>> that wasn’t legally allowed to be used.
>>
>> Take a look at previous threads about adding different libraries.
>>
>> On Jul 8, 2024, at 7:58 AM, Caleb Rackliffe 
>> wrote:
>>
>> +1 on picocli
>>
>> RE the feature branch, I would just maintain the feature branch in your
>> own fork to break out whatever "reviewable units" of code you want. When
>> all the incremental review is done (I have no problem going back and
>> forth), squash everything together, do whatever additional testing you
>> need, and commit.
>>
>> On Fri, Jul 5, 2024 at 10:40 AM Maxim Muzafarov 
>> wrote:
>>
>>> > Once you are happy with your chosen library, we need a DISCUSS thread
>>> to add this new library (current protocol).
>>>
>>> Thanks, David. This is a good point, do we need a separate DISCUSS
>>> thread or can we just use this one? I'm in favour of keeping the
>>> discussion in one place, especially when topics are closely related. I
>>> don't think that a separate thread would add extra visibility, but if
>>> that is the way the community has adopted - no problem at all, I'll
>>> repost.
>>>
>>>
>>> The reasons for replacing the Airlift/Airline [1] with the PicoCli [2]
>>> are as follows (in order of priority):
>>>
>>> 1. The library is under the Apache-2.0 License
>>> https://github.com/remkop/picocli?tab=Apache-2.0-1-ov-file#readme
>>>
>>> 2. The project is active and well-maintained (last release on 8 May 2024)
>>> https://github.com/remkop/picocli/releases
>>>
>>> 3. The library has ZERO dependencies, in some of the cases a single
>>> file can just be dropped into the sources (it's even pointed out in
>>> the documentation)
>>> https://picocli.info/#_add_as_source
>>>
>>> 4. Compared to the Airlift library, the PicoCLI uses the same markup
>>> design concepts, so we don't have to rewrite our command or make
>>> complex changes, which in turn minimizes the migration.
>>>
>>>
>>> Is adding the PicoCLI library as a project dependency getting any
>>> objections from the Community? Please, share your thoughts.
>>>
>>> There are a few other alternatives (commons-cli, airline2, jcommander)
>>> but they are not as well known and/or not as elegantly suited to our
>>> needs based on what we have now.
>>>
>>>
>>> [1] https://github.com/airlift/airline
>>> [2] https://github.com/remkop/picocli
>>>
>>>
>>> On Wed, 3 Jul 2024 at 22:27, David Capwell  wrote:
>>> >
>>> > I don't personally think there is a strong need for a feature branch.
>>> If it makes it easy for you, please go ahead with a feature branch.
>>> >
>>> >
>>> > Agree, I don’t see the reason for a feature branch… feature branch
>>> just means the branch lives in apache domain rather than your own fork.
>>> You won’t be able to merge until you are done and you will need to keep
>>> rebasing over and over again. Even if multiple people are working on this
>>> you can work in your fork just fine (assuming you grant permissions).
>>> >
>>> > Another issue is 

Re: Audit logging to tables.

2019-04-03 Thread Jon Haddad
; > >>> On Thu, Feb 28, 2019 at 8:38 AM Dinesh Joshi
> >  >> > > > >> > > > > > 
> >  >> > > > >> > > > > > >>> wrote:
> >  >> > > > >> > > > > > >>>
> >  >> > > > >> > > > > > >>>> I strongly echo Josh’s sentiment. Imagine
> > losing
> >  >> > audit
> >  >> > > > >> entries
> >  >> > > > >> > > > > > because C*
> >  >> > > > >> > > > > > >>>> is overloaded? It’s fine if you don’t care
> > about
> >  >> > losing
> >  >> > > > >> audit
> >  >> > > > >> > > > > entries.
> >  >> > > > >> > > > > > >>>>
> >  >> > > > >> > > > > > >>>> Dinesh
> >  >> > > > >> > > > > > >>>>
> >  >> > > > >> > > > > > >>>>> On Feb 28, 2019, at 6:41 AM, Joshua McKenzie <
> >  >> > > > >> > > > jmcken...@apache.org
> >  >> > > > >> > > > > >
> >  >> > > > >> > > > > > >>>> wrote:
> >  >> > > > >> > > > > > >>>>>
> >  >> > > > >> > > > > > >>>>> One of the things we've run into
> > historically, on
> >  >> a
> >  >> > > > *lot*
> >  >> > > > >> of
> >  >> > > > >> > > > axes,
> >  >> > > > >> > > > > is
> >  >> > > > >> > > > > > >>>> that
> >  >> > > > >> > > > > > >>>>> "just put it in C*" for various functionality
> >  >> looks
> >  >> > > > great
> >  >> > > > >> > from
> >  >> > > > >> > > a
> >  >> > > > >> > > > > user
> >  >> > > > >> > > > > > >>> and
> >  >> > > > >> > > > > > >>>>> usability perspective, and proves to be
> > something
> >  >> > of a
> >  >> > > > >> > > nightmare
> >  >> > > > >> > > > > from
> >  >> > > > >> > > > > > >>> an
> >  >> > > > >> > > > > > >>>>> admin / cluster behavior perspective.
> >  >> > > > >> > > > > > >>>>>
> >  >> > > > >> > > > > > >>>>> i.e. - cluster suffering so you're writing
> > hints?
> >  >> > > Write
> >  >> > > > >> them
> >  >> > > > >> > to
> >  >> > > > >> > > > C*
> >  >> > > > >> > > > > > >>> tables
> >  >> > > > >> > > > > > >>>>> and watch the cluster suffer more! :)
> >  >> > > > >> > > > > > >>>>> Same thing probably holds true for audit
> > logging -
> >  >> > at
> >  >> > > a
> >  >> > > > >> time
> >  >> > > > >> > > > frame
> >  >> > > > >> > > > > > when
> >  >> > > > >> > > > > > >>>>> things are getting hairy w/a cluster, if
> > you're
> >  >> > > writing
> >  >> > > > >> that
> >  >> > > > >> > > > audit
> >  >> > > > >> > > > > > >>>> logging
> >  >> > > > >> > > > > > >>>>> into C* proper (and dealing with ser/deser,
> >  >> > compaction
> >  >> > > > >> > > pressure,
> >  >> > > > >> > > > > > >>> flushing
> >

Re: Stabilising Internode Messaging in 4.0

2019-04-04 Thread Jon Haddad
Given the number of issues that are addressed, I definitely think it's
worth strongly considering merging this in.  I think it might be a
little unrealistic to cut the first alpha after the merge though.
Being realistic, any 20K+ LOC change is going to introduce its own
bugs, and we should be honest with ourselves about that.  It seems
likely the issues the patch addressed would have affected the 4.0
release in some form *anyways* so the question might be do we fix them
now or after someone's cluster burns down because there's no inbound /
outbound message load shedding.

Giving it a quick code review and going through the JIRA comments
(well written, thanks guys) there seem to be some pretty important bug
fixes in here as well as paying off a bit of technical debt.

Jon

On Thu, Apr 4, 2019 at 1:37 PM Pavel Yaskevich  wrote:
>
> Great to see such a significant progress made in the area!
>
> On Thu, Apr 4, 2019 at 1:13 PM Aleksey Yeschenko  wrote:
>
> > I would like to propose CASSANDRA-15066 [1] - an important set of bug fixes
> > and stability improvements to internode messaging code that Benedict, I,
> > and others have been working on for the past couple of months.
> >
> > First, some context.   This work started off as a review of CASSANDRA-14503
> > (Internode connection management is race-prone [2]), CASSANDRA-13630
> > (Support large internode messages with netty) [3], and a pre-4.0
> > confirmatory review of such a major new feature.
> >
> > However, as we dug in, we realized this was insufficient. With more than 50
> > bugs uncovered [4] - dozens of them critical to correctness and/or
> > stability of the system - a substantial rework was necessary to guarantee a
> > solid internode messaging subsystem for the 4.0 release.
> >
> > In addition to addressing all of the uncovered bugs [4] that were unique to
> > trunk + 13630 [3] + 14503 [2], we used this opportunity to correct some
> > long-existing, pre-4.0 bugs and stability issues. For the complete list of
> > notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d like
> > to highlight a few.
> >
> > # Lack of message integrity checks
> >
> > It’s known that TCP checksums are too weak [5] and Ethernet CRC cannot be
> > relied upon [6] for integrity. With sufficient scale or time, you will hit
> > bit flips. Sadly, most of the time these go undetected.  Cassandra’s
> > replication model makes this issue much more serious, as the faulty data
> > can infect the cluster.
> >
> > We recognised this problem, and recently introduced a fix for server-client
> > messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to the
> > native protocol) [7].
> >
> > But until CASSANDRA-15066 [1] lands, this is also a critical flaw
> > internode. We have addressed it by ensuring that no matter what, whether
> > you use SSL or not, whether you use internode compression or not, a
> > protocol level CRC is always present, for every message frame. It’s our
> > deep and sincere belief that shipping a new implementation of the messaging
> > protocol without application-level data integrity checks would be
> > unacceptable for a modern database.
> >
>
> I'm all for introducing more correctness checks at all levels especially in
> communication.
> Having dealt with multiple data corruption bugs that could have been easily
> prevented by
> having a checksum, it's great to see that we are moving in this direction.
>
>
> > # Lack of back-pressure and memory usage constraints
> >
> > As it stands today, it’s far too easy for a single slow node to become
> > overwhelmed by messages from its peers.  Conversely, multiple coordinators
> > can be made unstable by the backlog of messages to deliver to just one
> > struggling node.
> >
> > To address this problem, we have introduced strict memory usage constraints
> > that apply TCP-level back-pressure, on both outbound and inbound.  It is
> > now impossible for a node to be swamped on inbound, and on outbound it is
> > made significantly harder to overcommit resources.  It’s a simple, reliable
> > mechanism that drastically improves cluster stability under load, and
> > especially overload.
> >
> > Cassandra is a mature system, and introducing an entirely new messaging
> > implementation without resolving this fundamental stability issue is
> > difficult to justify in our view.
> >
>
> I'd say that this is required to be able to ship 4.0 as a release focused
> on stability.
> I personally have been waiting for this to happen for years. Significant
> step forward in our QoS story.
>
>
> >
> > # State of the patch, feature freeze and 4.0 timeline concerns
> >
> > The patch is essentially complete, with much improved unit tests all
> > passing, dtests green, and extensive fuzz testing underway - with initial
> > results all positive.  We intend to further improve in-code documentation
> > and test coverage in the next week or two, and do some minor additional
> > code review, but we believe it will be basically 

TLP tools for stress testing and building test clusters in AWS

2019-04-12 Thread Jon Haddad
I don't want to derail the discussion about Stabilizing Internode
Messaging, so I'm starting this as a separate thread.  There was a
comment that Josh made [1] about doing performance testing with real
clusters as well as a lot of microbenchmarks, and I'm 100% in support
of this.  We've been working on some tooling at TLP for the last
several months to make this a lot easier.  One of the goals has been
to help improve the 4.0 testing process.

The first tool we have is tlp-stress [2].  It's designed with a "get
started in 5 minutes" mindset.  My goal was to ship a stress tool that
ships with real workloads out of the box that can be easily tweaked,
similar to how fio allows you to design a disk workload and tweak it
with paramaters.  Included are stress workloads that stress LWTs (two
different types), materialized views, counters, time series, and
key-value workloads.  Each workload can be modified easily to change
compaction strategies, concurrent operations, number of partitions.
We can run workloads for a set number of iterations or a custom
duration.  We've used this *extensively* at TLP to help our customers
and most of our blog posts that discuss performance use it as well.
It exports data to both a CSV format and auto sets up prometheus for
metrics collection / aggregation.  As an example, we were able to
determine that the compression length set on the paxos tables imposes
a significant overhead when using the Locking LWT workload, which
simulates locking and unlocking of rows.  See CASSANDRA-15080 for
details.

We have documentation [3] on the TLP website.

The second tool we've been working on is tlp-cluster [4].  This tool
is designed to help provision AWS instances for the purposes of
testing.  To be clear, I don't expect, or want, this tool to be used
for production environments.  It's designed to assist with the
Cassandra build process by generating deb packages or re-using the
ones that have already been uploaded.  Here's a short list of the
things you'll care about:

1. Create instances in AWS for Cassandra using any instance size and
number of nodes.  Also create tlp-stress instances and a box for
monitoring
2. Use any available build of Cassandra, with a quick option to change
YAML config.  For example: tlp-stress use 3.11.4 -c
concurrent_writes:256
3. Do custom builds just by pointing to a local Cassandra git repo.
They can be used the same way as #2.
4. tlp-stress is automatically installed on the stress box.
5. Everything's installed with pure bash.  I considered something more
complex, but since this is for development only, it turns out the
simplest tool possible works well and it means it's easily
configurable.  Just drop in your own bash script starting with a
number in a XX_script_name.sh format and it gets run.
6. The monitoring box is running Prometheus.  It auto scrapes
Cassandra using the Instaclustr metrics library.
7. Grafana is also installed automatically.  There's a couple sample
graphs there now.  We plan on having better default graphs soon.

For the moment it installs java 8 only but that should be easily
fixable to use java 11 to test ZGC (it's on my radar).

Documentation for tlp-cluster is here [5].

There's still some things to work out in the tool, and we've been
working hard to smooth out the rough edges.  I still haven't announced
anything WRT tlp-cluster on the TLP blog, because I don't think it's
quite ready for public consumption, but I think the folks on this list
are smart enough to see the value in it even if it has a few warts
still.

I don't consider myself familiar enough with the networking patch to
give it a full review, but I am qualified to build tools to help test
it and go through the testing process myself.  From what I can tell
the patch is moving the codebase in a positive direction and I'd like
to help build confidence in it so we can get it merged in.

We'll continue to build out and improve the tooling with the goal of
making it easier for people to jump into the QA side of things.

Jon

[1] 
https://lists.apache.org/thread.html/742009c8a77999f4b62062509f087b670275f827d0c1895bf839eece@%3Cdev.cassandra.apache.org%3E
[2] https://github.com/thelastpickle/tlp-stress
[3] http://thelastpickle.com/tlp-stress/
[4] https://github.com/thelastpickle/tlp-cluster
[5] http://thelastpickle.com/tlp-cluster/

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: TLP tools for stress testing and building test clusters in AWS

2019-04-12 Thread Jon Haddad
I'd be more than happy to hop on a call next week to give you both
(and anyone else interested) a tour of our dev tools.  Maybe something
early morning on my end, which should be your evening, could work?

I can set up a Zoom conference to get everyone acquainted.  We can
record and post it for any who can't make it.

I'm thinking Tuesday, Wednesday, or Thursday morning, 9AM Pacific (5pm
London)?  If anyone's interested please reply with what dates work.
I'll be sure to post the details back here with the zoom link in case
anyone wants to join that didn't get a chance to reply, as well as a
link to the recorded call.

Jon

On Fri, Apr 12, 2019 at 10:41 AM Benedict Elliott Smith
 wrote:
>
> +1
>
> I’m also just as excited to see some standardised workloads and test bed.  At 
> the moment we’re benefiting from some large contributors doing their own 
> proprietary performance testing, which is super valuable and something we’ve 
> lacked before.  But I’m also keen to see some more representative workloads 
> that are reproducible by anybody in the community take shape.
>
>
> > On 12 Apr 2019, at 18:09, Aleksey Yeshchenko  
> > wrote:
> >
> > Hey Jon,
> >
> > This sounds exciting and pretty useful, thanks.
> >
> > Looking forward to using tlp-stress for validating 15066 performance.
> >
> > We should touch base some time next week to pick a comprehensive set of 
> > workloads and versions, perhaps?
> >
> >
> >> On 12 Apr 2019, at 16:34, Jon Haddad  wrote:
> >>
> >> I don't want to derail the discussion about Stabilizing Internode
> >> Messaging, so I'm starting this as a separate thread.  There was a
> >> comment that Josh made [1] about doing performance testing with real
> >> clusters as well as a lot of microbenchmarks, and I'm 100% in support
> >> of this.  We've been working on some tooling at TLP for the last
> >> several months to make this a lot easier.  One of the goals has been
> >> to help improve the 4.0 testing process.
> >>
> >> The first tool we have is tlp-stress [2].  It's designed with a "get
> >> started in 5 minutes" mindset.  My goal was to ship a stress tool that
> >> ships with real workloads out of the box that can be easily tweaked,
> >> similar to how fio allows you to design a disk workload and tweak it
> >> with paramaters.  Included are stress workloads that stress LWTs (two
> >> different types), materialized views, counters, time series, and
> >> key-value workloads.  Each workload can be modified easily to change
> >> compaction strategies, concurrent operations, number of partitions.
> >> We can run workloads for a set number of iterations or a custom
> >> duration.  We've used this *extensively* at TLP to help our customers
> >> and most of our blog posts that discuss performance use it as well.
> >> It exports data to both a CSV format and auto sets up prometheus for
> >> metrics collection / aggregation.  As an example, we were able to
> >> determine that the compression length set on the paxos tables imposes
> >> a significant overhead when using the Locking LWT workload, which
> >> simulates locking and unlocking of rows.  See CASSANDRA-15080 for
> >> details.
> >>
> >> We have documentation [3] on the TLP website.
> >>
> >> The second tool we've been working on is tlp-cluster [4].  This tool
> >> is designed to help provision AWS instances for the purposes of
> >> testing.  To be clear, I don't expect, or want, this tool to be used
> >> for production environments.  It's designed to assist with the
> >> Cassandra build process by generating deb packages or re-using the
> >> ones that have already been uploaded.  Here's a short list of the
> >> things you'll care about:
> >>
> >> 1. Create instances in AWS for Cassandra using any instance size and
> >> number of nodes.  Also create tlp-stress instances and a box for
> >> monitoring
> >> 2. Use any available build of Cassandra, with a quick option to change
> >> YAML config.  For example: tlp-stress use 3.11.4 -c
> >> concurrent_writes:256
> >> 3. Do custom builds just by pointing to a local Cassandra git repo.
> >> They can be used the same way as #2.
> >> 4. tlp-stress is automatically installed on the stress box.
> >> 5. Everything's installed with pure bash.  I considered something more
> >> complex, but since this is for development only, it turns out the
> >> simplest to

Re: TLP tools for stress testing and building test clusters in AWS

2019-04-15 Thread Jon Haddad
Hey all,

I've set up a Zoom call for 9AM Pacific time.  Everyone's welcome to join.

https://zoom.us/j/189920888

Looking forward to a good discussion on how we can all pitch in on
getting 4.0 out the door.

Jon

On Sat, Apr 13, 2019 at 9:14 AM Jonathan Koppenhofer
 wrote:
>
> Wednesday would work for me.
>
> We use and (slightly) contribute to tlp tools. We are platform testing and
> beginning 4.0 testing ourselves, so an in person overview would be great!
>
> On Sat, Apr 13, 2019, 8:48 AM Aleksey Yeshchenko 
> wrote:
>
> > Wednesday and Thursday, either, at 9 AM pacific WFM.
> >
> > > On 13 Apr 2019, at 13:31, Stefan Miklosovic <
> > stefan.mikloso...@instaclustr.com> wrote:
> > >
> > > Hi Jon,
> > >
> > > I would like be on that call too but I am off on Thursday.
> > >
> > > I am from Australia so 5pm London time is ours 2am next day so your
> > > Wednesday morning is my Thursday night. Wednesday early morning so
> > > your Tuesday morning and London's afternoon would be the best.
> > >
> > > Recording the thing would be definitely helpful too.
> > >
> > > On Sat, 13 Apr 2019 at 07:45, Jon Haddad  wrote:
> > >>
> > >> I'd be more than happy to hop on a call next week to give you both
> > >> (and anyone else interested) a tour of our dev tools.  Maybe something
> > >> early morning on my end, which should be your evening, could work?
> > >>
> > >> I can set up a Zoom conference to get everyone acquainted.  We can
> > >> record and post it for any who can't make it.
> > >>
> > >> I'm thinking Tuesday, Wednesday, or Thursday morning, 9AM Pacific (5pm
> > >> London)?  If anyone's interested please reply with what dates work.
> > >> I'll be sure to post the details back here with the zoom link in case
> > >> anyone wants to join that didn't get a chance to reply, as well as a
> > >> link to the recorded call.
> > >>
> > >> Jon
> > >>
> > >> On Fri, Apr 12, 2019 at 10:41 AM Benedict Elliott Smith
> > >>  wrote:
> > >>>
> > >>> +1
> > >>>
> > >>> I’m also just as excited to see some standardised workloads and test
> > bed.  At the moment we’re benefiting from some large contributors doing
> > their own proprietary performance testing, which is super valuable and
> > something we’ve lacked before.  But I’m also keen to see some more
> > representative workloads that are reproducible by anybody in the community
> > take shape.
> > >>>
> > >>>
> > >>>> On 12 Apr 2019, at 18:09, Aleksey Yeshchenko
> >  wrote:
> > >>>>
> > >>>> Hey Jon,
> > >>>>
> > >>>> This sounds exciting and pretty useful, thanks.
> > >>>>
> > >>>> Looking forward to using tlp-stress for validating 15066 performance.
> > >>>>
> > >>>> We should touch base some time next week to pick a comprehensive set
> > of workloads and versions, perhaps?
> > >>>>
> > >>>>
> > >>>>> On 12 Apr 2019, at 16:34, Jon Haddad  wrote:
> > >>>>>
> > >>>>> I don't want to derail the discussion about Stabilizing Internode
> > >>>>> Messaging, so I'm starting this as a separate thread.  There was a
> > >>>>> comment that Josh made [1] about doing performance testing with real
> > >>>>> clusters as well as a lot of microbenchmarks, and I'm 100% in support
> > >>>>> of this.  We've been working on some tooling at TLP for the last
> > >>>>> several months to make this a lot easier.  One of the goals has been
> > >>>>> to help improve the 4.0 testing process.
> > >>>>>
> > >>>>> The first tool we have is tlp-stress [2].  It's designed with a "get
> > >>>>> started in 5 minutes" mindset.  My goal was to ship a stress tool
> > that
> > >>>>> ships with real workloads out of the box that can be easily tweaked,
> > >>>>> similar to how fio allows you to design a disk workload and tweak it
> > >>>>> with paramaters.  Included are stress workloads that stress LWTs (two
> > >>>>> different types), materialized views, counters, time series, and
> > >>&g

Re: TLP tools for stress testing and building test clusters in AWS

2019-04-16 Thread Jon Haddad
Yes, sorry about that. Wednesday morning 9am PT

On Tue, Apr 16, 2019 at 3:26 AM Benedict Elliott Smith 
wrote:

> Just to confirm, this is on Wednesday?
>
> > On 15 Apr 2019, at 22:38, Jon Haddad  wrote:
> >
> > Hey all,
> >
> > I've set up a Zoom call for 9AM Pacific time.  Everyone's welcome to
> join.
> >
> > https://zoom.us/j/189920888
> >
> > Looking forward to a good discussion on how we can all pitch in on
> > getting 4.0 out the door.
> >
> > Jon
> >
> > On Sat, Apr 13, 2019 at 9:14 AM Jonathan Koppenhofer
> >  wrote:
> >>
> >> Wednesday would work for me.
> >>
> >> We use and (slightly) contribute to tlp tools. We are platform testing
> and
> >> beginning 4.0 testing ourselves, so an in person overview would be
> great!
> >>
> >> On Sat, Apr 13, 2019, 8:48 AM Aleksey Yeshchenko
> 
> >> wrote:
> >>
> >>> Wednesday and Thursday, either, at 9 AM pacific WFM.
> >>>
> >>>> On 13 Apr 2019, at 13:31, Stefan Miklosovic <
> >>> stefan.mikloso...@instaclustr.com> wrote:
> >>>>
> >>>> Hi Jon,
> >>>>
> >>>> I would like be on that call too but I am off on Thursday.
> >>>>
> >>>> I am from Australia so 5pm London time is ours 2am next day so your
> >>>> Wednesday morning is my Thursday night. Wednesday early morning so
> >>>> your Tuesday morning and London's afternoon would be the best.
> >>>>
> >>>> Recording the thing would be definitely helpful too.
> >>>>
> >>>> On Sat, 13 Apr 2019 at 07:45, Jon Haddad  wrote:
> >>>>>
> >>>>> I'd be more than happy to hop on a call next week to give you both
> >>>>> (and anyone else interested) a tour of our dev tools.  Maybe
> something
> >>>>> early morning on my end, which should be your evening, could work?
> >>>>>
> >>>>> I can set up a Zoom conference to get everyone acquainted.  We can
> >>>>> record and post it for any who can't make it.
> >>>>>
> >>>>> I'm thinking Tuesday, Wednesday, or Thursday morning, 9AM Pacific
> (5pm
> >>>>> London)?  If anyone's interested please reply with what dates work.
> >>>>> I'll be sure to post the details back here with the zoom link in case
> >>>>> anyone wants to join that didn't get a chance to reply, as well as a
> >>>>> link to the recorded call.
> >>>>>
> >>>>> Jon
> >>>>>
> >>>>> On Fri, Apr 12, 2019 at 10:41 AM Benedict Elliott Smith
> >>>>>  wrote:
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> I’m also just as excited to see some standardised workloads and test
> >>> bed.  At the moment we’re benefiting from some large contributors doing
> >>> their own proprietary performance testing, which is super valuable and
> >>> something we’ve lacked before.  But I’m also keen to see some more
> >>> representative workloads that are reproducible by anybody in the
> community
> >>> take shape.
> >>>>>>
> >>>>>>
> >>>>>>> On 12 Apr 2019, at 18:09, Aleksey Yeshchenko
> >>>  wrote:
> >>>>>>>
> >>>>>>> Hey Jon,
> >>>>>>>
> >>>>>>> This sounds exciting and pretty useful, thanks.
> >>>>>>>
> >>>>>>> Looking forward to using tlp-stress for validating 15066
> performance.
> >>>>>>>
> >>>>>>> We should touch base some time next week to pick a comprehensive
> set
> >>> of workloads and versions, perhaps?
> >>>>>>>
> >>>>>>>
> >>>>>>>> On 12 Apr 2019, at 16:34, Jon Haddad  wrote:
> >>>>>>>>
> >>>>>>>> I don't want to derail the discussion about Stabilizing Internode
> >>>>>>>> Messaging, so I'm starting this as a separate thread.  There was a
> >>>>>>>> comment that Josh made [1] about doing performance testing with
> real
> >>>>>>>> clusters as well as a lot of microbenchmarks, and I'm 100% in
> support

Re: TLP tools for stress testing and building test clusters in AWS

2019-04-16 Thread Jon Haddad
The one I sent out is open, no separate invite required.

On Tue, Apr 16, 2019 at 3:47 PM Dinesh Joshi  wrote:
>
> I'm slightly confused. The zoom meeting mentioned in this thread is only open 
> to who have registered interest here? If so, can someone please add me?
>
> Dinesh
>
> > On Apr 16, 2019, at 3:29 PM, Anthony Grasso  
> > wrote:
> >
> > Hi Stefan,
> >
> > Thanks for sending the invite out!
> >
> > Just wondering what do you think of the idea of having a Zoom meeting that
> > anyone can join? This way anyone else interested can join us as well. I can
> > set that up if you like?
> >
> > Cheers,
> > Anthony
> >
> > On Tue, 16 Apr 2019 at 21:24, Stefan Miklosovic <
> > stefan.mikloso...@instaclustr.com> wrote:
> >
> >> Hi Anthony,
> >>
> >> sounds good. I ve sent you Hangouts meeting invitation privately.
> >>
> >> Regards
> >>
> >> On Tue, 16 Apr 2019 at 14:53, Anthony Grasso 
> >> wrote:
> >>>
> >>> Hi Stefan,
> >>>
> >>> I have been working with Jon on developing the tool set. I can do a Zoom
> >>> call tomorrow (Wednesday) at 11am AEST if that works for you? We can go
> >>> through all the same information that Jon is going to go through in his
> >>> call. Note that I am in the same timezone as you, so if tomorrow morning
> >> is
> >>> no good we can always do the afternoon.
> >>>
> >>> Cheers,
> >>> Anthony
> >>>
> >>>
> >>> On Sat, 13 Apr 2019 at 22:38, Stefan Miklosovic <
> >>> stefan.mikloso...@instaclustr.com> wrote:
> >>>
> >>>> Hi Jon,
> >>>>
> >>>> I would like be on that call too but I am off on Thursday.
> >>>>
> >>>> I am from Australia so 5pm London time is ours 2am next day so your
> >>>> Wednesday morning is my Thursday night. Wednesday early morning so
> >>>> your Tuesday morning and London's afternoon would be the best.
> >>>>
> >>>> Recording the thing would be definitely helpful too.
> >>>>
> >>>> On Sat, 13 Apr 2019 at 07:45, Jon Haddad  wrote:
> >>>>>
> >>>>> I'd be more than happy to hop on a call next week to give you both
> >>>>> (and anyone else interested) a tour of our dev tools.  Maybe
> >> something
> >>>>> early morning on my end, which should be your evening, could work?
> >>>>>
> >>>>> I can set up a Zoom conference to get everyone acquainted.  We can
> >>>>> record and post it for any who can't make it.
> >>>>>
> >>>>> I'm thinking Tuesday, Wednesday, or Thursday morning, 9AM Pacific
> >> (5pm
> >>>>> London)?  If anyone's interested please reply with what dates work.
> >>>>> I'll be sure to post the details back here with the zoom link in case
> >>>>> anyone wants to join that didn't get a chance to reply, as well as a
> >>>>> link to the recorded call.
> >>>>>
> >>>>> Jon
> >>>>>
> >>>>> On Fri, Apr 12, 2019 at 10:41 AM Benedict Elliott Smith
> >>>>>  wrote:
> >>>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> I’m also just as excited to see some standardised workloads and
> >> test
> >>>> bed.  At the moment we’re benefiting from some large contributors doing
> >>>> their own proprietary performance testing, which is super valuable and
> >>>> something we’ve lacked before.  But I’m also keen to see some more
> >>>> representative workloads that are reproducible by anybody in the
> >> community
> >>>> take shape.
> >>>>>>
> >>>>>>
> >>>>>>> On 12 Apr 2019, at 18:09, Aleksey Yeshchenko
> >>>>  wrote:
> >>>>>>>
> >>>>>>> Hey Jon,
> >>>>>>>
> >>>>>>> This sounds exciting and pretty useful, thanks.
> >>>>>>>
> >>>>>>> Looking forward to using tlp-stress for validating 15066
> >> performance.
> >>>>>>>
> >>>>>>> We should touch base some time next week to pick a comprehensive
> >>

Re: TLP tools for stress testing and building test clusters in AWS

2019-04-17 Thread Jon Haddad
Hey folks.  I've opened the 9am zoom session.

You can join here: https://zoom.us/j/189920888


On Tue, Apr 16, 2019 at 10:49 PM Stefan Miklosovic
 wrote:
>
> Thanks Anthony for going that proverbial extra mile to cover people in
> different time zones too.
>
> I believe other people will find your talk as helpful as we did.
>
> Regards
>
> On Wed, 17 Apr 2019 at 10:08, Anthony Grasso  wrote:
> >
> > Hi Stefan and devs,
> >
> > I have set up a zoom link for the TLP tool set intro that will be on in an
> > hours time (17 April 2019 @ 11:00AM AEST): https://zoom.us/j/272648772
> >
> > This link is open so if anyone else wishes to join they are welcome to do
> > so. I will be covering the same topics Jon is covering in his meeting
> > tomorrow.
> >
> > Regards,
> > Anthony
> >
> >
> > On Wed, 17 Apr 2019 at 08:29, Anthony Grasso 
> > wrote:
> >
> > > Hi Stefan,
> > >
> > > Thanks for sending the invite out!
> > >
> > > Just wondering what do you think of the idea of having a Zoom meeting that
> > > anyone can join? This way anyone else interested can join us as well. I 
> > > can
> > > set that up if you like?
> > >
> > > Cheers,
> > > Anthony
> > >
> > > On Tue, 16 Apr 2019 at 21:24, Stefan Miklosovic <
> > > stefan.mikloso...@instaclustr.com> wrote:
> > >
> > >> Hi Anthony,
> > >>
> > >> sounds good. I ve sent you Hangouts meeting invitation privately.
> > >>
> > >> Regards
> > >>
> > >> On Tue, 16 Apr 2019 at 14:53, Anthony Grasso 
> > >> wrote:
> > >> >
> > >> > Hi Stefan,
> > >> >
> > >> > I have been working with Jon on developing the tool set. I can do a 
> > >> > Zoom
> > >> > call tomorrow (Wednesday) at 11am AEST if that works for you? We can go
> > >> > through all the same information that Jon is going to go through in his
> > >> > call. Note that I am in the same timezone as you, so if tomorrow
> > >> morning is
> > >> > no good we can always do the afternoon.
> > >> >
> > >> > Cheers,
> > >> > Anthony
> > >> >
> > >> >
> > >> > On Sat, 13 Apr 2019 at 22:38, Stefan Miklosovic <
> > >> > stefan.mikloso...@instaclustr.com> wrote:
> > >> >
> > >> > > Hi Jon,
> > >> > >
> > >> > > I would like be on that call too but I am off on Thursday.
> > >> > >
> > >> > > I am from Australia so 5pm London time is ours 2am next day so your
> > >> > > Wednesday morning is my Thursday night. Wednesday early morning so
> > >> > > your Tuesday morning and London's afternoon would be the best.
> > >> > >
> > >> > > Recording the thing would be definitely helpful too.
> > >> > >
> > >> > > On Sat, 13 Apr 2019 at 07:45, Jon Haddad  wrote:
> > >> > > >
> > >> > > > I'd be more than happy to hop on a call next week to give you both
> > >> > > > (and anyone else interested) a tour of our dev tools.  Maybe
> > >> something
> > >> > > > early morning on my end, which should be your evening, could work?
> > >> > > >
> > >> > > > I can set up a Zoom conference to get everyone acquainted.  We can
> > >> > > > record and post it for any who can't make it.
> > >> > > >
> > >> > > > I'm thinking Tuesday, Wednesday, or Thursday morning, 9AM Pacific
> > >> (5pm
> > >> > > > London)?  If anyone's interested please reply with what dates work.
> > >> > > > I'll be sure to post the details back here with the zoom link in
> > >> case
> > >> > > > anyone wants to join that didn't get a chance to reply, as well as 
> > >> > > > a
> > >> > > > link to the recorded call.
> > >> > > >
> > >> > > > Jon
> > >> > > >
> > >> > > > On Fri, Apr 12, 2019 at 10:41 AM Benedict Elliott Smith
> > >> > > >  wrote:
> > >> > > > >
> > >> > > > > +1
> > >> > > > >
> > >> > > > 

Re: Jira Suggestion

2019-05-14 Thread Jon Haddad
Great idea. +1

On Tue, May 14, 2019, 12:10 PM Benedict Elliott Smith 
wrote:

> It will be possible to insert n/a.  It will simply be a text field - Jira
> doesn’t know anything about the concept of a SHA, and I don’t intend to
> introduce validation logic.  It’s just a logical and consistent place for
> it to live, and a strong reminder to include it.  My intention is for it to
> be a text field supporting Jira markup, like Test and Doc Plan, so that we
> can insert cleanly formatted links to GitHub just like we do now in
> comments.
>
>
>
> > On 14 May 2019, at 20:04, Dinesh Joshi  wrote:
> >
> > I am +0.5 on this. I think it is a good idea. I want to ensure that we
> capture use-cases such as Tasks that may not have a git commit associated
> with them. There might be tickets that may have multiple git commits across
> repos. SVN commits may also need to be handled.
> >
> > Dinesh
> >
> >> On May 14, 2019, at 11:34 AM, Jeff Jirsa  wrote:
> >>
> >> Please
> >>
> >> --
> >> Jeff Jirsa
> >>
> >>
> >>> On May 14, 2019, at 7:53 AM, Benedict Elliott Smith <
> bened...@apache.org> wrote:
> >>>
> >>> How would people feel about introducing a field for the (git) commit
> SHA, to be required on (Jira) commit?
> >>>
> >>> The norm is that we comment the SHA, but given this is the norm
> perhaps we should codify it instead, while we have the chance?  It would
> also make it easier to find.
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: "4.0: TBD" -> "4.0: Est. Q4 2019"?

2019-05-28 Thread Jon Haddad
Sept is a pretty long ways off.  I think the ideal case is we can announce
4.0 release at the summit.  I'm not putting this as a "do or die" date, and
I don't think we need to announce it or make promises.  Sticking with "when
it's ready" is the right approach, but we need a target, and this is imo a
good one.

This date also gives us a pretty good runway.  We could cut our first
alphas in mid June / early July, betas in August and release in Sept.
 There's a ton of work going into testing 4.0 already.
Landing CASSANDRA-15066 will put us in a pretty good spot.  We've developed
tooling at TLP that will make it a lot easier to spin up dev clusters in
AWS as well as stress test them.  I've written about this a few times in
the past, and I'll have a few blog posts coming up that will help show this
in more details.

There's some other quality of life things we should try to hammer out
before then.  Updating our default JVM settings would be nice, for
example.  Improving documentation (the data modeling section in
particular), fixing the dynamic snitch issues [1], and some improvements to
virtual tables like exposing the sstable metadata [2], and exposing table
statistics [3] come to mind.  The dynamic snitch improvement will help
performance in a big way, and the virtual tables will go a long way to
helping with quality of life.  I showed a few folks virtual tables at the
Accelerate conference last week and the missing table statistics was a big
shock.  If we can get them in, it'll be a big help to operators.

[1] https://issues.apache.org/jira/browse/CASSANDRA-14459
[2] https://issues.apache.org/jira/browse/CASSANDRA-14630
[3] https://issues.apache.org/jira/browse/CASSANDRA-14572




On Mon, May 27, 2019 at 2:36 PM Nate McCall  wrote:

> Hi Sumanth,
> Thank you so much for taking the time to put this together.
>
> Cheers,
> -Nate
>
> On Tue, May 28, 2019 at 3:27 AM Sumanth Pasupuleti <
> sumanth.pasupuleti...@gmail.com> wrote:
>
> > I have taken an initial stab at documenting release types and exit
> criteria
> > in a google doc, to get us started, and to collaborate on.
> >
> >
> https://docs.google.com/document/d/1bS6sr-HSrHFjZb0welife6Qx7u3ZDgRiAoENMLYlfz8/edit?usp=sharing
> >
> > Thanks,
> > Sumanth
> >
> > On Thu, May 23, 2019 at 12:04 PM Dinesh Joshi  wrote:
> >
> > > Sankalp,
> > >
> > > Great point. This is the page created for testing.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality%3A+Components+and+Test+Plans
> > >
> > > I think we need to define the various release types and the exit
> criteria
> > > for each type of release. Anybody want to take a stab at this or start
> a
> > > thread to discuss it?
> > >
> > > Thanks,
> > >
> > > Dinesh
> > >
> > >
> > > > On May 23, 2019, at 11:57 AM, sankalp kohli 
> > > wrote:
> > > >
> > > > Hi,
> > > >Is there a page where it is written what is expected from an
> alpha,
> > > > beta, rc and a 4.0 release?
> > > > Also how are we coming up with Q4 2019 timeline. Is this for alpha,
> > beta,
> > > > rc or 4.0 release?
> > > >
> > > > Thanks,
> > > > Sankalp
> > > >
> > > > On Thu, May 23, 2019 at 11:27 AM Attila Wind  >
> > > wrote:
> > > >
> > > >> +1+1+1 I read a blog post was talking about last sept(?) to freeze
> > > >> features and start extensive testing. Maybe its really time to hit
> it!
> > > :-)
> > > >>
> > > >> Attila Wind
> > > >>
> > > >> http://www.linkedin.com/in/attilaw
> > > >> Mobile: +36 31 7811355
> > > >>
> > > >>
> > > >> On 2019. 05. 23. 19:30, ajs6f wrote:
> > > >>> +1 in the fullest degree. A date that needs to be changed is still
> > > >> enormously more attractive than no date at all.
> > > >>>
> > > >>> Adam Soroka
> > > >>>
> > >  On May 23, 2019, at 12:01 PM, Sumanth Pasupuleti <
> > > >> spasupul...@netflix.com.INVALID> wrote:
> > > 
> > >  Having at least a ballpark target on the website will definitely
> > help.
> > > >> +1
> > >  on setting it to Q4 2019 for now.
> > > 
> > >  On Thu, May 23, 2019 at 8:52 AM Dinesh Joshi 
> > > wrote:
> > > 
> > > > +1 on setting a date.
> > > >
> > > > Dinesh
> > > >
> > > >> On May 23, 2019, at 11:07 AM, Michael Shuler <
> > > mich...@pbandjelly.org>
> > > > wrote:
> > > >> We've had 4.0 listed as TBD release date for a very long time.
> > > >>
> > > >> Yesterday, Alexander Dejanovski got a "when's 4.0 going to
> > release?"
> > > > question after his repair talk and he suggested possibly Q4 2019.
> > > This
> > > > morning Nate McCall hinted at possibly being close by ApacheCon
> Las
> > > >> Vegas
> > > > in September. These got me thinking..
> > > >> Think we can we shoot for having a 4.0 alpha/beta/rc ready to
> > > > announce/release at ApacheCon? At that time, we'll have been
> frozen
> > > >> for 1
> > > > year, and I think we can. We'll GA release when it's ready, but I
> > > >> think Q4
> > > > could be an realistic target.
> > > >> With th

Re: "4.0: TBD" -> "4.0: Est. Q4 2019"?

2019-05-28 Thread Jon Haddad
My thinking is I'd like to be able to recommend 4.0.0 as a production ready
database for business critical cases of TLP customers.  If it's not ready
for prod, there's no way I'd vote to release it.  The TLP tooling I've
mentioned was developed over the last 6 months with the specific goal of
being able to test custom builds for the 4.0 release, and I've run several
clusters using it already.  The stress tool we built just got a --ttl
option so I should be able to start some longer running clusters that TTL
data out, so we can see the impact of running a cluster under heavy load
for several weeks.



On Tue, May 28, 2019 at 9:57 AM sankalp kohli 
wrote:

> Hi Jon,
>When you say 4.0 release, how do u match it with 3.0 minor
> releases. The unofficial rule is to not upgrade to prod till .10 is cut.
> Also due to heavy investment in testing, I dont think it will take as long
> as 3.0 but want to know what is your thinking with this.
>
> Thanks,
> Sankalp
>
> On Tue, May 28, 2019 at 9:40 AM Jon Haddad  wrote:
>
> > Sept is a pretty long ways off.  I think the ideal case is we can
> announce
> > 4.0 release at the summit.  I'm not putting this as a "do or die" date,
> and
> > I don't think we need to announce it or make promises.  Sticking with
> "when
> > it's ready" is the right approach, but we need a target, and this is imo
> a
> > good one.
> >
> > This date also gives us a pretty good runway.  We could cut our first
> > alphas in mid June / early July, betas in August and release in Sept.
> >  There's a ton of work going into testing 4.0 already.
> > Landing CASSANDRA-15066 will put us in a pretty good spot.  We've
> developed
> > tooling at TLP that will make it a lot easier to spin up dev clusters in
> > AWS as well as stress test them.  I've written about this a few times in
> > the past, and I'll have a few blog posts coming up that will help show
> this
> > in more details.
> >
> > There's some other quality of life things we should try to hammer out
> > before then.  Updating our default JVM settings would be nice, for
> > example.  Improving documentation (the data modeling section in
> > particular), fixing the dynamic snitch issues [1], and some improvements
> to
> > virtual tables like exposing the sstable metadata [2], and exposing table
> > statistics [3] come to mind.  The dynamic snitch improvement will help
> > performance in a big way, and the virtual tables will go a long way to
> > helping with quality of life.  I showed a few folks virtual tables at the
> > Accelerate conference last week and the missing table statistics was a
> big
> > shock.  If we can get them in, it'll be a big help to operators.
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-14459
> > [2] https://issues.apache.org/jira/browse/CASSANDRA-14630
> > [3] https://issues.apache.org/jira/browse/CASSANDRA-14572
> >
> >
> >
> >
> > On Mon, May 27, 2019 at 2:36 PM Nate McCall  wrote:
> >
> > > Hi Sumanth,
> > > Thank you so much for taking the time to put this together.
> > >
> > > Cheers,
> > > -Nate
> > >
> > > On Tue, May 28, 2019 at 3:27 AM Sumanth Pasupuleti <
> > > sumanth.pasupuleti...@gmail.com> wrote:
> > >
> > > > I have taken an initial stab at documenting release types and exit
> > > criteria
> > > > in a google doc, to get us started, and to collaborate on.
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1bS6sr-HSrHFjZb0welife6Qx7u3ZDgRiAoENMLYlfz8/edit?usp=sharing
> > > >
> > > > Thanks,
> > > > Sumanth
> > > >
> > > > On Thu, May 23, 2019 at 12:04 PM Dinesh Joshi 
> > wrote:
> > > >
> > > > > Sankalp,
> > > > >
> > > > > Great point. This is the page created for testing.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality%3A+Components+and+Test+Plans
> > > > >
> > > > > I think we need to define the various release types and the exit
> > > criteria
> > > > > for each type of release. Anybody want to take a stab at this or
> > start
> > > a
> > > > > thread to discuss it?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Dinesh
> > > > >

Re: [DISCUSS] Moving chats to ASF's Slack instance

2019-05-28 Thread Jon Haddad
+1

On Tue, May 28, 2019, 2:54 PM Joshua McKenzie  wrote:

> +1 to switching over. One less comms client + history + searchability is
> enough to get my vote easy.
>
> On Tue, May 28, 2019 at 5:52 PM Jonathan Ellis  wrote:
>
> > I agree.  This lowers the barrier to entry for new participants.  Slack
> is
> > probably two orders of magnitude more commonly used now than irc for sw
> > devs and three for everyone else.  And then you have the quality-of-life
> > features that you get out of the box with Slack and only with difficulty
> in
> > irc (history, search, file uploads...)
> >
> > On Tue, May 28, 2019 at 4:29 PM Nate McCall  wrote:
> >
> > > Hi Folks,
> > > While working on ApacheCon last week, I had to get setup on ASF's slack
> > > workspace. After poking around a bit, on a whim I created #cassandra
> and
> > > #cassandra-dev. I then invited a couple of people to come signup and
> test
> > > it out - primarily to make sure that the process was seamless for
> non-ASF
> > > account holders as well as committers, etc (it was).
> > >
> > > If you want to jump in, you can signup here:
> > > https://s.apache.org/slack-invite
> > >
> > > That said, I think it's time we transition from IRC to Slack. Now, I
> like
> > > CLI friendly, straight forward tools like IRC as much as anyone, but
> it's
> > > been more than once recently where a user I've talked to has said one
> of
> > > two things regarding our IRC channels: "What's IRC?" or "Yeah, I don't
> > > really do that anymore."
> > >
> > > In short, I think it's time to migrate. I think this will really just
> > > consist of some communications to our lists and updating the site
> > (anything
> > > I'm missing?). The archives of IRC should just kind of persist for
> > > posterity sake without any additional effort or maintenance. The
> > > ASF-requirements are all configured already on the Slack workspace, so
> I
> > > think we are good there.
> > >
> > > Thanks,
> > > -Nate
> > >
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >
>


[VOTE] remove the old wiki

2019-06-04 Thread Jon Haddad
I assume everyone here knows the old wiki hasn't been maintained, and is
years out of date.  I propose we sunset it completely and delete it forever
from the world.

I'm happy to file the INFRA ticket to delete it, I'd just like to give
everyone the opportunity to speak up in case there's something I'm not
aware of.

In favor of removing the wiki?  That's a +1.
-1 if you think we're better off migrating the entire thing to cwiki.

If you only need couple pages, feel free to move the content to the
documentation.  I'm sure we can also export the wiki in its entirety and
put it somewhere offline, if there's a concern about maybe needing some of
the content at some point in the future.

I think 72 hours is enough time to leave a vote open on this topic.

Jon


Re: [VOTE] remove the old wiki

2019-06-04 Thread Jon Haddad
I think we could port that page over and clean it up before deleting the
wiki.

On Tue, Jun 4, 2019 at 12:30 PM Joshua McKenzie 
wrote:

> Before I vote, do we have something analogous to this:
> https://wiki.apache.org/cassandra/ArchitectureInternals
> In the new wiki / docs? Looks like it's a stub:
> https://cassandra.apache.org/doc/latest/architecture/overview.html
>
> Having an architectural overview landing page would be critical before
> sunsetting the old one IMO. And yes, that ArchitectureInternals article
> is... very old. But very old > nothing in terms of establishing a framework
> in which to think about something. Maybe.
>
> On Tue, Jun 4, 2019 at 2:47 PM Jon Haddad  wrote:
>
> > I assume everyone here knows the old wiki hasn't been maintained, and is
> > years out of date.  I propose we sunset it completely and delete it
> forever
> > from the world.
> >
> > I'm happy to file the INFRA ticket to delete it, I'd just like to give
> > everyone the opportunity to speak up in case there's something I'm not
> > aware of.
> >
> > In favor of removing the wiki?  That's a +1.
> > -1 if you think we're better off migrating the entire thing to cwiki.
> >
> > If you only need couple pages, feel free to move the content to the
> > documentation.  I'm sure we can also export the wiki in its entirety and
> > put it somewhere offline, if there's a concern about maybe needing some
> of
> > the content at some point in the future.
> >
> > I think 72 hours is enough time to leave a vote open on this topic.
> >
> > Jon
> >
>


Re: Stability of MaterializedView in 3.11.x | 4.0

2019-08-28 Thread Jon Haddad
I've helped a lot of teams (a dozen to two dozen maybe) migrate away from
MVs due to inconsistencies, issues with streaming (have you added or
removed nodes yet?), and massive performance issues to the point of cluster
failure under (what I consider) trivial load.  I haven't gone too deep into
analyzing their issues, folks are usually fine with "move off them", vs
having me do a ton of analysis.

tlp-stress has a materialized view workload built in, and you can add
arbitrary CQL via the --cql flag to add a MV to any existing workload such
as KeyValue or BasicTimeSeries.

On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa  wrote:

> There have been people who have had operational issues related to MVs (many
> of them around running repair), but the biggest concern is correctness.
>
> It probably ultimately depends on what type of database you're running. If
> you're running some sort of IOT / analytics workload and you just want
> another way to SELECT the data, but you won't notice one of a billion
> records going missing, using MVs may be fine. If you're a bank, and one of
> a billion records going missing means you lose someone's bank account, I
> would avoid using MVs.
>
> It's all just risk management.
>
> On Wed, Aug 28, 2019 at 7:18 AM Pankaj Gajjar <
> pankaj.gaj...@contentserv.com>
> wrote:
>
> > Hi Michael,
> >
> > Thanks for putting very clever information " Users of MVs *must*
> determine
> > for themselves, through
> > thorough testing and understanding, if they wish to use them." And
> > this concluded that if there is any issue occur in future then only
> > solution is to rebuild the MVs since Cassandra does not able to make
> > consistent synch well.
> >
> > Also, we practically using the 10+ MVs and as of now, we have not faced
> > any issue, so my question to all community member, does anyone face any
> > critical issues ? so we need to start migration from MVs to manual query
> > base table ?
> >
> > Also, I can understand now, it's experimental and not ready for
> > production, so if possible, please ignore it only right ?
> >
> > Thanks
> > Pankaj
> >
> > On 27/08/19, 19:03, "Michael Shuler"  > of mich...@pbandjelly.org> wrote:
> >
> > It appears that you found the first message of the chain. I suggest
> > reading the linked JIRA and the complete dev@ thread that arrived at
> > this conclusion; there are loads of well formed opinions and
> > information. Users of MVs *must* determine for themselves, through
> > thorough testing and understanding, if they wish to use them.
> >
> > Linkage:
> > https://issues.apache.org/jira/browse/CASSANDRA-13959
> >   (sub-linkage..)
> >   https://issues.apache.org/jira/browse/CASSANDRA-13595
> >   https://issues.apache.org/jira/browse/CASSANDRA-13911
> >   https://issues.apache.org/jira/browse/CASSANDRA-13880
> >   https://issues.apache.org/jira/browse/CASSANDRA-12872
> >   https://issues.apache.org/jira/browse/CASSANDRA-13747
> >
> > Very much worth reading the complete thread:
> > part1:
> >
> >
> https://lists.apache.org/thread.html/d81a61da48e1b872d7599df4edfa8e244d34cbd591a18539f724796f@
> > 
> > part2:
> >
> >
> https://lists.apache.org/thread.html/19b7fcfd3b47f1526d6e993b3bb97f6c43e5ce204bc976ec0701cdd3@
> > 
> >
> > Quick JQL for open tickets with "mv":
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20text%20~%20mv%20AND%20status%20!%3D%20Resolved
> >
> > --
> > Michael
> >
> > On 8/27/19 5:01 AM, pankaj gajjar wrote:
> > > Hello,
> > >
> > >
> > >
> > > concern about Materialized Views (MVs) in Cassandra. Unfortunately
> > starting
> > > with version 3.11, MVs are officially considered experimental and
> > not ready
> > > for production use, as you can read here:
> > >
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201710.mbox/%3cetpan.59f24f38.438f4e99.7...@apple.com%3E
> > >
> > >
> > >
> > > Can you please someone give some productive feedback on this ? it
> > would
> > > help us to further implementation around the MVs in Cassandra.
> > >
> > >
> > >
> > > Does anyone facing some critical issue or data lose or
> > synchronization
> > > issue ?
> > >
> > >
> > >
> > > Regards
> > >
> > > Pankaj.
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
> >
> >
>


4.0 alpha before apachecon?

2019-08-28 Thread Jon Haddad
Hey folks,

I think it's time we cut a 4.0 alpha release.  Before I put up a vote
thread, is there a reason not to have a 4.0 alpha before ApacheCon /
Cassandra Summit?

There's a handful of small issues that I should be done for 4.0 (client
list in virtual tables, dynamic snitch improvements, fixing token counts),
I'm not trying to suggest we don't include them, but they're small enough I
think it's OK to merge them in following the first alpha.

Jon


Re: 4.0 alpha before apachecon?

2019-08-28 Thread Jon Haddad
Regarding the dynamic snitch improvements, it's gone through several rounds
of review already and there's been significant testing of it.  Regarding
the token change, switching a number from 256 -> 16 isn't so invasive that
we shouldn't do it.  There's a little extra work that needs to be done
there ideally to ensure safety, but it's again small enough where it
shouldn't be too big of a problem imo.  Both current implementations (256
tokens + our insanely over memory allocating dynamic snitch) limit the
ability of people to run large clusters, harming both availability and
performance.  It's been extremely harmful for Cassandra's reputation and
I'd really like it if we could ship something where I don't have to
constantly apologize to people I'm trying to help for the land mine
defaults we put out there.

To your point, I agree as a community we're lacking in an open, well
documented and up to date plan, and it needs to be addressed.  I think the
virtual meetings idea held at a regular might help a bit with that, I
intend on participating there.


On Wed, Aug 28, 2019 at 9:52 AM Joshua McKenzie 
wrote:

> >
> > dynamic snitch improvements, fixing token counts
>
>
>
> > they're small enough
>
>
> By what axis of measurement out of curiosity? Risk to re-test and validate
> a final artifact? Do we have a more clear understanding of what testing has
> taken place across the community?
>
> The last I saw, our documented test plan
> <
> https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality%3A+Components+and+Test+Plans
> >
> hasn't
> been maintained or kept up to date
> <
> https://issues.apache.org/jira/browse/CASSANDRA-14862?jql=project%20%3D%20CASSANDRA%20AND%20%20labels%20%3D%204.0-QA
> >.
> Is there another artifact reflecting what testing people have in flight to
> better reflect what risk of needing to re-test we have from these (and
> other) post-freeze changes?
>
>
>
> On Wed, Aug 28, 2019 at 11:52 AM Jon Haddad  wrote:
>
> > Hey folks,
> >
> > I think it's time we cut a 4.0 alpha release.  Before I put up a vote
> > thread, is there a reason not to have a 4.0 alpha before ApacheCon /
> > Cassandra Summit?
> >
> > There's a handful of small issues that I should be done for 4.0 (client
> > list in virtual tables, dynamic snitch improvements, fixing token
> counts),
> > I'm not trying to suggest we don't include them, but they're small
> enough I
> > think it's OK to merge them in following the first alpha.
> >
> > Jon
> >
>


Re: Stability of MaterializedView in 3.11.x | 4.0

2019-08-28 Thread Jon Haddad
>  Arguably, the other alternative to server-side denormalization is to do
the denormalization client-side which comes with the same axes of costs and
complexity, just with more of each.

That's not completely true.  You can write to any number of tables without
doing a read, and the cost of reading data off disk is significantly
greater than an insert alone.  You can crush a cluster with a write heavy
workload and MVs that would otherwise be completely fine to do all writes.

The other issue with MVs is that you still need to understand fundamentals
of data modeling, that don't magically solve the problem of enormous
partitions.  One of the reasons I've had to un-MV a lot of clusters is
because people have put an MV on a table with a low-cardinality field and
found themselves with a 10GB partition nightmare, so they need to go back
and remodel the view as something more complex anyways.  In this case, the
MV was extremely high cost since now they've not only pushed out a poor
implementation to begin with but now have the cost of a migration as well
as a rewrite.



On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie 
wrote:

> >
> > so we need to start migration from MVs to manual query base table ?
>
>  Arguably, the other alternative to server-side denormalization is to do
> the denormalization client-side which comes with the same axes of costs and
> complexity, just with more of each.
>
> Jeff's spot on when he discusses the risk appetite vs. mitigation aspect of
> it. There's a reason banks do end-of-day close-out validation analysis and
> have redundant systems for things like this.
>
> On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad  wrote:
>
> > I've helped a lot of teams (a dozen to two dozen maybe) migrate away from
> > MVs due to inconsistencies, issues with streaming (have you added or
> > removed nodes yet?), and massive performance issues to the point of
> cluster
> > failure under (what I consider) trivial load.  I haven't gone too deep
> into
> > analyzing their issues, folks are usually fine with "move off them", vs
> > having me do a ton of analysis.
> >
> > tlp-stress has a materialized view workload built in, and you can add
> > arbitrary CQL via the --cql flag to add a MV to any existing workload
> such
> > as KeyValue or BasicTimeSeries.
> >
> > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa  wrote:
> >
> > > There have been people who have had operational issues related to MVs
> > (many
> > > of them around running repair), but the biggest concern is correctness.
> > >
> > > It probably ultimately depends on what type of database you're running.
> > If
> > > you're running some sort of IOT / analytics workload and you just want
> > > another way to SELECT the data, but you won't notice one of a billion
> > > records going missing, using MVs may be fine. If you're a bank, and one
> > of
> > > a billion records going missing means you lose someone's bank account,
> I
> > > would avoid using MVs.
> > >
> > > It's all just risk management.
> > >
> > > On Wed, Aug 28, 2019 at 7:18 AM Pankaj Gajjar <
> > > pankaj.gaj...@contentserv.com>
> > > wrote:
> > >
> > > > Hi Michael,
> > > >
> > > > Thanks for putting very clever information " Users of MVs *must*
> > > determine
> > > > for themselves, through
> > > > thorough testing and understanding, if they wish to use them."
> And
> > > > this concluded that if there is any issue occur in future then only
> > > > solution is to rebuild the MVs since Cassandra does not able to make
> > > > consistent synch well.
> > > >
> > > > Also, we practically using the 10+ MVs and as of now, we have not
> faced
> > > > any issue, so my question to all community member, does anyone face
> any
> > > > critical issues ? so we need to start migration from MVs to manual
> > query
> > > > base table ?
> > > >
> > > > Also, I can understand now, it's experimental and not ready for
> > > > production, so if possible, please ignore it only right ?
> > > >
> > > > Thanks
> > > > Pankaj
> > > >
> > > > On 27/08/19, 19:03, "Michael Shuler"  > behalf
> > > > of mich...@pbandjelly.org> wrote:
> > > >
> > > > It appears that you found the first message of the chain. I
> suggest
> > > > reading the linked JIRA and the complete dev@ th

Re: 4.0 alpha before apachecon?

2019-08-28 Thread Jon Haddad
Yes we do.  It's one of the reasons I've spent about a lot of (thousands?)
hours working on tlp-stress and tlp-cluster in the last 2 years.  I shared
some progress on this a little ways back.  I'll send out a separate email
soon with updates, since we just merged in a *lot* of features that will
help with testing.

On Wed, Aug 28, 2019 at 10:52 AM Dinesh Joshi  wrote:

> +1 on cutting an alpha and having a clear, documented test plan[1] for
> alpha. We need volunteers to drive the test plan, though. :)
>
> Thanks,
>
> Dinesh
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality%3A+Components+and+Test+Plans
>
> > On Aug 28, 2019, at 10:27 AM, Jon Haddad  wrote:
> >
> > Regarding the dynamic snitch improvements, it's gone through several
> rounds
> > of review already and there's been significant testing of it.  Regarding
> > the token change, switching a number from 256 -> 16 isn't so invasive
> that
> > we shouldn't do it.  There's a little extra work that needs to be done
> > there ideally to ensure safety, but it's again small enough where it
> > shouldn't be too big of a problem imo.  Both current implementations (256
> > tokens + our insanely over memory allocating dynamic snitch) limit the
> > ability of people to run large clusters, harming both availability and
> > performance.  It's been extremely harmful for Cassandra's reputation and
> > I'd really like it if we could ship something where I don't have to
> > constantly apologize to people I'm trying to help for the land mine
> > defaults we put out there.
> >
> > To your point, I agree as a community we're lacking in an open, well
> > documented and up to date plan, and it needs to be addressed.  I think
> the
> > virtual meetings idea held at a regular might help a bit with that, I
> > intend on participating there.
> >
> >
> > On Wed, Aug 28, 2019 at 9:52 AM Joshua McKenzie 
> > wrote:
> >
> >>>
> >>> dynamic snitch improvements, fixing token counts
> >>
> >>
> >>
> >>> they're small enough
> >>
> >>
> >> By what axis of measurement out of curiosity? Risk to re-test and
> validate
> >> a final artifact? Do we have a more clear understanding of what testing
> has
> >> taken place across the community?
> >>
> >> The last I saw, our documented test plan
> >> <
> >>
> https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality%3A+Components+and+Test+Plans
> >>>
> >> hasn't
> >> been maintained or kept up to date
> >> <
> >>
> https://issues.apache.org/jira/browse/CASSANDRA-14862?jql=project%20%3D%20CASSANDRA%20AND%20%20labels%20%3D%204.0-QA
> >>> .
> >> Is there another artifact reflecting what testing people have in flight
> to
> >> better reflect what risk of needing to re-test we have from these (and
> >> other) post-freeze changes?
> >>
> >>
> >>
> >> On Wed, Aug 28, 2019 at 11:52 AM Jon Haddad  wrote:
> >>
> >>> Hey folks,
> >>>
> >>> I think it's time we cut a 4.0 alpha release.  Before I put up a vote
> >>> thread, is there a reason not to have a 4.0 alpha before ApacheCon /
> >>> Cassandra Summit?
> >>>
> >>> There's a handful of small issues that I should be done for 4.0 (client
> >>> list in virtual tables, dynamic snitch improvements, fixing token
> >> counts),
> >>> I'm not trying to suggest we don't include them, but they're small
> >> enough I
> >>> think it's OK to merge them in following the first alpha.
> >>>
> >>> Jon
> >>>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: 4.0 alpha before apachecon?

2019-08-29 Thread Jon Haddad
Agreed. There's no point in a branch if we aren't committing new features
to trunk, and I don't think we should yet.

On Thu, Aug 29, 2019 at 3:50 PM Dinesh Joshi  wrote:

> We should not branch trunk at least until the RC is out.
>
> Dinesh
>
> > On Aug 29, 2019, at 3:32 PM, Sankalp Kohli 
> wrote:
> >
> > I do not think we should branch and is -1 on it. The reason we have
> trunk frozen was for our focus to be on 4.0. I think we still need that
> focus till a few more releases like these.
> >
> >> On Aug 30, 2019, at 12:24 AM, Nate McCall  wrote:
> >>
> >> On Fri, Aug 30, 2019 at 10:11 AM Benedict Elliott Smith <
> bened...@apache.org>
> >> wrote:
> >>
> >>>
> >>>   Seems to make sense to branch, right?
> >>>
> >>> Feels like a good line in the sand. +1
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Stability of MaterializedView in 3.11.x | 4.0

2019-08-30 Thread Jon Haddad
If you don't have any intent on running across multiple nodes, Cassandra is
probably the wrong DB for you.

Postgres will give you a better feature set for a single node.

On Fri, Aug 30, 2019 at 5:23 AM Pankaj Gajjar 
wrote:

> Understand it well, how about Cassandra running on single node, we don’t
> have cluster setup (3 nodes+ i.e).
>
> Does MVs perform well on single node machine ?
>
> Note: I know about HA, so lets keep it side for now and it's only possible
> when we have cluster setup.
>
> On 29/08/19, 06:21, "Dor Laor"  wrote:
>
> On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad  wrote:
>
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > the denormalization client-side which comes with the same axes of
> costs and
> > complexity, just with more of each.
> >
> > That's not completely true.  You can write to any number of tables
> without
> > doing a read, and the cost of reading data off disk is significantly
> > greater than an insert alone.  You can crush a cluster with a write
> heavy
> > workload and MVs that would otherwise be completely fine to do all
> writes.
> >
> > The other issue with MVs is that you still need to understand
> fundamentals
> > of data modeling, that don't magically solve the problem of enormous
> > partitions.  One of the reasons I've had to un-MV a lot of clusters
> is
> > because people have put an MV on a table with a low-cardinality
> field and
> > found themselves with a 10GB partition nightmare, so they need to go
> back
> > and remodel the view as something more complex anyways.  In this
> case, the
> > MV was extremely high cost since now they've not only pushed out a
> poor
> > implementation to begin with but now have the cost of a migration as
> well
> > as a rewrite.
> >
>
> +1
>
> Moreover, the hard part is that an update for the base table means that
> the original data needs to be read and the database (or the poor
> developer
> who implements the denormalized model) needs to delete the data in the
> view
> and then to write the new ones. All need to be of course resilient to
> all
> types of
> errors and failures. Had it been simple, there was no need for a
> database
> MV..
>
>
> >
> >
> >
> > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie <
> jmcken...@apache.org>
> > wrote:
> >
> > > >
> > > > so we need to start migration from MVs to manual query base
> table ?
> > >
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > > the denormalization client-side which comes with the same axes of
> costs
> > and
> > > complexity, just with more of each.
> > >
> > > Jeff's spot on when he discusses the risk appetite vs. mitigation
> aspect
> > of
> > > it. There's a reason banks do end-of-day close-out validation
> analysis
> > and
> > > have redundant systems for things like this.
> > >
> > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad 
> wrote:
> > >
> > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate
> away
> > from
> > > > MVs due to inconsistencies, issues with streaming (have you
> added or
> > > > removed nodes yet?), and massive performance issues to the point
> of
> > > cluster
> > > > failure under (what I consider) trivial load.  I haven't gone
> too deep
> > > into
> > > > analyzing their issues, folks are usually fine with "move off
> them", vs
> > > > having me do a ton of analysis.
> > > >
> > > > tlp-stress has a materialized view workload built in, and you
> can add
> > > > arbitrary CQL via the --cql flag to add a MV to any existing
> workload
> > > such
> > > > as KeyValue or BasicTimeSeries.
> > > >
> > > > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa 
> wrote:
> > > >
> > > > > There have been people who have had operational issues related
> to MVs
> > > > (many
> > > > > of them around running repair), but the biggest concern is
> > correctness.
> > > > >
> > > > > It probably ultim

  1   2   3   4   >