from:"Jeff Jirsa"

Re: [VOTE] Release Apache Cassandra 4.1.0 (take2)

2022-12-11 Thread Jeff Jirsa

Concurrent shouldn’t matter (they’re non-overlapping in the repro). And I’d personally be a bit surprised if table count matters that much. It probably just requires high core count and enough data that the streams actually interact with the rate limiter On Dec 11, 2022, at 10:32 AM, Mick Semb Wever  wrote:On Sat, 10 Dec 2022 at 23:09, Abe Ratnofsky  wrote:Sorry - responded on the take1 thread:Could we defer the close of this vote til Monday, December 12th after 6pm Pacific Time?Jon Meredith and I have been working thru an issue blocking streaming on 4.1 for the last couple months, and are now testing a promising fix. We're currently working on a write-up, and we'd like to hold the release until the community is able to review our findings.Update on behalf of Jon and Abe.The issue raised is CASSANDRA-18110.Concurrent, or nodes with high cpu count and number of tables performing, host replacements can fail.It is still unclear if this is applicable to OSS C*, and if so to what extent users might ever be impacted.More importantly, there's a simple workaround for anyone that hits the problem.Without further information on the table, I'm inclined to continue with 4.1.0 GA (closing the vote in 32 hours), but add a clear message to the release announcement of the issue and workaround. Interested in hearing others' positions, don't be afraid to veto if that's where you're at.

Re: Merging CEP-15 to trunk

2023-01-23 Thread Jeff Jirsa

 But it's not merge-than-review, because they've already been reviewed,
before being merged to the feature branch, by committers (actually PMC
members)?

You want code that's been written by one PMC member and reviewed by 2 other
PMC members to be put up for review by some random 4th party? For how long?

On Mon, Jan 23, 2023 at 2:54 PM Mick Semb Wever  wrote:

> The sooner it’s in trunk, the more eyes it will draw, IMO, if you are
>> right about most contributors not having paid attention to a feature branch.
>>
>
>
> We all agree we want the feature branch incrementally merged sooner rather
> than later.
> IMHO any merge to trunk, and any rebase and squash of ninja-fix commits,
> deserves an invite to reviewers.
> Any notion of merge-then-review isn't our community precedent.
>
> I appreciate the desire to not "be left hanging" by creating a merge
> ticket that requires a reviewer when no reviewer shows. And the desire to
> move quickly on this.
>
> I don't object if you wish to use this thread as that review process. On
> the other hand, if you create the ticket I promise to be a reviewer of it,
> so as not to delay.
>
>
>

Re: [ANNOUNCE] Evolving governance in the Cassandra Ecosystem

2023-01-30 Thread Jeff Jirsa

Usually requires an offer to donate from the current owner, an acceptance
of that offer (PMC vote), and then the work to ensure that contributions
are acceptable from a legal standpoint (e.g. like the incubator -
https://incubator.apache.org/guides/transitioning_asf.html - "For
contributions composed of patches from individual contributors, it is safe
to import the code once the major contributors (by volume) have completed
ICLAs or SGAs.").



On Mon, Jan 30, 2023 at 10:53 AM German Eichberger via dev <
dev@cassandra.apache.org> wrote:

> Great news indeed. I am wondering what it would take to include projects
> everyone is using like medusa, reaper, cassandra-ldap, etc. as a subproject.
>
> Thanks,
> German
> --
> *From:* Francisco Guerrero 
> *Sent:* Friday, January 27, 2023 9:46 AM
> *To:* dev@cassandra.apache.org 
> *Subject:* [EXTERNAL] Re: [ANNOUNCE] Evolving governance in the Cassandra
> Ecosystem
>
> Great news! I'm very happy to see these changes coming soon.
>
> Thanks to everyone involved in this work.
>
> On 2023/01/26 21:21:01 Josh McKenzie wrote:
> > The Cassandra PMC is pleased to announce that we're evolving our
> governance procedures to better foster subprojects under the Cassandra
> Ecosystem's umbrella. Astute observers among you may have noticed that the
> Cassandra Sidecar is already a subproject of Apache Cassandra as of CEP-1 (
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fpages%2Fviewpage.action%3FpageId%3D95652224&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xUbCe%2FQGgZq3Ynr42YQucMkOw1IZ67cONiQSnkZI7bk%3D&reserved=0)
> and Cassandra-14395 (
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCASSANDRASC-24&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RdItVOzwVs865Xd%2Ff8ancwkTDJWKPosHlKgbl1uysMw%3D&reserved=0),
> however up until now we haven't had any structure to accommodate raising
> committers on specific subprojects or clarity on the addition or governance
> of future subprojects.
> >
> > Further, with the CEP for the driver donation in motion (
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1e0SsZxjeTabzrMv99pCz9zIkkgWjUd4KL5Yp0GFzNnY%2Fedit%23heading%3Dh.xhizycgqxoyo&data=05%7C01%7CGerman.Eichberger%40microsoft.com%7Cda65de0ac4d84d94c54708db008e897d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638104384430582894%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=pUXo983DEHRBDtjGD%2FHaZnqc1uRwpS7tBkFkNF9Qfns%3D&reserved=0),
> the need for a structured and sustainable way to expand the Cassandra
> Ecosystem is pressing.
> >
> > We'll document these changes in the confluence wiki as well as the
> sidecar as our first formal subproject after any discussion on this email
> thread. The new governance process is as follows:
> > -
> >
> > Subproject Governance
> > 1. The Apache Cassandra PMC is responsible for governing the broad
> Cassandra Ecosystem.
> > 2. The PMC will vote on inclusion of new interested subprojects using
> the existing procedural change vote process documented in the confluence
> wiki (Super majority voting: 66% of votes must be in favor to pass.
> Requires 50% participation of roll call).
> > 3. New committers for these subprojects will be nominated and raised,
> both at inclusion as a subproject and over time. Nominations can be brought
> to priv...@cassandra.apache.org. Typically we're looking for a mix of
> commitment and contribution to the community and project, be it through
> code, documentation, presentations, or other significant engagement with
> the project.
> > 4. While the commit-bit is ecosystem wide, code modification rights and
> voting rights (technical contribution, binding -1, CEP's) are granted per
> subproject
> >  4a. Individuals are trusted to exercise prudence and only commit or
> claim binding votes on approved subprojects. Repeated violations of this
> social contract will result in losing committer status.
> >  4b. Members of the PMC have commit and voting rights on all
> subprojects.
> > 5. For each subproject, the PMC will determine a trio of PMC members
> that will be responsible for all PMC specific functions (release votes,
> driving CVE response, marketing, branding, policing marks, etc) on the
> subproject.
> > -
> >
> > Curious to see what thoughts we have as a community!
> >

Re: [VOTE] CEP-21 Transactional Cluster Metadata

2023-02-06 Thread Jeff Jirsa

+1


On Mon, Feb 6, 2023 at 8:16 AM Sam Tunnicliffe  wrote:

> Hi everyone,
>
> I would like to start a vote on this CEP.
>
> Proposal:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata
>
> Discussion:
> https://lists.apache.org/thread/h25skwkbdztz9hj2pxtgh39rnjfzckk7
>
> The vote will be open for 72 hours.
> A vote passes if there are at least three binding +1s and no binding
> vetoes.
>
> Thanks,
> Sam
>

Re: Downgradability

2023-02-20 Thread Jeff Jirsa

I'm not even convinced even 8110 addresses this - just writing sstables in
old versions won't help if we ever add things like new types or new types
of collections without other control abilities. Claude's other email in
another thread a few hours ago talks about some of these surprises -
"Specifically during the 3.1 -> 4.0 changes a column broadcast_port was
added to system/local.  This means that 3.1 system can not read the table
as it has no definition for it.  I tried marking the column for deletion in
the metadata and in the serialization header.  The later got past the
column not found problem, but I suspect that it just means that data
columns after broadcast_port shifted and so incorrectly read." - this is a
harder problem to solve than just versioning sstables and network
protocols.

Stepping back a bit, we have downgrade ability listed as a goal, but it's
not (as far as I can tell) universally enforced, nor is it clear at which
point we will be able to concretely say "this release can be downgraded to
X".   Until we actually define and agree that this is a real goal with a
concrete version where downgrade-ability becomes real, it feels like things
are somewhat arbitrarily enforced, which is probably very frustrating for
people trying to commit work/tickets.

- Jeff



On Mon, Feb 20, 2023 at 11:48 AM Dinesh Joshi  wrote:

> I’m a big fan of maintaining backward compatibility. Downgradability
> implies that we could potentially roll back an upgrade at any time. While I
> don’t think we need to retain the ability to downgrade in perpetuity it
> would be a good objective to maintain strict backward compatibility and
> therefore downgradability until a certain point. This would imply
> versioning metadata and extending it in such a way that prior version(s)
> could continue functioning. This can certainly be expensive to implement
> and might bloat on-disk storage. However, we could always offer an option
> for the operator to optimize the on-disk structures for the current version
> then we can rewrite them in the latest version. This optimizes the storage
> and opens up new functionality. This means new features that can work with
> old on-disk structures will be available while others that strictly require
> new versions of the data structures will be unavailable until the operator
> migrates to the new version. This migration IMO should be irreversible.
> Beyond this point the operator will lose the ability to downgrade which is
> ok.
>
> Dinesh
>
> On Feb 20, 2023, at 10:40 AM, Jake Luciani  wrote:
>
> 
> There has been progress on
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-8928
>
> Which is similar to what datastax does for DSE. Would this be an
> acceptable solution?
>
> Jake
>
> On Mon, Feb 20, 2023 at 11:17 AM guo Maxwell  wrote:
>
>> It seems “An alternative solution is to implement/complete CASSANDRA-8110
>> ” can give us more
>> options if it is finished😉
>>
>> Branimir Lambov 于2023年2月20日 周一下午11:03写道：
>>
>>> Hi everyone,
>>>
>>> There has been a discussion lately about changes to the sstable format
>>> in the context of being able to abort a cluster upgrade, and the fact that
>>> changes to sstables can prevent downgraded nodes from reading any data
>>> written during their temporary operation with the new version.
>>>
>>> Most of the discussion is in CASSANDRA-18134
>>> , and is
>>> spreading into CASSANDRA-14277
>>>  and
>>> CASSANDRA-17698 ,
>>> none of which is a good place to discuss the topic seriously.
>>>
>>> Downgradability is a worthy goal and is listed in the current roadmap. I
>>> would like to open a discussion here on how it would be achieved.
>>>
>>> My understanding of what has been suggested so far translates to:
>>> - avoid changes to sstable formats;
>>> - if there are changes, implement them in a way that is
>>> backwards-compatible, e.g. by duplicating data, so that a new version is
>>> presented in a component or portion of a component that legacy nodes will
>>> not try to read;
>>> - if the latter is not feasible, make sure the changes are only applied
>>> if a feature flag has been enabled.
>>>
>>> To me this approach introduces several risks:
>>> - it bloats file and parsing complexity;
>>> - it discourages improvement (e.g. CASSANDRA-17698 is no longer a LHF
>>> ticket once this requirement is in place);
>>> - it needs care to avoid risky solutions to address technical issues
>>> with the format versioning (e.g. staying on n-versions for 5.0 and needing
>>> a bump for a 4.1 bugfix might require porting over support for new
>>> features);
>>> - it requires separate and uncoordinated solutions to the problem and
>>> switching mechanisms for each individual change.
>>>
>>> An alternative solution is to implement/complete

Re: Downgradability

2023-02-22 Thread Jeff Jirsa

When people are serious about this requirement, they’ll build the downgrade equivalents of the upgrade tests and run them automatically, often, so people understand what the real gap is and when something new makes it break Until those tests exist, I think collectively we should all stop pretending like this is dogma. Best effort is best effort. On Feb 22, 2023, at 6:57 AM, Branimir Lambov  wrote:> 1. Major SSTable changes should begin with forward-compatibility in a prior release.This requires "feature" changes, i.e. new non-trivial code for previous patch releases. It also entails porting over any further format modification.Instead of this, in combination with your second point, why not implement backwards write compatibility? The opt-in is then clearer to define (i.e. upgrades start with e.g. a "4.1-compatible" settings set that includes file format compatibility and disabling of new features, new nodes start with "current" settings set). When the upgrade completes and the user is happy with the result, the settings set can be replaced.Doesn't this achieve what you want (and we all agree is a worthy goal) with much less effort for everyone? Supporting backwards-compatible writing is trivial, and we even have a proof-of-concept in the stats metadata serializer. It also simplifies by a serious margin the amount of work and thinking one has to do when a format improvement is implemented -- e.g. the TTL patch can just address this in exactly the way the problem was addressed in earlier versions of the format, by capping to 2038, without any need to specify, obey or test any configuration flags.>> It’s a commitment, and it requires every contributor to consider it as part of work they produce.> But it shouldn't be a burden. Ability to downgrade is a testable problem, so I see this work as a function of the suite of tests the project is willing to agree on supporting.I fully agree with this sentiment, and I feel that the current "try to not introduce breaking changes" approach is adding the burden, but not the benefits -- because the latter cannot be proven, and are most likely already broken.Regards,BranimirOn Wed, Feb 22, 2023 at 1:01 AM Abe Ratnofsky  wrote:Some interesting existing work on this subject is "Understanding and Detecting Software Upgrade Failures in Distributed Systems" - https://dl.acm.org/doi/10.1145/3477132.3483577, also summarized by Andrey Satarin here: https://asatarin.github.io/talks/2022-09-upgrade-failures-in-distributed-systems/They specifically tested Cassandra upgrades, and have a solid list of defects that they found. They also describe their testing mechanism DUPTester, which includes a component that confirms that the leftover state from one version can start up on the next version. There is a wider scope of upgrade defects highlighted in the paper, beyond SSTable version support.I believe the project would benefit from expanding our test suite similarly, by parametrizing more tests on upgrade version pairs.Also, per Benedict's comment:> It’s a commitment, and it requires every contributor to consider it as part of work they produce.But it shouldn't be a burden. Ability to downgrade is a testable problem, so I see this work as a function of the suite of tests the project is willing to agree on supporting.Specifically - I agree with Scott's proposal to emulate the HDFS upgrade-then-finalize approach. I would also support automatic finalization based on a time threshold or similar, to balance the priorities of safe and straightforward upgrades. Users need to be aware of the range of SSTable formats supported by a given version, and how to handle when their SSTables wouldn't be supported by an upcoming upgrade.--Abe--  
 Branimir Lambov
 
 e.
 branimir.lam...@datastax.com
 w. www.datastax.com

Re: Degradation of availability when using NTS and RF > number of racks

2023-03-06 Thread Jeff Jirsa

A huge number of people use this legal and unsafe combination - like anyone
running RF=3 in AWS us-west-1 (or any other region with only 2 accessible
AZs), and no patch is going to suddenly make that safe, and banning it
hurts users a lot.

If we're really going to ship a less-bad version of this, then that
less-bad version probably wants to reject invalid configs (like RF=3 with 2
racks), but again, it'll be approximately impossible for anyone to document
what it takes to move from the maybe-unsafe version to the definitely-safe
version without rewriting all of the data into the cluster, so most people
won't be able to use it anyway?





On Mon, Mar 6, 2023 at 8:31 AM Derek Chen-Becker 
wrote:

> 1) It does seem a like a big footgun. I think it violates the principle of
> least surprise if someone has configured NTS thinking that they are
> improving availability
> 2) I don't know that we want to ban it outright, since maybe there's a
> case for someone to be using a different CL that would be OK with the loss
> of a majority of replicas (e.g. ONE). For example, we don't fail if someone
> uses ALL or EACH_QUORUM with a problematic setup, do we? Would we warn on
> keyspace creation with RF > racks or are you suggesting that the warning
> would be at query time?
> 3) agreed, this doesn't seem like an enhancement as much as it is
> identifying legal but likely incorrect configuration
>
> Cheers,
>
> Derek
>
> On Mon, Mar 6, 2023 at 3:52 AM Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
>
>> Hi all,
>>
>> some time ago we identified an issue with NetworkTopologyStrategy. The
>> problem is that when RF > number of racks, it may happen that NTS places
>> replicas in such a way that when whole rack is lost, we lose QUORUM and
>> data are not available anymore if QUORUM CL is used.
>>
>> To illustrate this problem, lets have this setup:
>>
>> 9 nodes in 1 DC, 3 racks, 3 nodes per rack. RF = 5. Then, NTS could place
>> replicas like this: 3 replicas in rack1, 1 replica in rack2, 1 replica in
>> rack3. Hence, when rack1 is lost, we do not have QUORUM.
>>
>> It seems to us that there is already some logic around this scenario (1)
>> but the implementation is not entirely correct. This solution is not
>> computing the replica placement correctly so the above problem would be
>> addressed.
>>
>> We created a draft here (2, 3) which fixes it.
>>
>> There is also a test which simulates this scenario. When I assign 256
>> tokens to each node randomly (by same mean as generatetokens command uses)
>> and I try to compute natural replicas for 1 billion random tokens and I
>> compute how many cases there will be when 3 replicas out of 5 are inserted
>> in the same rack (so by losing it we would lose quorum), for above setup I
>> get around 6%.
>>
>> For 12 nodes, 3 racks, 4 nodes per rack, rf = 5, this happens in 10%
>> cases.
>>
>> To interpret this number, it basically means that with such topology, RF
>> and CL, when a random rack fails completely, when doing a random read,
>> there is 6% chance that data will not be available (or 10%, respectively).
>>
>> One caveat here is that NTS is not compatible with this new strategy
>> anymore because it will place replicas differently. So I guess that fixing
>> this in NTS will not be possible because of upgrades. I think people would
>> need to setup completely new keyspace and somehow migrate data if they wish
>> or they just start from scratch with this strategy.
>>
>> Questions:
>>
>> 1) do you think this is meaningful to fix and it might end up in trunk?
>>
>> 2) should not we just ban this scenario entirely? It might be possible to
>> check the configuration upon keyspace creation (rf > num of racks) and if
>> we see this is problematic we would just fail that query? Guardrail maybe?
>>
>> 3) people in the ticket mention writing "CEP" for this but I do not see
>> any reason to do so. It is just a strategy as any other. What would that
>> CEP would even be about? Is this necessary?
>>
>> Regards
>>
>> (1)
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L126-L128
>> (2) https://github.com/apache/cassandra/pull/2191
>> (3) https://issues.apache.org/jira/browse/CASSANDRA-16203
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>

Re: Degradation of availability when using NTS and RF > number of racks

2023-03-07 Thread Jeff Jirsa

Anyone have stats on how many people use RF > 3 per dc? (I know what it looks like in my day job but I don’t want to pretend it’s representative of the larger community) I’m a fan of fixing this but I do wonder how common this is in the wild. On Mar 7, 2023, at 9:12 AM, Derek Chen-Becker  wrote:I think that the warning would only be thrown in the case where a potentially QUORUM-busting configuration is used. I think it would be a worse experience to not warn and let the user discover later when they can't write at QUORUM.Cheers,DerekOn Tue, Mar 7, 2023 at 9:32 AM Jeremiah D Jordan  wrote:I agree with Paulo, it would be nice if we could figure out some way to make new NTS work correctly, with a parameter to fall back to the “bad” behavior, so that people restoring backups to a new cluster can get the right behavior to match their backups.
The problem with only fixing this in a new strategy is we have a ton of tutorials and docs out there which tell people to use NTS, so it would be great if we could keep “use NTS” as the recommendation.  Throwing a warning when someone uses NTS is kind of user hostile.  If someone just read some tutorial or doc which told them “make your key space this way” and then when they do that the database yells at them telling them they did it wrong, it is not a great experience.

-Jeremiah

> On Mar 7, 2023, at 10:16 AM, Benedict  wrote:
> 
> My view is that if this is a pretty serious bug. I wonder if transactional metadata will make it possible to safely fix this for users without rebuilding (only via opt-in, of course).
> 
>> On 7 Mar 2023, at 15:54, Miklosovic, Stefan  wrote:
>> 
>> Thanks everybody for the feedback.
>> 
>> I think that emitting a warning upon keyspace creation (and alteration) should be enough for starters. If somebody can not live without 100% bullet proof solution over time we might choose some approach from the offered ones. As the saying goes there is no silver bullet. If we decide to implement that new strategy, we would probably emit warnings anyway on NTS but it would be already done so just new strategy would be provided.
>> 
>> 
>> From: Paulo Motta 
>> Sent: Monday, March 6, 2023 17:48
>> To: dev@cassandra.apache.org
>> Subject: Re: Degradation of availability when using NTS and RF > number of racks
>> 
>> NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>> 
>> 
>> 
>> It's a bit unfortunate that NTS does not maintain the ability to lose a rack without loss of quorum for RF > #racks > 2, since this can be easily achieved by evenly placing replicas across all racks.
>> 
>> Since RackAwareTopologyStrategy is a superset of NetworkTopologyStrategy, can't we just use the new correct placement logic for newly created keyspaces instead of having a new strategy?
>> 
>> The placement logic would be backwards-compatible for RF <= #racks. On upgrade, we could mark existing keyspaces with RF > #racks with use_legacy_replica_placement=true to maintain backwards compatibility and log a warning that the rack loss guarantee is not maintained for keyspaces created before the fix. Old keyspaces with RF <=#racks would still work with the new replica placement. The downside is that we would need to keep the old NTS logic around, or we could eventually deprecate it and require users to migrate keyspaces using the legacy placement strategy.
>> 
>> Alternatively we could have RackAwareTopologyStrategy and fail NTS keyspace creation for RF > #racks and indicate users to use RackAwareTopologyStrategy to maintain the quorum guarantee on rack loss or set an override flag "support_quorum_on_rack_loss=false". This feels a bit iffy though since it could potentially confuse users about when to use each strategy.
>> 
>> On Mon, Mar 6, 2023 at 5:51 AM Miklosovic, Stefan > wrote:
>> Hi all,
>> 
>> some time ago we identified an issue with NetworkTopologyStrategy. The problem is that when RF > number of racks, it may happen that NTS places replicas in such a way that when whole rack is lost, we lose QUORUM and data are not available anymore if QUORUM CL is used.
>> 
>> To illustrate this problem, lets have this setup:
>> 
>> 9 nodes in 1 DC, 3 racks, 3 nodes per rack. RF = 5. Then, NTS could place replicas like this: 3 replicas in rack1, 1 replica in rack2, 1 replica in rack3. Hence, when rack1 is lost, we do not have QUORUM.
>> 
>> It seems to us that there is already some logic around this scenario (1) but the implementation is not entirely correct. This solution is not computing the replica placement correctly so the above problem would be addressed.
>> 
>> We created a draft here (2, 3) which fixes it.
>> 
>> There is also a test which simulates this scenario. When I assign 256

Re: [DISCUSS] Enhanced Disk Error Handling

2023-03-08 Thread Jeff Jirsa

On Wed, Mar 8, 2023 at 5:25 AM Bowen Song via dev 
wrote:

> At the moment, when a read error, such as unrecoverable bit error or data
> corruption, occurs in the SSTable data files, regardless of the
> disk_failure_policy configuration, manual (or to be precise, external)
> intervention is required to recover from the error.
>
> Commonly, there's two approach to recover from such error:
>
>1. The safer, but slower recover strategy: replace the entire node.
>2. The less safe, but faster recover strategy: shut down the node,
>delete the affected SSTable file(s), and then bring the node back online
>and run repair.
>
> Based on my understanding of Cassandra, it should be possible to recover
> from such error by marking the affected token range in the existing SSTable
> as "corrupted" and stop reading from them (e.g. creating a "bad block" file
> or in memory), and then streaming the affected token range from the healthy
> replicas. The corrupted SSTable file can then be removed upon the next
> successful compaction involving it, or alternatively an anti-compaction is
> performed on it to remove the corrupted data.
>
> The advantage of this strategy is:
>
>- Reduced node down time - node restart or replacement is not needed
>- Less data streaming is required - only the affected token range
>- Faster recovery time - less streaming and delayed compaction or
>anti-compaction
>- No less safe than replacing the entire node
>- This process can be automated internally, removing the need for
>operator inputs
>
> The disadvantage is added complexity on the SSTable read path and it may
> mask disk failures from the operator who is not paying attention to it.
>
> What do you think about this?
>

In a database with a shard manager rather than token range, you'd mark that
shard as dead and re-replicate it.

With the token range, if you COULD mark that token range as dead, and
re-bootstrap it (which today means a repair of the range followed by a
stream from a source that's not corrupt), you could do the same thing, but
there's no current way to mark that range as offline on a specific replica.

When transactional metadata ( CEP-21 ) is done, a future version of it
should make sure we eventually allow us to move past the token ring to
something that approximates this for cassandra (because doing so would also
allow us to expand/shrink faster, and handle other types of hot-ranges
better as well).

Re: [DISCUSS] CEP-26: Unified Compaction Strategy

2023-03-17 Thread Jeff Jirsa




> On Mar 17, 2023, at 1:46 PM, Jeremy Hanna  wrote:
> 
> 
> 
> One much more graceful element of UCS is that instead of what was previously 
> done with compaction strategies where the server just shuts down when running 
> out of space - forcing system administrators to be paranoid about headroom.  
> Instead UCS has a target overhead (default 20%).  First since the ranges are 
> sharded, it makes it less likely that there will be large sstables that need 
> to get compacted to require as much headroom, but  if it detects that there 
> is a compaction that will violate the target overhead, it will log that and 
> skip the compaction - a much more graceful way of handling it.

Skipping doesn’t really handle it though? 

If you have a newly flushed sstable full of tombstones and it naturally somehow 
triggers you to exceed that target overhead you never free that space? Usually 
LCS would try to reduce the scope of the compaction, and I assume UCS will too?

Re: [DISCUSS] CEP-26: Unified Compaction Strategy

2023-03-17 Thread Jeff Jirsa

I’m without laptop this week but looks like 
CompactionTask#reduceScopeForLimitedSpace

So maybe it just comes for free with UCS 


> On Mar 17, 2023, at 6:21 PM, Jeremy Hanna  wrote:
> 
> You're right that it doesn't handle it in the sense that it doesn't resolve 
> it the problem, but it also doesn't do what STCS does.  From what I've seen, 
> STCS blindly tries to compact and then the disk will fill up triggering the 
> disk failure policy.  With UCS it's much less likely and if it does happen, 
> my understanding is that it will skip the compaction.  I didn't realize that 
> LCS would try to reduce the scope of the compaction.  I can't find in the 
> branch where it handles that.
> 
> Branimir, can you point to where it handles the scenario?
> 
> Thanks,
> 
> Jeremy
> 
>>> On Mar 17, 2023, at 4:52 PM, Jeff Jirsa  wrote:
>>> 
>>> 
>>> 
>>>> On Mar 17, 2023, at 1:46 PM, Jeremy Hanna  
>>>> wrote:
>>> 
>>> 
>>> 
>>> One much more graceful element of UCS is that instead of what was 
>>> previously done with compaction strategies where the server just shuts down 
>>> when running out of space - forcing system administrators to be paranoid 
>>> about headroom.  Instead UCS has a target overhead (default 20%).  First 
>>> since the ranges are sharded, it makes it less likely that there will be 
>>> large sstables that need to get compacted to require as much headroom, but  
>>> if it detects that there is a compaction that will violate the target 
>>> overhead, it will log that and skip the compaction - a much more graceful 
>>> way of handling it.
>> 
>> Skipping doesn’t really handle it though? 
>> 
>> If you have a newly flushed sstable full of tombstones and it naturally 
>> somehow triggers you to exceed that target overhead you never free that 
>> space? Usually LCS would try to reduce the scope of the compaction, and I 
>> assume UCS will too? 
>> 
>> 
>

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-28 Thread Jeff Jirsa

On Tue, Mar 28, 2023 at 7:30 AM Jeremiah D Jordan 
wrote:

> - Resources isolation. Having the said service running within the same JVM
> may negatively impact Cassandra storage's performance. It could be more
> beneficial to have them in Sidecar, which offers strong resource isolation
> guarantees.
>
>
> How does having this in a side car change the impact on “storage
> performance”?  The side car reading sstables will have the same impact on
> storage IO as the main process reading sstables.
>

This is true.


>  Given the sidecar is running on the same node as the main C* process, the
> only real resource isolation you have is in heap/GC?  CPU/Memory/IO are all
> still shared between the main C* process and the side car, and coordinating
> those across processes is harder than coordinating them within a single
> process. For example if we wanted to have the compaction throughput,
> streaming throughput, and analytics read throughput all tied back to a
> single disk IO cap, that is harder with an external process.
>

Relatively trivial, for CPU and memory, to run them in different
containers/cgroups/etc, so you can put an exact cpu/memory limit on the
sidecar. That's different from a jmx rate limiter/throttle, but (arguably)
more precise, because it actually limits the underlying physical resource
instead of a proxy for it in a config setting.



>
> - Complexity. Considering the existence of the Sidecar project, it would
> be less complex to avoid adding another (http?) service in Cassandra.
>
>
> Not sure that is really very complex, running an http service is a pretty
> easy?  We already have netty in use to instantiate one from.
> I worry more about the complexity of having the matching schema for a set
> of sstables being read.  The complexity of new sstable versions/formats
> being introduced.  The complexity of having up to date data from memtables
> being considered by this API without having to flush before every query of
> it.  The complexity of dealing with the new memtable API introduced in
> CEP-11.  The complexity of coordinating compaction/streaming adding and
> removing files with these APIs reading them.  There are a lot of edge cases
> to consider for this external access to sstables that the main process
> considers itself the “owner” of.
>
> All of this is not to say that I think separating things out into other
> processes/services is bad.  But I think we need to be very careful with how
> we do it, or end users will end up running into all the sharp edges and the
> feature will fail.
>
> -Jeremiah
>
> On Mar 24, 2023, at 8:15 PM, Yifan Cai  wrote:
>
> Hi Jeremiah,
>
> There are good reasons to not have these inside Cassandra. Consider the
> following.
> - Resources isolation. Having the said service running within the same JVM
> may negatively impact Cassandra storage's performance. It could be more
> beneficial to have them in Sidecar, which offers strong resource isolation
> guarantees.
> - Availability. If the Cassandra cluster is being bounced, using sidecar
> would not affect the SBR/SBW functionality, e.g. SBR can still read
> SSTables via sidecar endpoints.
> - Compatibility. Sidecar provides stable REST-based APIs, such as
> uploading SSTables endpoint, which would remain compatible with different
> versions of Cassandra. The current implementation supports versions 3.0 and
> 4.0.
> - Complexity. Considering the existence of the Sidecar project, it would
> be less complex to avoid adding another (http?) service in Cassandra.
> - Release velocity. Sidecar, as an independent project, can have a quicker
> release cycle from Cassandra.
> - The features in sidecar are mostly implemented based on various existing
> tools/APIs exposed from Cassandra, e.g. ring, commit sstable, snapshot, etc.
>
> Regarding authentication and authorization
> - We will add it as a follow-on CEP in Sidecar, but we don't want to hold
> up this CEP. It would be a feature that benefits all Sidecar endpoints.
>
> - Yifan
>
> On Fri, Mar 24, 2023 at 2:43 PM Doug Rohrer  wrote:
>
>> I agree that the analytics library will need to support vnodes. To be
>> clear, there’s nothing preventing the solution from working with vnodes
>> right now, and no assumptions about a 1:1 topology between a token and a
>> node. However, we don’t, today, have the ability to test vnode support
>> end-to-end. We are working towards that, however, and should be able to
>> remove the caveat from the released analytics library once we can properly
>> test vnode support.
>> If it helps, I can update the CEP to say something more like “Caveat:
>> Currently untested with vnodes - work is ongoing to remove this limitation”
>> if that helps?
>>
>> Doug
>>
>> > On Mar 24, 2023, at 11:43 AM, Brandon Williams 
>> wrote:
>> >
>> > On Fri, Mar 24, 2023 at 10:39 AM Jeremiah D Jordan
>> >  wrote:
>> >>
>> >> I have concerns with the majority of this being in the sidecar and not
>> in the database itself.  I think it would make sense for the serv

Re: [DISCUSS] Introduce DATABASE as an alternative to KEYSPACE

2023-04-04 Thread Jeff Jirsa

KEYSPACE at least makes sense in the context that it is the unit that
defines how those partitions keys are going to be treated/replicated

DATABASE may be ambiguous, but it's ambiguity shared across the industry.

Creating a new name like TABLESPACE or TABLEGROUP sounds horrible because
it'll be unique to us in the world, and therefore unintuitive for new users.



On Tue, Apr 4, 2023 at 9:36 AM Josh McKenzie  wrote:

> I think there's competing dynamics here.
>
> 1) KEYSPACE isn't that great of a name; it's not a space in which keys are
> necessarily unique, and you can't address things just by key w/out their
> respective tables
> 2) DATABASE isn't that great of a name either due to the aforementioned
> ambiguity.
>
> Something like "TABLESPACE" or 'TABLEGROUP" would *theoretically* better
> satisfy point 1 and 2 above but subjectively I kind of recoil at both
> equally. So there's that.
>
> On Tue, Apr 4, 2023, at 12:30 PM, Abe Ratnofsky wrote:
>
> I agree with Bowen - I find Keyspace easier to communicate with. There are
> plenty of situations where the use of "database" is ambiguous (like "Could
> you help me connect to database x?"), but Keyspace refers to a single
> thing. I think more software is moving towards calling these things
> "namespaces" (like Kubernetes), and while "Keyspaces" is not a term used in
> this way elsewhere, I still find it leads to clearer communication.
>
> --
> Abe
>
>
> On Apr 4, 2023, at 9:24 AM, Andrés de la Peña 
> wrote:
>
> I think supporting DATABASE is a great idea.
>
> It's better aligned with SQL databases, and can save new users one of the
> first troubles they find.
>
> Probably anyone starting to use Cassandra for the first time is going to
> face the what is a keyspace? question in the first minutes. Saving that to
> users with a more common name would be a victory for usability IMO.
>
> On Tue, 4 Apr 2023 at 16:48, Mike Adamson  wrote:
>
> Hi,
>
> I'd like to propose that we add DATABASE to the CQL grammar as an
> alternative to KEYSPACE.
>
> Background: While TABLE was introduced as an alternative for COLUMNFAMILY
> in the grammar we have kept KEYSPACE for the container name for a group of
> tables. Nearly all traditional SQL databases use DATABASE as the container
> name for a group of tables so it would make sense for Cassandra to adopt
> this naming as well.
>
> KEYSPACE would be kept in the grammar but we would update some logging and
> documentation to encourage use of the new name.
>
> Mike Adamson
>
> --
> [image: DataStax Logo Square] 
> *Mike Adamson*
> Engineering
> +1 650 389 6000 <16503896000> | datastax.com 
> Find DataStax Online:
> [image: LinkedIn Logo]
> 
>[image: Facebook Logo]
> 
>[image: Twitter Logo]    [image: RSS
> Feed]    [image: Github Logo]
> 
>
>
>

Re: [DISCUSS] New data type for vector search

2023-04-27 Thread Jeff Jirsa

On Thu, Apr 27, 2023 at 7:39 AM Jonathan Ellis  wrote:

> It's been a while, so I may be missing something, but do we already have
> fixed-size lists?  If not, I don't see why we'd try to make this fit into a
> List-shaped problem.
>

We do not. The proposal got closed as wont-fix
https://issues.apache.org/jira/browse/CASSANDRA-9110

Re: [DISCUSS] The future of CREATE INDEX

2023-05-10 Thread Jeff Jirsa

Changes like this always scare me, but the benefits probably outweigh the
risks. Probably obviously to whoever implements but please make sure if
this happens is super visible in both NEWS and simultaneously updates the
to-string / to-cql representation of the schema in cqlsh / drivers /
snapshots

On Wed, May 10, 2023 at 8:27 PM Patrick McFadin  wrote:

> Having pulled a lot of developers out of the 2i fire, I would love it if
> defaults got a bit more sane. Adding USING...WITH... on CREATE INDEX
> seems like the right move for most developers that don't read docs and
> assume behavior.
>
> As much as I hate that 2i would be the configured default, I get it. New
> feature and this is the right thing for users. Would there be any way to
> switch 2i to SAI for the same index declaration? That would make for a nice
> upgrade for users moving to 5 without having to re-create indexes.
>
> Patrick
>
> On Wed, May 10, 2023 at 9:28 AM David Capwell  wrote:
>
>> Having to revert to CREATE CUSTOM INDEX sounds pretty awful, so I'd
>> prefer allowing USING...WITH... for CREATE INDEX
>>
>>
>> I have 0 issues with a new syntax to make this more clear
>>
>> just deprecating CREATE CUSTOM INDEX (at least after 5.0), but that's
>> more or less what my original proposal was above (modulo the configurable
>> default).
>>
>>
>> I have 0 issues deprecating and producing a ClientWarning recommending
>> the new syntax, but I would be against removing this syntax later on… it
>> should be low effort to keep, so breaking a user would not be desirable for
>> me.
>>
>> change only the fact that CREATE INDEX retains a configurable default
>>
>>
>> This option allows users to control this behavior, and allows us to
>> change the default over time.  For 5.0 I am strongly against SAI being the
>> default (new features disabled by default), but I wouldn’t have issues in
>> later versions changing the default once its been out for awhile.
>>
>> I’m not convinced by the changing defaults argument here. The
>> characteristics of the two index types are very different, and users with
>> scripts that make indexes today shouldn’t have their behaviour change.
>>
>>
>> In my mind this is no different from defaulting to BTI in a follow up
>> release, but if this concern is that the legacy index leaked details such
>> as index tables, so changing the default would have side effects in the
>> public domain that users might not expect, then I get it… are there other
>> concerns?
>>
>> On May 10, 2023, at 9:03 AM, Caleb Rackliffe 
>> wrote:
>>
>> tl;dr If you take my original proposal and change only the fact that CREATE
>> INDEX retains a configurable default, I think we get to the same place?
>>
>> (Then it's just a matter of what we do in 5.0 vs. after 5.0...)
>>
>> On Wed, May 10, 2023 at 11:00 AM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> I see a broad desire here to have a configurable (YAML) default
>>> implementation for CREATE INDEX. I'm not strongly opposed to that, as
>>> the concept of a default index implementation is pretty standard for most
>>> DBMS (see Postgres, etc.). However, keep in mind that if we do that, we
>>> still need to either revert to CREATE CUSTOM INDEX or add the
>>> USING...WITH... extensions to CREATE INDEX to override the default or
>>> specify parameters, which will be in play once SAI supports basic text
>>> tokenization/filtering. Having to revert to CREATE CUSTOM INDEX sounds
>>> pretty awful, so I'd prefer allowing USING...WITH... for CREATE INDEX
>>> and just deprecating CREATE CUSTOM INDEX (at least after 5.0), but
>>> that's more or less what my original proposal was above (modulo the
>>> configurable default).
>>>
>>> Thoughts?
>>>
>>> On Wed, May 10, 2023 at 2:59 AM Benedict  wrote:
>>>
 I’m not convinced by the changing defaults argument here. The
 characteristics of the two index types are very different, and users with
 scripts that make indexes today shouldn’t have their behaviour change.

 We could introduce new syntax that properly appreciates there’s no
 default index, perhaps CREATE LOCAL [type] INDEX? To also make clear that
 these indexes involve a partition key or scatter gather

 On 10 May 2023, at 06:26, guo Maxwell  wrote:

 
 +1 , as we must Improve the image of your own default indexing ability.

 and As for *CREATE CUSTOM INDEX *, should we just left as it is and we
 can disable the ability for create SAI through  *CREATE CUSTOM INDEX*  in
 some version after 5.0?

 for as I know there may be users using this as a plugin-index
 interface, like https://github.com/Stratio/cassandra-lucene-index
 (though these project may be inactive， But if someone wants to do something
 similar in the future, we don't have to stop).



 Jonathan Ellis  于2023年5月10日周三 10:01写道：

> +1 for this, especially in the long term.  CREATE INDEX should do the
> right thing for most people w

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jeff Jirsa

On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer  wrote:

> If you have rows that vary significantly in their size, your latencies
>> could end up being pretty unpredictable using a LIMIT BY . Being
>> able to specify a limit by bytes at the driver / API level would allow app
>> devs to get more deterministic results out of their interaction w/the DB if
>> they're looking to respond back to a client within a certain time frame and
>> / or determine next steps in the app (continue paging, stop, etc) based on
>> how long it took to get results back.
>
>
> Are you talking about the page size or the LIMIT. Once the LIMIT is
> reached there is no "continue paging". LIMIT is also at the CQL level not
> at the driver level.
> I can totally understand the need for a page size in bytes not for a LIMIT.
>

Would only ever EXPECT to see a page size in bytes, never a LIMIT
specifying bytes.

I know the C-11745 ticket says LIMIT, too, but that feels very odd to me.

Re: [DISCUSSION] Adding sonar report analysis to the Cassandra project

2023-06-12 Thread Jeff Jirsa

On Mon, Jun 12, 2023 at 10:18 AM Mick Semb Wever  wrote:

>
>
> On Mon, 12 Jun 2023 at 15:02, Maxim Muzafarov  wrote:
>
>> Hello everyone,
>>
>> I would like to make the source code of the Cassandra project more
>> visible to people outside of the Cassandra Community and highlight the
>> typical known issues in new contributions in the GitHub pull-request
>> interface as well. This makes it easier for those who are unfamiliar
>> with the accepted code style and just want to be part of a large and
>> friendly community to add new contributions.
>>
>> The ASF provides [1] the SonarClound facilities for the open source
>> project, which are free to use, and we can also easily add the process
>> of building and uploading reports to the build using GitHub actions
>> with almost no maintenance costs for us. Of course, as a
>> recommendation quality tool, it doesn't reject any changes/pull
>> requests, so nothing will change from that perspective.
>>
>> I've prepared everything we need to do this here (we also need to
>> modify the default Sonar Way profile to suit our needs, which I can't
>> do as I don't have sufficient privileges):
>> https://issues.apache.org/jira/browse/CASSANDRA-18390
>>
>> I look forward to hearing your thoughts on this.
>>
>
>
> Looks good.  Agree with the use of GHA, but it's worth noting that this
> cannot be a pre-commit gate – as PRs are not required.  And if it came as
> part of pre-commit CI, how would the feedback then work (as it's the jira
> ticket that is our point-of-contact pre-commit) ?
>
> I say go for it.  Especially with the post-commit trends it will be
> valuable for us to see it before further adoption and adjustment.
>

I'd also say the same - Go for it, at worst people can ignore it, at best
someone sees the data and decides to take action.

If we eventually try to define a POLICY based on the feedback, I suspect
it'll be a longer  conversation, but I don't see any harm in setting it up.

Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread Jeff Jirsa

+1


On Tue, Jun 13, 2023 at 7:15 AM Jeremy Hanna 
wrote:

> Calling for a vote on CEP-8 [1].
>
> To clarify the intent, as Benjamin said in the discussion thread [2], the
> goal of this vote is simply to ensure that the community is in favor of
> the donation. Nothing more.
> The plan is to introduce the drivers, one by one. Each driver donation
> will need to be accepted first by the PMC members, as it is the case for
> any donation. Therefore the PMC should have full control on the pace at
> which new drivers are accepted.
>
> If this vote passes, we can start this process for the Java driver under
> the direction of the PMC.
>
> Jeremy
>
> 1.
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>

Re: [DISCUSS] Using ACCP or tc-native by default

2023-06-22 Thread Jeff Jirsa

Either would be better than today.

On Thu, Jun 22, 2023 at 1:57 PM Jordan West  wrote:

> Hi,
>
> I’m wondering if there is appetite to change the default SSL provider for
> Cassandra going forward to either ACCP [1] or tc-native in Netty? Our
> deployment as well as others I’m aware of make this change in their fork
> and it can lead to significant performance improvement. When recently
> qualifying 4.1 without using ACCP (by accident) we noticed p99 latencies
> were 2x higher than 3.0 w/ ACCP. Wiring up ACCP can be a bit of a pain and
> also requires some amount of customization. I think it could be great for
> the wider community to adopt it.
>
> The biggest hurdle I foresee is licensing but ACCP is Apache 2.0 licensed.
> Anything else I am missing before opening a JIRA and submitting a patch?
>
> Jordan
>
>
> [1]
> https://github.com/corretto/amazon-corretto-crypto-provider
>

Re: Improved DeletionTime serialization to reduce disk size

2023-06-29 Thread Jeff Jirsa

On Thu, Jun 22, 2023 at 11:23 PM Berenguer Blasi 
wrote:

> Hi all,
>
> Given we're already introducing a new sstable format (OA) in 5.0 I would
> like to try to get this in before the freeze. The point being that
> sstables with lots of small partitions would benefit from a smaller DT
> at partition level. My tests show a 3%-4% size reduction on disk.
>

3-4% reduction on disk ... for what exactly?

It seems exceptionally uncommon to have 3% of your data SIZE be tombstones.

Is this enhancement driven by a pathological data model that's like "mostly
tiny records OR tombstones" ?

Re: Removal of CloudstackSnitch

2023-07-10 Thread Jeff Jirsa

+1


On Mon, Jul 10, 2023 at 8:42 AM Josh McKenzie  wrote:

> 2) keep it there in 5.0 but mark it @Deprecated
>
> I'd say Deprecate, log warnings that it's not supported nor maintained and
> people to use it at their own risk, and that it's going to be removed.
>
> That is, assuming the maintenance burden of it isn't high. I assume not
> since, as Brandon said, they're quite pluggable and well modularized.
>
> On Mon, Jul 10, 2023, at 9:57 AM, Brandon Williams wrote:
>
> I agree with Ekaterina, but also want to point out that snitches are
> pluggable, so whatever we do should be pretty safe.  If someone
> discovers after the removal that they need it, they can just plug it
> back in.
>
> Kind Regards,
> Brandon
>
> On Mon, Jul 10, 2023 at 8:54 AM Ekaterina Dimitrova
>  wrote:
> >
> > Hi Stefan,
> >
> > I think we should follow our deprecation rules and deprecate it in 5.0,
> potentially remove in 6.0. (Deprecate in one major, remove in the next
> major)
> > Maybe the deprecation can come with a note on your findings for the
> users, just in case someone somewhere uses it and did not follow the user
> mailing list?
> >
> > Thank you
> > Ekaterina
> >
> > On Mon, 10 Jul 2023 at 9:47, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
> >>
> >> Hi list,
> >>
> >> I want to ask about the future of CloudstackSnitch.
> >>
> >> This snitch was added 9 years ago (1). I contacted the original author
> of that snitch, Pierre-Yves Ritschard, who is currently CEO of a company he
> coded that snitch for.
> >>
> >> In a nutshell, Pierre answered that he does not think this snitch is
> relevant anymore and the company is using different way how to fetch
> metadata from a node, rendering CloudstackSnitch, as is, irrelevant for
> them.
> >>
> >> I also wrote an email to user ML list (2) about two weeks ago and
> nobody answered that they are using it either.
> >>
> >> The current implementation is using this approach (3) but I think that
> it is already obsolete in the snitch because snitch is adding a path to
> parsed metadata service IP which is probably not there at all in the
> default implementation of Cloudstack data server.
> >>
> >> What also bothers me is that we, as a community, seem to not be able to
> test the functionality of this snitch as I do not know anybody with a
> Cloudstack deployment who would be able to test this reliably.
> >>
> >> For completeness, in (1), Brandon expressed his opinion that unless
> users come forward for this snitch, he thinks the retiring it is the best
> option.
> >>
> >> For all cloud-based snitches, we did the refactorization of the code in
> 16555 an we work on improvement in 18438 which introduces a generic way how
> metadata services are called and plugging in custom logic or reusing a
> default implementation of a cloud connector is very easy, further making
> this snitch less relevant.
> >>
> >> This being said, should we:
> >>
> >> 1) remove it in 5.0
> >> 2) keep it there in 5.0 but mark it @Deprecated
> >> 3) keep it there
> >>
> >> Regards
> >>
> >> (1) https://issues.apache.org/jira/browse/CASSANDRA-7147
> >> (2) https://lists.apache.org/thread/k4woljlk23m2oylvrbnod6wocno2dlm3
> >> (3)
> https://docs.cloudstack.apache.org/en/latest/adminguide/virtual_machines/user-data.html#determining-the-virtual-router-address-without-dns
>
>
>

Re: [VOTE] Release Apache Cassandra 5.0-alpha1

2023-08-25 Thread Jeff Jirsa

Given the disclaimer, can you just confirm why we'd cut an alpha now -
we're trying to lock protocols and give other people an integration target,
presumably?


On Fri, Aug 25, 2023 at 8:14 AM Mick Semb Wever  wrote:

>
> Proposing the test build of Cassandra 5.0-alpha1 for release.
>
> DISCLAIMER, this alpha release does not contain the expected 5.0
> features: Vector Search (CEP-30), Transactional Cluster Metadata
> (CEP-21) and Accord Transactions (CEP-15).  These features will land
> in a later alpha release.
>
> Please also note that this is an alpha release and what that means,
> further info at
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>
> sha1: 62cb03cc7311384db6619a102d1da6a024653fa6
> Git: https://github.com/apache/cassandra/tree/5.0-alpha1-tentative
> Maven Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1314/org/apache/cassandra/cassandra-all/5.0-alpha1/
>
> The Source and Build Artifacts, and the Debian and RPM packages and
> repositories, are available here:
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-alpha1/
>
> The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no -1's.
>
> [1]: CHANGES.txt:
> https://github.com/apache/cassandra/blob/5.0-alpha1-tentative/CHANGES.txt
> [2]: NEWS.txt:
> https://github.com/apache/cassandra/blob/5.0-alpha1-tentative/NEWS.txt
>

Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Jeff Jirsa

Just to be clear - this repo?
https://github.com/haifengl/smile/blob/master/LICENSE

That shows GPL + Commercial?



On Wed, Sep 13, 2023 at 9:10 AM Brandon Williams  wrote:

> I don't see any problem with this, +1.
>
> Kind Regards,
> Brandon
>
>
> On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson 
> wrote:
>
>> CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
>> Storage-Attached Indexes] uses the smile-nlp library
>> (com.github.haifengl.smile-nlp) in its testing to allow the creation of
>> word2vec embeddings for valid input into the HNSW graph index.
>>
>> The reason for this library is that we found that using random vectors in
>> testing produced very inconsistent results. Using the smile-nlp word2vec
>> implementation with the glove.3k.50d library produces repeatable results.
>>
>> Does anyone have any objections to the use of this library as a test only
>> dependency?
>> --
>> [image: DataStax Logo Square]  *Mike Adamson*
>> Engineering
>>
>> +1 650 389 6000 <16503896000> | datastax.com 
>> Find DataStax Online: [image: LinkedIn Logo]
>> 
>>[image: Facebook Logo]
>> 
>>[image: Twitter Logo]    [image: RSS
>> Feed]    [image: Github Logo]
>> 
>>
>>

Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Jeff Jirsa

You can open a legal JIRA to confirm, but based on my understanding (and
re-confirming reading https://www.apache.org/legal/resolved.html#category-a
):

- The restriction is on what can be in A PROJECT. tests are part of the
project, and not distinguished from the compiled product (especially since
the PROJECT ships SOURCE to build the product, if the SOURCE to build
requires the test, the test is clearly non-optional)
- GPL is cat X https://www.apache.org/legal/resolved.html#category-x

Cat X mixes "project" and "product" a few times, but again, the product is
still the source distribution, which would include the test, which means
it's excluded.

"Apache projects may not distribute Category X licensed components, in
source or binary form" doesnt seem ambiguous to me, but if someone wants to
ask ASF legal if I'm wrong, that's totally ok.



On Wed, Sep 13, 2023 at 9:25 AM Ekaterina Dimitrova 
wrote:

> Jeff, isn’t this ok as long as it is used only in tests? If we are not
> sure we can open a Jira to legal?
>
> On Wed, 13 Sep 2023 at 12:23, Jeff Jirsa  wrote:
>
>> Just to be clear - this repo?
>> https://github.com/haifengl/smile/blob/master/LICENSE
>>
>> That shows GPL + Commercial?
>>
>>
>>
>> On Wed, Sep 13, 2023 at 9:10 AM Brandon Williams 
>> wrote:
>>
>>> I don't see any problem with this, +1.
>>>
>>> Kind Regards,
>>> Brandon
>>>
>>>
>>> On Wed, Sep 13, 2023 at 11:09 AM Mike Adamson 
>>> wrote:
>>>
>>>> CEP-30: [Approximate Nearest Neighbor(ANN) Vector Search via
>>>> Storage-Attached Indexes] uses the smile-nlp library
>>>> (com.github.haifengl.smile-nlp) in its testing to allow the creation of
>>>> word2vec embeddings for valid input into the HNSW graph index.
>>>>
>>>> The reason for this library is that we found that using random vectors
>>>> in testing produced very inconsistent results. Using the smile-nlp word2vec
>>>> implementation with the glove.3k.50d library produces repeatable results.
>>>>
>>>> Does anyone have any objections to the use of this library as a test
>>>> only dependency?
>>>> --
>>>> [image: DataStax Logo Square] <https://www.datastax.com/> *Mike
>>>> Adamson*
>>>> Engineering
>>>>
>>>> +1 650 389 6000 <16503896000> | datastax.com
>>>> <https://www.datastax.com/>
>>>> Find DataStax Online: [image: LinkedIn Logo]
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>>[image: Facebook Logo]
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>>[image: Twitter Logo] <https://twitter.com/DataStax>   [image: RSS
>>>> Feed] <https://www.datastax.com/blog/rss.xml>   [image: Github Logo]
>>>> <https://github.com/datastax>
>>>>
>>>>

Re: [DISCUSS] Add JVector as a dependency for CEP-30

2023-09-22 Thread Jeff Jirsa

To do that, the cassandra PMC can open a legal JIRA and ask for a (durable,
concrete) opinion.


On Fri, Sep 22, 2023 at 5:59 AM Benedict  wrote:

>
>1. my understanding is that with the former the liability rests on the
>provider of the lib to ensure it's in compliance with their claims to
>copyright
>
> I highly doubt liability works like that in all jurisdictions, even if it
> might in some. I can even think of some historic cases related to Linux
> where patent trolls went after users of Linux, though I’m not sure where
> that got to and I don’t remember all the details.
>
> But anyway, none of us are lawyers and we shouldn’t be depending on this
> kind of analysis. At minimum we should invite legal to proffer an opinion
> on whether dependencies are a valid loophole to the policy.
>
>
> On 22 Sep 2023, at 13:48, J. D. Jordan  wrote:
>
> 
> This Gen AI generated code use thread should probably be its own mailing
> list DISCUSS thread?  It applies to all source code we take in, and accept
> copyright assignment of, not to jars we depend on and not only to vector
> related code contributions.
>
> On Sep 22, 2023, at 7:29 AM, Josh McKenzie  wrote:
>
> 
> So if we're going to chat about GenAI on this thread here, 2 things:
>
>1. A dependency we pull in != a code contribution (I am not a lawyer
>but my understanding is that with the former the liability rests on the
>provider of the lib to ensure it's in compliance with their claims to
>copyright and it's not sticky). Easier to transition to a different dep if
>there's something API compatible or similar.
>2. With code contributions we take in, we take on some exposure in
>terms of copyright and infringement. git revert can be painful.
>
> For this thread, here's an excerpt from the ASF policy:
>
> a recommended practice when using generative AI tooling is to use tools
> with features that identify any included content that is similar to parts
> of the tool’s training data, as well as the license of that content.
>
> Given the above, code generated in whole or in part using AI can be
> contributed if the contributor ensures that:
>
>1. The terms and conditions of the generative AI tool do not place any
>restrictions on use of the output that would be inconsistent with the Open
>Source Definition (e.g., ChatGPT’s terms are inconsistent).
>2. At least one of the following conditions is met:
>1. The output is not copyrightable subject matter (and would not be
>   even if produced by a human)
>   2. No third party materials are included in the output
>   3. Any third party materials that are included in the output are
>   being used with permission (e.g., under a compatible open source 
> license)
>   of the third party copyright holders and in compliance with the 
> applicable
>   license terms
>   3. A contributor obtain reasonable certainty that conditions 2.2 or
>2.3 are met if the AI tool itself provides sufficient information about
>materials that may have been copied, or from code scanning results
>1. E.g. AWS CodeWhisperer recently added a feature that provides
>   notice and attribution
>
> When providing contributions authored using generative AI tooling, a
> recommended practice is for contributors to indicate the tooling used to
> create the contribution. This should be included as a token in the source
> control commit message, for example including the phrase “Generated-by
>
>
> I think the real challenge right now is ensuring that the output from an
> LLM doesn't include a string of tokens that's identical to something in its
> input training dataset if it's trained on non-permissively licensed inputs.
> That plus the risk of, at least in the US, the courts landing on the side
> of saying that not only is the output of generative AI not copyrightable,
> but that there's legal liability on either the users of the tools or the
> creators of the models for some kind of copyright infringement. That can be
> sticky; if we take PR's that end up with that liability exposure, we end up
> in a place where either the foundation could be legally exposed and/or we'd
> need to revert some pretty invasive code / changes.
>
> For example, Microsoft and OpenAI have publicly committed to paying legal
> fees for people sued for copyright infringement for using their tools:
> https://www.verdict.co.uk/microsoft-to-pay-legal-fees-for-customers-sued-while-using-its-ai-products/?cf-view
> .
> Pretty interesting, and not a step a provider would take in an environment
> where things were legally clear and settled.
>
> So while the usage of these things is apparently incredibly pervasive
> right now, "everybody is doing it" is a pr

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Jeff Jirsa

- I think this is a great step forward.
- Being able to move sstables around between tiers of storage is a feature
Cassandra desperately needs, especially if one of those tiers is some sort
of object storage
- This looks like it's a foundational piece that enables that. Perhaps by a
team that's already implemented this end to end?
- Rather than building this piece by piece, I think it'd be awesome if
someone drew up an end-to-end plan to implement tiered storage, so we can
make sure we're discussing the whole final state, and not an implementation
detail of one part of the final state?






On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
dev@cassandra.apache.org> wrote:

> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situations where there is not enough disk space for compaction and
>the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper
>storage layers.
>
> I have a working POC implementation [2] though there are some issues still
> to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread Jeff Jirsa

Claude if you’re still at POC phase does it make sense for you to perhaps help validate / qualify the work that Henrik seems willing to rebase rather than reinventing the wheel? On Sep 26, 2023, at 11:23 PM, Claude Warren, Jr via dev  wrote:I spent a little (very little) time building an S3 implementation using an Apache licensed S3 filesystem package.  I have not yet tested it but if anyone is interested it is at https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxyIn looking at some of the code I think the Cassandra File class needs to be modified to ask the ChannelProxy for the default file system for the file in question.  This should resolve some of the issues my original demo has with some files being created in the data tree.  It may also handle many of the cases for offline tools as well.On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan  wrote:Would it be possible to make Jimfs integration production-ready then? I see we are using it in the tests already.

It might be one of the reference implementations of this CEP. If there is a type of workload / type of nodes with plenty of RAM but no disk, some kind of compute nodes, it would just hold it all in memory and we might "flush" it to a cloud-based storage if rendered to be not necessary anymore (whatever that means).

We could then completely bypass the memtables as fetching data from an SSTable from memory would be basically roughly same?

On the other hand, that might be achieved by creating a ramdisk so I am not sure what exactly we would gain here. However, if it was eventually storing these SSTables in a cloud storage, we might "compact" "TWCS tables" automatically after so-and-so period by moving them there.

From: Jake Luciani 
Sent: Tuesday, September 26, 2023 19:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.

We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a pretty high value piece of functionality. I am happy to see there is interest in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to quickly turn into a dead end as we get into really using multiple storage backends. We need to be able to list files and really the full range of filesystem interactions that Java supports should work with any backend to make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends where appropriate, but that works is probably necessary even with `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple Fileystems). There will probably also be backend specific behaviors that show up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and their individual configuration that can be used, as well as configuration and support for a "backend file router" for file creation (and opening) that can be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the normal directory tree to other disk so that compaction can occur in situations where there is not enough disk space for compaction and the processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>

--
http://twitter.com

Re: [VOTE] Accept java-driver

2023-10-03 Thread Jeff Jirsa

+1


On Mon, Oct 2, 2023 at 9:53 PM Mick Semb Wever  wrote:

> The donation of the java-driver is ready for its IP Clearance vote.
> https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
>
> The SGA has been sent to the ASF.  This does not require acknowledgement
> before the vote.
>
> Once the vote passes, and the SGA has been filed by the ASF Secretary, we
> will request ASF Infra to move the datastax/java-driver as-is to
> apache/java-driver
>
> This means all branches and tags, with all their history, will be kept.  A
> cleaning effort has already cleaned up anything deemed not needed.
>
> Background for the donation is found in CEP-8:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>
> PMC members, please take note of (and check) the IP Clearance requirements
> when voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
>
> regards,
> Mick
>

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Jeff Jirsa

On Mon, Oct 23, 2023 at 4:52 AM Mick Semb Wever  wrote:

>
> The TCM work (CEP-21) is in its review stage but being well past our
> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
> like to propose the following.
>
>

I think this presumes that 5.0 GA is date driven instead of feature driven.

I'm sure there's a conversation elsewhere, but why isn't this date movable?

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Jeff Jirsa

Why ship a ghost release we dont really expect people to use. Why not just
move the date so all the PR content highlighting TCM+Accord isnt a mess?

I get it, nobody wants to move dates. Isn't that the least-bad option?

On Mon, Oct 23, 2023 at 11:28 AM Aleksey Yeshchenko 
wrote:

> I’m not so sure that many folks will choose to go 4.0->5.0->5.1 path
> instead of just waiting longer for TCM+Accord to be in, and go 4.0->5.1 in
> one hop.
>
> Nobody likes going through these upgrades. So I personally expect 5.0 to
> be a largely ghost release if we go this route, adopted by few, just a
> permanent burden on the merge path to trunk.
>
> Not to say that there isn’t valuable stuff in 5.0 without TCM and Accord -
> there most certainly is - but with the expectation that 5.1 will follow up
> reasonably shortly after with all that *and* two highly anticipated
> features on top, I just don’t see the point. It will be another 2.2 release.
>
>
> On 23 Oct 2023, at 17:43, Josh McKenzie  wrote:
>
> We discussed that at length in various other mailing threads Jeff - kind
> of settled on "we're committing to cutting a major (semver MAJOR or MINOR)
> every 12 months but want to remain flexible for exceptions when
> appropriate".
>
> And then we discussed our timeline for 5.0 this year and settled on the
> "let's try and get it out this calendar year so it's 12 months after 4.1,
> but we'll grandfather in TCM and Accord past freeze date if they can make
> it by October".
>
> So that's the history for how we landed here.
>
> 2) Do we drop the support of 3.0 and 3.11 after 5.0.0 is out or after
> 5.1.0 is?
>
> This is my understanding, yes. Deprecation and support drop is predicated
> on the 5.0 release, not any specific features or anything.
>
> On Mon, Oct 23, 2023, at 12:29 PM, Jeff Jirsa wrote:
>
>
>
> On Mon, Oct 23, 2023 at 4:52 AM Mick Semb Wever  wrote:
>
>
> The TCM work (CEP-21) is in its review stage but being well past our
> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
> like to propose the following.
>
>
>
> I think this presumes that 5.0 GA is date driven instead of feature driven.
>
> I'm sure there's a conversation elsewhere, but why isn't this date movable?
>
>
>

Re: [VOTE] Release Apache Cassandra 5.0-beta1

2023-11-28 Thread Jeff Jirsa

-0 to cutting a beta we know has a very obvious correctness flaw with a fix
already understood



On Tue, Nov 28, 2023 at 10:40 AM Patrick McFadin  wrote:

> JD, that wasn't my point. It feels like we are treating a beta like an RC,
> which it isn't. Ship Beta 1 now and Beta 2 later. We need people looking
> today because they will find new bugs and the signal is lost on alpha. It's
> too yolo for most people.
>
> On Tue, Nov 28, 2023 at 10:36 AM Benjamin Lerer  wrote:
>
>> -1 based on the problems raised by Caleb.
>>
>> I would be fine with releasing that version as an alpha as Jeremiah
>> proposed.
>>
>> As of this time, I'm also not aware of a user of the project operating a
>>> build from the 5.0 branch at substantial scale to suss out the operational
>>> side of what can be expected. If someone is running a build supporting
>>> non-perf-test traffic derived from the 5.0 branch and has an experience
>>> report to share it would be great to read.
>>
>>
>> Some people at Datastax are working on such testing. It will take a bit
>> of time before we get the final results though.
>>
>> Le mar. 28 nov. 2023 à 19:27, J. D. Jordan  a
>> écrit :
>>
>>> That said. This is clearly better than and with many fixes from the
>>> alpha. Would people be more comfortable if this cut was released as another
>>> alpha and we do beta1 once the known fixes land?
>>>
>>> On Nov 28, 2023, at 12:21 PM, J. D. Jordan 
>>> wrote:
>>>
>>> 
>>> -0 (NB) on this cut. Given the concerns expressed so far in the thread I
>>> would think we should re-cut beta1 at the end of the week.
>>>
>>> On Nov 28, 2023, at 12:06 PM, Patrick McFadin 
>>> wrote:
>>>
>>> 
>>> I'm a +1 on a beta now vs maybe later. Beta doesn't imply perfect
>>> especially if there are declared known issues. We need people outside of
>>> this tight group using it and finding issues. I know how this rolls. Very
>>> few people touch a Alpha release. Beta is when the engine starts and we
>>> need to get it started asap. Otherwise we are telling ourselves we have the
>>> perfect testing apparatus and don't need more users testing. I don't think
>>> that is the case.
>>>
>>> Scott, Ekaterina, and I are going to be on stage in 2 weeks talking
>>> about Cassandra 5 in the keynotes. In that time, our call to action is
>>> going to be to test the beta.
>>>
>>> Patrick
>>>
>>> On Tue, Nov 28, 2023 at 9:41 AM Mick Semb Wever  wrote:
>>>
 The vote will be open for 72 hours (longer if needed). Everyone who has
> tested the build is invited to vote. Votes by PMC members are considered
> binding. A vote passes if there are at least three binding +1s and no 
> -1's.
>


 +1

 Checked
 - signing correct
 - checksums are correct
 - source artefact builds (JDK 11+17)
 - binary artefact runs (JDK 11+17)
 - debian package runs (JDK 11+17)
 - debian repo runs (JDK 11+17)
 - redhat* package runs (JDK11+17)
 - redhat* repo runs (JDK 11+17)


 With the disclaimer:  There's a few known bugs in SAI, e.g. 19011, with
 fixes to be available soon in 5.0-beta2.

Re: [DISCUSS] CEP-39: Cost Based Optimizer

2023-12-14 Thread Jeff Jirsa

I'm also torn on the CEP as presented. I think some of it is my negative
emotional response to the examples - e.g. I've literally never seen a real
use case where unfolding constants matters, and I'm trying to convince
myself to read past that.

I also cant tell what exactly you mean when you say "In order to ensure
that the execution plans on each node are the same, the cardinality
estimator should provide the same global statistics on every node as well
as some notification mechanism that can be used to trigger
re-optimization." In my experience, you'll see variable cost on each host,
where a machine that went offline temporarily got a spike in sstables from
repair and has a compaction backlog, causing a higher cost per read on that
host due to extra sstables/duplicate rows/merges. Is the cost based
optimizer in your model going to understand the different cost per replica
and also use that in choosing the appropriate replicas to query?

Finally: ALLOW FILTERING should not be deprecated. It doesn't matter if the
CBO may be able to help improve queries that have filtering. That guard
exists because most people who are new to cassandra don't understand the
difference and it prevents far more self-inflicted failures than anyone can
count. Please do not remove this. You will instantly create a world where
most new users to the database tip over as soon as their adoption picks up.



On Thu, Dec 14, 2023 at 7:49 AM Chris Lohfink  wrote:

> I don't wanna be a blocker for this CEP or anything but did want to put my
> 2 cents in. This CEP is horrifying to me.
>
> I have seen thousands of clusters across multiple companies and helped
> them get working successfully. A vast majority of that involved blocking
> the use of MVs, GROUP BY, secondary indexes, and even just simple _range
> queries_. The "unncessary restrictions of cql" are not only necessary IMHO,
> more restrictions are necessary to be successful at scale. The idea of just
> opening up CQL to general purpose relational queries and lines like 
> "supporting
> queries with joins in an efficient way" ... I would really like us to
> make secondary indexes be a viable option before we start opening up
> floodgates on stuff like this.
>
> Chris
>
> On Thu, Dec 14, 2023 at 9:37 AM Benedict  wrote:
>
>> > So yes, this physical plan is the structure that you have in mind but
>> the idea of sharing it is not part of the CEP.
>>
>>
>> I think it should be. This should form a major part of the API on which
>> any CBO is built.
>>
>>
>> > It seems that there is a difference between the goal of your proposal
>> and the one of the CEP. The goal of the CEP is first to ensure optimal
>> performance. It is ok to change the execution plan for one that delivers
>> better performance. What we want to minimize is having a node performing
>> queries in an inefficient way for a long period of time.
>>
>>
>> You have made a goal of the CEP synchronising summary statistics across
>> the whole cluster in order to achieve some degree of uniformity of query
>> plan. So this is explicitly a goal of the CEP, and synchronising summary
>> statistics is a hard problem and won’t provide strong guarantees.
>>
>>
>> > The client side proposal targets consistency for a given query on a
>> given driver instance. In practice, it would be possible to have 2 similar
>> queries with 2 different execution plans on the same driver
>>
>>
>> This would only be possible if the driver permitted it. A driver could
>> (and should) enforce that it only permits one query plan per query.
>>
>>
>> The opposite is true for your proposal: some queries may begin degrading
>> because they touch specific replicas that optimise the query differently,
>> and this will be hard to debug.
>>
>>
>> On 14 Dec 2023, at 15:30, Benjamin Lerer  wrote:
>>
>> 
>> The binding of the parser output to the schema (what is today the
>> Raw.prepare call) will create the logical plan, expressed as a tree of
>> relational operators. Simplification and normalization will happen on that
>> tree to produce a new equivalent logical plan. That logical plan will be
>> used as input to the optimizer. The output will be a physical plan
>> producing the output specified by the logical plan. A tree of physical
>> operators specifying how the operations should be performed.
>>
>> That physical plan will be stored as part of the statements
>> (SelectStatement, ModificationStatement, ...) in the prepared statement
>> cache. Upon execution, variables will be bound and the
>> RangeCommands/Mutations will be created based on the physical plan.
>>
>> The string representation of a physical plan will effectively represent
>> the output of an EXPLAIN statement but outside of that the physical plan
>> will stay encapsulated within the statement classes.
>> Hints will be parameters provided to the optimizer to enforce some
>> specific choices. Like always using an Index Scan instead of a Table Scan,
>> ignoring the cost comparison.
>>
>> So yes, this

Re: Future direction for the row cache and OHC implementation

2023-12-14 Thread Jeff Jirsa




> On Dec 14, 2023, at 1:51 PM, Dinesh Joshi  wrote:
> 
> 
>> 
>> On Dec 14, 2023, at 10:32 AM, Ariel Weisberg  wrote:
>> 
>> 1. Fork OHC and start publishing under a new package name and continue to 
>> use it
> 
> Who would fork it? Where would you fork it? My first instinct is that this 
> would not be viable path forward.
> 
>> 2. Replace OHC with a different cache implementation like Caffeine which 
>> would move it on heap
> 
> Doesn’t seem optimal but given the advent of newer garbage collectors, we 
> might be able to run Cassandra with larger heap sizes and moving this to heap 
> may be a non-issue. Someone needs to try it out and measure  the performance 
> impact with Zgc or Shenandoah.
> 
>> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in a 
>> later release
> 
> In my experience, Row cache has historically helped in narrow workloads where 
> you have really hot rows but in other workloads it can hurt performance. So 
> keeping it around may be fine as long as people can disable it.

Especially well with tiny partitions . Once you start slicing / paging the 
benefit usually disappears 


> 
> Moving it on-heap using Caffeine maybe the easiest option here.

That’s what I’d do.


> 
> 
> Dinesh

Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra

2024-01-18 Thread Jeff Jirsa

The problem with generalizing things is if you’re behind on compaction, reads get expensive, so you pause compaction completely, you’re SOL and you’ll eventually have to throttle traffic to recoverThe SEDA model is bad at back pressure and deferred cost makes it non-obvious which resource to slow to ensure stabilityJust start by exposing it instead of pretending we can outsmart the very complex system On Jan 18, 2024, at 4:56 PM, Jon Haddad  wrote:I am definitely +1 on the ability to rate limit operations to tables and keyspaces, and if we can do it at a granular level per user I'm +1 to that as well.  I think this would need to be exposed to the operator regardless of any automatic rate limiter.Thinking about the bigger picture for a minute, I think there's a few things we could throttle dynamically on the server before limiting the client requests.  I've long wanted to see a dynamic rate limiter with compaction and any streaming operation - using resources when they're available but slowing down to allow an influx of requests.  Being able to throttle background operations to free up resources to ensure the DB stays online and healthy would be a big win.> The major challenge with latency based rate limiters is that the latency is subjective from one workload to another. You're absolutely right.  This goes to my other suggestion that client-side rate limiting would be a higher priority (on my list at least) as it is perfectly suited for multiple varying workloads.  Of course, if you're not interested in working on the drivers and only on C* itself, this is a moot point.  You're free to work on whatever you want - I just think there's a ton more value in the drivers being able to throttle requests to deal than server side.> And if these two are +ve then consider the server under pressure. And once it is under the pressure, then shed the traffic from less aggressive to more aggressive, etc. The idea is to prevent Cassandra server from melting (by considering the above two signals to begin with and add any more based on the learnings)Yes, I agree using dropped metrics (errors) is useful, as well as queue length.  I can't remember offhand all the details of the request queue and how load shedding works there, I need to go back and look.  If we don't already have load shedding based on queue depth that seems like an easy thing to do immediately, and is a high quality signal.  Maybe someone can remind me if we have that already?My issue with using CPU to rate limit clients is that I think it's a very low quality signal, and I suspect it'll trigger a ton of false positives.  For example, there's a big difference from performance being impacted by repair vs large reads vs backing up a snapshot to an object store, but they have similar effects on the CPU - high I/O, high CPU usage, both sustained over time.  Imo it would be a pretty bad decision to throttle clients when we should be throttling repair instead, and we should only do so if it's actually causing an issue for the client, something CPU usage can't tell us, only the response time and error rates can.  In the case of a backup, throttling might make sense, or might not, it really depends on the environment and if backups are happening concurrently.  If a backup's configured with nice +19 (as it should be), I'd consider throttling user requests to be a false positive, potentially one that does more harm than good to the cluster, since the OS should be deprioritizing the backup for us rather than us deprioritizing C*.  In my ideal world, if C* detected problematic response times (possibly violating a per-table, target latency time) or query timeouts, it would start by throttling back compactions, repairs, and streaming to ensure client requests can be serviced.  I think we'd need to define the latency targets in order for this to work optimally, b/c you might not want to wait for query timeouts before you throttle.  I think there's a lot of value in dynamically adaptive compaction, repair, and streaming since it would prioritize user requests, but again, if you're not willing to work on that, it's your call.  Anyways - I like the idea of putting more safeguards in the database itself, we're fundamentally in agreement there.  I see a ton of value in having flexible rate limiters, whether it be per-table, keyspace, or user+table combination.  I'd also like to ensure the feature doesn't cause more disruptions than it solves, which I think would be the case from using CPU usage as a signal. JonOn Wed, Jan 17, 2024 at 10:26 AM Jaydeep Chovatia  wrote:Jon,The major challenge with latency based rate limiters is that the latency is subjective from one workload to another. As a result, in the proposal I have described, the idea is to make decision on the following combinations:System parameters (such as CPU usage, etc.)Cassandra thread pools health (are they dropping requests, etc.)And if these two are +ve then consider the server under pr

Re: [Discuss] Introducing Flexible Authentication in Cassandra via Feature Flag

2024-02-12 Thread Jeff Jirsa

Auth is one of those things that needs to be a bit more concrete In the scenario you describe, you already have an option to deploy the auth in piece partially during the rollout (pause halfway through) in the cluster and look for asymmetric connections, and the option to drop in a new Authenticator jar in the class path that does the flexible auth you describe I fear that the extra flexibility this allows for 1% of operations exposes people to long term problemsHave you considered just implementing the feature flag you describe using the existing plugin infrastructure ?On Feb 12, 2024, at 9:47 PM, Gaurav Agarwal  wrote:Dear Dinesh and Abe,Thank you for reviewing the document on enabling Cassandra authentication. I apologize that I didn't initially include the following failure scenarios where this feature could be particularly beneficial (I've included them now):Below are the failure scenarios:Incorrect credentials: If a client accidentally uses the wrong username/password combination during the rollout, While restarting the server to enable authentication, it will refuse connections with incorrect credentials. This can temporarily interrupt the service until correct credentials are sent.Missed service auth updates: In a large-scale system, a service "X" might miss the credential update during rollout. After some server nodes restart, service "X" might finally realize it needs correct credentials, but it's too late. Nodes are already expecting authorized requests, and this mismatch causes "X" to stop working on auth enabled and restarted nodes.Infrequent traffic:  Suppose one of the services only interacts with the server once a week. Suppose it starts sending requests with incorrect credentials after authentication is enabled. Since the entire cluster is now running on authentication, the service's outdated credentials cause it to be denied access, resulting in a service-wide outage.The overall aim of the proposed feature flag would allow clients to connect momentarily without authentication during the rollout, mitigating these risks and ensuring a smoother transition.Thanks in advance for your continued review of the proposal. On Mon, Feb 12, 2024 at 2:24 PM Abe Ratnofsky  wrote:Hey Guarav,Thanks for your proposal.> disruptive, full-cluster restart, posing significant risks in live environmentsFor configuration that isn't hot-reloadable, like providing a new IAuthenticator implementation, a rolling restart is required. But rolling restarts are zero-downtime and safe in production, as long as you pace them accordingly.In general, changing authenticators is a risky thing because it requires coordination with clients. To mitigate this risk and support clients while they transition between authenticators, I like the approach taken by MutualTlsWithPasswordFallbackAuthenticator:https://github.com/apache/cassandra/blob/bec6bfde1f3b6a782f123f9f9ff18072a97e379f/src/java/org/apache/cassandra/auth/MutualTlsWithPasswordFallbackAuthenticator.java#L34If client certificates are available, then use those, otherwise use the existing PasswordAuthenticator that clients are already using. The existing IAuthenticator interface supports this transitional behavior well.Your proposal to include a new configuration for auth_enforcement_flag doesn't clearly cover how to transition from one authenticator to another. It says:> Soft: Operates in a monitoring mode without enforcing authenticationMost users use authentication today, so auth_enforcement_flag=Soft would allow unauthenticated clients to connect to the database.--AbeOn Feb 12, 2024, at 2:44 PM, Gaurav Agarwal  wrote:Dear Cassandra Community,I'm excited to share a proposal for a new feature that I believe would significantly enhance the platform's security and operational flexibility: a flexible authentication mechanism implemented through a feature flag .Currently, enforcing authentication in Cassandra requires a disruptive, full-cluster restart, posing significant risks in live environments. My proposal, the auth_enforcement_flag, addresses this challenge by offering three modes:Hard: Enforces strict authentication with detailed logging.Soft: Monitors connection attempts (valid and invalid) without enforcing authentication.None: Maintains the current Cassandra behavior.This flag enables:Minimized downtime: Seamless authentication rollout without service interruptions.Enhanced security: Detailed logs for improved threat detection and troubleshooting.Gradual adoption: Phased implementation with real-world feedback integration.I believe this feature provides substantial benefits for both users and administrators. Please see the detailed proposal here: Introducing flexible authentication mechanismI warmly invite the community to review this proposal and share your valuable feedback. I'm eager to discuss its potential impact and collaborate on making Cassandra even better.Thank you for your time and consideration.Sincerely,Gaurav Agarw

Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)

2024-02-14 Thread Jeff Jirsa

1) If there’s an “old compatible default” and “latest recommended settings”, 
when does the value in “old compatible default” get updated? Never? 
2) If there are test failures with the new values, it seems REALLY IMPORTANT to 
make sure those test failures are discovered + fixed IN THE FUTURE TOO. If 
pushing new yaml into a different file makes us less likely to catch the 
failures in the future, it seems like we’re hurting ourselves. Branimir 
mentions this, but how do we ensure that we don’t let this pattern disguise 
future bugs? 





> On Feb 13, 2024, at 8:41 AM, Branimir Lambov  wrote:
> 
> Hi All,
> 
> CASSANDRA-18753 introduces a second set of defaults (in a separate 
> "cassandra_latest.yaml") that enable new features of Cassandra. The objective 
> is two-fold: to be able to test the database in this configuration, and to 
> point potential users that are evaluating the technology to an optimized set 
> of defaults that give a clearer picture of the expected performance of the 
> database for a new user. The objective is to get this configuration into 5.0 
> to have the extra bit of confidence that we are not releasing (and 
> recommending) options that have not gone through thorough CI.
> 
> The implementation has already gone through review, but I'd like to get 
> people's opinion on two things:
> - There are currently a number of test failures when the new options are 
> selected, some of which appear to be genuine problems. Is the community okay 
> with committing the patch before all of these are addressed? This should 
> prevent the introduction of new failures and make sure we don't release 
> before clearing the existing ones.
> - I'd like to get an opinion on what's suitable wording and documentation for 
> the new defaults set. Currently, the patch proposes adding the following text 
> to the yaml (see https://github.com/apache/cassandra/pull/2896/files):
> # NOTE:
> #   This file is provided in two versions:
> # - cassandra.yaml: Contains configuration defaults for a "compatible"
> #   configuration that operates using settings that are 
> backwards-compatible
> #   and interoperable with machines running older versions of Cassandra.
> #   This version is provided to facilitate pain-free upgrades for existing
> #   users of Cassandra running in production who want to gradually and
> #   carefully introduce new features.
> # - cassandra_latest.yaml: Contains configuration defaults that enable
> #   the latest features of Cassandra, including improved functionality as
> #   well as higher performance. This version is provided for new users of
> #   Cassandra who want to get the most out of their cluster, and for users
> #   evaluating the technology.
> #   To use this version, simply copy this file over cassandra.yaml, or 
> specify
> #   it using the -Dcassandra.config system property, e.g. by running
> # cassandra 
> -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml
> # /NOTE
> Does this sound sensible? Should we add a pointer to this defaults set 
> elsewhere in the documentation?
> 
> Regards,
> Branimir

Re: Which cassandra client for Python should we use (in the context of Python 3.12) ?

2024-02-21 Thread Jeff Jirsa




On 2024/02/21 09:26:53 Jarek Potiuk wrote:
> Hello dear Cassandra community,
> 
> I am a fellow PMC member of Apache Airflow and recently we started to look
> at the Cassandra provider of ours in the context of Python 3.12 migration
> and the integration raised my interest.
> 
> TL;DR; I am quite confused, which client should we use to be future-proof
> and I would appreciate the advice of the community on it, also I would like
> to understand why there is no community-managed client, as seems that with
> the current approach, any Python project (including ASF ones are pretty
> much forced to use 3rd-party managed way to use Cassandra, which I find
> rather strange.
> 
> Context:
> 
> So far in Apache Airflow we were using
> https://github.com/datastax/python-driver/ to connect to Cassandra, but
> when we worked on Python 3.12 compatibility.  While looking at it, I
> discovered something strange
> 

Mid-donated to the foundation: 

CEP: 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation

[Private@]: https://lists.apache.org/thread/gor4b5l1hc4yokmcmpnhkfvg52w7rpp0

Status in board report: 
https://apache.org/foundation/records/minutes/2023/board_minutes_2023_08_16.txt

The Scylla version is a fork WITH ADDITIONS that work with implementation 
details of Scylladb not present in Apache Cassandra.

Preference use "Datastax" driver under donation if at all possible, and get it 
fixed as rapidly as is practical, but given that Scylla has already fixed the 
issue in theirs and it's an apache licensed fork of the same code, if you have 
to ship something to remain functional, that seems like a reasonable fallback.

Re: [DISCUSS] CEP-40: Data Transfer Using Cassandra Sidecar for Live Migrating Instances

2024-04-19 Thread Jeff Jirsa

I think Jordan and German had an interesting insight, or at least their comment made me think about this slightly differently, and I’m going to repeat it so it’s not lost in the discussion about zerocopy / sendfile.The CEP treats this as “move a live instance from one machine to another”. I know why the author wants to do this.If you think of it instead as “change backup/restore mechanism to be able to safely restore from a running instance”, you may end up with a cleaner abstraction that’s easier to think about (and may also be easier to generalize in clouds where you have other tools available ). I’m not familiar enough with the sidecar to know the state of orchestration for backup/restore, but “ensure the original source node isn’t running” , “migrate the config”, “choose and copy a snapshot” , maybe “forcibly exclude the original instance from the cluster” are all things the restore code is going to need to do anyway, and if restore doesn’t do that today, it seems like we can solve it once. Backup probably needs to be generalized to support many sources, too. Object storage is obvious (s3 download). Block storage is obvious (snapshot and reattach). Reading sstables from another sidecar seems reasonable, too.It accomplishes the original goal, in largely the same fashion, it just makes the logic reusable for other purposes? On Apr 19, 2024, at 5:52 PM, Dinesh Joshi wrote:On Thu, Apr 18, 2024 at 12:46 PM Ariel Weisberg wrote:If there is a faster/better way to replace a node why not have Cassandra support that natively without the sidecar so people who aren’t running the sidecar can benefit? I am not the author of the CEP so take whatever I say with a pinch of salt. Scott and Jordan have pointed out some benefits of doing this in the Sidecar vs Cassandra. Today Cassandra is able to do fast node replacements. However, this CEP is addressing an important corner case when Cassandra is unable to start up due to old / ailing hardware. Can we fix it in Cassandra so it doesn't die on old hardware? Sure. However, you would still need operator intervention to start it up in some special mode both on the old and new node so the new node can peer with the old node, copy over its data and join the ring. This would still require some orchestration outside the database. The Sidecar can do that orchestration for the operator. The point I'm making here is that the CEP addresses a real issue. The way it is currently built can improve over time with improvements in Cassandra.Dinesh

Re: [DISCUSS] Donating easy-cass-stress to the project

2024-04-25 Thread Jeff Jirsa

Unless there’s 2-3 other people who expect to keep working on it, I don’t see how we justify creating a subprojectAnd if there’s not 2-3 people expressing interest, even pulling it into the main project seems riskySo: besides Jon, who in the community expects/desires to maintain this going forward? On Apr 25, 2024, at 5:55 PM, Jon Haddad  wrote:Yeah, I agree with your concerns.  I very firmly prefer a separate subproject.  I've got zero interest in moving from a modern Gradle project to an ant based one.  It would be a lot of work for not much benefit.If we wanted to replace cassandra-stress, I'd say bring in the release artifact as part of the build process instead of tying it all together, but I'm OK if we keep it separate as well.JonOn Thu, Apr 25, 2024 at 2:43 PM Brandon Williams  wrote:I want to begin by saying I am generally +1 on this because I have
become a fan of easy-cass-stress after using it, but I am curious if
this is intended to be a subproject, or replace cassandra-stress?  If
the latter, we are going to have to reconcile the build systems
somehow.  I don't really want to drag ECS back to ant, but I also
don't want two different build systems in-tree.

Kind Regards,
Brandon

On Thu, Apr 25, 2024 at 9:38 AM Jon Haddad  wrote:
>
> I've been asked by quite a few people, both in person and in JIRA [1] about contributing easy-cass-stress [2] to the project.  I've been happy to maintain the project myself over the years but given its widespread use I think it makes sense to make it more widely available and under the project's umbrella.
>
> My goal with the project was always to provide something that's easy to use.  Up and running in a couple minutes, using the parameters to shape the workload rather than defining everything through configuration.  I was happy to make this tradeoff since Cassandra doesn't have very many types of queries and it's worked well for me over the years.
>
> Obviously I would continue working on this project, and I hope this would encourage others to contribute.  I've heard a lot of good ideas that other teams have implemented in their folks.  I'd love to see those ideas make it into the project, and it sounds like it would be a lot easier for teams to get approval to contribute if it was under the project umbrella.
>
> Would love to hear your thoughts.
>
> Thanks,
> Jon
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-18661
> [2] https://github.com/rustyrazorblade/easy-cass-stress

Re: [DISCUSS] Adding support for BETWEEN operator

2024-05-15 Thread Jeff Jirsa

You can remove the shadowed values at compaction time, but you can’t ever fully 
propagate the range update to point updates, so you’d be propagating all of the 
range-update structures throughout everything forever. It’s JUST like a range 
tombstone - you don’t know what it’s shadowing (and can’t, in many cases, 
because the width of the range is uncountable for some types). 

Setting aside whether or not this construct is worth adding (I suspect a lot of 
binding votes would say it’s not), the thread focuses on BETWEEN operator, and 
there’s no reason we should pollute the conversation of “add a missing SQL 
operator that basically maps to existing functionality” with creation of a 
brand new form of update that definitely doesn’t map to any existing concepts. 





> On May 14, 2024, at 10:05 AM, Jon Haddad  wrote:
> 
> Personally, I don't think that something being scary at first glance is a 
> good reason not to explore an idea.  The scenario you've described here is 
> tricky but I'm not expecting it to be any worse than say, SAI, which (the 
> last I checked) has O(N) complexity on returning result sets with regard to 
> rows returned.  We've also merged in Vector search which has O(N) overhead 
> with the number of SSTables.  We're still fundamentally looking at, in most 
> cases, a limited number of SSTables and some merging of values.
> 
> Write updates are essentially a timestamped mask, potentially overlapping, 
> and I suspect potentially resolvable during compaction by propagating the 
> values.  They could be eliminated or narrowed based on how they've propagated 
> by using the timestamp metadata on the SSTable.
> 
> It would be a lot more constructive to apply our brains towards solving an 
> interesting problem than pointing out all its potential flaws based on gut 
> feelings.  We haven't even moved this past an idea.  
> 
> I think it would solve a massive problem for a lot of people and is 100% 
> worth considering.  Thanks Patrick and David for raising this.
> 
> Jon
> 
> 
> 
> On Tue, May 14, 2024 at 9:48 AM Bowen Song via dev  > wrote:
>> Ranged update sounds like a disaster for compaction and read performance.
>> 
>> Imagine compacting or reading some SSTables in which a large number of 
>> overlapping but non-identical ranges were updated with different values. It 
>> gives me a headache by just thinking about it.
>> 
>> Ranged delete is much simpler, because the "value" is the same tombstone 
>> marker, and it also is guaranteed to expire and disappear eventually, so the 
>> performance impact of dealing with them at read and compaction time doesn't 
>> suffer in the long term.
>> 
>> 
>> On 14/05/2024 16:59, Benjamin Lerer wrote:
>>> It should be like range tombstones ... in much worse ;-). A tombstone is a 
>>> simple marker (deleted). An update can be far more complex.  
>>> 
>>> Le mar. 14 mai 2024 à 15:52, Jon Haddad >> > a écrit :
 Is there a technical limitation that would prevent a range write that 
 functions the same way as a range tombstone, other than probably needing a 
 version bump of the storage format?
 
 
 On Tue, May 14, 2024 at 12:03 AM Benjamin Lerer >>> > wrote:
> Range restrictions (>, >=, =<, < and BETWEEN) do not work on UPDATEs. 
> They do work on DELETE because under the hood C* they get translated into 
> range tombstones.
> 
> Le mar. 14 mai 2024 à 02:44, David Capwell  > a écrit :
>> I would also include in UPDATE… but yeah, <3 BETWEEN and welcome this 
>> work.
>> 
>>> On May 13, 2024, at 7:40 AM, Patrick McFadin >> > wrote:
>>> 
>>> This is a great feature addition to CQL! I get asked about it from time 
>>> to time but then people figure out a workaround. It will be great to 
>>> just have it available. 
>>> 
>>> And right on Simon! I think the only project I had as a high school 
>>> senior was figuring out how many parties I could go to and still 
>>> maintain a passing grade. Thanks for your work here. 
>>> 
>>> Patrick 
>>> 
>>> On Mon, May 13, 2024 at 1:35 AM Benjamin Lerer >> > wrote:
 Hi everybody,
 
 Just raising awareness that Simon is working on adding support for the 
 BETWEEN operator in WHERE clauses (SELECT and DELETE) in 
 CASSANDRA-19604. We plan to add support for it in conditions in a 
 separate patch.
 
 The patch is available.
 
 As a side note, Simon chose to do his highschool senior project 
 contributing to Apache Cassandra. This patch is his first contribution 
 for his senior project (his second feature contribution to Apache 
 Cassandra).
 
 
>>

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-02 Thread Jeff Jirsa

Would this be implemented solely in the write path? Or would you also try to enforce it in the read and sstable/compaction/repair paths as well?  On May 31, 2024, at 23:24, Bernardo Botella  wrote:Hello everyone,I am proposing this CEP:CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundationcwiki.apache.orgAnd I’m looking for feedback from the community.Thanks a lot!Bernardo

Re: [DISCUSS] CEP-42: Constraints Framework

2024-06-02 Thread Jeff Jirsa

Separately, when we discuss benefits of a proposal in a CEP, we should talk about what’s concrete and ignore the stuff that’s idealistic. Of these four points:This brings to the table several benefits and flexibility. Some examples:Cassandra operators have more control to reason about your data and appropriately tune for performance.Potential reduction on maintenance overhead, being able to better predict partition sizes.Extensibility to more complex validations in the future.Potential value in storage engine making decisions based on data size.The second is just the first, restated, and the fourth seems incredibly unlikely. The third seems maybe possible, but why not spec out the full range with the CEP instead of assuming iterative implementation?On Jun 2, 2024, at 20:59, Jeff Jirsa  wrote:Would this be implemented solely in the write path? Or would you also try to enforce it in the read and sstable/compaction/repair paths as well?  On May 31, 2024, at 23:24, Bernardo Botella  wrote:Hello everyone,I am proposing this CEP:CEP-42: Constraints Framework - CASSANDRA - Apache Software Foundationcwiki.apache.orgAnd I’m looking for feedback from the community.Thanks a lot!Bernardo

Re: Suggestions for CASSANDRA-18078

2024-06-20 Thread Jeff Jirsa

If we have a public-facing API that we’re contemplating releasing to the 
public, and we don’t think it’s needed, we should remove it before it’s 
launched and we’re stuck with it forever. 




> On Jun 20, 2024, at 9:55 AM, Jeremiah Jordan  wrote:
> 
> +1 from me for 1, just remove it now.
> I think this case is different from CASSANDRA-19556/CASSANDRA-17425.  The new 
> guardrail from 19556 which would deprecate the 17425 has not been committed 
> yet.  In the case of MAXWRITETIME the replacement is already in the code, we 
> just didn’t remove MAXWRITETIME yet.
> 
> Jeremiah Jordan
> e. jerem...@datastax.com 
> w. www.datastax.com 
> 
> 
> 
> On Jun 20, 2024 at 11:46:08 AM, Štefan Miklošovič  > wrote:
>> List,
>> 
>> we need your opinions about CASSANDRA-18078.
>> 
>> That ticket is about the removal of MAXWRITETIME function which was added in 
>> CASSANDRA-17425 and firstly introduced in 5.0-alpha1.
>> 
>> This function was identified to be redundant in favor of CASSANDRA-8877 and 
>> CASSANDRA-18060.
>> 
>> The idea of the removal was welcomed and the patch was prepared doing so but 
>> it was never delivered and the question what to do with it, in connection 
>> with 5.0.0, still remains.
>> 
>> The options are:
>> 
>> 1) since 18078 was never released in GA, there is still time to remove it.
>> 2) it is too late for the removal hence we would keep it in 5.0.0 and we 
>> would deprecate it in 5.0.1 and remove it in trunk.
>> 
>> It is worth to say that there is a precedent in 2), in CASSANDRA-17495, 
>> where it was the very same scenario. A guardrail was introduced in alpha1. 
>> We decided to release and deprecate in 5.0.1 and remove in trunk. The same 
>> might be applied here, however we would like to have it confirmed if this is 
>> indeed the case or we prefer to just go with 1) and be done with it.
>> 
>> Regards

Re: [VOTE][IP CLEARANCE] GoCQL driver

2024-06-25 Thread Jeff Jirsa

+1

Thank you for being explicit about which authors of gocql have signed the ICLA

> Where The Gocql Authors for copyright purposes are below. Those marked with
> asterisk have agreed to donate (copyright assign) their contributions to the
> Apache Software Foundation, signing CLAs when appropriate.

> On Jun 25, 2024, at 10:32 AM, Mick Semb Wever  wrote:
> 
>   .
>  
>> The vote will be open for 72 hours (or longer). Votes by PMC members are 
>> considered binding. A vote passes if there are at least three binding +1s 
>> and no -1's.
> 
> 
> +1
> 
>

Re: [VOTE] Release Apache Cassandra 5.0-rc1

2024-06-25 Thread Jeff Jirsa

+1



> On Jun 25, 2024, at 5:04 AM, Mick Semb Wever  wrote:
> 
> 
> 
> Proposing the test build of Cassandra 5.0-rc1 for release.
> 
> sha1: b43f0b2e9f4cb5105764ef9cf4ece404a740539a
> Git: https://github.com/apache/cassandra/tree/5.0-rc1-tentative
> Maven Artifacts: 
> https://repository.apache.org/content/repositories/orgapachecassandra-1336/org/apache/cassandra/cassandra-all/5.0-rc1/
> 
> The Source and Build Artifacts, and the Debian and RPM packages and 
> repositories, are available here: 
> https://dist.apache.org/repos/dist/dev/cassandra/5.0-rc1/
> 
> The vote will be open for 72 hours (longer if needed). Everyone who has 
> tested the build is invited to vote. Votes by PMC members are considered 
> binding. A vote passes if there are at least three binding +1s and no -1's.
> 
> [1]: CHANGES.txt: 
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/CHANGES.txt
> [2]: NEWS.txt: 
> https://github.com/apache/cassandra/blob/5.0-rc1-tentative/NEWS.txt

Re: Evolving the client protocol

2018-04-22 Thread Jeff Jirsa




On Apr 20, 2018, at 5:03 AM, Sylvain Lebresne  wrote:

>> 
>> 
>> Those were just given as examples. Each would be discussed on its own,
>> assuming we are able to find a way to cooperate.
>> 
>> 
>> These are relatively simple and it wouldn't be hard for use to patch
>> Cassandra. But I want to find a way to make more complicated protocol
>> changes where it wouldn't be realistic for us to modify Cassandra.
>> 
> 
> That's where I'm confused with what you are truly asking.
> 
> The native protocol is the protocol of the Apache Cassandra project and was
> never meant to be a standard protocol. If the ask is to move towards more
> of handling the protocol as a standard that would evolve independently of
> whether Cassandra implements it (would the project commit to implement it
> eventually?), then let's be clear on what the concrete suggestion is and
> have this discussion (but to be upfront, the short version of my personal
> opinion is that this would likely be a big distraction with relatively low
> merits for the project, so I'm very unconvinced).
> 
> But if that's not the ask, what is it exactly? That we agree to commit
> changes
> to the protocol spec before we have actually implemented them? If so, I just
> don't get it. The downsides are clear (we risk the feature is either never
> implemeted due to lack of contributions/loss of interest, or that the
> protocol
> changes committed are not fully suitable to the final implementation) but
> what
> benefit to the project can that ever have?

Agree with everything here 

> 
> Don't get me wrong, protocol-impacting changes/additions are very much
> welcome
> if reasonable for Cassandra, and both CASSANDRA-14311 and CASSANDRA-2848 are
> certainly worthy. Both the definition of done of those ticket certainly
> include the server implementation imo,

Also agree here - any changes to protocol on the Apache Cassandra side have to 
come with the implementation, otherwise you should consider using the optional 
arbitrary k/v map that zipkin tracing leverages for arbitrary payloads.


> not just changing the protocol spec
> file. As for the shard notion, it makes no sense for Cassandra at this point
> in time, so unless an additional contribution makes it so that it start to
> make
> sense, I'm not sure why we'd add anything related to it to the protocol.
> 
> --
> Sylvain
> 
> 
> 
>> 
>>> RE #3,
>>> 
>>> It's hard to be +1 on this because we don't benefit by boxing ourselves
>> in by defining a spec we haven't implemented, tested, and decided we are
>> satisfied with. Having it in ScyllaDB de-risks it to a certain extent, but
>> what if Cassandra decides to go a different direction in some way?
>> 
>> Such a proposal would include negotiation about the sharding algorithm
>> used to prevent Cassandra being boxed in. Of course it's impossible to
>> guarantee that a new idea won't come up that requires more changes.
>> 
>>> I don't think there is much discussion to be had without an example of
>> the the changes to the CQL specification to look at, but even then if it
>> looks risky I am not likely to be in favor of it.
>>> 
>>> Regards,
>>> Ariel
>>> 
 On Thu, Apr 19, 2018, at 9:33 AM, glom...@scylladb.com wrote:
 
 On 2018/04/19 07:19:27, kurt greaves  wrote:
>> 1. The protocol change is developed using the Cassandra process in
>>a JIRA ticket, culminating in a patch to
>>doc/native_protocol*.spec when consensus is achieved.
> I don't think forking would be desirable (for anyone) so this seems
> the most reasonable to me. For 1 and 2 it certainly makes sense but
> can't say I know enough about sharding to comment on 3 - seems to me
> like it could be locking in a design before anyone truly knows what
> sharding in C* looks like. But hopefully I'm wrong and there are
> devs out there that have already thought that through.
 Thanks. That is our view and is great to hear.
 
 About our proposal number 3: In my view, good protocol designs are
 future proof and flexible. We certainly don't want to propose a design
 that works just for Scylla, but would support reasonable
 implementations regardless of how they may look like.
 
> Do we have driver authors who wish to support both projects?
> 
> Surely, but I imagine it would be a minority. 
> 
 -
 To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For
 additional commands, e-mail: dev-h...@cassandra.apache.org
 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.o

Re: Evolving the client protocol

2018-04-23 Thread Jeff Jirsa

Respectfully, there’s pretty much already apparent consensus among those with a 
vote (unless I missed some dissenting opinion while I was on vacation).

Its been expressed multiple times by committers and members of the PMC that 
it’s Cassandra native protocol, it belongs in the protocol when it’s 
implemented. I haven’t seen ANY committers or members of the PMC make an 
argument that we should alter the spec without a matching implementation. 

Unless a committer wants to make an argument that we should change the spec 
without changing the implementation, this conversation can end. 

The spec is what the server implements. Anything we don’t implement can use the 
arbitrary payload from the zipkin tracing ticket or fork.

-- 
Jeff Jirsa

> On Apr 23, 2018, at 6:18 PM, Nate McCall  wrote:
> 
> Folks,
> Before this goes much further, let's take a step back for a second.
> 
> I am hearing the following: Folks are fine with CASSANDRA-14311 and
> CASSANDRA-2848 *BUT* they don't make much sense from the project's
> perspective without a reference implementation. I think the shard
> concept is too abstract for the project right now, so we should
> probably set that one aside.
> 
> Dor and Avi, I appreciate you both engaging directly on this. Where
> can we find common ground on this?
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Evolving the client protocol

2018-04-24 Thread Jeff Jirsa

They aren't even remotely similar, they're VERY different. Here's a few
starting points:

1) Most of Datastax's work for the first 5, 6, 8 years of existence focused
on driving users to cassandra from other DBs (see all of the "Cassandra
Summits" that eventually created trademark friction) ; Scylla's marketing
is squarely Scylla v  Cassandra. Ultimately they're both companies out to
make money, but one has a history of driving users to Cassandra, and the
other is trying to siphon users away from Cassandra.
2) Datastax may not be actively contributing as much as they used to, but
some ridiculous number of engineering hours got paid out of their budget -
maybe 80% of total lines of code? Maybe higher (though it's decreasing day
by day). By contrast, Scylla has exactly zero meaningful concrete code
contributions to the project, uses a license that makes even sharing
concepts prohibitive, only has a handful or so JIRAs opened (which is
better than zero), but has effectively no goodwill in the eyes of many of
the longer-term community members (in large part because of #1, and also
because of the way they positioned their talk-turned-product announcement
at the competitor-funded 2016 summit).
3) Datastax apparently respects the project enough that they'd NEVER come
in and ask for a protocol spec change without providing a reference
implementation.
4) To that end, native protocol changes aren't something anyone is anxious
to shove in without good reason. Even with a reference implementation, and
a REALLY GOOD REASON (namely data correctness / protection from
corruption), https://issues.apache.org/jira/browse/CASSANDRA-13304 has been
sitting patch available for OVER A YEAR.

So again: we have a Cassandra native protocol, and we have a process for
changing it, and that process is contributor agnostic.  Anyone who wants a
change can submit a patch, and it'll get reviewed, and maybe if it's a good
idea, it'll get committed, but the chances of a review leading to a commit
without an implementation is nearly zero.

Would be happy to see this thread die now. There's nothing new coming out
of it.

- Jeff

On Tue, Apr 24, 2018 at 8:30 AM, Eric Stevens  wrote:

> Let met just say that as an observer to this conversation -- and someone
> who believes that compatibility, extensibility, and frankly competition
> bring out the best in products -- I'm fairly surprised and disappointed
> with the apparent hostility many community members have shown toward a
> sincere attempt by another open source product to find common ground here.
>
> Yes, Scylla has a competing OSS project (albeit under a different
> license).  They also have a business built around it.  It's hard for me to
> see that as dramatically different than the DataStax relationship to this
> community.  Though I would love to be shown why.
>

Re: Evolving the client protocol

2018-04-28 Thread Jeff Jirsa

On Sat, Apr 28, 2018 at 4:49 AM, mck  wrote:

> We should, as open source contributors, put business concerns to the side
> and welcome opportunities to work across company and product lines.
>

I resent the fact that you're calling this a business concern. This isn't a
business concern, and as a committer and ASF member you should be able to
discern the difference.

Sylvain said:

> The native protocol is the protocol of the Apache Cassandra project and
was
> never meant to be a standard protocol.

and

> Don't get me wrong, protocol-impacting changes/additions are very much
> welcome if reasonable for Cassandra, and both CASSANDRA-14311 and
CASSANDRA-2848 are
> certainly worthy. Both the definition of done of those ticket certainly
> include the server implementation imo,

I said:

> So again: we have a Cassandra native protocol, and we have a process for
> changing it, and that process is contributor agnostic. Anyone who wants a
> change can submit a patch, and it'll get reviewed, and maybe if it's a
good
> idea, it'll get committed, but the chances of a review leading to a commit
> without an implementation is nearly zero.

The only reason business names came into it is that someone drew a false
equivalence between two businesses. They're not equivalent, and the lack of
equivalence likely explains why this thread keeps bouncing around -
Datastax would have written a patch and contributed it to the project, and
Scylla didn't. But again, the lack of protocol changes so far ISN'T because
the project somehow favors one company more than the other (it doesn't),
the protocol changes havent happened because nobody's submitted a patch.

You're a committer Mick, if you think it belongs in the database, write the
patches and get them reviewed.  Until then, the project isn't going to be
bullied into changing the protocol without an implementation.

- Jeff

Spring 2018 Cassandra Dev Wrap-up

2018-05-12 Thread Jeff Jirsa

Here's what's going on in the Cassandra world this spring:

Mailing list:
- Kurt sent out a call for reviewers:
https://lists.apache.org/thread.html/f1f7926d685b7f734edb180aeddc3014d79dc6e5f89e68b751b9eb5e@%3Cdev.cassandra.apache.org%3E
- Dinesh proposed a management sidecar:
https://lists.apache.org/thread.html/a098341efd8f344494bcd2761dba5125e971b59b1dd54f282ffda253@%3Cdev.cassandra.apache.org%3E
- Joey sent some math about the impact of vnodes on availability:
https://lists.apache.org/thread.html/54a9cb1d3eeed57cbe55f14aff2fb0030bce22b59d04b32d592da6b3@%3Cdev.cassandra.apache.org%3E
- We spent some time talking about feature freeze dates for 4.0, and seem
to have landed around Sept 1:
https://lists.apache.org/thread.html/eb9f5080fbab4f4e38266c7444b467ca1c54af787568321af56e8e4b@%3Cdev.cassandra.apache.org%3E

Activity:
- Some really crude git log | grep | cut | sort nonsense suggests 58
different patch authors / contributors so far in 2018. This may be
undercounted by a bit.
- Blake Eggleston, Sam Tunnicliffe, and Stefan Podkowinski were added to
the PMC (congrats!)
- We're up to about 45 changes pending in 3.11.3 and 30'ish in 3.0.17,
nearing time for some new release votes

Notable Commits to 4.0 since February

- CASSANDRA-12151 landed, brining audit logs
- CASSANDRA-13910 landed, removing probabilistic read repair chance
- Pluggable storage interfaces are being added incrementally, with the
write path, repair, and streaming interfaces already committed

If you're bored on this weekend and want something to do, here's Kurt's
list of patches that need reviews:

Bugs:
https://issues.apache.org/jira/browse/CASSANDRA-14365
https://issues.apache.org/jira/browse/CASSANDRA-14204
https://issues.apache.org/jira/browse/CASSANDRA-14162
https://issues.apache.org/jira/browse/CASSANDRA-14126
https://issues.apache.org/jira/browse/CASSANDRA-14365
https://issues.apache.org/jira/browse/CASSANDRA-14099
https://issues.apache.org/jira/browse/CASSANDRA-14073
https://issues.apache.org/jira/browse/CASSANDRA-14063
https://issues.apache.org/jira/browse/CASSANDRA-14056
https://issues.apache.org/jira/browse/CASSANDRA-14054
https://issues.apache.org/jira/browse/CASSANDRA-14013
https://issues.apache.org/jira/browse/CASSANDRA-13841
https://issues.apache.org/jira/browse/CASSANDRA-13698

Improvements:
https://issues.apache.org/jira/browse/CASSANDRA-14309
https://issues.apache.org/jira/browse/CASSANDRA-10789
https://issues.apache.org/jira/browse/CASSANDRA-14443
https://issues.apache.org/jira/browse/CASSANDRA-13010
https://issues.apache.org/jira/browse/CASSANDRA-11559
https://issues.apache.org/jira/browse/CASSANDRA-10789
https://issues.apache.org/jira/browse/CASSANDRA-10023
https://issues.apache.org/jira/browse/CASSANDRA-8460

And Josh's similar JIRA query:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20AND%20resolution%20%3D%20unresolved%20AND%20status%20in%20(%22Patch%20Available%22%2C%20%22Awaiting%20Feedback%22)%20AND%20reviewer%20is%20EMPTY%20ORDER%20BY%20updated%20DESC

Re: Academic paper about Cassandra database compaction

2018-05-14 Thread Jeff Jirsa

Interesting!

I suspect I know what the increased disk usage in TWCS, and it's a solvable
problem, the problem is roughly something like this:
- Window 1 has sstables 1, 2, 3, 4, 5, 6
- We start compacting 1, 2, 3, 4 (using STCS-in-TWCS first window)
- The TWCS window rolls over
- We flush (sstable 7), and trigger the TWCS window major compaction, which
starts compacting 5, 6, 7 + any other sstable from that window
- If the first compaction (1,2,3,4) has finished by the time sstable 7 is
flushed, we'll include it's result in that compaction, if it doesn't we'll
have to do the major compaction twice to guarantee we have exactly one
sstable per window, which will temporarily increase disk space

We can likely fix this by not scheduling the major compaction until we know
all of the sstables in the window are available to be compacted.

Also your data model is probably typical, but not well suited for time
series cases - if you find my 2016 Cassandra Summit TWCS talk (it's on
youtube), I mention aligning partition keys to TWCS windows, which involves
adding a second component to the partition key. This is hugely important in
terms of making sure TWCS data expires quickly and avoiding having to read
from more than one TWCS window at a time.

- Jeff

On Mon, May 14, 2018 at 7:12 AM, Lucas Benevides <
lu...@maurobenevides.com.br> wrote:

> Dear community,
>
> I want to tell you about my paper published in a conference in March. The
> title is " NoSQL Database Performance Tuning for IoT Data - Cassandra
> Case Study"  and it is available (not for free) in
> http://www.scitepress.org/DigitalLibrary/Link.aspx?doi=
> 10.5220/0006782702770284 .
>
> TWCS is used and compared with DTCS.
>
> I hope you can download it, unfortunately I cannot send copies as the
> publisher has its copyright.
>
> Lucas B. Dias
>
>
>

Re: secondary index table - tombstones surviving compactions

2018-05-18 Thread Jeff Jirsa

This would matter for the base table, but would be less likely for the 
secondary index, where the partition key is the value of the base row

Roman: there’s a config option related to only purging repaired tombstones - do 
you have that enabled ? If so, are you running repairs?

-- 
Jeff Jirsa


> On May 18, 2018, at 6:41 AM, Eric Stevens  wrote:
> 
> The answer to Question 3 is "yes."  One of the more subtle points about
> tombstones is that Cassandra won't remove them during compaction if there
> is a bloom filter on any SSTable on that replica indicating that it
> contains the same partition (not primary) key.  Even if it is older than
> gc_grace, and would otherwise be a candidate for cleanup.
> 
> If you're recycling partition keys, your tombstones may never be able to be
> cleaned up, because in this scenario there is a high probability that an
> SSTable not involved in that compaction also contains the same partition
> key, and so compaction cannot have confidence that it's safe to remove the
> tombstone (it would have to fully materialize every record in the
> compaction, which is too expensive).
> 
> In general it is an antipattern in Cassandra to write to a given partition
> indefinitely for this and other reasons.
> 
> On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
> roman.bie...@openmindnetworks.com> wrote:
> 
>> Hi,
>> 
>> I have a Cassandra 3.11 table (with compact storage) and using secondary
>> indices with rather unique data stored in the indexed columns. There are
>> many inserts and deletes, so in order to avoid tombstones piling up I'm
>> re-using primary keys from a pool (which works fine).
>> I'm aware that this design pattern is not ideal, but for now I can not
>> change it easily.
>> 
>> The problem is, the size of 2nd index tables keeps growing (filled with
>> tombstones) no matter what.
>> 
>> I tried some aggressive configuration (just for testing) in order to
>> expedite the tombstone removal but with little-to-zero effect:
>> COMPACTION = { 'class':
>> 'LeveledCompactionStrategy', 'unchecked_tombstone_compaction': 'true',
>> 'tombstone_compaction_interval': 600 }
>> gc_grace_seconds = 600
>> 
>> I'm aware that perhaps Materialized views could provide a solution to this,
>> but I'm bind to the Thrift interface, so can not use them.
>> 
>> Questions:
>> 1. Is there something I'm missing? How come compaction does not remove the
>> obsolete indices/tombstones from 2nd index tables? Can I trigger the
>> cleanup manually somehow?
>> I have tried nodetool flush, compact, rebuild_index on both data table and
>> internal Index table, but with no result.
>> 
>> 2. When deleting a record I'm deleting the whole row at once - which would
>> create one tombstone for the whole record if I'm correct. Would it help to
>> delete the indexed columns separately creating extra tombstone for each
>> cell?
>> As I understand the underlying mechanism, the indexed column value must be
>> read in order a proper tombstone for the index is created for it.
>> 
>> 3. Could the fact that I'm reusing the primary key of a deleted record
>> shortly for a new insert interact with the secondary index tombstone
>> removal?
>> 
>> Will be grateful for any advice.
>> 
>> Regards,
>> Roman
>> 
>> --
>> <http://www.openmindnetworks.com>
>> <http://www.openmindnetworks.com/>
>> <https://www.linkedin.com/company/openmind-networks>
>> <https://twitter.com/Openmind_Ntwks>  <http://www.openmindnetworks.com/>
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Tombstone passed GC period causes un-repairable inconsistent data

2018-06-21 Thread Jeff Jirsa

Think he's talking about
https://issues.apache.org/jira/browse/CASSANDRA-6434

Doesn't solve every problem if you don't run repair at all, but if you're
not running repairs, you're nearly guaranteed problems with resurrection
after gcgs anyway.



On Thu, Jun 21, 2018 at 11:33 AM, Jay Zhuang  wrote:

> Yes, I also agree that the user should run (incremental) repair within GCGS
> to prevent it from happening.
>
> @Sankalp, would you please point us the patch you mentioned from Marcus?
> The problem is basically the same as
> https://issues.apache.org/jira/browse/CASSANDRA-14145
>
> CASSANDRA-11427  is
> actually the opposite of this problem. As purgeable tombstone is repaired,
> this un-repairable problem cannot be reproduced. I tried 2.2.5 (before the
> fix), it's able to repair the purgeable tombstone from node1 to node2, so
> the data is deleted as expected. But it doesn't mean that's the right
> behave, as it will also cause purgeable tombstones keeps bouncing around
> the nodes.
> I think https://issues.apache.org/jira/browse/CASSANDRA-14145 will fix the
> problem by detecting the repaired/un-repaired data.
>
> How about having hints dispatch to deliver/replay purgeable (not live)
> tombstones? It will reduce the chance to have this issue, especially when
> GCGS < hinted handoff window.
>
> On Wed, Jun 20, 2018 at 9:36 AM sankalp kohli 
> wrote:
>
> > I agree with Stefan that we should use incremental repair and use patches
> > from Marcus to drop tombstones only from repaired data.
> > Regarding deep repair, you can bump the read repair and run the repair.
> The
> > issue will be that you will stream lot of data and also your blocking
> read
> > repair will go up when you bump the gc grace to higher value.
> >
> > On Wed, Jun 20, 2018 at 1:10 AM Stefan Podkowinski 
> > wrote:
> >
> > > Sounds like an older issue that I tried to address two years ago:
> > > https://issues.apache.org/jira/browse/CASSANDRA-11427
> > >
> > > As you can see, the result hasn't been as expected and we got some
> > > unintended side effects based on the patch. I'm not sure I'd be willing
> > > to give this another try, considering the behaviour we like to fix in
> > > the first place is rather harmless and the read repairs shouldn't
> happen
> > > at all to any users who regularly run repairs within gc_grace.
> > >
> > > What I'd suggest is to think more into the direction of a
> > > post-full-repair-world and to fully embrace incremental repairs, as
> > > fixed by Blake in 4.0. In that case, we should stop doing read repairs
> > > at all for repaired data, as described in
> > > https://issues.apache.org/jira/browse/CASSANDRA-13912. RRs are
> certainly
> > > useful, but can be very risky if not very very carefully implemented.
> So
> > > I'm wondering if we shouldn't disable RRs for everything but unrepaired
> > > data. I'd btw also be interested to hear any opinions on this in
> context
> > > of transient replicas.
> > >
> > >
> > > On 20.06.2018 03:07, Jay Zhuang wrote:
> > > > Hi,
> > > >
> > > > We know that the deleted data may re-appear if repair is not run
> within
> > > > gc_grace_seconds. When the tombstone is not propagated to all nodes,
> > the
> > > > data will re-appear. But it's also causing following 2 issues before
> > the
> > > > tombstone is compacted away:
> > > > a. inconsistent query result
> > > >
> > > > With consistency level ONE or QUORUM, it may or may not return the
> > value.
> > > > b. lots of read repairs, but doesn't repair anything
> > > >
> > > > With consistency level ALL, it always triggers a read repair.
> > > > With consistency level QUORUM, it also very likely (2/3) causes a
> read
> > > > repair. But it doesn't repair the data, so it's causing repair every
> > > time.
> > > >
> > > >
> > > > Here are the reproducing steps:
> > > >
> > > > 1. Create a 3 nodes cluster
> > > > 2. Create a table (with small gc_grace_seconds):
> > > >
> > > > CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy',
> > > > 'replication_factor': 3};
> > > > CREATE TABLE foo.bar (
> > > > id int PRIMARY KEY,
> > > > name text
> > > > ) WITH gc_grace_seconds=30;
> > > >
> > > > 3. Insert data with consistency all:
> > > >
> > > > INSERT INTO foo.bar (id, name) VALUES(1, 'cstar');
> > > >
> > > > 4. stop 1 node
> > > >
> > > > $ ccm node2 stop
> > > >
> > > > 5. Delete the data with consistency quorum:
> > > >
> > > > DELETE FROM foo.bar WHERE id=1;
> > > >
> > > > 6. Wait 30 seconds and then start node2:
> > > >
> > > > $ ccm node2 start
> > > >
> > > > Now the tombstone is on node1 and node3 but not on node2.
> > > >
> > > > With quorum read, it may or may not return value, and read repair
> will
> > > send
> > > > the data from node2 to node1 and node3, but it doesn't repair
> anything.
> > > >
> > > > I'd like to discuss a few potential solutions and workarounds:
> > > >
> > > > 1. Can hints replay sends GCed tombstone?
> > > >
> > > > 2

Re: [VOTE] Release Apache Cassandra 3.11.3

2018-07-02 Thread Jeff Jirsa

+1

On Mon, Jul 2, 2018 at 1:11 PM, Michael Shuler 
wrote:

> I propose the following artifacts for release as 3.11.3.
>
> sha1: aed1b5fdf1e953d19bdd021ba603618772208cdd
> Git: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=sho
> rtlog;h=refs/tags/3.11.3-tentative
> Artifacts: https://repository.apache.org/content/repositories/orgapache
> cassandra-1161/org/apache/cassandra/apache-cassandra/3.11.3/
> Staging repository: https://repository.apache.org/
> content/repositories/orgapachecassandra-1161/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) http://git-wip-us.apache.org/r
> epos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/
> tags/3.11.3-tentative
> [2]: (NEWS.txt) http://git-wip-us.apache.org/r
> epos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tag
> s/3.11.3-tentative
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Release Apache Cassandra 2.2.13

2018-07-02 Thread Jeff Jirsa

+1

On Mon, Jul 2, 2018 at 1:10 PM, Michael Shuler 
wrote:

> I propose the following artifacts for release as 2.2.13.
>
> sha1: 9ff78249a0a5e87bd04bf9804ef1a3b29b5e1645
> Git: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=sho
> rtlog;h=refs/tags/2.2.13-tentative
> Artifacts: https://repository.apache.org/content/repositories/orgapache
> cassandra-1159/org/apache/cassandra/apache-cassandra/2.2.13/
> Staging repository: https://repository.apache.org/
> content/repositories/orgapachecassandra-1159/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) http://git-wip-us.apache.org/r
> epos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/
> tags/2.2.13-tentative
> [2]: (NEWS.txt) http://git-wip-us.apache.org/r
> epos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tag
> s/2.2.13-tentative
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Release Apache Cassandra 3.0.17

2018-07-02 Thread Jeff Jirsa

+1

On Mon, Jul 2, 2018 at 1:10 PM, Michael Shuler 
wrote:

> I propose the following artifacts for release as 3.0.17.
>
> sha1: c4e6cd2a1aca84a88983192368bbcd4c8887c8b2
> Git: http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=sho
> rtlog;h=refs/tags/3.0.17-tentative
> Artifacts: https://repository.apache.org/content/repositories/orgapache
> cassandra-1160/org/apache/cassandra/apache-cassandra/3.0.17/
> Staging repository: https://repository.apache.org/
> content/repositories/orgapachecassandra-1160/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler/
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: (CHANGES.txt) http://git-wip-us.apache.org/r
> epos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/
> tags/3.0.17-tentative
> [2]: (NEWS.txt) http://git-wip-us.apache.org/r
> epos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tag
> s/3.0.17-tentative
>
> --
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Testing 4.0 Post-Freeze

2018-07-03 Thread Jeff Jirsa

I think there's value in the psychological commitment that if someone has
time to contribute, their contributions should be focused on validating a
release, not pushing future features.


On Tue, Jul 3, 2018 at 1:03 PM, Jonathan Haddad  wrote:

> I agree with Josh. I don’t see how changing the convention around trunk
> will improve the process, seems like it’ll only introduce a handful of
> rollback commits when people forget.
>
> Other than that, it all makes sense to me.
>
> I’ve been working on a workload centric stress tool on and off for a little
> bit in an effort to create something that will help with wider adoption in
> stress testing. It differs from the stress we ship by including fully
> functional stress workloads as well as a validation process. The idea being
> to be flexible enough to test both performance and correctness in LWT and
> MVs as well as other arbitrary workloads.
>
> https://github.com/thelastpickle/tlp-stress
>
> Jon
>
>
> On Tue, Jul 3, 2018 at 12:28 PM Josh McKenzie 
> wrote:
>
> > Why not just branch a 4.0-rel and bugfix there and merge up while still
> > accepting new features or improvements on trunk?
> >
> > I don't think the potential extra engagement in testing will balance out
> > the atrophy and discouraging contributions / community engagement we'd
> get
> > by deferring all improvements and new features in an open-ended way.
> >
> > On Tue, Jul 3, 2018 at 1:33 PM sankalp kohli 
> > wrote:
> >
> > > Hi cassandra-dev@,
> > >
> > > With the goal of making Cassandra's 4.0 the most stable major release
> to
> > > date, we would like all committers of the project to consider joining
> us
> > in
> > > dedicating their time and attention to testing, running, and fixing
> > issues
> > > in 4.0 between the September freeze and the 4.0 beta release. This
> would
> > > result in a freeze of new feature development on trunk or branches
> during
> > > this period, instead focusing on writing, improving, and running tests
> or
> > > fixing and reviewing bugs or performance regressions found in 4.0 or
> > > earlier.
> > >
> > > How would this work?
> > >
> > > We propose that between the September freeze date and beta, a new
> branch
> > > would not be created and trunk would only have bug fixes and
> performance
> > > improvements committed to it. At the same time we do not want to
> > discourage
> > > community contributions. Not all contributors can be expected to be
> aware
> > > of such a decision or may be new to the project. In cases where new
> > > features are contributed during this time, the contributor can be
> > informed
> > > of the current status of the release process, be encouraged to
> contribute
> > > to testing or bug fixing, and have their feature reviewed after the
> beta
> > is
> > > reached.
> > >
> > >
> > > What happens when beta is reached?
> > >
> > > Ideally, contributors who have made significant contributions to the
> > > release will stick around to continue testing between beta and final
> > > release. Any additional folks who continue this focus would also be
> > greatly
> > > appreciated.
> > >
> > > What about before the freeze?
> > >
> > > Testing new features is of course important. This isn't meant to
> > discourage
> > > development – only to enable us to focus on testing and hardening 4.0
> to
> > > deliver Cassandra's most stable major release. We would like to see
> > > adoption of 4.0 happen much more quickly than its predecessor.
> > >
> > > Thanks for considering this proposal,
> > > Sankalp Kohli
> >
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Testing 4.0 Post-Freeze

2018-07-03 Thread Jeff Jirsa

Yes?

-- 
Jeff Jirsa


> On Jul 3, 2018, at 2:29 PM, Jonathan Ellis  wrote:
> 
> Is that worth the risk of demotivating new contributors who might have
> other priorities?
> 
>> On Tue, Jul 3, 2018 at 4:22 PM, Jeff Jirsa  wrote:
>> 
>> I think there's value in the psychological commitment that if someone has
>> time to contribute, their contributions should be focused on validating a
>> release, not pushing future features.
>> 
>> 
>>> On Tue, Jul 3, 2018 at 1:03 PM, Jonathan Haddad  wrote:
>>> 
>>> I agree with Josh. I don’t see how changing the convention around trunk
>>> will improve the process, seems like it’ll only introduce a handful of
>>> rollback commits when people forget.
>>> 
>>> Other than that, it all makes sense to me.
>>> 
>>> I’ve been working on a workload centric stress tool on and off for a
>> little
>>> bit in an effort to create something that will help with wider adoption
>> in
>>> stress testing. It differs from the stress we ship by including fully
>>> functional stress workloads as well as a validation process. The idea
>> being
>>> to be flexible enough to test both performance and correctness in LWT and
>>> MVs as well as other arbitrary workloads.
>>> 
>>> https://github.com/thelastpickle/tlp-stress
>>> 
>>> Jon
>>> 
>>> 
>>> On Tue, Jul 3, 2018 at 12:28 PM Josh McKenzie 
>>> wrote:
>>> 
>>>> Why not just branch a 4.0-rel and bugfix there and merge up while still
>>>> accepting new features or improvements on trunk?
>>>> 
>>>> I don't think the potential extra engagement in testing will balance
>> out
>>>> the atrophy and discouraging contributions / community engagement we'd
>>> get
>>>> by deferring all improvements and new features in an open-ended way.
>>>> 
>>>> On Tue, Jul 3, 2018 at 1:33 PM sankalp kohli 
>>>> wrote:
>>>> 
>>>>> Hi cassandra-dev@,
>>>>> 
>>>>> With the goal of making Cassandra's 4.0 the most stable major release
>>> to
>>>>> date, we would like all committers of the project to consider joining
>>> us
>>>> in
>>>>> dedicating their time and attention to testing, running, and fixing
>>>> issues
>>>>> in 4.0 between the September freeze and the 4.0 beta release. This
>>> would
>>>>> result in a freeze of new feature development on trunk or branches
>>> during
>>>>> this period, instead focusing on writing, improving, and running
>> tests
>>> or
>>>>> fixing and reviewing bugs or performance regressions found in 4.0 or
>>>>> earlier.
>>>>> 
>>>>> How would this work?
>>>>> 
>>>>> We propose that between the September freeze date and beta, a new
>>> branch
>>>>> would not be created and trunk would only have bug fixes and
>>> performance
>>>>> improvements committed to it. At the same time we do not want to
>>>> discourage
>>>>> community contributions. Not all contributors can be expected to be
>>> aware
>>>>> of such a decision or may be new to the project. In cases where new
>>>>> features are contributed during this time, the contributor can be
>>>> informed
>>>>> of the current status of the release process, be encouraged to
>>> contribute
>>>>> to testing or bug fixing, and have their feature reviewed after the
>>> beta
>>>> is
>>>>> reached.
>>>>> 
>>>>> 
>>>>> What happens when beta is reached?
>>>>> 
>>>>> Ideally, contributors who have made significant contributions to the
>>>>> release will stick around to continue testing between beta and final
>>>>> release. Any additional folks who continue this focus would also be
>>>> greatly
>>>>> appreciated.
>>>>> 
>>>>> What about before the freeze?
>>>>> 
>>>>> Testing new features is of course important. This isn't meant to
>>>> discourage
>>>>> development – only to enable us to focus on testing and hardening 4.0
>>> to
>>>>> deliver Cassandra's most stable major release. We would like to see
>>>>> adoption of 4.0 happen much more quickly than its predecessor.
>>>>> 
>>>>> Thanks for considering this proposal,
>>>>> Sankalp Kohli
>>>> 
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Testing 4.0 Post-Freeze

2018-07-10 Thread Jeff Jirsa

Ultimately, we have a consensus driven development. If Jonathan or Dave
strongly disagrees with this, they can share their strong disagreement.

Jonathan shared his concern about dissuading contributors.

What's absurd is trying the same thing we've tried for 10 years and
expecting things to magically change. We know that a lot of folks are
lining up to test the 4.0 release. If people who have contributed enough to
be able to commit have time to work on features, the proposal is that the
project make it known that we'd rather have them work on testing than
commit their patch, or hold their patch until testing is done. That doesn't
mean they're suddenly not allowed to commit, it's that we'd prefer they use
their time and attention in a more constructive manner.

- Jeff



On Tue, Jul 10, 2018 at 10:18 AM, Jonathan Haddad  wrote:

> I guess I look at the initial voting in of committers as the process
> by which people are trusted to merge things in.  This proposed process
> revokes that trust. If Jonathan Ellis or Dave Brosius (arbitrarily
> picked) wants to merge a new feature into trunk during the freeze, now
> they're not allowed?  That's absurd.  People have already met the bar
> and have been voted in by merit, they should not have their privilege
> revoked.
> On Tue, Jul 10, 2018 at 10:14 AM Ben Bromhead  wrote:
> >
> > Well put Mick
> >
> > +1
> >
> > On Tue, Jul 10, 2018 at 1:06 PM Aleksey Yeshchenko 
> > wrote:
> >
> > > +1 from me too.
> > >
> > > —
> > > AY
> > >
> > > On 10 July 2018 at 04:17:26, Mick Semb Wever (m...@apache.org) wrote:
> > >
> > >
> > > > We have done all this for previous releases and we know it has not
> > > worked
> > > > well. So how giving it one more try is going to help here. Can
> someone
> > > > outline what will change for 4.0 which will make it more successful?
> > >
> > >
> > > I (again) agree with you Sankalp :-)
> > >
> > > Why not try something new?
> > > It's easier to discuss these things more genuinely after trying it out.
> > >
> > > One of the differences in the branching approaches: to feature-freeze
> on a
> > > 4.0 branch or on trunk; is who it is that has to then merge and work
> with
> > > multiple branches.
> > >
> > > Where that small but additional effort is placed I think becomes a
> signal
> > > to what the community values most: new features or stability.
> > >
> > > I think most folk would vote for stability, so why not give this
> approach
> > > a go and to learn from it.
> > > It also creates an incentive to make the feature-freeze period as
> short as
> > > possible, moving us towards an eventual goal of not needing to
> > > feature-freeze at all.
> > >
> > > regards,
> > > Mick
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > > --
> > Ben Bromhead
> > CTO | Instaclustr 
> > +1 650 284 9692
> > Reliability at Scale
> > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Branching Change for 4.0 Freeze

2018-07-11 Thread Jeff Jirsa

+1


-- 
Jeff Jirsa


> On Jul 11, 2018, at 2:46 PM, sankalp kohli  wrote:
> 
> Hi,
>As discussed in the thread[1], we are proposing that we will not branch
> on 1st September but will only allow following merges into trunk.
> 
> a. Bug and Perf fixes to 4.0.
> b. Critical bugs in any version of C*.
> c. Testing changes to help test 4.0
> 
> If someone has a change which does not fall under these three, we can
> always discuss it and have an exception.
> 
> Vote will be open for 72 hours.
> 
> Thanks,
> Sankalp
> 
> [1]
> https://lists.apache.org/thread.html/494c3ced9e83ceeb53fa127e44eec6e2588a01b769896b25867fd59f@%3Cdev.cassandra.apache.org%3E

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Scratch an itch

2018-07-12 Thread Jeff Jirsa

On Thu, Jul 12, 2018 at 10:54 AM, Michael Burman 
wrote:

> On 07/12/2018 07:38 PM, Stefan Podkowinski wrote:
>
>> this point? Also, if we tell someone that their contribution will be
>> reviewed and committed later after 4.0-beta, how is that actually making
>> a difference for that person, compared to committing it now for a 4.x
>> version. It may be satisfying to get a patch committed, but what matters
>> more is when the code will actually be released and deferring committing
>> contributions after 4.0-beta doesn't necessarily mean that there's any
>> disadvantage when it comes to that.
>>
>> Deferring huge amount of commits gives rebase/redo hell. That's the
> biggest impact and the order in which these deferred commits are then
> actually committed can make it more painful or less painful depending on
> the commit. And that in turn will have to then wait for each contributor to
> rebase/redo their commit and those timings might make more rebase issues.
> If those committers will want to rebase something after n-months or have
> time at that point.
>
>
This is true, but it's also part of the point - if the people fixing bugs
for 4.0 proper have to spend a bunch of time rebasing around 4.next
features, then that rebase hell gets in the way of fixing bugs for a
release (because we wouldn't commit just to 4.0 without also rebasing for
trunk).


> That's a problem for all Cassandra patches that take huge time to commit
> and if this block takes a lot of time, then that will for sure be even more
> painful. I know products such as Kubernetes does the same (I guess that's
> where this idea might have come from) "trunk patches only", but their block
> is quite short.
>
> My wish is that this freeze does not last too long to kill enthusiasm
> towards committing to Cassandra. There are (I assume) many hobbyist who do
> this as a side-project instead of their daily work and might not have the
> capabilities to test 4.0 in a way that will trigger bugs (easy bugs are
> fixed quite quickly I hope). And if they feel like it's not worth the time
> at this point to invest time to Cassandra (because nothing they do will get
> merged) - they might move to another project. And there's no guarantee they
> will return. Getting stuff to the product is part of the satisfaction and
> without satisfaction there's no interest in continuing.
>

I wish for this too.

Re: [VOTE] Release Apache Cassandra 3.0.17 (Take 2)

2018-07-25 Thread Jeff Jirsa

+1

On Wed, Jul 25, 2018 at 12:17 AM, Michael Shuler 
wrote:

> I propose the following artifacts for release as 3.0.17.
>
> sha1: d52c7b8c595cc0d06fc3607bf16e3f595f016bb6
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.0.17-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1165/org/apache/cassandra/apache-cassandra/3.0.17/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1165/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: CHANGES.txt:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.17-tentative
> [2]: NEWS.txt:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> blob_plain;f=CHANGES.txt;hb=refs/tags/3.0.17-tentative
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Release Apache Cassandra 3.11.3 (Take 2)

2018-07-25 Thread Jeff Jirsa

+1

On Wed, Jul 25, 2018 at 12:16 AM, Michael Shuler 
wrote:

> I propose the following artifacts for release as 3.11.3.
>
> sha1: 31d5d870f9f5b56391db46ba6cdf9e0882d8a5c0
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/3.11.3-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1164/org/apache/cassandra/apache-cassandra/3.11.3/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1164/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: CHANGES.txt:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.3-tentative
> [2]: NEWS.txt:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> blob_plain;f=CHANGES.txt;hb=refs/tags/3.11.3-tentative
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Release Apache Cassandra 2.2.13

2018-07-25 Thread Jeff Jirsa

+1

On Wed, Jul 25, 2018 at 12:17 AM, Michael Shuler 
wrote:

> I propose the following artifacts for release as 2.2.13.
>
> sha1: 3482370df5672c9337a16a8a52baba53b70a4fe8
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> shortlog;h=refs/tags/2.2.13-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1167/org/apache/cassandra/apache-cassandra/2.2.13/
> Staging repository:
> https://repository.apache.org/content/repositories/
> orgapachecassandra-1167/
>
> The Debian and RPM packages are available here:
> http://people.apache.org/~mshuler
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: CHANGES.txt:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.13-tentative
> [2]: NEWS.txt:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> blob_plain;f=CHANGES.txt;hb=refs/tags/2.2.13-tentative
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: NGCC 2018?

2018-07-26 Thread Jeff Jirsa

Bay area event is interesting to me, in any format.


On Thu, Jul 26, 2018 at 9:03 PM, Ben Bromhead  wrote:

> It sounds like there may be an appetite for something, but the NGCC in its
> current format is likely to not be that useful?
>
> Is a bay area event focused on C* developers something that is interesting
> for the broader dev community? In whatever format that may be?
>
> On Tue, Jul 24, 2018 at 5:02 PM Nate McCall  wrote:
>
> > This was discussed amongst the PMC recently. We did not come to a
> > conclusion and there were not terribly strong feelings either way.
> >
> > I don't feel like we need to hustle to get "NGCC" in place,
> > particularly given our decided focus on 4.0. However, that should not
> > stop us from doing an additional 'c* developer' event in sept. to
> > coincide with distributed data summit.
> >
> > On Wed, Jul 25, 2018 at 5:03 AM, Patrick McFadin 
> > wrote:
> > > Ben,
> > >
> > > Lynn Bender had offered a space the day before Distributed Data Summit
> in
> > > September (http://distributeddatasummit.com/) since we are both
> platinum
> > > sponsors. I thought he and Nate had talked about that being a good
> place
> > > for NGCC since many of us will be in town already.
> > >
> > > Nate, now that I've spoken for you, you can clarify, :D
> > >
> > > Patrick
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> > --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>

Re: Proposing an Apache Cassandra Management process

2018-08-20 Thread Jeff Jirsa

On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala
 wrote:

> contributions should be evaluated based on the merit of code and their
> value add to the whole offering. I  hope it does not matter whether that
> contribution comes from PMC member or a person who is not a committer.


I hope this goes without saying.

Re: Side Car New Repo vs not

2018-08-23 Thread Jeff Jirsa

+1 for separate repo


-- 
Jeff Jirsa


> On Aug 23, 2018, at 1:00 PM, sankalp kohli  wrote:
> 
> Separate repo is in a majority so far. Please reply to this thread with
> your responses.
> 
> On Tue, Aug 21, 2018 at 4:34 PM Rahul Singh 
> wrote:
> 
>> +1 for separate repo. Especially on git. Maybe make it a submodule.
>> 
>> Rahul
>> On Aug 21, 2018, 3:33 PM -0500, Stefan Podkowinski ,
>> wrote:
>>> I'm also currently -1 on the in-tree option.
>>> 
>>> Additionally to what Aleksey mentioned, I also don't see how we could
>>> make this work with the current build and release process. Our scripts
>>> [0] for creating releases (tarballs and native packages), would need
>>> significant work to add support for an independent side-car. Our ant
>>> based build process is also not a great start for adding new tasks, let
>>> alone integrating other tool chains for web components for a potential
>> UI.
>>> 
>>> [0] https://git-wip-us.apache.org/repos/asf?p=cassandra-builds.git
>>> 
>>> 
>>>> On 21.08.18 19:20, Aleksey Yeshchenko wrote:
>>>> Sure, allow me to elaborate - at least a little bit. But before I do,
>> just let me note that this wasn’t a veto -1, just a shorthand for “I don’t
>> like this option”.
>>>> 
>>>> It would be nice to have sidecar and C* version and release cycles
>> fully decoupled. I know it *can* be done when in-tree, but the way we vote
>> on releases with tags off current branches would have to change somehow.
>> Probably painfully. It would be nice to be able to easily enforce freezes,
>> like the upcoming one, on the whole C* repo, while allowing feature
>> development on the sidecar. It would be nice to not have sidecar commits in
>> emails from commits@ mailing list. It would be nice to not have C* CI
>> trigger necessarily on sidecar commits. Groups of people working on the two
>> repos will mostly be different too, so what’s the point in sharing the repo?
>>>> 
>>>> Having an extra repo with its own set of branches is cheap and easy -
>> we already do that with dtests. I like cleanly separated things when
>> coupling is avoidable. As such I would prefer the sidecar to live in a
>> separate new repo, while still being part of the C* project.
>>>> 
>>>> —
>>>> AY
>>>> 
>>>> On 21 August 2018 at 17:06:39, sankalp kohli (kohlisank...@gmail.com)
>> wrote:
>>>> 
>>>> Hi Aleksey,
>>>> Can you please elaborate on the reasons for your -1? This
>>>> way we can make progress towards any one approach.
>>>> Thanks,
>>>> Sankalp
>>>> 
>>>> On Tue, Aug 21, 2018 at 8:39 AM Aleksey Yeshchenko 
>>>> wrote:
>>>> 
>>>>> FWIW I’m strongly -1 on in-tree approach, and would much prefer a
>> separate
>>>>> repo, dtest-style.
>>>>> 
>>>>> —
>>>>> AY
>>>>> 
>>>>> On 21 August 2018 at 16:36:02, Jeremiah D Jordan (
>>>>> jeremiah.jor...@gmail.com) wrote:
>>>>> 
>>>>> I think the following is a very big plus of it being in tree:
>>>>>>> * Faster iteration speed in general. For example when we need to
>> add a
>>>>>>> new
>>>>>>> JMX endpoint that the sidecar needs, or change something from
>> JMX to a
>>>>>>> virtual table (e.g. for repair, or monitoring) we can do all
>> changes
>>>>>>> including tests as one commit within the main repository and
>> don't
>>>>> have
>>>>>>> to
>>>>>>> commit to main repo, sidecar repo,
>>>>> 
>>>>> I also don’t see a reason why the sidecar being in tree means it
>> would not
>>>>> work in a mixed version cluster. The nodes themselves must work in a
>> mixed
>>>>> version cluster during a rolling upgrade, I would expect any
>> management
>>>>> side car to operate in the same manor, in tree or not.
>>>>> 
>>>>> This tool will be pretty tightly coupled with the server, and as
>> someone
>>>>> with experience developing such tightly coupled tools, it is *much*
>> easier
>>>>> to make sure you don’t accidentally break them if they are in tree.
>> How
>>>>> many times has someone updated some JMX interface, updated nodetool,
>> and
>>>>>

Re: Reaper as cassandra-admin

2018-08-27 Thread Jeff Jirsa

Can you get all of the contributors cleared?
What’s the architecture? Is it centralized? Is there a sidecar?


> On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
> 
> Hey folks,
> 
> Mick brought this up in the sidecar thread, but I wanted to have a clear /
> separate discussion about what we're thinking with regard to contributing
> Reaper to the C* project.  In my mind, starting with Reaper is a great way
> of having an admin right now, that we know works well at the kind of scale
> we need.  We've worked with a lot of companies putting Reaper in prod (at
> least 50), running on several hundred clusters.  The codebase has evolved
> as a direct result of production usage, and we feel it would be great to
> pair it with the 4.0 release.  There was a LOT of work done on the repair
> logic to make things work across every supported version of Cassandra, with
> a great deal of documentation as well.
> 
> In case folks aren't aware, in addition to one off and scheduled repairs,
> Reaper also does cluster wide snapshots, exposes thread pool stats, and
> visualizes streaming (in trunk).
> 
> We're hoping to get some feedback on our side if that's something people
> are interested in.  We've gone back and forth privately on our own
> preferences, hopes, dreams, etc, but I feel like a public discussion would
> be healthy at this point.  Does anyone share the view of using Reaper as a
> starting point?  What concerns to people have?
> -- 
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Reaper as cassandra-admin

2018-08-27 Thread Jeff Jirsa

As an aside, it’s frustrating that ya’ll would sit on this for months (first 
e-mail was April); you folks have enough people that know the process to know 
that communicating early and often helps avoid duplicating (expensive) work. 

The best tech needs to go in and we need to leave ourselves with the ability to 
meet the goals of the original proposal (and then some). The reaper UI is nice, 
I wish you’d have talked to the other group of folks to combine efforts in 
April, we’d be much further ahead. 

-- 
Jeff Jirsa


> On Aug 27, 2018, at 6:02 PM, Jeff Jirsa  wrote:
> 
> Can you get all of the contributors cleared?
> What’s the architecture? Is it centralized? Is there a sidecar?
> 
> 
>> On Aug 27, 2018, at 5:36 PM, Jonathan Haddad  wrote:
>> 
>> Hey folks,
>> 
>> Mick brought this up in the sidecar thread, but I wanted to have a clear /
>> separate discussion about what we're thinking with regard to contributing
>> Reaper to the C* project.  In my mind, starting with Reaper is a great way
>> of having an admin right now, that we know works well at the kind of scale
>> we need.  We've worked with a lot of companies putting Reaper in prod (at
>> least 50), running on several hundred clusters.  The codebase has evolved
>> as a direct result of production usage, and we feel it would be great to
>> pair it with the 4.0 release.  There was a LOT of work done on the repair
>> logic to make things work across every supported version of Cassandra, with
>> a great deal of documentation as well.
>> 
>> In case folks aren't aware, in addition to one off and scheduled repairs,
>> Reaper also does cluster wide snapshots, exposes thread pool stats, and
>> visualizes streaming (in trunk).
>> 
>> We're hoping to get some feedback on our side if that's something people
>> are interested in.  We've gone back and forth privately on our own
>> preferences, hopes, dreams, etc, but I feel like a public discussion would
>> be healthy at this point.  Does anyone share the view of using Reaper as a
>> starting point?  What concerns to people have?
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Supporting multiple JDKs

2018-08-28 Thread Jeff Jirsa

+1 from me on both points below 

-- 
Jeff Jirsa


> On Aug 28, 2018, at 1:40 PM, Sumanth Pasupuleti 
>  wrote:
> 
> Correct me if I am wrong, but I see the following consensus so far, on the
> proposal.
> 
> C* 2.2
> AnimalSniffer
> Use AnimalSniffer for compile-time feedback on JDK 1.7 compatibility -
> complete consensus so far
> Circle CI Builds
> In addition to existing JDK 1.8 support, build against JDK 1.7, and
> [optionally] run unit tests and DTests against JDK 1.7 - Dinesh and
> Sumanth +1 so far. Mick - I am not sure if you are +0 or -1 on this.
> 
> C* 4.0
> Circle CI Builds
> In addition to existing JDK 1.8 support, build against JDK 11 and
> [optionally] run unit tests and DTests against JDK 11. - complete consensus
> so far
> 
> If anyone has any further feedback, please comment.
> 
> Thanks,
> Sumanth
> 
> On Fri, Aug 24, 2018 at 7:27 AM Sumanth Pasupuleti
>  wrote:
> 
>>> I'm still a bit confused as to what's the benefit in compiling with
>> jdk1.7 and then testing with jdk1.7 or jdk1.8
>> I meant two separate workflows for each JDK i.e.
>> Workflow1: Build against jdk1.7, and optionally run UTs and Dtests against
>> 1.7
>> Workflow2: Build against jdk1.8, and run UTs and DTests against 1.8.
>> 
>>> If you find breakages here that otherwise don't exist when it's compiled
>> with jdk1.8 then that's just false-positives. As well as generally wasting
>> CI resources.
>> If we find breakages in workflow1, and not in workflow 2, how would they be
>> false positives? we will need to then look into whats causing breakages
>> with 1.7, isn't it?
>> 
>> Thanks,
>> Sumanth
>> 
>> On Thu, Aug 23, 2018 at 7:59 PM, Mick Semb Wever  wrote:
>> 
>>>> However, in addition to using such a
>>>> tool, I believe, when we make a release, we should build against the
>>> actual
>>>> JDKs we support (that way, we are not making a release just based on
>> the
>>>> result of an external tool), and we should be able to optionally run
>> UTs
>>>> and DTests against the JDK  (i.e. Java7 and Java8 for C* 2.2).
>>> 
>>> 
>>> I'm still a bit confused as to what's the benefit in compiling with
>> jdk1.7
>>> and then testing with jdk1.7 or jdk1.8
>>> 
>>> If you find breakages here that otherwise don't exist when it's compiled
>>> with jdk1.8 then that's just false-positives. As well as generally
>> wasting
>>> CI resources.
>>> 
>>> Either way, there's not much point discussing this as Cassandra-2.1 is
>>> about EOL, and Cassandra-4.0 is stuck with a very specific compile.
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Reaper as cassandra-admin

2018-08-29 Thread Jeff Jirsa

Agreed here - combining effort and making things pluggable seems like a good 
solution


-- 
Jeff Jirsa


On Aug 28, 2018, at 11:44 PM, Vinay Chella  wrote:

>> I haven’t settled on a position yet (will have more time think about
> things after the 9/1 freeze), but I wanted to point out that the argument
> that something new should be written because an existing project has tech
> debt, and we'll do it the right way this time, is a pretty common software
> engineering mistake. The thing you’re replacing usually needs to have some
> really serious problems to make it worth replacing.
> 
> Agreed, Yes, I don’t think we should write everything from the scratch, but
> carry forwarding tech debt (if any) and design decisions which makes new
> features in future difficult to develop is something that we need to
> consider. I second Dinesh’s thought on taking the best parts from available
> projects to move forward with the right solution which works great and
> easily pluggable.
> 
> -
> Vinay Chella
> 
> 
>> On Tue, Aug 28, 2018 at 10:03 PM Mick Semb Wever  wrote:
>> 
>> 
>>> the argument that something new should be written because an existing
>> project has tech debt, and we'll do it the right way this time, is a pretty
>> common software engineering mistake. The thing you’re replacing usually
>> needs to have some really serious problems to make it worth replacing.
>> 
>> 
>> Thanks for writing this Blake. I'm no fan of writing from scratch. Working
>> with other people's code is the joy of open-source, imho.
>> 
>> Reaper is not a big project. None of its java files are large or
>> complicated.
>> This is not the C* codebase we're talking about.
>> 
>> It comes with strict code style in place (which the build enforces), unit
>> and integration tests. The tech debt that I think of first is removing
>> stuff that we would no longer want to support if it were inside the
>> Cassandra project. A number of recent refactorings  have proved it's an
>> easy codebase to work with.
>> 
>> It's also worth noting that Cassandra-4.x adoption is still some away, in
>> which time Reaper will only continue to grow and gain users.
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Java 11 Z garbage collector

2018-08-31 Thread Jeff Jirsa

Read heavy workload with wider partitions (like 1-2gb) and disable the key 
cache will be worst case for GC




-- 
Jeff Jirsa


> On Aug 31, 2018, at 10:51 AM, Carl Mueller 
>  wrote:
> 
> I'm assuming that p99 that Rocksandra tries to target is caused by GC
> pauses, does anyone have data patterns or datasets that will generate GC
> pauses in Cassandra to highlight the abilities of Rocksandra (and...
> Scylla?) and perhaps this GC approach?
> 
> On Thu, Aug 30, 2018 at 8:11 PM Carl Mueller 
> wrote:
> 
>> Oh nice, I'll check that out.
>> 
>> On Thu, Aug 30, 2018 at 11:07 AM Jonathan Haddad 
>> wrote:
>> 
>>> Advertised, yes, but so far I haven't found it to be any better than
>>> ParNew + CMS or G1 in the performance tests I did when writing
>>> http://thelastpickle.com/blog/2018/08/16/java11.html.
>>> 
>>> That said, I didn't try it with a huge heap (i think it was 16 or 24GB),
>>> so
>>> maybe it'll do better if I throw 50 GB RAM at it.
>>> 
>>> 
>>> 
>>> On Thu, Aug 30, 2018 at 8:42 AM Carl Mueller
>>>  wrote:
>>> 
>>>> https://www.opsian.com/blog/javas-new-zgc-is-very-exciting/
>>>> 
>>>> .. max of 4ms for stop the world, large terabyte heaps, seems promising.
>>>> 
>>>> Will this be a major boon to cassandra p99 times? Anyone know the
>>> aspects
>>>> of cassandra that cause the most churn and lead to StopTheWorld GC? I
>>> was
>>>> under the impression that bloom filters, caches, etc are statically
>>>> allocated at startup.
>>>> 
>>> 
>>> 
>>> --
>>> Jon Haddad
>>> http://www.rustyrazorblade.com
>>> twitter: rustyrazorblade
>>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Request for post-freeze merge exception

2018-09-04 Thread Jeff Jirsa

Seems like a reasonable thing to merge to me. Nothing else has been
committed, it was approved pre-freeze, seems like the rush to merge was
bound to have some number of rebase casualties.

On Tue, Sep 4, 2018 at 11:15 AM Sam Tunnicliffe  wrote:

> Hey all,
>
> On 2018-31-08 CASSANDRA-14145 had been +1'd by two reviewers and CI was
> green, and so it was marked Ready To Commit. This was before the 4.0
> feature freeze but before it landed, CASSANDRA-14408, which touched a few
> common areas of the code, was merged. I didn't have chance to finish the
> rebase over the weekend but in the end it turned out that most of the
> conflicts were in test code and were straightforward to resolve. I'd like
> to commit this now; the rebase is done (& has been re-reviewed), and the CI
> is still green so I suspect most of the community would probably be ok with
> that. We did vote for a freeze though and I don't want to subvert or
> undermine that decision, so I wanted to check and give a chance for anyone
> to raise objections before I did.
>
> I'll wait 24 hours, and if nobody objects before then I'll merge to trunk.
>
> Thanks,
> Sam
>

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Jeff Jirsa

How can we continue moving this forward?

Mick/Jon/TLP folks, is there a path here where we commit the
Netflix-provided management process, and you augment Reaper to work with it?
Is there a way we can make a larger umbrella that's modular that can
support either/both?
Does anyone believe there's a clear, objective argument that one is
strictly better than the other? I haven't seen one.



On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala
 wrote:

> +1 to everything that Joey articulated with emphasis on the fact that
> contributions should be evaluated based on the merit of code and their
> value add to the whole offering. I  hope it does not matter whether that
> contribution comes from PMC member or a person who is not a committer. I
> would like the process to be such that it encourages the new members to be
> a part of the community and not shy away from contributing to the code
> assuming their contributions are valued differently than committers or PMC
> members. It would be sad to see the contributions decrease if we go down
> that path.
>
> *Regards,*
>
> *Roopa Tangirala*
>
> Engineering Manager CDE
>
> *(408) 438-3156 - mobile*
>
>
>
>
>
>
> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch 
> wrote:
>
> > > We are looking to contribute Reaper to the Cassandra project.
> > >
> > Just to clarify are you proposing contributing Reaper as a project via
> > donation or you are planning on contributing the features of Reaper as
> > patches to Cassandra? If the former how far along are you on the donation
> > process? If the latter, when do you think you would have patches ready
> for
> > consideration / review?
> >
> >
> > > Looking at the patch it's very similar in its base design already, but
> > > Reaper does has a lot more to offer. We have all been working hard to
> > move
> > > it to also being a side-car so it can be contributed. This raises a
> > number
> > > of relevant questions to this thread: would we then accept both works
> in
> > > the Cassandra project, and what burden would it put on the current PMC
> to
> > > maintain both works.
> > >
> > I would hope that we would collaborate on merging the best parts of all
> > into the official Cassandra sidecar, taking the always on, shared
> nothing,
> > highly available system that we've contributed a patchset for and adding
> in
> > many of the repair features (e.g. schedules, a nice web UI) that Reaper
> > has.
> >
> >
> > > I share Stefan's concern that consensus had not been met around a
> > > side-car, and that it was somehow default accepted before a patch
> landed.
> >
> >
> > I feel this is not correct or fair. The sidecar and repair discussions
> have
> > been anything _but_ "default accepted". The timeline of consensus
> building
> > involving the management sidecar and repair scheduling plans:
> >
> > Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper
> to
> > come up with design goals for a repair scheduler that could work at
> Netflix
> > scale.
> >
> > ~Feb 2017: Netflix believes that the fundamental design gaps prevented us
> > from using Reaper as it relies heavily on remote JMX connections and
> > central coordination.
> >
> > Sep. 2017: Vinay gives a lightning talk at NGCC about a highly available
> > and distributed repair scheduling sidecar/tool. He is encouraged by
> > multiple committers to build repair scheduling into the daemon itself and
> > not as a sidecar so the database is truly eventually consistent.
> >
> > ~Jun. 2017 - Feb. 2018: Based on internal need and the positive feedback
> at
> > NGCC, Vinay and myself prototype the distributed repair scheduler within
> > Priam and roll it out at Netflix scale.
> >
> > Mar. 2018: I open a Jira (CASSANDRA-14346) along with a detailed 20 page
> > design document for adding repair scheduling to the daemon itself and
> open
> > the design up for feedback from the community. We get feedback from Alex,
> > Blake, Nate, Stefan, and Mick. As far as I know there were zero proposals
> > to contribute Reaper at this point. We hear the consensus that the
> > community would prefer repair scheduling in a separate distributed
> sidecar
> > rather than in the daemon itself and we re-work the design to match this
> > consensus, re-aligning with our original proposal at NGCC.
> >
> > Apr 2018: Blake brings the discussion of repair scheduling to the dev
> list
> > (
> >
> >
> https://lists.apache.org/thread.html/760fbef677f27aa5c2ab4c375c7efeb81304fea428deff986ba1c2eb@%3Cdev.cassandra.apache.org%3E
> > ).
> > Many community members give positive feedback that we should solve it as
> > part of Cassandra and there is still no mention of contributing Reaper at
> > this point. The last message is my attempted summary giving context on
> how
> > we want to take the best of all the sidecars (OpsCenter, Priam, Reaper)
> and
> > ship them with Cassandra.
> >
> > Apr. 2018: Dinesh opens CASSANDRA-14395 along with a public design
> document
> > for gathering feedback on a general mana

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Jeff Jirsa

I’d also like to see the end state you describe: reaper UI wrapping the Netflix 
management process with pluggable scheduling (either as is with reaper now, or 
using the Netflix scheduler), but I don’t think that means we need to start 
with reaper - if personally prefer the opposite direction, starting with 
something small and isolated and layering on top. 

-- 
Jeff Jirsa


> On Sep 7, 2018, at 5:42 PM, Blake Eggleston  wrote:
> 
> I think we should accept the reaper project as is and make that cassandra 
> management process 1.0, then integrate the netflix scheduler (and other new 
> features) into that.
> 
> The ultimate goal would be for the netflix scheduler to become the default 
> repair scheduler, but I think using reaper as the starting point makes it 
> easier to get there. 
> 
> Reaper would bring a prod user base that would realistically take 2-3 years 
> to build up with a new project. As an operator, switching to a cassandra 
> management process that’s basically a re-brand of an existing and commonly 
> used management process isn’t super risky. Asking operators to switch to a 
> new process is a much harder sell. 
> 
> On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote:
> 
> How can we continue moving this forward?  
> 
> Mick/Jon/TLP folks, is there a path here where we commit the  
> Netflix-provided management process, and you augment Reaper to work with it?  
> Is there a way we can make a larger umbrella that's modular that can  
> support either/both?  
> Does anyone believe there's a clear, objective argument that one is  
> strictly better than the other? I haven't seen one.  
> 
> 
> 
> On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala  
>  wrote:  
> 
>> +1 to everything that Joey articulated with emphasis on the fact that  
>> contributions should be evaluated based on the merit of code and their  
>> value add to the whole offering. I hope it does not matter whether that  
>> contribution comes from PMC member or a person who is not a committer. I  
>> would like the process to be such that it encourages the new members to be  
>> a part of the community and not shy away from contributing to the code  
>> assuming their contributions are valued differently than committers or PMC  
>> members. It would be sad to see the contributions decrease if we go down  
>> that path.  
>> 
>> *Regards,*  
>> 
>> *Roopa Tangirala*  
>> 
>> Engineering Manager CDE  
>> 
>> *(408) 438-3156 - mobile*  
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch   
>> wrote:  
>> 
>>>> We are looking to contribute Reaper to the Cassandra project.  
>>>> 
>>> Just to clarify are you proposing contributing Reaper as a project via  
>>> donation or you are planning on contributing the features of Reaper as  
>>> patches to Cassandra? If the former how far along are you on the donation  
>>> process? If the latter, when do you think you would have patches ready  
>> for  
>>> consideration / review?  
>>> 
>>> 
>>>> Looking at the patch it's very similar in its base design already, but  
>>>> Reaper does has a lot more to offer. We have all been working hard to  
>>> move  
>>>> it to also being a side-car so it can be contributed. This raises a  
>>> number  
>>>> of relevant questions to this thread: would we then accept both works  
>> in  
>>>> the Cassandra project, and what burden would it put on the current PMC  
>> to  
>>>> maintain both works.  
>>>> 
>>> I would hope that we would collaborate on merging the best parts of all  
>>> into the official Cassandra sidecar, taking the always on, shared  
>> nothing,  
>>> highly available system that we've contributed a patchset for and adding  
>> in  
>>> many of the repair features (e.g. schedules, a nice web UI) that Reaper  
>>> has.  
>>> 
>>> 
>>>> I share Stefan's concern that consensus had not been met around a  
>>>> side-car, and that it was somehow default accepted before a patch  
>> landed.  
>>> 
>>> 
>>> I feel this is not correct or fair. The sidecar and repair discussions  
>> have  
>>> been anything _but_ "default accepted". The timeline of consensus  
>> building  
>>> involving the management sidecar and repair scheduling plans:  
>>> 
>>> Dec 2016: Vinay worked with Jon and Alex to try to collaborate on Reaper  
>>

Re: Proposing an Apache Cassandra Management process

2018-09-07 Thread Jeff Jirsa

The benefit is that it more closely matched the design doc, from 5 months ago, 
which is decidedly not about coordinating repair - it’s about a general purpose 
management tool, where repair is one of many proposed tasks

https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit


By starting with a tool that is built to run repair, you’re sacrificing 
generality and accepting something purpose built for one sub task. It’s an 
important subtask, and it’s a nice tool, but it’s not an implementation of the 
proposal, it’s an alternative that happens to do some of what was proposed.

-- 
Jeff Jirsa


> On Sep 7, 2018, at 6:53 PM, Blake Eggleston  wrote:
> 
> What’s the benefit of doing it that way vs starting with reaper and 
> integrating the netflix scheduler? If reaper was just a really inappropriate 
> choice for the cassandra management process, I could see that being a better 
> approach, but I don’t think that’s the case.
> 
> If our management process isn’t a drop in replacement for reaper, then reaper 
> will continue to exist, which will split the user and developers base between 
> the 2 projects. That won't be good for either project.
> 
> On September 7, 2018 at 6:12:01 PM, Jeff Jirsa (jji...@gmail.com) wrote:
> 
> I’d also like to see the end state you describe: reaper UI wrapping the 
> Netflix management process with pluggable scheduling (either as is with 
> reaper now, or using the Netflix scheduler), but I don’t think that means we 
> need to start with reaper - if personally prefer the opposite direction, 
> starting with something small and isolated and layering on top.  
> 
> --  
> Jeff Jirsa  
> 
> 
>> On Sep 7, 2018, at 5:42 PM, Blake Eggleston  wrote:  
>> 
>> I think we should accept the reaper project as is and make that cassandra 
>> management process 1.0, then integrate the netflix scheduler (and other new 
>> features) into that.  
>> 
>> The ultimate goal would be for the netflix scheduler to become the default 
>> repair scheduler, but I think using reaper as the starting point makes it 
>> easier to get there.  
>> 
>> Reaper would bring a prod user base that would realistically take 2-3 years 
>> to build up with a new project. As an operator, switching to a cassandra 
>> management process that’s basically a re-brand of an existing and commonly 
>> used management process isn’t super risky. Asking operators to switch to a 
>> new process is a much harder sell.  
>> 
>> On September 7, 2018 at 4:17:10 PM, Jeff Jirsa (jji...@gmail.com) wrote:  
>> 
>> How can we continue moving this forward?  
>> 
>> Mick/Jon/TLP folks, is there a path here where we commit the  
>> Netflix-provided management process, and you augment Reaper to work with it? 
>>  
>> Is there a way we can make a larger umbrella that's modular that can  
>> support either/both?  
>> Does anyone believe there's a clear, objective argument that one is  
>> strictly better than the other? I haven't seen one.  
>> 
>> 
>> 
>> On Mon, Aug 20, 2018 at 4:14 PM Roopa Tangirala  
>>  wrote:  
>> 
>>> +1 to everything that Joey articulated with emphasis on the fact that  
>>> contributions should be evaluated based on the merit of code and their  
>>> value add to the whole offering. I hope it does not matter whether that  
>>> contribution comes from PMC member or a person who is not a committer. I  
>>> would like the process to be such that it encourages the new members to be  
>>> a part of the community and not shy away from contributing to the code  
>>> assuming their contributions are valued differently than committers or PMC  
>>> members. It would be sad to see the contributions decrease if we go down  
>>> that path.  
>>> 
>>> *Regards,*  
>>> 
>>> *Roopa Tangirala*  
>>> 
>>> Engineering Manager CDE  
>>> 
>>> *(408) 438-3156 - mobile*  
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Aug 20, 2018 at 2:58 PM Joseph Lynch   
>>> wrote:  
>>> 
>>>>> We are looking to contribute Reaper to the Cassandra project.  
>>>>> 
>>>> Just to clarify are you proposing contributing Reaper as a project via  
>>>> donation or you are planning on contributing the features of Reaper as  
>>>> patches to Cassandra? If the former how far along are you on the donation  
>>>> process? If the latter, when do you think you would have patches ready  
>>> for  
>>>> consideration / review?  
>>>

Re: UDF

2018-09-11 Thread Jeff Jirsa

+1 as well.

On Tue, Sep 11, 2018 at 10:27 AM Aleksey Yeschenko 
wrote:

> If this is about inclusion in 4.0, then I support it.
>
> Technically this is *mostly* just a move+donation of some code from
> java-driver to Cassandra. Given how important this seemingly is to the
> board and PMC for us to not have the dependency on the driver, the sooner
> it’s gone, the better.
>
> I’d be +1 for committing to trunk.
>
> —
> AY
>
> On 11 September 2018 at 14:43:29, Robert Stupp (sn...@snazy.de) wrote:
>
> The patch is technically complete - i.e. it works and does its thing.
>
> It's not strictly a bug fix but targets trunk. That's why I started the
> discussion.
>
>
> On 09/11/2018 02:53 PM, Jason Brown wrote:
> > Hi Robert,
> >
> > Thanks for taking on this work. Is this message a heads up that a patch
> is
> > coming/complete, or to spawn a discussion about including this in 4.0?
> >
> > Thanks,
> >
> > -Jason
> >
> > On Tue, Sep 11, 2018 at 2:32 AM, Robert Stupp  wrote:
> >
> >> In an effort to clean up our hygiene and limit the dependencies used
> by
> >> UDFs/UDAs, I think we should refactor the UDF code parts and remove
> the
> >> dependency to the Java Driver in that area without breaking existing
> >> UDFs/UDAs.
> >>
> >> A working prototype is in this branch: https://github.com/snazy/
> >> cassandra/tree/feature/remove-udf-driver-dep-trunk <
> >> https://github.com/snazy/cassandra/tree/feature/remove-
> >> udf-driver-dep-trunk> . The changes are rather trivial and provide
> 100%
> >> backwards compatibility for existing UDFs.
> >>
> >> The prototype copies the necessary parts from the Java Driver into the
> C*
> >> source tree to org.apache.cassandra.cql3.functions.types and adopts
> its
> >> usages - i.e. UDF/UDA code plus CQLSSTableWriter +
> StressCQLSSTableWriter.
> >> The latter two classes have a reference to UDF’s UDHelper and had to
> be
> >> changed as well.
> >>
> >> Some functionality, like type parsing & handling, is duplicated in the
> >> code base with this prototype - once in the “current” source tree and
> once
> >> for UDFs. However, unifying the code paths is not trivial, since the
> UDF
> >> sandbox prohibits the use of internal classes (direct and likely
> indirect
> >> dependencies).
> >>
> >> Robert
> >>
> >> —
> >> Robert Stupp
> >> @snazy
> >>
> >>
>
> --
> Robert Stupp
> @snazy
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: [VOTE] Accept GoCQL driver donation and begin incubation process

2018-09-12 Thread Jeff Jirsa

+1

(Incubation looks like it may be challenging to get acceptance from all 
existing contributors, though)

-- 
Jeff Jirsa


> On Sep 12, 2018, at 8:12 AM, Nate McCall  wrote:
> 
> This will be the same process used for dtest. We will need to walk
> this through the incubator per the process outlined here:
> 
> https://incubator.apache.org/guides/ip_clearance.html
> 
> Pending the outcome of this vote, we will create the JIRA issues for
> tracking and after we go through the process, and discuss adding
> committers in a separate thread (we need to do this atomically anyway
> per general ASF committer adding processes).
> 
> Thanks,
> -Nate
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Development Approach for Apache Cassandra Management process

2018-09-12 Thread Jeff Jirsa

d - good with either option, but would probably slightly prefer b, as it
can be build towards the design doc.



On Wed, Sep 12, 2018 at 8:19 AM sankalp kohli 
wrote:

> Hi,
> Community has been discussing about Apache Cassandra Management process
> since April and we had lot of discussion about which approach to take to
> get started. Several contributors have been interested in doing this and we
> need to make a decision of which approach to take.
>
> The current approaches being evaluated are
> a. Donate an existing project to Apache Cassandra like Reaper. If this
> option is selected, we will evaluate various projects and see which one
> fits best.
> b. Take a piecemeal approach and use the features from different OSS
> projects and build a new project.
>
> Available options to vote
> a. +1 to use existing project.
> b. +1 to take piecemeal approach
> c  -1 to both
> d +0 I dont mind either option
>
> You can also just type a,b,c,d as well to chose an option.
>
> Dev threads with discussions
>
>
> https://lists.apache.org/thread.html/4eace8cb258aab83fc3a220ff2203a281ea59f4d6557ebeb1af7b7f1@%3Cdev.cassandra.apache.org%3E
>
>
> https://lists.apache.org/thread.html/4a7e608c46aa2256e8bcb696104a4e6d6aaa1f302834d211018ec96e@%3Cdev.cassandra.apache.org%3E
>

Re: [VOTE] Development Approach for Apache Cassandra Management process

2018-09-12 Thread Jeff Jirsa

On Wed, Sep 12, 2018 at 12:41 PM Sylvain Lebresne 
wrote:

> That's probably a stupid question, and excuse me if it is, but what does
> those votes on the dev mailing list even mean?
>
> How do you count votes at the end? Just by counting all votes cast,
> irregardless of whomever cast it? Or are we intending to only count PMC
> members, or maybe committers votes?
>

I believe the intent is to try to see if there exists consensus.
Ultimately, PMC is going to matter more than random email addresses from
people nobody recognizes. This should be in public, though, not private, so
seeing what feedback is beyond the PMC is useful (primarily because it will
matter when it comes time to extend and maintain it - if people strongly
prefer one or the other, then maintenance is going to be a problem).

If there's 100 random non-contributor votes for one option and 20 pmc votes
for another options, I think the real answer will be "we don't have
consensus, and either we don't do it, or we do it the way the PMC thinks is
best", for all of the reasons you describe in the paragraphs below.

> If the former, that is a bit weird to me because we simply don't know who
> votes. And I don't mean to be rude towards anyone, but 1) someone could
> easily create 10 email addresses to vote 10 times (and sure, you could
> invoke trust, and I'm not entirely against trust in general, but it's the
> internet...) and 2) this kind of decision will have non-trivial
> consequences for the project, particularly on those that maintain it, so I
> admit I'm not entirely comfortable with "anyone's voice has the same
> weight".
> If the latter, then this makes more sense to me (why are we even bothering
> voting PMC members in if it's not to handle these kinds of decisions, which
> are very "project management" related), but we should be very clear about
> this from the get go (we could still use the dev list for transparency
> sake, that I don't mind)? We should probably also have some deadline to the
> vote, one that isn't too short.
>

Like releases, I think PMC votes count

>
> Anyway, fwiw, my opinion on this vote is not far from the one on the golang
> driver acceptance vote (for which my remark above also apply btw): no yet
> 100% convinced adding more pieces and scope to the project is what the
> project needs just right now, but not strongly opposed if people really
> wants this (and this one makes more sense to me than the golang driver
> actually). But if I'm to pick between a) and b), I'm leaning b).
>

FWIW, two of the main reasons I'm in favor is as a way to lower barrier to
entry to both using the software AND contributing to the project, so I
think your points are valid (both on gocql thread and on this note above),
but I think that's also part of why we should be encouraging both.

- Jeff

Re: [DISCUSS] changing default token behavior for 4.0

2018-09-21 Thread Jeff Jirsa

Also agree it should be lowered, but definitely not to 1, and probably 
something closer to 32 than 4.

-- 
Jeff Jirsa


> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna  wrote:
> 
> I agree that it should be lowered. What I’ve seen debated a bit in the past 
> is the number but I don’t think anyone thinks that it should remain 256.
> 
>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad  wrote:
>> 
>> One thing that's really, really bothered me for a while is how we default
>> to 256 tokens still.  There's no experienced operator that leaves it as is
>> at this point, meaning the only people using 256 are the poor folks that
>> just got started using C*.  I've worked with over a hundred clusters in the
>> last couple years, and I think I only worked with one that had lowered it
>> to something else.
>> 
>> I think it's time we changed the default to 4 (or 8, up for debate).
>> 
>> To improve the behavior, we need to change a couple other things.  The
>> allocate_tokens_for_keyspace setting is... odd.  It requires you have a
>> keyspace already created, which doesn't help on new clusters.  What I'd
>> like to do is add a new setting, allocate_tokens_for_rf, and set it to 3 by
>> default.
>> 
>> To handle clusters that are already using 256 tokens, we could prevent the
>> new node from joining unless a -D flag is set to explicitly allow
>> imbalanced tokens.
>> 
>> We've agreed to a trunk freeze, but I feel like this is important enough
>> (and pretty trivial) to do now.  I'd also personally characterize this as a
>> bug fix since 256 is horribly broken when the cluster gets to any
>> reasonable size, but maybe I'm alone there.
>> 
>> I honestly can't think of a use case where random tokens is a good choice
>> anymore, so I'd be fine / ecstatic with removing it completely and
>> requiring either allocate_tokens_for_keyspace (for existing clusters)
>> or allocate_tokens_for_rf
>> to be set.
>> 
>> Thoughts?  Objections?
>> -- 
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: MD5 in the read path

2018-09-26 Thread Jeff Jirsa

In some installations, it's used for hashing the partition key to find the
host ( RandomPartitioner )
It's used for prepared statement IDs
It's used for hashing the data for reads to know if the data matches on all
different replicas.

We don't use CRC because conflicts would be really bad. There's probably
something in the middle that's slightly faster than md5 without the
drawbacks of crc32

On Wed, Sep 26, 2018 at 3:47 PM Tyagi, Preetika 
wrote:

> Hi all,
>
> I have a question about MD5 being used in the read path in Cassandra.
> I wanted to understand what exactly it is being used for and why not
> something like CRC is used which is less complex in comparison to MD5.
>
> Thanks,
> Preetika
>
>

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-11 Thread Jeff Jirsa




I think 16k is a better default, but it should only affect new tables. Whoever 
changes it, please make sure you think about the upgrade path. 


> On Oct 12, 2018, at 2:31 AM, Ben Bromhead  wrote:
> 
> This is something that's bugged me for ages, tbh the performance gain for
> most use cases far outweighs the increase in memory usage and I would even
> be in favor of changing the default now, optimizing the storage cost later
> (if it's found to be worth it).
> 
> For some anecdotal evidence:
> 4kb is usually what we end setting it to, 16kb feels more reasonable given
> the memory impact, but what would be the point if practically, most folks
> set it to 4kb anyway?
> 
> Note that chunk_length will largely be dependent on your read sizes, but 4k
> is the floor for most physical devices in terms of ones block size.
> 
> +1 for making this change in 4.0 given the small size and the large
> improvement to new users experience (as long as we are explicit in the
> documentation about memory consumption).
> 
> 
>> On Thu, Oct 11, 2018 at 7:11 PM Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> This is regarding https://issues.apache.org/jira/browse/CASSANDRA-13241
>> 
>> This ticket has languished for a while. IMO it's too late in 4.0 to
>> implement a more memory efficient representation for compressed chunk
>> offsets. However I don't think we should put out another release with the
>> current 64k default as it's pretty unreasonable.
>> 
>> I propose that we lower the value to 16kb. 4k might never be the correct
>> default anyways as there is a cost to compression and 16k will still be a
>> large improvement.
>> 
>> Benedict and Jon Haddad are both +1 on making this change for 4.0. In the
>> past there has been some consensus about reducing this value although maybe
>> with more memory efficiency.
>> 
>> The napkin math for what this costs is:
>> "If you have 1TB of uncompressed data, with 64k chunks that's 16M chunks
>> at 8 bytes each (128MB).
>> With 16k chunks, that's 512MB.
>> With 4k chunks, it's 2G.
>> Per terabyte of data (pre-compression)."
>> 
>> https://issues.apache.org/jira/browse/CASSANDRA-13241?focusedCommentId=15886621&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15886621
>> 
>> By way of comparison memory mapping the files has a similar cost per 4k
>> page of 8 bytes. Multiple mappings makes this more expensive. With a
>> default of 16kb this would be 4x less expensive than memory mapping a file.
>> I only mention this to give a sense of the costs we are already paying. I
>> am not saying they are directly related.
>> 
>> I'll wait a week for discussion and if there is consensus make the change.
>> 
>> Regards,
>> Ariel
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-12 Thread Jeff Jirsa





> On Oct 12, 2018, at 6:46 AM, Pavel Yaskevich  wrote:
> 
>> On Thu, Oct 11, 2018 at 4:31 PM Ben Bromhead  wrote:
>> 
>> This is something that's bugged me for ages, tbh the performance gain for
>> most use cases far outweighs the increase in memory usage and I would even
>> be in favor of changing the default now, optimizing the storage cost later
>> (if it's found to be worth it).
>> 
>> For some anecdotal evidence:
>> 4kb is usually what we end setting it to, 16kb feels more reasonable given
>> the memory impact, but what would be the point if practically, most folks
>> set it to 4kb anyway?
>> 
>> Note that chunk_length will largely be dependent on your read sizes, but 4k
>> is the floor for most physical devices in terms of ones block size.
>> 
> 
> It might be worth while to investigate how splitting chunk size into data,
> index and compaction sizes would affect performance.
> 

Data chunk and index chunk are already different (though one is table level and 
one is per instance), but I’m not parsing the compaction comment? 
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Deprecating/removing PropertyFileSnitch?

2018-10-16 Thread Jeff Jirsa

We should, but the 4.0 features that log/reject verbs to invalid replicas 
solves a lot of the concerns here 

-- 
Jeff Jirsa


> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna  wrote:
> 
> We have had PropertyFileSnitch for a long time even though 
> GossipingPropertyFileSnitch is effectively a superset of what it offers and 
> is much less error prone.  There are some unexpected behaviors when things 
> aren’t configured correctly with PFS.  For example, if you replace nodes in 
> one DC and add those nodes to that DCs property files and not the other DCs 
> property files - the resulting problems aren’t very straightforward to 
> troubleshoot.
> 
> We could try to improve the resilience and fail fast error checking and error 
> reporting of PFS, but honestly, why wouldn’t we deprecate and remove 
> PropertyFileSnitch?  Are there reasons why GPFS wouldn’t be sufficient to 
> replace it?
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Using Cassandra as local db without cluster

2018-10-18 Thread Jeff Jirsa

I can’t think of a situation where I’d choose Cassandra as a database in a 
single-host use case (if you’re sure it’ll never be more than one machine).

-- 
Jeff Jirsa


> On Oct 18, 2018, at 12:31 PM, Abdelkrim Fitouri  wrote:
> 
> Hello,
> 
> I am wondering if using cassandra as one local database without the cluster
> capabilities has a sens, (i cannot do multi node cluster due to a technical
> constraint)
> 
> I have an application with a purpose to store a dynamic number of colones
> on each rows (thing that i cannot do with classical relational database),
> and i don't want to use documents based nosql database to avoid using Json
> marshal and unmarshal treatments...
> 
> Does cassandra with only one node and with a well designer model based on
> queries and partition keys can lead to best performance than postgresql ?
> 
> Does cassandra have some limitation about the size of data ? about the
> number of partition on a node ?
> 
> Thanks for any details or help.
> 
> --
> 
> Best Regards.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Built in trigger: double-write for app migration

2018-10-18 Thread Jeff Jirsa

The write sampling is adding an extra instance with the same schema to test 
things like yaml params or compaction without impacting reads or correctness - 
it’s different than what you describe



-- 
Jeff Jirsa


> On Oct 18, 2018, at 5:57 PM, Carl Mueller 
>  wrote:
> 
> I guess there is also write-survey-mode from cass 1.1:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-3452
> 
> Were triggers intended to supersede this capability? I can't find a lot of
> "user level" info on it.
> 
> 
> On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller 
> wrote:
> 
>> tl;dr: a generic trigger on TABLES that will mirror all writes to
>> facilitate data migrations between clusters or systems. What is necessary
>> to ensure full write mirroring/coherency?
>> 
>> When cassandra clusters have several "apps" aka keyspaces serving
>> applications colocated on them, but the app/keyspace bandwidth and size
>> demands begin impacting other keyspaces/apps, then one strategy is to
>> migrate the keyspace to its own dedicated cluster.
>> 
>> With backups/sstableloading, this will entail a delay and therefore a
>> "coherency" shortfall between the clusters. So typically one would employ a
>> "double write, read once":
>> 
>> - all updates are mirrored to both clusters
>> - writes come from the current most coherent.
>> 
>> Often two sstable loads are done:
>> 
>> 1) first load
>> 2) turn on double writes/write mirroring
>> 3) a second load is done to finalize coherency
>> 4) switch the app to point to the new cluster now that it is coherent
>> 
>> The double writes and read is the sticking point. We could do it at the
>> app layer, but if the app wasn't written with that, it is a lot of testing
>> and customization specific to the framework.
>> 
>> We could theoretically do some sort of proxying of the java-driver
>> somehow, but all the async structures and complex interfaces/apis would be
>> difficult to proxy. Maybe there is a lower level in the java-driver that is
>> possible. This also would only apply to the java-driver, and not
>> python/go/javascript/other drivers.
>> 
>> Finally, I suppose we could do a trigger on the tables. It would be really
>> nice if we could add to the cassandra toolbox the basics of a write
>> mirroring trigger that could be activated "fairly easily"... now I know
>> there are the complexities of inter-cluster access, and if we are even
>> using cassandra as the target mirror system (for example there is an
>> article on triggers write-mirroring to kafka:
>> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> 
>> And this starts to get into the complexities of hinted handoff as well.
>> But fundamentally this seems something that would be a very nice feature
>> (especially when you NEED it) to have in the core of cassandra.
>> 
>> Finally, is the mutation hook in triggers sufficient to track all incoming
>> mutations (outside of "shudder" other triggers generating data)
>> 
>> 
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Built in trigger: double-write for app migration

2018-10-18 Thread Jeff Jirsa

Could be done with CDC
Could be done with triggers
(Could be done with vtables — double writes or double reads — if they were 
extended to be user facing)

Would be very hard to generalize properly, especially handling failure cases 
(write succeeds in one cluster/table but not the other) which are often app 
specific


-- 
Jeff Jirsa


> On Oct 18, 2018, at 6:47 PM, Jonathan Ellis  wrote:
> 
> Isn't this what CDC was designed for?
> 
> https://issues.apache.org/jira/browse/CASSANDRA-8844
> 
> On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
>  wrote:
> 
>> tl;dr: a generic trigger on TABLES that will mirror all writes to
>> facilitate data migrations between clusters or systems. What is necessary
>> to ensure full write mirroring/coherency?
>> 
>> When cassandra clusters have several "apps" aka keyspaces serving
>> applications colocated on them, but the app/keyspace bandwidth and size
>> demands begin impacting other keyspaces/apps, then one strategy is to
>> migrate the keyspace to its own dedicated cluster.
>> 
>> With backups/sstableloading, this will entail a delay and therefore a
>> "coherency" shortfall between the clusters. So typically one would employ a
>> "double write, read once":
>> 
>> - all updates are mirrored to both clusters
>> - writes come from the current most coherent.
>> 
>> Often two sstable loads are done:
>> 
>> 1) first load
>> 2) turn on double writes/write mirroring
>> 3) a second load is done to finalize coherency
>> 4) switch the app to point to the new cluster now that it is coherent
>> 
>> The double writes and read is the sticking point. We could do it at the app
>> layer, but if the app wasn't written with that, it is a lot of testing and
>> customization specific to the framework.
>> 
>> We could theoretically do some sort of proxying of the java-driver somehow,
>> but all the async structures and complex interfaces/apis would be difficult
>> to proxy. Maybe there is a lower level in the java-driver that is possible.
>> This also would only apply to the java-driver, and not
>> python/go/javascript/other drivers.
>> 
>> Finally, I suppose we could do a trigger on the tables. It would be really
>> nice if we could add to the cassandra toolbox the basics of a write
>> mirroring trigger that could be activated "fairly easily"... now I know
>> there are the complexities of inter-cluster access, and if we are even
>> using cassandra as the target mirror system (for example there is an
>> article on triggers write-mirroring to kafka:
>> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> 
>> And this starts to get into the complexities of hinted handoff as well. But
>> fundamentally this seems something that would be a very nice feature
>> (especially when you NEED it) to have in the core of cassandra.
>> 
>> Finally, is the mutation hook in triggers sufficient to track all incoming
>> mutations (outside of "shudder" other triggers generating data)
>> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-19 Thread Jeff Jirsa

Agree with Sylvain (and I think Benedict) - there’s no compelling reason to 
violate the freeze here. We’ve had the wrong default for years - add a note to 
the docs that we’ll be changing it in the future, but let’s not violate the 
freeze now.

-- 
Jeff Jirsa


> On Oct 19, 2018, at 10:06 AM, Sylvain Lebresne  wrote:
> 
> Fwiw, as much as I agree this is a change worth doing in general, I do am
> -0 for 4.0. Both the "compact sequencing" and the change of default really.
> We're closing on 2 months within the freeze, and for me a freeze do include
> not changing defaults, because changing default ideally imply a decent
> amount of analysis/benchmark of the consequence of that change[1] and that
> doesn't enter in my definition of a freeze.
> 
> [1]: to be extra clear, I'm not saying we've always done this, far from it.
> But I hope we can all agree we were wrong to no do it when we didn't and
> should strive to improve, not repeat past mistakes.
> --
> Sylvain
> 
> 
>> On Thu, Oct 18, 2018 at 8:55 PM Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> For those who were asking about the performance impact of block size on
>> compression I wrote a microbenchmark.
>> 
>> https://pastebin.com/RHDNLGdC
>> 
>> [java] Benchmark   Mode
>> Cnt  Score  Error  Units
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast16kthrpt
>> 15  331190055.685 ±  8079758.044  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast32kthrpt
>> 15  353024925.655 ±  7980400.003  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast64kthrpt
>> 15  365664477.654 ± 10083336.038  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressLZ4Fast8k thrpt
>> 15  305518114.172 ± 11043705.883  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast16k  thrpt
>> 15  688369529.911 ± 25620873.933  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast32k  thrpt
>> 15  703635848.895 ±  5296941.704  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast64k  thrpt
>> 15  695537044.676 ± 17400763.731  ops/s
>> [java] CompactIntegerSequenceBench.benchDecompressLZ4Fast8k   thrpt
>> 15  727725713.128 ±  4252436.331  ops/s
>> 
>> To summarize, compression is 8.5% slower and decompression is 1% faster.
>> This is measuring the impact on compression/decompression not the huge
>> impact that would occur if we decompressed data we don't need less often.
>> 
>> I didn't test decompression of Snappy and LZ4 high, but I did test
>> compression.
>> 
>> Snappy:
>> [java] CompactIntegerSequenceBench.benchCompressSnappy16k   thrpt
>> 2  196574766.116  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressSnappy32k   thrpt
>> 2  198538643.844  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressSnappy64k   thrpt
>> 2  194600497.613  ops/s
>> [java] CompactIntegerSequenceBench.benchCompressSnappy8kthrpt
>> 2  186040175.059  ops/s
>> 
>> LZ4 high compressor:
>> [java] CompactIntegerSequenceBench.bench16k thrpt2
>> 20822947.578  ops/s
>> [java] CompactIntegerSequenceBench.bench32k thrpt2
>> 12037342.253  ops/s
>> [java] CompactIntegerSequenceBench.bench64k  thrpt2
>> 6782534.469  ops/s
>> [java] CompactIntegerSequenceBench.bench8k   thrpt2
>> 32254619.594  ops/s
>> 
>> LZ4 high is the one instance where block size mattered a lot. It's a bit
>> suspicious really when you look at the ratio of performance to block size
>> being close to 1:1. I couldn't spot a bug in the benchmark though.
>> 
>> Compression ratios with LZ4 fast for the text of Alice in Wonderland was:
>> 
>> Chunk size 8192, ratio 0.709473
>> Chunk size 16384, ratio 0.667236
>> Chunk size 32768, ratio 0.634735
>> Chunk size 65536, ratio 0.607208
>> 
>> By way of comparison I also ran deflate with maximum compression:
>> 
>> Chunk size 8192, ratio 0.426434
>> Chunk size 16384, ratio 0.402423
>> Chunk size 32768, ratio 0.381627
>> Chunk size 65536, ratio 0.364865
>> 
>> Ariel
>> 
>>> On Thu, Oct 18, 2018, at 5:32 AM, Benedict Elliott Smith wrote:
>>> FWIW, I’m not -0, just think that long after the freeze date a change
>>> like this needs a strong mandate from the community.  I think the change
>>> i

Re: Deprecating/removing PropertyFileSnitch?

2018-10-22 Thread Jeff Jirsa

On Mon, Oct 22, 2018 at 7:09 PM J. D. Jordan 
wrote:

> Do you have a specific gossip bug that you have seen recently which caused
> a problem that would make this happen?  Do you have a specific JIRA in mind?

Sankalp linked a few others, but also
https://issues.apache.org/jira/browse/CASSANDRA-13700

>   “We can’t remove this because what if there is a bug” doesn’t seem like
> a good enough reason to me. If that was a reason we would never make any
> changes to anything.
>

How about "we know that certain fields that are gossiped go missing even
after all of the known races are fixed, so removing an existing
low-maintenance feature and forcing users to rely on gossip for topology
may be worth some discussion".

> I think many people have seen PFS actually cause real problems, where with
> GPFS the issue being talked about is predicated on some theoretical gossip
> bug happening.
>

How many of those were actually caused by incorrect fallback from GPFS to
PFS, rather than PFS itself?

> In the past year at DataStax we have done a lot of testing on 3.0 and 3.11
> around adding nodes, adding DC’s, replacing nodes, replacing racks, and
> replacing DC’s, all while using GPFS, and as far as I know we have not seen
> any “lost” rack/DC information during such testing.
>

I've also run very large GPFS clusters in the past without much gossip
pain, and I'm in the "we should deprecate PFS" camp, but it is also true
that PFS is low maintenance and mostly works. Perhaps the first step is
breaking the GPFS->PFS fallback that people don't know about, maybe that'll
help?

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-23 Thread Jeff Jirsa

My objection (-0.5) is based on freeze not in code complexity



-- 
Jeff Jirsa


> On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith  
> wrote:
> 
> To discuss the concerns about the patch for a more efficient representation:
> 
> The risk from such a patch is very low.  It’s a very simple in-memory data 
> structure, that we can introduce thorough fuzz tests for.  The reason to 
> exclude it would be for reasons of wanting to begin strictly enforcing the 
> freeze only.  This is a good enough reason in my book, which is why I’m 
> neutral on its addition.  I just wanted to provide some context for everyone 
> else's voting intention.
> 
> 
>> On 23 Oct 2018, at 16:51, Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> I just asked Jeff. He is -0 and -0.5 respectively.
>> 
>> Ariel
>> 
>>> On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
>>> I’m +1 change of default.  I think Jeff was -1 on that though.
>>> 
>>> 
>>>> On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> To summarize who we have heard from so far
>>>> 
>>>> WRT to changing just the default:
>>>> 
>>>> +1:
>>>> Jon Haddadd
>>>> Ben Bromhead
>>>> Alain Rodriguez
>>>> Sankalp Kohli (not explicit)
>>>> 
>>>> -0:
>>>> Sylvaine Lebresne 
>>>> Jeff Jirsa
>>>> 
>>>> Not sure:
>>>> Kurt Greaves
>>>> Joshua Mckenzie
>>>> Benedict Elliot Smith
>>>> 
>>>> WRT to change the representation:
>>>> 
>>>> +1:
>>>> There are only conditional +1s at this point
>>>> 
>>>> -0:
>>>> Sylvaine Lebresne
>>>> 
>>>> -.5:
>>>> Jeff Jirsa
>>>> 
>>>> This 
>>>> (https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>>>>  is a rough cut of the change for the representation. It needs better 
>>>> naming, unit tests, javadoc etc. but it does implement the change.
>>>> 
>>>> Ariel
>>>>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
>>>>> Sorry, to be clear - I'm +1 on changing the configuration default, but I
>>>>> think changing the compression in memory representations warrants further
>>>>> discussion and investigation before making a case for or against it yet.
>>>>> An optimization that reduces in memory cost by over 50% sounds pretty good
>>>>> and we never were really explicit that those sort of optimizations would 
>>>>> be
>>>>> excluded after our feature freeze.  I don't think they should necessarily
>>>>> be excluded at this time, but it depends on the size and risk of the 
>>>>> patch.
>>>>> 
>>>>>> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad  
>>>>>> wrote:
>>>>>> 
>>>>>> I think we should try to do the right thing for the most people that we
>>>>>> can.  The number of folks impacted by 64KB is huge.  I've worked on a lot
>>>>>> of clusters created by a lot of different teams, going from brand new to
>>>>>> pretty damn knowledgeable.  I can't think of a single time over the last 
>>>>>> 2
>>>>>> years that I've seen a cluster use non-default settings for compression.
>>>>>> With only a handful of exceptions, I've lowered the chunk size 
>>>>>> considerably
>>>>>> (usually to 4 or 8K) and the impact has always been very noticeable,
>>>>>> frequently resulting in hardware reduction and cost savings.  Of all the
>>>>>> poorly chosen defaults we have, this is one of the biggest offenders 
>>>>>> that I
>>>>>> see.  There's a good reason ScyllaDB  claims they're so much faster than
>>>>>> Cassandra - we ship a DB that performs poorly for 90+% of teams because 
>>>>>> we
>>>>>> ship for a specific use case, not a general one (time series on memory
>>>>>> constrained boxes being the specific use case)
>>>>>> 
>>>>>> This doesn't impact existing tables, just new ones.  More and more teams
>>>>>> are using Cassandra as a general purpose database, we should acknowledge
>>>

Re: Deprecating/removing PropertyFileSnitch?

2018-10-29 Thread Jeff Jirsa

;>
> >>>>>>>> Em seg, 22 de out de 2018 às 16:58, sankalp kohli <
> >>>>> kohlisank...@gmail.com>
> >>>>>>>> escreveu:
> >>>>>>>>
> >>>>>>>>> Yes it will happen. I am worried that same way DC or rack info
> can go
> >>>>>>>>> missing.
> >>>>>>>>>
> >>>>>>>>> On Mon, Oct 22, 2018 at 12:52 PM Paulo Motta <
> >>>>> pauloricard...@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>> the new host won’t learn about the host whose status is
> missing and
> >>>>>>>> the
> >>>>>>>>>> view of this host will be wrong.
> >>>>>>>>>>
> >>>>>>>>>> Won't this happen even with PropertyFileSnitch as the token(s)
> for
> >>>>> this
> >>>>>>>>>> host will be missing from gossip/system.peers?
> >>>>>>>>>>
> >>>>>>>>>> Em sáb, 20 de out de 2018 às 00:34, Sankalp Kohli <
> >>>>>>>>> kohlisank...@gmail.com>
> >>>>>>>>>> escreveu:
> >>>>>>>>>>
> >>>>>>>>>>> Say you restarted all instances in the cluster and status for
> some
> >>>>>>>> host
> >>>>>>>>>>> goes missing. Now when you start a host replacement, the new
> host
> >>>>>>>> won’t
> >>>>>>>>>>> learn about the host whose status is missing and the view of
> this
> >>>>>>>> host
> >>>>>>>>>> will
> >>>>>>>>>>> be wrong.
> >>>>>>>>>>>
> >>>>>>>>>>> PS: I will be happy to be proved wrong as I can also start
> using
> >>>>>>>> Gossip
> >>>>>>>>>>> snitch :)
> >>>>>>>>>>>
> >>>>>>>>>>>> On Oct 19, 2018, at 2:41 PM, Jeremy Hanna <
> >>>>>>>>> jeremy.hanna1...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Do you mean to say that during host replacement there may be
> a time
> >>>>>>>>>> when
> >>>>>>>>>>> the old->new host isn’t fully propagated and therefore
> wouldn’t yet
> >>>>>>>> be
> >>>>>>>>> in
> >>>>>>>>>>> all system tables?
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Oct 17, 2018, at 4:20 PM, sankalp kohli <
> >>>>>>>> kohlisank...@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This is not the case during host replacement correct?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan <
> >>>>>>>>>>>>> jeremiah.jor...@gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> As long as we are correctly storing such things in the
> system
> >>>>>>>>> tables
> >>>>>>>>>>> and
> >>>>>>>>>>>>>> reading them out of the system tables when we do not have
> the
> >>>>>>>>>>> information
> >>>>>>>>>>>>>> from gossip yet, it should not be a problem. (As far as I
> know
> >>>>>>>> GPFS
> >>>>>>>>>>> does
> >>>>>>>>>>>>>> this, but I have not done extensive code diving or testing
> to
> >>>>>>>> make
> >>>>>>>>>>> sure all
> >>>>>>>>>>>>>> edge cases are covered there)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Jeremiah
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli <
> >>>>>>>>> kohlisank...@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Will GossipingPropertyFileSnitch not be vulnerable to
> Gossip
> >>>>>>>> bugs
> >>>>>>>>>>> where
> >>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>> lose hostId or some other fields when we restart C* for
> large
> >>>>>>>>>>>>>>> clusters(~1000 instances)?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa <
> jji...@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We should, but the 4.0 features that log/reject verbs to
> >>>>>>>> invalid
> >>>>>>>>>>>>>> replicas
> >>>>>>>>>>>>>>>> solves a lot of the concerns here
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Jeff Jirsa
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna <
> >>>>>>>>>>> jeremy.hanna1...@gmail.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We have had PropertyFileSnitch for a long time even
> though
> >>>>>>>>>>>>>>>> GossipingPropertyFileSnitch is effectively a superset of
> what
> >>>>>>>> it
> >>>>>>>>>>> offers
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> is much less error prone.  There are some unexpected
> behaviors
> >>>>>>>>> when
> >>>>>>>>>>>>>> things
> >>>>>>>>>>>>>>>> aren’t configured correctly with PFS.  For example, if you
> >>>>>>>>> replace
> >>>>>>>>>>>>>> nodes in
> >>>>>>>>>>>>>>>> one DC and add those nodes to that DCs property files and
> not
> >>>>>>>> the
> >>>>>>>>>>> other
> >>>>>>>>>>>>>> DCs
> >>>>>>>>>>>>>>>> property files - the resulting problems aren’t very
> >>>>>>>>> straightforward
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>> troubleshoot.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> We could try to improve the resilience and fail fast
> error
> >>>>>>>>>> checking
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>>> error reporting of PFS, but honestly, why wouldn’t we
> deprecate
> >>>>>>>>> and
> >>>>>>>>>>>>>> remove
> >>>>>>>>>>>>>>>> PropertyFileSnitch?  Are there reasons why GPFS wouldn’t
> be
> >>>>>>>>>>> sufficient
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> replace it?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> -
> >>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>>>> dev-h...@cassandra.apache.org
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>
> -
> >>>>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>>>>>>>> For additional commands, e-mail:
> dev-h...@cassandra.apache.org
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>
> -
> >>>>>>>>>>>>>> To unsubscribe, e-mail:
> dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>>>>>> For additional commands, e-mail:
> dev-h...@cassandra.apache.org
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>
> -
> >>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>>>> For additional commands, e-mail:
> dev-h...@cassandra.apache.org
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>> -
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> -
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>

Re: Request for reviewer: CASSANDRA-14829

2018-11-16 Thread Jeff Jirsa

The assignment is just so you get “credit” for the patch - asking for a 
reviewer is good but not strictly necessary. 

(Some of the committers will try to review it when we can, usually waiting for 
someone who’s comfortable with that code to come along)

-- 
Jeff Jirsa


> On Nov 16, 2018, at 11:33 AM, Georg Dietrich  wrote:
> 
> Hi here,
> 
> I've posted https://issues.apache.org/jira/browse/CASSANDRA-14829 together 
> with a pull request, now I've been assigned the task... I assume that means I 
> should go look for a reviewer?
> 
> Regards
> Georg
> 
> --
> 
> Georg Dietrich
> Senior System Developer
> imbus TestBench
> Tel. +49 9131 7518-944
> E-Mail: georg.dietr...@imbus.de
> 
> Tel. +49 9131 7518-0, Fax +49 9131 7518-50
> i...@imbus.de www.imbus.de
> 
> imbus AG, Kleinseebacher Str. 9,  91096 Möhrendorf, DEUTSCHLAND
> Vorsitzender des Aufsichtsrates: Wolfgang Wieser
> Vorstand: Tilo Linz, Bernd Nossem, Thomas Roßner
> Sitz der Gesellschaft: Möhrendorf; Registergericht: Fürth/Bay, HRB 8365
> 
> Post/Besuchsadresse: imbus AG, Hauptstraße 8a, 91096 Möhrendorf, Deutschland
> =
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Inter-node messaging latency

2018-11-28 Thread Jeff Jirsa

Are you sure you’re blocked on internode and not commitlog? Batch is typically 
not what people expect (group commitlog in 4.0 is probably closer to what you 
think batch does).

-- 
Jeff Jirsa


> On Nov 27, 2018, at 10:55 PM, Yuji Ito  wrote:
> 
> Hi,
> 
> Thank you for the reply.
> I've measured LWT throughput in 4.0.
> 
> I used the cassandra-stress tool to insert rows with LWT for 3 minutes on 
> i3.xlarge and i3.4xlarge
> For 3.11, I modified the tool to support LWT.
> Before each measurement, I cleaned up all Cassandra data.
> 
> The throughput in 4.0 is 5 % faster than 3.11.
> The CPU load of i3.4xlarge (16 vCPUs) is only up to 75% in both versions.
> And, the throughput was slower than 4 times that of i3.xlarge.
> I think the throughput wasn't bounded by CPU also in 4.0.
> 
> The CPU load of i3.4xlarge is up to 80 % with non-LWT write.
> 
> I wonder what is the bottleneck for writes on a many-core machine if the 
> issue about messaging has been resolved in 4.0.
> Can I use up CPU to insert rows by changing any parameter?
> 
> # LWT insert
> * Cassandra 3.11.3
> | instance type | # of threads | concurrent_writes | Throughput [op/s] |
> | i3.xlarge |   64 |32 |  2815 |
> |i3.4xlarge |  256 |   128 |  9506 |
> |i3.4xlarge |  512 |   256 | 10540 |
> 
> * Cassandra 4.0 (trunk)
> | instance type | # of threads | concurrent_writes | Throughput [op/s] |
> | i3.xlarge |   64 |32 |  2951 |
> |i3.4xlarge |  256 |   128 |  9816 |
> |i3.4xlarge |  512 |   256 | 11055 |
> 
> * Environment
> - 3 node cluster
> - Replication factor: 3
> - Node instance: AWS EC2 i3.xlarge / i3.4xlarge
> 
> * C* configuration
> - Apache Cassandra 3.11.3 / 4.0 (trunk)
> - commitlog_sync: batch
> - concurrent_writes: 32, 256
> - native_transport_max_threads: 128(default), 256 (when concurrent_writes is 
> 256)
> 
> Thanks,
> Yuji
> 
> 
> 2018年11月26日(月) 17:27 sankalp kohli :
>> Inter-node messaging is rewritten using Netty in 4.0. It will be better to 
>> test it using that as potential changes will mostly land on top of that. 
>> 
>>> On Mon, Nov 26, 2018 at 7:39 AM Yuji Ito  wrote:
>>> Hi,
>>> 
>>> I'm investigating LWT performance with C* 3.11.3.
>>> It looks that the performance is bounded by messaging latency when many 
>>> requests are issued concurrently.
>>> 
>>> According to the source code, the number of messaging threads per node is 
>>> only 1 thread for incoming and 1 thread for outbound "small" message to 
>>> another node.
>>> 
>>> I guess these threads are frequently interrupted because many threads are 
>>> executed when many requests are issued.
>>> Especially, I think it affects the LWT performance when many LWT requests 
>>> which need lots of inter-node messaging are issued.
>>> 
>>> I measured that latency. It took 2.5 ms in average to enqueue a message at 
>>> a node and to receive the message at the **same** node with 96 concurrent 
>>> LWT writes.
>>> Is it normal? I think it is too big latency, though a message was sent to 
>>> the same node.
>>> 
>>> Decreasing numbers of other threads like `concurrent_counter_writes`, 
>>> `concurrent_materialized_view_writes` reduced a bit the latency.
>>> Can I change any other parameter to reduce the latency?
>>> I've tried using message coalescing, but they didn't reduce that.
>>> 
>>> * Environment
>>> - 3 node cluster
>>> - Replication factor: 3
>>> - Node instance: AWS EC2 i3.xlarge
>>> 
>>> * C* configuration
>>> - Apache Cassandra 3.11.3
>>> - commitlog_sync: batch
>>> - concurrent_reads: 32 (default)
>>> - concurrent_writes: 32 (default)
>>> 
>>> Thanks,
>>> Yuji
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [VOTE] Change Jira Workflow

2018-12-17 Thread Jeff Jirsa

+1

-- 
Jeff Jirsa


> On Dec 17, 2018, at 7:19 AM, Benedict Elliott Smith  
> wrote:
> 
> I propose these changes 
> <https://cwiki.apache.org/confluence/display/CASSANDRA/JIRA+Workflow+Proposals>*
>  to the Jira Workflow for the project.  The vote will be open for 72 hours**.
> 
> I am, of course, +1.
> 
> * With the addendum of the mailing list discussion 
> <https://lists.apache.org/thread.html/e4668093169aa4ef52f2bea779333f04a0afde8640c9a79a8c86ee74@%3Cdev.cassandra.apache.org%3E>;
>  in case of any conflict arising from a mistake on my part in the wiki, the 
> consensus reached by polling the mailing list will take precedence.
> ** I won’t be around to close the vote, as I will be on vacation.  Everyone 
> is welcome to ignore the result until I get back in a couple of weeks, or if 
> anybody is eager feel free to close the vote and take some steps towards 
> implementation.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Question about PartitionUpdate.singleRowUpdate()

2018-12-19 Thread Jeff Jirsa

Definitely worth a JIRA. Suspect it may be slow to get a response this
close to the holidays, but a JIRA will be a bit more durable than the
mailing list post.


On Wed, Dec 19, 2018 at 1:58 PM Sam Klock  wrote:

> Cassandra devs,
>
> I have a question about the implementation of
> PartitionUpdate.singleRowUpdate(), in particular the choice to use
> EncodingStats.NO_STATS when building the resulting PartitionUpdate.  Is
> there a functional reason for that -- i.e., is it safe to modify it to
> use an EncodingStats built from deletionInfo, row, and staticRow?
>
> Context: under 3.0.17, we have a table using TWCS and a secondary index.
> We've been having a problem with the sstables for the index lingering
> essentially forever, despite the correlated sstables for the parent
> table being removed pretty much when we expect them to.  We traced the
> problem to the use of EncodingStats.NO_STATS in singleRowUpdate(), which
> is being used to create the index updates when we write to the parent
> table.  It appears that NO_STATS is making Cassandra think the memtables
> for the index have data from September 2015 in them, which in turn
> prevents it from dropping expired sstables (all of which are much more
> recent than that) for the index.
>
> Experimentally, modifying singleRowUpdate() to build an EncodingStats
> from its inputs (plus the MutableDeletionInfo it creates) seems to fix
> the problem.  We don't have any insight into why the existing logic uses
> NO_STATS, however, so we don't know if this change is really safe.  Does
> it sound like we're on the right track?  (Also: I'm sure we'd be happy
> to open an issue and submit a patch if this sounds like it would be
> useful generally.)
>
> Thanks,
> SK
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Git Repo Migration

2019-01-04 Thread Jeff Jirsa

+1



-- 
Jeff Jirsa


> On Jan 4, 2019, at 2:49 AM, Sam Tunnicliffe  wrote:
> 
> As per the announcement on 7th December 2018[1], ASF infra are planning to 
> shutdown the service behind git-wip-us.apache.org and migrate all existing 
> repos to gitbox.apache.org 
> 
> There are further details in the original mail, but apparently one of the 
> benefits of the migration is that we'll have full write access via Github, 
> including the ability finally to close PRs.

Fwiw we can sorta close PRs now (on commit via commit msg and through infra 
ticket)

> This affects the cassandra, cassandra-dtest and cassandra-build repos (but 
> not the new cassandra-sidecar repo).
> 
> A pre-requisite of the migration is to demonstrate consensus within the 
> community, so to satisfy that formality I'm starting this thread to gather 
> any objections or specific requests regarding the timing of the move.
> 
> I'll collate responses in a week or so and file the necessary INFRA Jira.
> 
> Thanks,
> Sam
> 
> [1] 
> https://lists.apache.org/thread.html/667772efdabf49a0a23d585539c127f335477e033f1f9b6f5079aced@%3Cdev.cassandra.apache.org%3E
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Who should be in our distribution KEYS file?

2019-01-07 Thread Jeff Jirsa

I dont think it's awkward, I think a lot of us know there are serious bugs
and we need a release, but we keep finding other bugs and it's super
tempting to say "one more fix"

We should probably just cut next 3.0.x and 3.11.x though, because there are
some nasty bugs hiding in there that the testing for 4.0 has uncovered.


On Mon, Jan 7, 2019 at 2:14 PM Jonathan Haddad  wrote:

> > I don't understand how adding keys changes release frequency. Did
> someone request a release to be made or are we on some assumed date
> interval?
>
> I don't know if it would (especially by itself), I just know that if more
> people are able to do releases that's more opportunity to do so.
>
> I think getting more folks involved in the release process is a good idea
> for other reasons.  People take vacations, there's job conflicts, there's
> life stuff (kids usually take priority), etc.
>
> The last release of 3.11 was almost half a year ago, and there's 30+ bug
> fixes in the 3.11 branch.
>
> > Did someone request a release to be made or are we on some assumed date
> interval?
>
> I can't recall (and a search didn't find) anyone asking for a 3.11.4
> release, but I think part of the point is that requesting a release from a
> static release manager is a sign of a flaw in the release process.
>
> On a human, note, it feels a little awkward asking for a release.  I might
> be alone on this though.
>
> Jon
>
>
> On Mon, Jan 7, 2019 at 1:16 PM Michael Shuler 
> wrote:
>
> > Mick and I have discussed this previously, but I don't recall if it was
> > email or irc. Apologies if I was unable to describe the problem to a
> > point of general understanding.
> >
> > To reiterate the problem, changing gpg signature keys screws our debian
> > and redhat package repositories for all users. Tarballs are not
> > installed with a client that checks signatures in a known trust
> > database. When gpg key signer changes, users need to modify their trust
> > on every node, importing new key(s), in order for packages to
> > install/upgrade with apt or yum.
> >
> > I don't understand how adding keys changes release frequency. Did
> > someone request a release to be made or are we on some assumed date
> > interval?
> >
> > Michael
> >
> > On 1/7/19 2:30 PM, Jonathan Haddad wrote:
> > > That's a good point.  Looking at the ASF docs I had assumed the release
> > > manager was per-project, but on closer inspection it appears to be
> > > per-release.  You're right, it does say that it can be any committer.
> > >
> > > http://www.apache.org/dev/release-publishing.html#release_manager
> > >
> > > We definitely need more frequent releases, if this is the first step
> > > towards that goal, I think it's worth it.
> > >
> > > Glad you brought this up!
> > > Jon
> > >
> > >
> > > On Mon, Jan 7, 2019 at 11:58 AM Mick Semb Wever 
> wrote:
> > >
> > >>
> > >>
> > >>> I don't see any reason to have any keys in there, except from release
> > >>> managers who are signing releases.
> > >>
> > >>
> > >> Shouldn't any PMC (or committer) should be able to be a release
> manager?
> > >>
> > >> The release process should be reliable and reproducible enough to be
> > safe
> > >> for rotating release managers every release. I would have thought
> > security
> > >> concerns were better addressed by a more tested process? And AFAIK no
> > other
> > >> asf projects are as restrictive on who can be the release manager role
> > (but
> > >> i've only checked a few projects).
> > >>
> > >>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >>
> > >>
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: Warn about SASI usage and allow to disable them

2019-01-14 Thread Jeff Jirsa

When we say disable, do you mean disable creation of new SASI indices, or
disable using existing ones? I assume it's just creation of new?

On Mon, Jan 14, 2019 at 11:19 AM Andrés de la Peña 
wrote:

> Hello all,
>
> It is my understanding that SASI is still to be considered an
> experimental/beta feature, and they apparently are not being very actively
> developed. Some higlighted problems in SASI are:
>
> - OOMs during flush, as it is described in CASSANDRA-12662
> - General secondary index consistency problems described in CASSANDRA-8272.
> There is a pending-review patch addressing the problem for regular 2i.
> However, the proposed solution is based on indexing tombstones. SASI
> doesn't index tombstones, so it wouldn't be enterely trivial to extend the
> approach to SASI.
> - Probably insufficient testing. As far as I know, we don't have a single
> dtest for SASI nor tests dealing with large SSTables.
>
> Similarly to what CASSANDRA-13959 did with materialized views,
> CASSANDRA-14866 aims to throw a native protocol warning about SASI
> experimental state, and to add a config property to disable them. Perhaps
> this property could be disabled by default in trunk. This should raise
> awareness about SASI maturity until we let them in a more stable state.
>
> The purpose for this thread is discussing whether we want to add this
> warning, the config property and, more controversially, if we want to set
> SASI as disabled by default in trunk.
>
> WDYT?
>

Re: Warn about SASI usage and allow to disable them

2019-01-14 Thread Jeff Jirsa

+1 on config
-0 on warning 
-0 on disabling by default


-- 
Jeff Jirsa


> On Jan 14, 2019, at 9:22 PM, Taylor Cressy  wrote:
> 
> +1 on config. +1 on disabling. 
> 
> +1 on applying it to materialized views as well. 
> 
>> On Jan 14, 2019, at 17:29, Joshua McKenzie  wrote:
>> 
>> +1 on config change, +1 on disabling, and so long as the comments make the
>> limitations and risks extremely clear, I'm fine w/out the client warning.
>> 
>> On Mon, Jan 14, 2019 at 12:28 PM Andrés de la Peña 
>> wrote:
>> 
>>> I mean disabling the creation of new SASI indices with CREATE INDEX
>>> statement, the existing indexes would continue working. The CQL client
>>> warning will be thrown with that creation statement as well (if they are
>>> enabled).
>>> 
>>>> On Mon, 14 Jan 2019 at 20:18, Jeff Jirsa  wrote:
>>>> 
>>>> When we say disable, do you mean disable creation of new SASI indices, or
>>>> disable using existing ones? I assume it's just creation of new?
>>>> 
>>>> On Mon, Jan 14, 2019 at 11:19 AM Andrés de la Peña <
>>>> a.penya.gar...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hello all,
>>>>> 
>>>>> It is my understanding that SASI is still to be considered an
>>>>> experimental/beta feature, and they apparently are not being very
>>>> actively
>>>>> developed. Some higlighted problems in SASI are:
>>>>> 
>>>>> - OOMs during flush, as it is described in CASSANDRA-12662
>>>>> - General secondary index consistency problems described in
>>>> CASSANDRA-8272.
>>>>> There is a pending-review patch addressing the problem for regular 2i.
>>>>> However, the proposed solution is based on indexing tombstones. SASI
>>>>> doesn't index tombstones, so it wouldn't be enterely trivial to extend
>>>> the
>>>>> approach to SASI.
>>>>> - Probably insufficient testing. As far as I know, we don't have a
>>> single
>>>>> dtest for SASI nor tests dealing with large SSTables.
>>>>> 
>>>>> Similarly to what CASSANDRA-13959 did with materialized views,
>>>>> CASSANDRA-14866 aims to throw a native protocol warning about SASI
>>>>> experimental state, and to add a config property to disable them.
>>> Perhaps
>>>>> this property could be disabled by default in trunk. This should raise
>>>>> awareness about SASI maturity until we let them in a more stable state.
>>>>> 
>>>>> The purpose for this thread is discussing whether we want to add this
>>>>> warning, the config property and, more controversially, if we want to
>>> set
>>>>> SASI as disabled by default in trunk.
>>>>> 
>>>>> WDYT?
>>>>> 
>>>> 
>>> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

1 2 3 4 5 >

1 - 100 of 478 matches

Mail list logo