from:"Jeremiah Jordan"

Re: Single slow node dramatically reduces cluster write throughput regardless of CL

2022-12-14 Thread Jeremiah Jordan

I have seen this same behavior in the past as well and came to the same 
conclusions of where the issue is.  It would be good to write this up in a 
ticket.  Giving people the option of using the DynamicEndpointSnitch to order 
batch log replica selection could mitigate this exact issue, but may have other 
tradeoffs to batch log guarantees.

> On Dec 14, 2022, at 11:19 AM, Sarisky, Dan  wrote:
> 
> We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, 
> QUORUM, or LOCAL_QUORUM)
> 
> On clusters of any size - a single extremely slow node causes a ~90% loss of 
> cluster-wide throughput using batched writes.  We can replicate this in the 
> lab via CPU or disk throttling.  I observe this in 3.11, 4.0, and 4.1.
> 
> It appears the mechanism in play is:
> Those logged batches are immediately written to two replica nodes and the 
> actual mutations aren't processed until those two nodes acknowledge the batch 
> statements.  Those replica nodes are selected randomly from all nodes in the 
> local data center currently up in gossip.  If a single node is slow, but 
> still thought to be up in gossip, this eventually causes every other node to 
> have all of its MutationStages to be waiting while the slow replica accepts 
> batch writes.
> 
> The code in play appears to be:
> See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245.
>   In the method filterBatchlogEndpoints() there is a Collections.shuffle() to 
> order the endpoints and a FailureDetector.isEndpointAlive() to test if the 
> endpoint is acceptable.
> 
> This behavior causes Cassandra to move from a multi-node fault tolerant 
> system toa collection of single points of failure.
> 
> We try to take administrator actions to kill off the extremely slow nodes, 
> but it would be great to have some notion of "what node is a bad choice" when 
> writing log batches to replica nodes.
>

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

2023-06-12 Thread Jeremiah Jordan

 As long as it is valid in the paging protocol to return a short page, but
still say “there are more pages”, I think that is fine to do that.  For an
actual LIMIT that is part of the user query, I think the server must always
have returned all data that fits into the LIMIT when all pages have been
returned.

-Jeremiah

On Jun 12, 2023 at 12:56:14 PM, Josh McKenzie  wrote:

> Yeah, my bad. I have paging on the brain. Seriously.
>
> I can't think of a use-case in which a LIMIT based on # bytes makes sense
> from a user perspective.
>
> On Mon, Jun 12, 2023, at 1:35 PM, Jeff Jirsa wrote:
>
>
>
> On Mon, Jun 12, 2023 at 9:50 AM Benjamin Lerer  wrote:
>
> If you have rows that vary significantly in their size, your latencies
> could end up being pretty unpredictable using a LIMIT BY . Being
> able to specify a limit by bytes at the driver / API level would allow app
> devs to get more deterministic results out of their interaction w/the DB if
> they're looking to respond back to a client within a certain time frame and
> / or determine next steps in the app (continue paging, stop, etc) based on
> how long it took to get results back.
>
>
> Are you talking about the page size or the LIMIT. Once the LIMIT is
> reached there is no "continue paging". LIMIT is also at the CQL level not
> at the driver level.
> I can totally understand the need for a page size in bytes not for a LIMIT.
>
>
> Would only ever EXPECT to see a page size in bytes, never a LIMIT
> specifying bytes.
>
> I know the C-11745 ticket says LIMIT, too, but that feels very odd to me.
>
>
>

Re: [VOTE] CEP-8 Datastax Drivers Donation

2023-06-13 Thread Jeremiah Jordan

 +1 nb

On Jun 13, 2023 at 9:14:35 AM, Jeremy Hanna 
wrote:

> Calling for a vote on CEP-8 [1].
>
> To clarify the intent, as Benjamin said in the discussion thread [2], the
> goal of this vote is simply to ensure that the community is in favor of
> the donation. Nothing more.
> The plan is to introduce the drivers, one by one. Each driver donation
> will need to be accepted first by the PMC members, as it is the case for
> any donation. Therefore the PMC should have full control on the pace at
> which new drivers are accepted.
>
> If this vote passes, we can start this process for the Java driver under
> the direction of the PMC.
>
> Jeremy
>
> 1.
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+Datastax+Drivers+Donation
> 2. https://lists.apache.org/thread/opt630do09phh7hlt28odztxdv6g58dp
>

Re: [DISCUSSIONS] Replace ant eclipse-warnings with CheckerFramework

2023-06-16 Thread Jeremiah Jordan

 +1 from me.

On Jun 15, 2023 at 1:01:01 PM, Ekaterina Dimitrova 
wrote:

> Hi everyone,
> Happy Thursday!
> Some time ago, Jacek raised the point that ant eclipse-warnings is generating 
> too many false positives and not really working as expected. (CASSANDRA-18239)
>
> Reminder: ant eclipse-warnings is a task we run with the goal to check 
> Cassandra code - static analysis to warn on unsafe use of Autocloseable 
> instances; checks against two related particular compiler options
>
> While trying to upgrade ECJ compiler that we use for this task 
> (CASSANDRA-18190) so we can switch the task from running it with JDK8 to 
> JDK11 in preparation for dropping JDK8, I hit the following issues:
> - the latest version of ECJ is throwing more than 300 Potential Resource Leak 
> warnings. I looked at 10-15, and they were all false positives.
> - Even if we file a bug report to the Eclipse community, JDK11 is about to be 
> removed with the next version of the compiler
>
> So I shared this information with Jacek. He came up with a different solution:
> It seems we already pull through Guava CheckerFramework with an MIT license, 
> which appears to be acceptable according to this link -  
> https://www.apache.org/legal/resolved.html#category-a
> He already has an initial integration with Cassandra which shows the 
> following:
> - CheckerFramework does not understand the @SuppressWarnings("resource") 
> (there is a different one to be used), so it is immediately visible how it 
> does not report all those false positives that eclipse-warnings does. On the 
> flip side, I got the feedback that what it has witnessed so far is something 
> we should investigate.
> - Also, there are additional annotations like @Owning that let you fix many 
> problems at once because the tool understands that the ownership of the 
> resources was passed to another entity; It also enables you to do something 
> impossible with eclipse-warnings - you can tell the tool that there is 
> another method that needs to be called to release the resources, like 
> release, free, disconnect, etc.
> - the tool works with JDK8, JDK11, JDK17, and JDK20, so we can backport it 
> even to older branches (while at the same time keeping eclipse-warnings there)
> - though it runs 8 minutes so, we should not run it with every test, some 
> reorganization around ant tasks will be covered as even for eclipse-warnings 
> it was weird to call it on every single test run locally by default
>
>
> If there are no concerns, we will continue replacing ant eclipse-warnings 
> with the CheckerFramework as part of CASSANDRA-18239 and CASSANDRA-18190 in 
> trunk.
>
> Best regards,
>
> Ekaterina
>
>

Re: [DISCUSS] Being specific about JDK versions and python lib versions in CI

2023-06-22 Thread Jeremiah Jordan

 Yes.  -1 on forcing the patch release version, and possibly minor version,
for anything that is not absolutely necessary to do so.  Especially for
things like Java or Python version, I would hope we just install the latest
Java 8, Java 11, or Java 17 JDK for the platform the image is built from
and run under them.  Otherwise we don’t find out until it’s too late when
some JDK update breaks things.  I know I always tell users to run the
latest JDK patch release, so we should do the same.

If we want to pin the major version of something, then I have no problem
there.

-Jeremiah

On Jun 22, 2023 at 5:00:36 PM, Ekaterina Dimitrova 
wrote:

> Wouldn’t we recommend people to use the test images the project CI use?
> Thus using in testing the versions we use? I would assume the repeatable CI
> will still expect test images the way we have now?
> (I hope I did not misunderstand the question)
>
> I also share similar worries with Brandon
>
> On Thu, 22 Jun 2023 at 17:48, Brandon Williams  wrote:
>
>> On Thu, Jun 22, 2023 at 4:23 PM Josh McKenzie 
>> wrote:
>> >
>> > 2. Anyone concerned about us being specific about versions in
>> requirements.txt in the dtest repo?
>>
>> My concern with this is atrophy; we'll set the version once and when
>> finally forced to update, find that a lot of other things must also be
>> updated in order to do so.  I think our current approach of only
>> setting them on things we require at a certain version like thrift has
>> been successful thus far, and I don't think having different versions
>> would be very common, but also not really affect repeatability if
>> encountered.  You can see what versions are used from the logs though
>> and could adjust them to be the same if necessary.
>>
>

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-29 Thread Jeremiah Jordan

I like the idea of extending CREATE ROLE rather than adding a brand new ADD
IDENTITY syntax.  Not sure how that can line up with one to many
relationships for an identity, but maybe that can just be done through role
hierarchy?

In either case, I don’t think IDENTITY related operations should be tied to
the super user flag. They should be tied to either existing role
permissions, or a brand new permissions about IDENTITY.  We should not
require that end users give the account allowed to make IDENTITY changes
super user permission to do what ever they want across the whole database.

On Jun 28, 2023 at 11:48:02 PM, Yuki Morishita  wrote:

> Thinking more about "CREATE ROLE" permission, if we can extend CREATE
> ROLE/ALTER ROLE statements, it may look streamlined:
>
> I don't have the good example, but something like:
> ```
> CREATE ROLE dev WITH LOGIN = true AND IDENTITIES = {'spiffe://xxx'};
> ALTER ROLE dev ADD IDENTITY 'xxx';
> LIST ROLES;
> ```
>
> This requires a role to identities table as well as the current identity
> to role table though.
>
> On Thu, Jun 29, 2023 at 12:34 PM Yuki Morishita 
> wrote:
>
>> Hi Jyothsna,
>>
>> I think for the *initial* commit, the description looks fine to me.
>> I'd like to see/contribute to the future improvement though:
>>
>> * ADD IDENTITY requires SUPERUSER, this means that the brand new cluster
>> needs to start with PasswordAuthenticator/CassandraAuthorizer first, and
>> then change to mTLS one.
>> * For this, I'd really like to see Cassandra use password authn and
>> authz by default.
>> * Cassandra allows the user with "CREATE ROLE" permission to create
>> roles without superuser privilege. Maybe it is natural to allow them to add
>> identities also?
>>
>>
>> On Thu, Jun 29, 2023 at 7:35 AM Jyothsna Konisa 
>> wrote:
>>
>>> Hi Yuki,
>>>
>>> I have added cassandra docs for CQL syntax that we are adding and how to
>>> get started with using mTLS authenticators along with the migration plan.
>>> Please review it and let me know if it looks good.
>>>
>>> Thanks,
>>> Jyothsna Konisa.
>>>
>>> On Wed, Jun 21, 2023 at 10:46 AM Jyothsna Konisa 
>>> wrote:
>>>
 Hi Yuki!

 Thanks for the questions.

 Here are the steps for the initial setup.

 1. Since only super users can add/remove identities from the
 `identity_to_roles` table, operators should use that role to add authorized
 identities to the table. Note that the authenticator is not an mTLS
 authenticator yet.
 EX: ADD IDENTITY 'spiffe://testdomain.com/testIdentifier/testValue
 '
 TO ROLE 'read_only_user'

 2. Change authenticator configuration in cassandra.yaml to use mTLS
 authenticator
 EX: authenticator:
   class_name :org.apache.cassandra.auth.MutualTlsAuthenticator
   parameters :
 validator_class_name:
 org.apache.cassandra.auth.SpiffeCertificateValidator
 3. Restart the cluster so that newly configured mTLS authenticator is
 used

 What will be the op's first step to set up the roles and identities?
 -> Yes, the op should set up roles & identities first.

 Is default cassandra / cassandra superuser login still required to set
 up other roles and identities?
 -> When transitioning from a password based to mTLS based
 authenticators, yes superuser login is required to add identities, as only
 super users can add them. However when a cluster is using mTLS based
 authenticator, the super user will be associated with some certificate
 identity and hence we don't need password based cassandra super user login.

 If initial cassandra super user login is required, does that mean super
 users and "cassandra '' superuser bypass mTLS check?
 -> No, while adding identities to the roles table in step1 the
 authenticator will not be an mTLS authenticator. Once the identities are
 added and the authenticator is configured, even super users have to go
 through an mTLS check during connection.


 Regarding migration

 I *think* you need to first use
 MutualTlsWithPasswordFallbackAuthenticator so the current roles can login
 with their password,
 and eventually the admin sets up identity and then can switch to mTLS
 auth.
 Is this the expected way for migration?
 -> Yes you can do that or else we can add identities with password
 based login and then change the authenticator to be mTLS authenticator.

 I think a thorough documentation for this new feature including new CQL
 syntax, setting up and migration would be greatly appreciated.
 -> I have added documentation for the authenticators, cqlsh commands in
 the Javadocs in the source code. Maybe I will add the setup process &
 migration process in the Javadocs, d

Re: [DISCUSS] When to run CheckStyle and other verificiations

2023-06-29 Thread Jeremiah Jordan

 +100 I support making generate-idea-files auto setup everything in
IntelliJ for you.  If you post a diff, I will test it.

On this proposal, I don’t really have an opinion one way or the other about
what the default is for local "ant jar”, if its slow I will figure out how
to turn it off, if its fast I will leave it on.
I do care that CI runs checks, and complains loudly if something is wrong
such that it is very easy to tell during review.

-Jeremiah

On Jun 29, 2023 at 1:44:09 PM, Josh McKenzie  wrote:

> In accord I added an opt-out for each hook, and will require such here as
> well
>
> On for main branches, off for feature branches seems like it might blanket
> satisfy this concern? Doesn't fix the "--atomic across 5 branches means
> style checks and build on hook across those branches" which isn't ideal. I
> don't think style check failures after push upstream are frequent enough to
> make the cost/benefit there make sense overall are they?
>
> Related to this - I have sonarlint, spotbugs, and checkstyle all running
> inside idea; since pulling those in and tuning the configs a bit I haven't
> run into a single issue w/our checkstyle build target (go figure). Having
> the required style checks reflected realtime inside your work environment
> goes a long way towards making it a more intuitive part of your workflow
> rather than being an annoying last minute block of your ability to progress
> that requires circling back into the code.
>
> From a technical perspective, it looks like adding a reference
> "externalDependencies.xml" to our ide/idea directory which we copied over
> during "generate-idea-files" would be sufficient to get idea to pop up
> prompts to install those extensions if you don't have them when opening the
> project (theory; haven't tested).
>
> We'd need to make sure the configuration for each of those was calibrated
> to our project out of the box of course, but making style considerations a
> first-class citizen in that way seems a more intuitive and human-centered
> approach to all this rather than debating nuance of our command-line
> targets, hooks, and how we present things to people. To Berenguer's point -
> better to have these be completely invisible to people with their workflows
> and Just Work (except for when your IDE scolds you for bad behavior w/build
> errors immediately).
>
> I still think Flags Are Bad. :)
>
> On Thu, Jun 29, 2023, at 1:38 PM, Ekaterina Dimitrova wrote:
>
> Should we just keep a consolidated for all kind of checks no-check flag
> and get rid of the no-checkstyle one?
>
> Trading one for one with Josh :-)
>
> Best regards,
> Ekaterina
>
> On Thu, 29 Jun 2023 at 10:52, Josh McKenzie  wrote:
>
>
> I really prefer separate tasks than flags. Flags are not listed in the
> help message like "ant -p" and are not auto-completed in the terminal. That
> makes them almost undiscoverable for newcomers.
>
> Please, no more flags. We are *more* than flaggy enough right now.
>
> Having to dig through build.xml to determine how to change things or do
> things is painful; the more we can avoid this (for oldtimers and newcomers
> alike!) the better.
>
> On Thu, Jun 29, 2023, at 8:34 AM, Mick Semb Wever wrote:
>
>
>
> On Thu, 29 Jun 2023 at 13:30, Jacek Lewandowski <
> lewandowski.ja...@gmail.com> wrote:
>
> There is another target called "build", which retrieves dependencies, and
> then calls "build-project".
>
>
>
> Is it intended to be called by a user ?
>
> If not, please follow the ant style prefixing the target name with an
> underscore (so that it does not appear in the `ant -projecthelp` list).
>
> If possible, I agree with Brandon, `build` is the better name to expose to
> the user.
>
>
>
>

Re: CASSANDRA-18554 - mTLS based client and internode authenticators

2023-06-30 Thread Jeremiah Jordan

 I don’t think users necessarily need to be able to update their own
identities.  I just don’t want to have to use the super user role.  The
super user role has all power over all things in the data base.  I don’t
want to have to give that much power to the person who manages identities,
I just want to give them the power to manage identities.

Jeremiah Jordan
e. jerem...@datastax.com
w. www.datastax.com



On Jun 30, 2023 at 1:35:41 PM, Dinesh Joshi  wrote:

> Yuki, Jeremiah both are fair points. The mental model we're using for
> mTLS authentication is slightly different.
>
> In your model you're treating the TLS identity itself to be similar to
> the password. The password is the 'shared secret' that currently needs
> to be rotated by the user that owns the account therefore necessitating
> the permission to update their password. But that is not the case with
> TLS certificates and mTLS identities.
>
> The model we're going for is different. The identity is provisioned for
> an account by a super user. This is more locked down and the user can
> still rotate their own certificates but not change the identity
> associated with their account without a super user.
>
> Once provisioned, a user does not need rotate the identity itself. They
> only need to obtain fresh certificates as their certificates near
> expiry. This requires no updates on the database unlike passwords.
>
> We could extend this functionality in the future to allow users to
> change their own identity. Nothing here prevents that.
>
> thanks,
>
> Dinesh
>
>
>
> On 6/29/23 08:16, Jeremiah Jordan wrote:
>
> I like the idea of extending CREATE ROLE rather than adding a brand new
>
> ADD IDENTITY syntax.  Not sure how that can line up with one to many
>
> relationships for an identity, but maybe that can just be done through
>
> role hierarchy?
>
>
> In either case, I don’t think IDENTITY related operations should be tied
>
> to the super user flag. They should be tied to either existing role
>
> permissions, or a brand new permissions about IDENTITY.  We should not
>
> require that end users give the account allowed to make IDENTITY changes
>
> super user permission to do what ever they want across the whole database.
>
>
> On Jun 28, 2023 at 11:48:02 PM, Yuki Morishita 
> <mailto:mor.y...@gmail.com>> wrote:
>
> > Thinking more about "CREATE ROLE" permission, if we can extend CREATE
>
> > ROLE/ALTER ROLE statements, it may look streamlined:
>
> >
>
> > I don't have the good example, but something like:
>
> > ```
>
> > CREATE ROLE dev WITH LOGIN = true AND IDENTITIES = {'spiffe://xxx'};
>
> > ALTER ROLE dev ADD IDENTITY 'xxx';
>
> > LIST ROLES;
>
> > ```
>
> >
>
> > This requires a role to identities table as well as the current
>
> > identity to role table though.
>
> >
>
> > On Thu, Jun 29, 2023 at 12:34 PM Yuki Morishita 
> > <mailto:mor.y...@gmail.com>> wrote:
>
> >
>
> > Hi Jyothsna,
>
> >
>
> > I think for the *initial* commit, the description looks fine to me.
>
> > I'd like to see/contribute to the future improvement though:
>
> >
>
> > * ADD IDENTITY requires SUPERUSER, this means that the brand new
>
> > cluster needs to start with
>
> > PasswordAuthenticator/CassandraAuthorizer first, and then change
>
> > to mTLS one.
>
> > * For this, I'd really like to see Cassandra use password
>
> > authn and authz by default.
>
> > * Cassandra allows the user with "CREATE ROLE" permission to
>
> > create roles without superuser privilege. Maybe it is natural to
>
> > allow them to add identities also?
>
> >
>
> >
>
> > On Thu, Jun 29, 2023 at 7:35 AM Jyothsna Konisa
>
> > mailto:jyothsna1...@gmail.com>> wrote:
>
> >
>
> > Hi Yuki,
>
> >
>
> > I have added cassandra docs for CQL syntax that we are adding
>
> > and how to get started with using mTLS authenticators along
>
> > with the migration plan. Please review it and let me know if
>
> > it looks good.
>
> >
>
> > Thanks,
>
> > Jyothsna Konisa.
>
> >
>
> > On Wed, Jun 21, 2023 at 10:46 AM Jyothsna Konisa
>
> > mailto:jyothsna1...@gmail.com>> wrote:
>
> >
>
> > Hi Yuki!
>
> >
>
> > Thanks for the questions.
>
> >
>
>

Re: [ANNOUNCEMENT] Expect failures today. Dropping JDK 8 and adding JDK 11

2023-07-25 Thread Jeremiah Jordan

 Yes.  Great to get this work merged.  Thanks to everyone who worked on it
and to Ekaterina for leading the charge!

-Jeremiah

On Jul 24, 2023 at 9:27:10 PM, C. Scott Andreas 
wrote:

> Ekaterina, thank you for spearheading JDK17 support for Apache Cassandra!
> Exciting to get to this point.
>
> - Scott
>
> On Jul 24, 2023, at 7:11 PM, Ekaterina Dimitrova 
> wrote:
>
> 
> Good news!
> After run #1638-39 you should not see anything else failing than
> SSLFactory test class. This known issue will be fixed by potentially adding
>  bounty castle. More info in CASSANDRA-17992 and this netty PR:
> https://github.com/netty/netty/issues/10317
> We can probably mark the test class with @Ignore, but knowing how easily
> those are forgotten and 17992 being already in review, I prefer not to do
> it.
>
> The only new failure I found in #1636 is a rare flaky test we never saw in
> CircleCI before. (unit tests were running only there; they were not enabled
> in Jenkins until we cleaned them ). Ticket already opened -
> CASSANDRA-18685 
>
> Last but not least, eclipse-warnings is already removed (it doesn't work
> with post JDK8 versions), but the new static analysis from Checker
> Framework is already in review and soon to land in trunk - CASSANDRA-18239
>
> As usual - if you have any questions or concerns, please do let me know.
> Last but not least - thank you to everyone who helped in one way or
> another with this effort!!
>
> On Mon, 24 Jul 2023 at 16:37, Ekaterina Dimitrova 
> wrote:
>
>> Ninja fix was required for Jenkins, new build started #1636
>>
>> On Mon, 24 Jul 2023 at 15:42, Ekaterina Dimitrova 
>> wrote:
>>
>>> Done!
>>>
>>> All commits from 18255 are in.
>>> The first run to monitor will be in Jenkins #1635
>>>
>>> There will be still fixes to be applied for some unit and in-jvm tests
>>> that were pending on the drop but I will do it when I see Jenkins kicking
>>> in this run properly.  (Which are those can be seen in CASSANDRA-16895,
>>> there is a table in its description)
>>>
>>> I will keep you posted on any new developments.
>>>
>>>
>>> On Mon, 24 Jul 2023 at 14:52, Ekaterina Dimitrova 
>>> wrote:
>>>
 Starting commits for 18255. Please put on hold any trunk commits. I
 will let you know when it is done. Thank you

 On Mon, 24 Jul 2023 at 11:29, Ekaterina Dimitrova <
 e.dimitr...@gmail.com> wrote:

> Hi everyone,
>
> Happy Monday!
>
> I am working on dropping JDK 8 and adding JDK17 on trunk in both CI
> systems today.
> This requires numerous patches in a few repos so you will be seeing
> more failures in CI throughout the day today, but it shouldn’t be anything
> more 🤞 than what we have listed in the table of failures in
> CASSANDRA-16895’s description. I will be applying the fixes one by one
> today.
> I will keep you posted with updates. Also, please, do let me know if
> you have any questions or concerns.
>
> Best regards,
> Ekaterina
>
>
>

Re: [Discuss] Repair inside C*

2023-07-25 Thread Jeremiah Jordan

 +1 for the side car being the right location.

-Jeremiah

On Jul 25, 2023 at 1:16:14 PM, Chris Lohfink  wrote:

> I think a CEP is the next step. Considering the number of companies
> involved, this might necessitate several drafts and rounds of discussions.
> I appreciate your initiative in starting this process, and I'm eager to
> contribute to the ensuing discussions. Maybe in a google docs or something
> initially for more interactive feedback?
>
> In regards to https://issues.apache.org/jira/browse/CASSANDRA-14346 we at
> Netflix are actually putting effort currently to move this into the sidecar
> as the idea was to start moving non-read/write path things into different
> process and jvms to not impact each other.
>
> I think the sidecar/in process discussion might be a bit contentious as I
> know even things like compaction some feel should be moved out of process
> in future. On a personal note, my primary interest lies in seeing the
> implementation realized, so I am willing to support whatever consensus
> emerges. Whichever direction these go we will help with the implementation.
>
> Chris
>
> On Tue, Jul 25, 2023 at 1:09 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Sounds good, German. Feel free to let me know if you need my help
>> in filing CEP, adding supporting content to the CEP, etc.
>> As I mentioned previously, I have already been working (going through an
>> internal review) on creating a one-pager doc, code, etc., that has been
>> working for us for the last six years at an immense scale, and I will share
>> it soon on a private fork.
>>
>> Thanks,
>> Jaydeep
>>
>> On Tue, Jul 25, 2023 at 9:48 AM German Eichberger via dev <
>> dev@cassandra.apache.org> wrote:
>>
>>> In [2] we suggested that the next step should be a CEP.
>>>
>>> I am happy to lend a hand to this effort as well.
>>>
>>> Thanks Jaydeep and David - really appreciated.
>>>
>>> German
>>>
>>> --
>>> *From:* David Capwell 
>>> *Sent:* Tuesday, July 25, 2023 8:32 AM
>>> *To:* dev 
>>> *Cc:* German Eichberger 
>>> *Subject:* [EXTERNAL] Re: [Discuss] Repair inside C*
>>>
>>> As someone who has done a lot of work trying to make repair stable, I
>>> approve of this message ^_^
>>>
>>> More than glad to help mentor this work
>>>
>>> On Jul 24, 2023, at 6:29 PM, Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>> To clarify the repair solution timing, the one we have listed in the
>>> article is not the recently developed one. We were hitting some
>>> high-priority production challenges back in early 2018, and to address
>>> that, we developed and rolled out the solution in production in just a few
>>> months. The timing-wise, the solution was developed and productized by Q3
>>> 2018, of course, continued to evolve thereafter. Usually, we explore the
>>> existing solutions we can leverage, but when we started our journey in
>>> early 2018, most of the solutions were based on sidecar solutions. There is
>>> nothing against the sidecar solution; it was just a pure business decision,
>>> and in that, we wanted to avoid the sidecar to avoid a dependency on the
>>> control plane. Every solution developed has its deep context, merits, and
>>> pros and cons; they are all great solutions!
>>>
>>> An appeal to the community members is to think one more time about
>>> having repairs in the Open Source Cassandra itself. As mentioned in my
>>> previous email, any solution getting adopted is fine; the important aspect
>>> is to have a repair solution in the OSS Cassandra itself!
>>>
>>> Yours Faithfully,
>>> Jaydeep
>>>
>>> On Mon, Jul 24, 2023 at 3:46 PM Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>> Hi German,
>>>
>>> The goal is always to backport our learnings back to the community. For
>>> example, I have already successfully backported the following two
>>> enhancements/bug fixes back to the Open Source Cassandra, which are
>>> described in the article. I am already currently working on open-source a
>>> few more enhancements mentioned in the article back to the open-source.
>>>
>>>1. https://issues.apache.org/jira/browse/CASSANDRA-18555
>>>2. https://issues.apache.org/jira/browse/CASSANDRA-13740
>>>
>>> There is definitely heavy interest in having the repair solution inside
>>> the Open Source Cassandra itself, very much like Compaction. As I write
>>> this email, we are internally working on a one-pager proposal doc to all
>>> the community members on having a repair inside the OSS Apache Cassandra
>>> along with our private fork - I will share it soon.
>>>
>>> Generally, we are ok with any solution getting adopted (either Joey's
>>> solution or our repair solution or any other solution). The primary
>>> motivation is to have the repair embedded inside the open-source Cassandra
>>> itself, so we can retire all various privately developed solutions
>>> eventually :)
>>>
>>> I am also happy to help (drive conversation, discussion, etc.) in any
>>> way

Re: [DISCUSS] Using ACCP or tc-native by default

2023-07-26 Thread Jeremiah Jordan

I had a discussion with Mick on slack.  His concern is not with enabling
ACCP.  His concern is around the testing of the new C* yaml config code
which is included in the patch that is used to decide if ACCP should be
enabled or not, and if startup should fail if it can’t be enabled.

I agree.  We should make sure that the new C* yaml config code is solid
before we commit this patch, especially when it has the possibility of
cause node startup to fail on purpose.  But that should be a discussion for
the ticket I think, not for this thread.

So I think we are back to the original question.  Should ACCP be used by
default in trunk.  From what I have seen I do not see anyone who is against
that?

-Jeremiah


On Jul 26, 2023 at 2:53:02 PM, Jordan West  wrote:

> +1 Scott. And agreed all involved are looking out for the best interests
> of C* users. And I appreciate those with concerns contributing to
> addressing them.
>
> I’m all for making upgrades smooth bc I do them so often. A huge portion
> of our 4.1 qualification is “will it break on upgrade”? Because of that I’m
> confident in this patch and concerned about many other areas. I think it’s
> commedable to want to reach a point where teams have the trust in the
> community to have done that for them but that starts w better test coverage
> and concrete evidence.
>
> Given all that, I think we should move forward w Ayushi’s proposal to make
> it on by default.
>
> Jordan
>
> On Wed, Jul 26, 2023 at 12:14 C. Scott Andreas 
> wrote:
>
>> I think these concerns are well-intended, but they feel rooted in
>> uncertainty rather than in factual examples of areas where risk is present.
>> I would appreciate elaboration on the specific areas of risk that folks
>> imagine.
>>
>> I would encourage those who express skepticism to try the patch, and I
>> endorse Ayushi's proposal to enable it by default.
>>
>>
>> – Scott
>>
>> On Jul 26, 2023, at 12:03 PM, "Miklosovic, Stefan" <
>> stefan.mikloso...@netapp.com> wrote:
>>
>>
>> We can make it opt-in, wait one major to see what bugs pop up and we
>> might do that opt-out eventually. We do not need to hurry up with this. I
>> understand everybody's expectations and excitement but it really boils down
>> to one line change in yaml. People who are so much after the performance
>> will be definitely aware of this knob to turn on to squeeze even more perf
>> ...
>>
>> I look around dtests Jeremiah mentioned but I would just moved on and
>> make it opt-in if we are not 100% persuaded about it _yet_.
>>
>> 
>> From: Mick Semb Wever 
>> Sent: Wednesday, July 26, 2023 20:48
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>>
>> What comes to mind is how we brought down people clusters and made
>> sstables unreadable with the introduction of the chunk_length configuration
>> in 1.0. It wasn't about how tested the compression libraries were, but
>> about the new configuration itself. Introducing silent defaults has more
>> surface area for bugs than introducing explicit defaults that only apply to
>> new clusters and are so opt-in for existing clusters.
>>
>>
>>
>> On Wed, 26 Jul 2023 at 20:13, J. D. Jordan > > wrote:
>> Enabling ssl for the upgrade dtests would cover this use case. If those
>> don’t currently exist I see no reason it won’t work so I would be fine for
>> someone to figure it out post merge if there is a concern. What JCE
>> provider you use should have no upgrade concerns.
>>
>> -Jeremiah
>>
>> On Jul 26, 2023, at 1:07 PM, Miklosovic, Stefan <
>> stefan.mikloso...@netapp.com> wrote:
>>
>> Am I understanding it correctly that tests you are talking about are
>> only required in case we make ACCP to be default provider?
>>
>> I can live with not making it default and still deliver it if tests are
>> not required. I do not think that these kind of tests were required couple
>> mails ago when opt-in was on the table.
>>
>> While I tend to agree with people here who seem to consider testing this
>> scenario to be unnecessary exercise, I am afraid that I will not be able to
>> deliver that as testing something like this is quite complicated matter.
>> There is a lot of aspects which could be tested I can not even enumerate
>> right now ... so I try to meet you somewhere in the middle.
>>
>> 
>> From: Mick Semb Wever mailto:m...@apache.org>>
>> Sent: Wednesday, July 26, 2023 17:34
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSS] Using ACCP or tc-native by default
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>>

Re: Tokenization and SAI query syntax

2023-08-02 Thread Jeremiah Jordan

SASI just uses “=“ for the tokenized equality matching, which is the exact 
thing this discussion is about changing/not liking.

> On Aug 2, 2023, at 7:18 PM, J. D. Jordan  wrote:
> 
> I do not think LIKE actually applies here. LIKE is used for prefix, 
> contains, or suffix searches in SASI depending on the index type.
> 
> This is about exact matching of tokens.
> 
>> On Aug 2, 2023, at 5:53 PM, Jon Haddad  wrote:
>> 
>> Certain bits of functionality also already exist on the SASI side of 
>> things, but I'm not sure how much overlap there is.  Currently, there's a 
>> LIKE keyword that handles token matching, although it seems to have some 
>> differences from the feature set in SAI.  
>> 
>> That said, there seems to be enough of an overlap that it would make sense 
>> to consider using LIKE in the same manner, doesn't it?  I think it would be 
>> a little odd if we have different syntax for different indexes.  
>> 
>> https://github.com/apache/cassandra/blob/trunk/doc/SASI.md
>> 
>> I think one complication here is that there seems to be a desire, that I 
>> very much agree with, to expose as much of the underlying flexibility of 
>> Lucene as much as possible.  If it means we use Caleb's suggestion, I'd ask 
>> that the queries that SASI and SAI both support use the same syntax, even if 
>> it means there's two ways of writing the same query.  To use Caleb's 
>> example, this would mean supporting both LIKE and the `expr` column.  
>> 
>> Jon
>> 
 On 2023/08/01 19:17:11 Caleb Rackliffe wrote:
>>> Here are some additional bits of prior art, if anyone finds them useful:
>>> 
>>> 
>>> The Stratio Lucene Index -
>>> https://github.com/Stratio/cassandra-lucene-index#examples
>>> 
>>> Stratio was the reason C* added the "expr" functionality. They embedded
>>> something similar to ElasticSearch JSON, which probably isn't my favorite
>>> choice, but it's there.
>>> 
>>> 
>>> The ElasticSearch match query syntax -
>>> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html__;!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAf0MsxZ9$
>>>  
>>> 
>>> Again, not my favorite. It's verbose, and probably too powerful for us.
>>> 
>>> 
>>> ElasticSearch's documentation for the basic Lucene query syntax -
>>> https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/8.9/query-dsl-query-string-query.html*query-string-syntax__;Iw!!PbtH5S7Ebw!ZHwYJ2xkivwTzYgjkp5QFAzALXCWPqkga6GBD-m2aK3j06ioSCRPsdZD0CIe50VpRrtW-1rY_m6lrSpp7zVlAXEPP1sK$
>>>  
>>> 
>>> One idea is to take the basic Lucene index, which it seems we already have
>>> some support for, and feed it to "expr". This is nice for two reasons:
>>> 
>>> 1.) People can just write Lucene queries if they already know how.
>>> 2.) No changes to the grammar.
>>> 
>>> Lucene has distinct concepts of filtering and querying, and this is kind of
>>> the latter. I'm not sure how, for example, we would want "expr" to interact
>>> w/ filters on other column indexes in vanilla CQL space...
>>> 
>>> 
 On Mon, Jul 24, 2023 at 9:37 AM Josh McKenzie  wrote:

 `column CONTAINS term`. Contains is used by both Java and Python for
 substring searches, so at least some users will be surprised by term-based
 behavior.

 I wonder whether users are in their "programming language" headspace or in
 their "querying a database" headspace when interacting with CQL? i.e. this
 would only present confusion if we expected users to be thinking in the
 idioms of their respective programming languages. If they're thinking in
 terms of SQL, MATCHES would probably end up confusing them a bit since it
 doesn't match the general structure of the MATCH operator.

 That said, I also think CONTAINS loses something important that you allude
 to here Jonathan:

 with corresponding query-time tokenization and analysis.  This means that
 the query term is not always a substring of the original string!  Besides
 obvious transformations like lowercasing, you have things like
 PhoneticFilter available as well.

 So to me, neither MATCHES nor CONTAINS are particularly great candidates.

 So +1 to the "I don't actually hate it" sentiment on:

 column : term`. Inspired by Lucene’s syntax

> On Mon, Jul 24, 2023, at 8:35 AM, Benedict wrote:

 I have a strong preference not to use the name of an SQL operator, since
 it precludes us later providing the SQL standard operator to users.

 What about CONTAINS TOKEN term? Or CONTAINS TERM term?

> On 24 Jul 2023, at 13:34, Andrés de la Peña  wrote:

 `column = term` is definitively problematic because it creates an
 ambiguity when the queried column belongs to the primary key. For some
 queries we wouldn't know whether the user wants a primary key q

Re: [DISCUSSION] Shall we remove ant javadoc task?

2023-08-03 Thread Jeremiah Jordan

 I don’t think anyone wants to remove the javadocs.  This thread is about
removing the broken ant task which generates html files from them.

+1 from me on removing the ant task.  If someone feels the task is useful
they can always implement one that does not crash and add it back.

-Jeremiah

On Aug 3, 2023 at 9:59:55 AM, "Claude Warren, Jr via dev" <
dev@cassandra.apache.org> wrote:

> I think that we can get more developers interested if there are available
> javadocs.  While many of the core classes are not going to be touched by
> someone just starting, being able to understand what the external touch
> points are and how they interact with other bits of the system can be
> invaluable, particularly when you don't have the entire code base in front
> of you.
>
> For example, I just wrote a tool that explores the distribution of keys
> across multiple sstables, I needed some of the tools classes but not much
> more.  Javadocs would have made that easy if I did not have the source code
> in front of me.
>
> I am -1 on removing the javadocs.
>
> On Thu, Aug 3, 2023 at 4:35 AM Josh McKenzie  wrote:
>
>> If anything, the codebase could use a little more package/class/method
>> markup in some places
>>
>> I am impressed with how diplomatic and generous you're being here Derek.
>> :D
>>
>> On Wed, Aug 2, 2023, at 5:46 PM, Miklosovic, Stefan wrote:
>>
>> That is a good idea. I would like to have Javadocs valid when going
>> through them in IDE. To enforce it, we would have to fix it first. If we
>> find a way how to validate Javadocs without actually rendering them, that
>> would be cool.
>>
>> There is a lot of legacy and rewriting of some custom-crafted formatting
>> of some comments might be quite a tedious task to do if it is required to
>> have them valid. I am in general for valid documentation and even enforcing
>> it but what to do with what is already there ...
>>
>> 
>> From: Jacek Lewandowski 
>> Sent: Wednesday, August 2, 2023 23:38
>> To: dev@cassandra.apache.org
>> Subject: Re: [DISCUSSION] Shall we remove ant javadoc task?
>>
>> NetApp Security WARNING: This is an external email. Do not click links or
>> open attachments unless you recognize the sender and know the content is
>> safe.
>>
>>
>>
>> With or without outputting JavaDoc to HTML, there are some errors which
>> we should maybe fix. We want to keep the documentation, but there can be
>> syntax errors which may prevent IDE generating a proper preview. So, the
>> question is - should we validate the JavaDoc comments as a precommit task?
>> Can it be done without actually generating HTML output?
>>
>> Thanks,
>> Jacek
>>
>> śr., 2 sie 2023, 22:24 użytkownik Derek Chen-Becker <
>> de...@chen-becker.org> napisał:
>> Oh, whoops, I guess I'm the only one that thinks Javadoc is just the tool
>> and/or it's output (not the markup itself) :P If anything, the codebase
>> could use a little more package/class/method markup in some places, so I'm
>> definitely only in favor of getting rid of the ant task. I should amend my
>> statement to be "...I suspect most people are not opening their browsers
>> and looking at Javadoc..." :)
>>
>> Cheers,
>>
>> Derek
>>
>>
>>
>> On Wed, Aug 2, 2023, 1:30 PM Josh McKenzie > jmcken...@apache.org>> wrote:
>> most people are not looking at Javadoc when working on the codebase.
>> I definitely use it extensively inside the IDE. But never as a compiled
>> set of external docs.
>>
>> Which is to say, I'm +1 on removing the target and I'd ask everyone to
>> keep javadoccing your classes and methods where things are non-obvious or
>> there's a logical coupling with something else in the system. :)
>>
>> On Wed, Aug 2, 2023, at 2:08 PM, Derek Chen-Becker wrote:
>> +1. If a need comes up for Javadoc we can fix it at that point, but I
>> suspect most people are not looking at Javadoc when working on the codebase.
>>
>> Cheers,
>>
>> Derek
>>
>> On Wed, Aug 2, 2023 at 11:11 AM Brandon Williams > > wrote:
>> I don't think even if it works anyone is going to use the output, so
>> I'm good with removal.
>>
>> Kind Regards,
>> Brandon
>>
>> On Wed, Aug 2, 2023 at 11:50 AM Ekaterina Dimitrova
>> mailto:e.dimitr...@gmail.com>> wrote:
>> >
>> > Hi everyone,
>> > We were looking into a user report around our ant javadoc task recently.
>> > That made us realize it is not run in CI; it finishes successfully even
>> if there are hundreds of errors, some potentially breaking doc pages.
>> >
>> > There was a ticket discussion where a few community members mentioned
>> that this task was probably unnecessary. Can we remove it, or shall we fix
>> it?
>> >
>> > Best regards,
>> > Ekaterina
>>
>>
>> --
>> +---+
>> | Derek Chen-Becker |
>> | GPG Key available at https://keybase.io/dchenbecker
>>

Re: [DISCUSS] Addition of smile-nlp test dependency for CEP-30

2023-09-13 Thread Jeremiah Jordan

 I wonder if it can easily be replaced with Apache open-nlp?  It also
provides an implementation of GloVe.

https://opennlp.apache.org/docs/2.3.0/apidocs/opennlp-tools/opennlp/tools/util/wordvector/Glove.html


On Sep 13, 2023 at 1:17:46 PM, Benedict  wrote:

> There’s a distinction for spotbugs and other build related tools where
> they can be downloaded and used during the build so long as they’re not
> critical to the build process.
>
> They have to be downloaded dynamically in binary form I believe though,
> they cannot be included in the release.
>
> So it’s not really in conflict with what Jeff is saying, and my
> recollection accords with Jeff’s
>
> On 13 Sep 2023, at 17:42, Brandon Williams  wrote:
>
> 
>
> On Wed, Sep 13, 2023 at 11:37 AM Jeff Jirsa  wrote:
>
>> You can open a legal JIRA to confirm, but based on my understanding (and
>> re-confirming reading
>> https://www.apache.org/legal/resolved.html#category-a ):
>>
>>
> We should probably get clarification here regardless, iirc this came up
> when we were considering SpotBugs too.
>
>

Re: [VOTE] Accept java-driver

2023-10-03 Thread Jeremiah Jordan

 +1 nb.

Thanks to everyone who has made this happen.

On Oct 2, 2023 at 11:52:47 PM, Mick Semb Wever  wrote:

> The donation of the java-driver is ready for its IP Clearance vote.
> https://incubator.apache.org/ip-clearance/cassandra-java-driver.html
>
> The SGA has been sent to the ASF.  This does not require acknowledgement
> before the vote.
>
> Once the vote passes, and the SGA has been filed by the ASF Secretary, we
> will request ASF Infra to move the datastax/java-driver as-is to
> apache/java-driver
>
> This means all branches and tags, with all their history, will be kept.  A
> cleaning effort has already cleaned up anything deemed not needed.
>
> Background for the donation is found in CEP-8:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-8%3A+DataStax+Drivers+Donation
>
> PMC members, please take note of (and check) the IP Clearance requirements
> when voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members are
> considered binding. A vote passes if there are at least three binding +1s
> and no -1's.
>
> regards,
> Mick
>

Re: [VOTE] Accept java-driver

2023-10-05 Thread Jeremiah Jordan

 I think this is covered by the grant agreement?

https://www.apache.org/licenses/software-grant-template.pdf

2. Licensor represents that, to Licensor's knowledge, Licensor is
legally entitled to grant the above license. Licensor agrees to notify
the Foundation of any facts or circumstances of which Licensor becomes
aware and which makes or would make Licensor's representations in this
License Agreement inaccurate in any respect.



On Oct 5, 2023 at 4:35:08 AM, Benedict  wrote:

> Surely it needs to be shared with the foundation and the PMC so we can
> verify? Or at least have ASF legal confirm they have received and are
> satisfied with the tarball? It certainly can’t be kept private to DS,
> AFAICT.
>
> Of course it shouldn’t be shared publicly but not sure how PMC can fulfil
> its verification function here without it.
>
> On 5 Oct 2023, at 10:23, Mick Semb Wever  wrote:
>
> 
>
>.
>
> On Tue, 3 Oct 2023 at 13:25, Josh McKenzie  wrote:
>
>> I see now this will likely be instead apache/cassandra-java-driver
>>
>> I was wondering about that. apache/java-driver seemed pretty broad. :)
>>
>> From the linked page:
>> Check that all active committers have a signed CLA on record. TODO –
>> attach list
>> I've been part of these discussions and work so am familiar with the
>> status of it (as well as guidance and clearance from the foundation re:
>> folks we couldn't reach) - but might be worthwhile to link to the sheet or
>> perhaps instead provide a summary of the 49 java contributors, their CLA
>> signing status, attempts to reach out, etc for other PMC members that
>> weren't actively involved back when we were working through it.
>>
>
>
> We have a spreadsheet with this information, and the tarball of all the
> signed CLAs.
> The tarball we should keep private to DS, but know that we have it for
> governance's sake.
>
> I've attached the spreadsheet to the CEP confluence page.
>
>

Re: CASSANDRA-18775 (Cassandra supported OSs)

2023-10-20 Thread Jeremiah Jordan

 Agreed.  -1 on selectively removing any of the libs.  But +1 for removing
the whole thing if it is no longer used.

-Jeremiah

On Oct 20, 2023 at 9:28:55 AM, Mick Semb Wever  wrote:

> Does anyone see any reason _not_ to do this?
>>
>
>
> Thanks for bring this to dev@
>
> I see reason not to do it, folk do submit patches for other archs despite
> us not formally maintaining and testing the code for those archs.  Some
> examples are PPC64 Big Endian (CASSANDRA-7476), s390x (CASSANDRA-17723),
> PPC64 Little Endian (CASSANDRA-7381), sparcv9 (CASSANDRA-6628).  Wrote this
> on the ticket too.
>
> +1 for removing sigar altogether (as Brandon points out).
>
>

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-23 Thread Jeremiah Jordan

 +1 from me assuming we have tickets and two committer +1’s on them for
everything being committed to trunk, and CI is working/passing before it
merges.  The usual things, but I want to make sure we do not compromise on
any of them as we try to “move fast” here.

-Jeremiah Jordan

On Oct 23, 2023 at 8:50:46 AM, Sam Tunnicliffe  wrote:

> +1 from me too.
>
> Regarding Benedict's point, backwards incompatibility should be minimal;
> we modified snitch behaviour slightly, so that local snitch config only
> relates to the local node, all peer info is fetched from cluster metadata.
> There is also a minor change to the way failed bootstraps are handled, as
> with TCM they require an explicit cancellation step (running a nodetool
> command).
>
> Whether consensus decrees that this constitutes a major bump or not, I
> think decoupling these major projects from 5.0 is the right move.
>
>
> On 23 Oct 2023, at 12:57, Benedict  wrote:
>
> I’m cool with this.
>
> We may have to think about numbering as I think TCM will break some
> backwards compatibility and we might technically expect the follow-up
> release to be 6.0
>
> Maybe it’s not so bad to have such rapid releases either way.
>
> On 23 Oct 2023, at 12:52, Mick Semb Wever  wrote:
>
> 
>
> The TCM work (CEP-21) is in its review stage but being well past our
> cut-off date¹ for merging, and now jeopardising 5.0 GA efforts, I would
> like to propose the following.
>
> We merge TCM and Accord only to trunk.  Then branch cassandra-5.1 and cut
> an immediate 5.1-alpha1 release.
>
> I see this as a win-win scenario for us, considering our current
> situation.  (Though it is unfortunate that Accord is included in this
> scenario because we agreed it to be based upon TCM.)
>
> This will mean…
>  - We get to focus on getting 5.0 to beta and GA, which already has a ton
> of features users want.
>  - We get an alpha release with TCM and Accord into users hands quickly
> for broader testing and feedback.
>  - We isolate GA efforts on TCM and Accord – giving oss and downstream
> engineers time and patience reviewing and testing.  TCM will be the biggest
> patch ever to land in C*.
>  - Give users a choice for a more incremental upgrade approach, given just
> how many new features we're putting on them in one year.
>  - 5.1 w/ TCM and Accord will maintain its upgrade compatibility with all
> 4.x versions, just as if it had landed in 5.0.
>
>
> The risks/costs this introduces are
>  - If we cannot stabilise TCM and/or Accord on the cassandra-5.1 branch,
> and at some point decide to undo this work, while we can throw away the
> cassandra-5.1 branch we would need to do a bit of work reverting the
> changes in trunk.  This is a _very_ edge case, as confidence levels on the
> design and implementation of both are already tested and high.
>  - We will have to maintain an additional branch.  I propose that we treat
> the 5.1 branch in the same maintenance window as 5.0 (like we have with 3.0
> and 3.11).  This also adds the merge path overhead.
>  - Reviewing of TCM and Accord will continue to happen post-merge.  This
> is not our normal practice, but this work will have already received its
> two +1s from committers, and such ongoing review effort is akin to GA
> stabilisation work on release branches.
>
>
> I see no other ok solution in front of us that gets us at least both the
> 5.0 beta and TCM+Accord alpha releases this year.  Keeping in mind users
> demand to start experimenting with these features, and our Cassandra Summit
> in December.
>
>
> 1) https://lists.apache.org/thread/9c5cnn57c7oqw8wzo3zs0dkrm4f17lm3
>
>
>
>

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Jeremiah Jordan

 If we decide to go the route of not merging TCM to the 5.0 branch.  Do we
actually need to immediately cut a 5.1 branch?  Can we work on stabilizing
things while it is in trunk and cut the 5.1 branch when we actually think
we are near releasing?  I don’t see any reason we can not cut “preview”
artifacts from trunk?

-Jeremiah

On Oct 24, 2023 at 11:54:25 AM, Jon Haddad 
wrote:

> I guess at the end of the day, shipping a release with a bunch of awesome
> features is better than holding it back.  If there's 2 big releases in 6
> months the community isn't any worse off.
>
> We either ship something, or nothing, and something is probably better.
>
> Jon
>
>
> On 2023/10/24 16:27:04 Patrick McFadin wrote:
>
> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
>
> was excited about Accord, but SAI and UCS were pretty high on the list.
>
>
> Benedict and I had a good conversation last night, and now I understand
>
> more essential details for this conversation. TCM is taking far more work
>
> than initially scoped, and Accord depends on a stable TCM. TCM is months
>
> behind and that's a critical fact, and one I personally just learned of. I
>
> thought things were wrapping up this month, and we were in the testing
>
> phase. I get why that's a topic we are dancing around. Nobody wants to say
>
> ship dates are slipping because that's part of our culture. It's
>
> disappointing and, if new information, an unwelcome surprise, but none of
>
> us should be angry or in a blamey mood because I guarantee every one of us
>
> has shipped the code late. My reaction yesterday was based on an incorrect
>
> assumption. Now that I have a better picture, my point of view is changing.
>
>
> Josh's point about what's best for users is crucial. Users deserve stable
>
> code with a regular cadence of features that make their lives easier. If we
>
> put 5.0 on hold for TCM + Accord, users will get neither for a very long
>
> time. And I mentioned a disaster yesterday. A bigger disaster would be
>
> shipping Accord with a major bug that causes data loss, eroding community
>
> trust. Accord has to be the most bulletproof of all bulletproof features.
>
> The pressure to ship is only going to increase and that's fertile ground
>
> for that sort of bug.
>
>
> So, taking a step back and with a clearer picture, I support the 5.0 + 5.1
>
> plan mainly because I don't think 5.1 is (or should be) a fast follow.
>
>
> For the user community, the communication should be straightforward. TCM +
>
> Accord are turning out to be much more complicated than was originally
>
> scoped, and for good reasons. Our first principle is to provide a stable
>
> and reliable system, so as a result, we'll be de-coupling TCM + Accord from
>
> 5.0 into a 5.1 branch, which is available in parallel to 5.0 while
>
> additional hardening and testing is done. We can communicate this in a blog
>
> post.,
>
>
> To make this much more palatable to our use community, if we can get a
>
> build and docker image available ASAP with Accord, it will allow developers
>
> to start playing with the syntax. Up to this point, that hasn't been widely
>
> available unless you compile the code yourself. Developers need to
>
> understand how this will work in an application, and up to this point, the
>
> syntax is text they see in my slides. We need to get some hands-on and that
>
> will get our user community engaged on Accord this calendar year. The
>
> feedback may even uncover some critical changes we'll need to make. Lack of
>
> access to Accord by developers is a critical problem we can fix soon and
>
> there will be plenty of excitement there and start building use cases
>
> before the final code ships.
>
>
> I'm bummed but realistic. It sucks that I won't have a pony for Christmas,
>
> but maybe one for my birthday?
>
>
> Patrick
>
>
>
>
> On Tue, Oct 24, 2023 at 7:23 AM Josh McKenzie 
> wrote:
>
>
> > Maybe it won't be a glamorous release but shipping
>
> > 5.0 mitigates our worst case scenario.
>
> >
>
> > I disagree with this characterization of 5.0 personally. UCS, SAI, Trie
>
> > memtables and sstables, maybe vector ANN if the sub-tasks on C-18715 are
>
> > accurate, all combine to make 5.0 a pretty glamorous release IMO
>
> > independent of TCM and Accord. Accord is a true paradigm-shift
> game-changer
>
> > so it's easy to think of 5.0 as uneventful in comparison, and TCM helps
>
> > resolve one of the biggest pain-points in our system for over a decade,
> but
>
> > I think 5.0 is a very meaty release in its own right today.
>
> >
>
> > Anyway - I agree with you Brandon re: timelines. If things take longer
>
> > than we'd hope (which, if I think back, they do roughly 100% of the time
> on
>
> > this project), blocking on these features could both lead to a
> significant
>
> > delay in 5.0 going out as well as increasing pressure and risk of burnout
>
> > on the folks working on it. While I believe we all need some balanced
>
> > urgency to do our

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-24 Thread Jeremiah Jordan

 In order for the project to advertise the release outside the dev@ list it
needs to be a formal release.  That just means that there was a release
vote and at least 3 PMC members +1’ed it, and there are more +1 than there
are -1, and we follow all the normal release rules.  The ASF release
process doesn’t care what branch you cut the artifacts from or what version
you call it.

So the project can cut artifacts for and release a 5.1-alpha1,
5.1-dev-preview1, what ever we want to version this thing, from trunk or
any other branch name we want.

-Jeremiah

On Oct 24, 2023 at 2:03:41 PM, Patrick McFadin  wrote:

> I would like to have something for developers to use ASAP to try the
> Accord syntax. Very few people have seen it, and I think there's a learning
> curve we can start earlier.
>
> It's my understanding that ASF policy is that it needs to be a project
> release to create a docker image.
>
> On Tue, Oct 24, 2023 at 11:54 AM Jeremiah Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
>> If we decide to go the route of not merging TCM to the 5.0 branch.  Do we
>> actually need to immediately cut a 5.1 branch?  Can we work on stabilizing
>> things while it is in trunk and cut the 5.1 branch when we actually think
>> we are near releasing?  I don’t see any reason we can not cut “preview”
>> artifacts from trunk?
>>
>> -Jeremiah
>>
>> On Oct 24, 2023 at 11:54:25 AM, Jon Haddad 
>> wrote:
>>
>>> I guess at the end of the day, shipping a release with a bunch of
>>> awesome features is better than holding it back.  If there's 2 big releases
>>> in 6 months the community isn't any worse off.
>>>
>>> We either ship something, or nothing, and something is probably better.
>>>
>>> Jon
>>>
>>>
>>> On 2023/10/24 16:27:04 Patrick McFadin wrote:
>>>
>>> +1 to what you are saying, Josh. Based on the last survey, yes, everyone
>>>
>>> was excited about Accord, but SAI and UCS were pretty high on the list.
>>>
>>>
>>> Benedict and I had a good conversation last night, and now I understand
>>>
>>> more essential details for this conversation. TCM is taking far more work
>>>
>>> than initially scoped, and Accord depends on a stable TCM. TCM is months
>>>
>>> behind and that's a critical fact, and one I personally just learned of.
>>> I
>>>
>>> thought things were wrapping up this month, and we were in the testing
>>>
>>> phase. I get why that's a topic we are dancing around. Nobody wants to
>>> say
>>>
>>> ship dates are slipping because that's part of our culture. It's
>>>
>>> disappointing and, if new information, an unwelcome surprise, but none of
>>>
>>> us should be angry or in a blamey mood because I guarantee every one of
>>> us
>>>
>>> has shipped the code late. My reaction yesterday was based on an
>>> incorrect
>>>
>>> assumption. Now that I have a better picture, my point of view is
>>> changing.
>>>
>>>
>>> Josh's point about what's best for users is crucial. Users deserve stable
>>>
>>> code with a regular cadence of features that make their lives easier. If
>>> we
>>>
>>> put 5.0 on hold for TCM + Accord, users will get neither for a very long
>>>
>>> time. And I mentioned a disaster yesterday. A bigger disaster would be
>>>
>>> shipping Accord with a major bug that causes data loss, eroding community
>>>
>>> trust. Accord has to be the most bulletproof of all bulletproof features.
>>>
>>> The pressure to ship is only going to increase and that's fertile ground
>>>
>>> for that sort of bug.
>>>
>>>
>>> So, taking a step back and with a clearer picture, I support the 5.0 +
>>> 5.1
>>>
>>> plan mainly because I don't think 5.1 is (or should be) a fast follow.
>>>
>>>
>>> For the user community, the communication should be straightforward. TCM
>>> +
>>>
>>> Accord are turning out to be much more complicated than was originally
>>>
>>> scoped, and for good reasons. Our first principle is to provide a stable
>>>
>>> and reliable system, so as a result, we'll be de-coupling TCM + Accord
>>> from
>>>
>>> 5.0 into a 5.1 branch, which is available in parallel to 5.0 while
>>>
>>> additional hardening and testing is done. We can communic

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-25 Thread Jeremiah Jordan

>
> If we do a 5.1 release why not take it as an opportunity to release more
> things. I am not saying that we will. Just that we should let that door
> open.
>

Agreed.  This is the reason I brought up the possibility of not branching
off 5.1 immediately.


On Oct 25, 2023 at 3:17:13 AM, Benjamin Lerer  wrote:

> The proposal includes 3 things:
> 1. Do not include TCM and Accord in 5.0 to avoid delaying 5.0
> 2. The next release will be 5.1 and will include only Accord and TCM
> 3. Merge TCM and Accord right now in 5.1 (making an initial release)
>
> I am fine with question 1 and do not have a strong opinion on which way to
> go.
> 2. Means that every new feature will have to wait for post 5.1 even if it
> is ready before 5.1 is stabilized and shipped. If we do a 5.1 release why
> not take it as an opportunity to release more things. I am not saying that
> we will. Just that we should let that door open.
> 3. There is a need to merge TCM and Accord as maintaining those separate
> branches is costly in terms of time and energy. I fully understand that. On
> the other hand merging TCM and Accord will make the TCM review harder and I
> do believe that this second round of review is valuable as it already
> uncovered a valid issue. Nevertheless, I am fine with merging TCM as soon
> as it passes CI and continuing the review after the merge. If we cannot
> meet at least that quality level (Green CI) we should not merge just for
> creating an 5.1.alpha release for the summit.
>
> Now, I am totally fine with a preview release without numbering and with
> big warnings that will only serve as a preview for the summit.
>
> Le mer. 25 oct. 2023 à 06:33, Berenguer Blasi 
> a écrit :
>
>> I also think there's many good new features in 5.0 already they'd make a
>> good release even on their own. My 2 cts.
>>
>> On 24/10/23 23:20, Brandon Williams wrote:
>> > The catch here is that we don't publish docker images currently.  The
>> > C* docker images available are not made by us.
>> >
>> > Kind Regards,
>> > Brandon
>> >
>> > On Tue, Oct 24, 2023 at 3:31 PM Patrick McFadin 
>> wrote:
>> >> Let me make that really easy. Hell yes
>> >>
>> >> Not everybody runs CCM, I've tried but I've met resistance.
>> >>
>> >> Compiling your own version usually involves me saying the words "Yes,
>> ant realclean exists. I'm not trolling you"
>> >>
>> >> docker pull  works on every OS and curates a single node
>> experience.
>> >>
>> >>
>> >>
>> >> On Tue, Oct 24, 2023 at 12:37 PM Josh McKenzie 
>> wrote:
>> >>> In order for the project to advertise the release outside the dev@
>> list it needs to be a formal release.
>> >>>
>> >>> That's my reading as well:
>> >>> https://www.apache.org/legal/release-policy.html#release-definition
>> >>>
>> >>> I wonder if there'd be value in us having a cronned job that'd do
>> nightly docker container builds on trunk + feature branches, archived for N
>> days, and we make that generally known to the dev@ list here so folks
>> that want to poke at the current state of trunk or other branches could do
>> so with very low friction. We'd probably see more engagement on feature
>> branches if it was turn-key easy for other C* devs to spin the up and check
>> them out.
>> >>>
>> >>> For what you're talking about here Patrick (a docker image for folks
>> outside the dev@ audience and more user-facing), we'd want to vote on it
>> and go through the formal process.
>> >>>
>> >>> On Tue, Oct 24, 2023, at 3:10 PM, Jeremiah Jordan wrote:
>> >>>
>> >>> In order for the project to advertise the release outside the dev@
>> list it needs to be a formal release.  That just means that there was a
>> release vote and at least 3 PMC members +1’ed it, and there are more +1
>> than there are -1, and we follow all the normal release rules.  The ASF
>> release process doesn’t care what branch you cut the artifacts from or what
>> version you call it.
>> >>>
>> >>> So the project can cut artifacts for and release a 5.1-alpha1,
>> 5.1-dev-preview1, what ever we want to version this thing, from trunk or
>> any other branch name we want.
>> >>>
>> >>> -Jeremiah
>> >>>
>> >>> On Oct 24, 2023 at 2:03:41 PM, Patrick McFadin 
>> wrote:
>> >>>
>> >

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-31 Thread Jeremiah Jordan

 work is ready for broader input and
>>>> review. In this case, more than six months ago.
>>>
>>>
>>> It is unfortunately more complicated than that because six month ago
>>> Ekaterina and I were working on supporting Java 17 and dropping Java 8
>>> which was needed by different ongoing works. We both missed the
>>> announcement that TCM was ready for review and anyway would not have been
>>> available at that time. Maxim has asked me ages ago for a review of
>>> CASSANDRA-15254 <https://issues.apache.org/jira/browse/CASSANDRA-15254>
>>> more than 6 months ago and I have not been able to help him so far. We all
>>> have a limited bandwidth and can miss some announcements.
>>>
>>> The project has grown and a lot of things are going on in parallel.
>>> There are also more interdependencies between the different projects. In my
>>> opinion what we are lacking is a global overview of the different things
>>> going on in the project and some rough ideas of the status of the different
>>> significant pieces. It would allow us to better organize ourselves.
>>>
>>> Le jeu. 26 oct. 2023 à 00:26, Benedict  a écrit :
>>>
>>>> I have spoken privately with Ekaterina, and to clear up some possible
>>>> ambiguity: I realise nobody has demanded a delay to this work to conduct
>>>> additional reviews; a couple of folk have however said they would prefer
>>>> one.
>>>>
>>>>
>>>> My point is that, as a community, we need to work on ensuring folk that
>>>> care about a CEP participate at an appropriate time. If they aren’t able
>>>> to, the consequences of that are for them to bear.
>>>>
>>>>
>>>> We should be working to avoid surprises as CEP start to land. To this
>>>> end, I think we should work on some additional paragraphs for the
>>>> governance doc covering expectations around the landing of CEPs.
>>>>
>>>> On 25 Oct 2023, at 21:55, Benedict  wrote:
>>>>
>>>> 
>>>>
>>>> I am surprised this needs to be said, but - especially for long-running
>>>> CEPs - you must involve yourself early, and certainly within some
>>>> reasonable time of being notified the work is ready for broader input and
>>>> review. In this case, more than six months ago.
>>>>
>>>>
>>>> This isn’t the first time this has happened, and it is disappointing to
>>>> see it again. Clearly we need to make this explicit in the guidance docs.
>>>>
>>>>
>>>> Regarding the release of 5.1, I understood the proposal to be that we
>>>> cut an actual alpha, thereby sealing the 5.1 release from new features.
>>>> Only features merged before we cut the alpha would be permitted, and the
>>>> alpha should be cut as soon as practicable. What exactly would we be
>>>> waiting for?
>>>>
>>>>
>>>> If we don’t have a clear and near-term trigger for branching 5.1 for
>>>> its own release, shortly after Accord and TCM merge, then I am in favour of
>>>> instead delaying 5.0.
>>>>
>>>> On 25 Oct 2023, at 19:40, Mick Semb Wever  wrote:
>>>>
>>>> 
>>>> I'm open to the suggestions of not branching cassandra-5.1 and/or
>>>> naming a preview release something other than 5.1-alpha1.
>>>>
>>>> But… the codebases and release process (and upgrade tests) do not
>>>> currently support releases with qualifiers that are not alpha, beta, or
>>>> rc.  We can add a preview qualifier to this list, but I would not want to
>>>> block getting a preview release out only because this wasn't yet in place.
>>>>
>>>> Hence the proposal used 5.1-alpha1 simply because that's what we know
>>>> we can do today.  An alpha release also means (typically) the branch.
>>>>
>>>> Is anyone up for looking into adding a "preview" qualifier to our
>>>> release process?
>>>> This may also solve our previous discussions and desire to have
>>>> quarterly releases that folk can use through the trunk dev cycle.
>>>>
>>>> Personally, with my understanding of timelines in front of us to fully
>>>> review and stabilise tcm, it makes sense to branch it as soon as it's
>>>> merged.  It's easiest to stabilise it on a branch, and there's definitely
>>>> the de

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-10-31 Thread Jeremiah Jordan

 You are free to argue validity.  I am just stating what I see on the
mailing list and in the wiki.  We had a vote which was called passing and
was not contested at that time.  The vote was on a process which includes
as #3 in the list:

   1. Before a merge, a committer needs either a non-regressing (i.e. no
   new failures) run of circleci with the required test suites (TBD; see
   below) or of ci-cassandra.
  1. Non-regressing is defined here as "Doesn't introduce any new test
  failures; any new failures in CI are clearly not attributable to
this diff"
  2. (NEW) After merging tickets, ci-cassandra runs against the SHA and
  the author gets an advisory update on the related JIRA for any new errors
  on CI. The author of the ticket will take point on triaging this new
  failure and either fixing (if clearly reproducible or related to their
  work) or opening a JIRA for the intermittent failure and linking it in
  butler (https://butler.cassandra.apache.org/#/)

Which clearly says that before merge we ensure there are no known new
regressions to CI.

The allowance for releases without CI being green, and merges without the
CI being completely green are from the fact that our trunk CI has rarely
been completely green, so we allow merging things which do not introduce
NEW regressions, and we allow releases with known regressions that are
deemed acceptable.

We can indeed always vote to override it, and if it comes to that we can
consider that as an option.

-Jeremiah

On Oct 31, 2023 at 11:41:29 AM, Benedict  wrote:

> That vote thread also did not reach the threshold; it was incorrectly
> counted, as committer votes are not binding for procedural changes. I
> counted at most 8 PMC +1 votes.
>
> The focus of that thread was also clearly GA releases and merges on such
> branches, since there was a focus on releases being failure-free. But this
> predates the more general release lifecycle vote that allows for alphas to
> have failing tests - which logically would be impossible if nothing were
> merged with failing or flaky tests.
>
> Either way, the vote and discussion specifically allow for this to be
> overridden.
>
> 🤷‍♀️
>
> On 31 Oct 2023, at 16:29, Jeremiah Jordan 
> wrote:
>
> 
> I never said there was a need for green CI for alpha.  We do have a
> requirement for not merging things to trunk that have known regressions in
> CI.
> Vote here:
> https://lists.apache.org/thread/j34mrgcy9wrtn04nwwymgm6893h0xwo9
>
>
>
> On Oct 31, 2023 at 3:23:48 AM, Benedict  wrote:
>
>> There is no requirement for green CI on alpha. We voted last year to
>> require running all tests before commit and to require green CI for beta
>> releases. This vote was invalid because it didn’t reach the vote floor for
>> a procedural change but anyway is not inconsistent with knowingly and
>> selectively merging work without green CI.
>>
>> If we reach the summit we should take a look at the state of the PRs and
>> make a decision about if they are alpha quality; if so, and we want a
>> release, we should simply merge it and release. Making up a new release
>> type when the work meets alpha standard to avoid an arbitrary and not
>> mandated commit bar seems the definition of silly.
>>
>> On 31 Oct 2023, at 04:34, J. D. Jordan  wrote:
>>
>> 
>> That is my understanding as well. If the TCM and Accord based on TCM
>> branches are ready to commit by ~12/1 we can cut a 5.1 branch and then a
>> 5.1-alpha release.
>> Where “ready to commit” means our usual things of two committer +1 and
>> green CI etc.
>>
>> If we are not ready to commit then I propose that as long as everything
>> in the accord+tcm Apache repo branch has had two committer +1’s, but maybe
>> people are still working on fixes for getting CI green or similar, we cut a
>> 5.1-preview  build from the feature branch to vote on with known issues
>> documented.  This would not be the preferred path, but would be a way to
>> have a voted on release for summit.
>>
>> -Jeremiah
>>
>> On Oct 30, 2023, at 5:59 PM, Mick Semb Wever  wrote:
>>
>> 
>>
>> Hoping we can get clarity on this.
>>
>> The proposal was, once TCM and Accord merges to trunk,  then immediately
>> branch cassandra-5.1 and cut an immediate 5.1-alpha1 release.
>>
>> This was to focus on stabilising TCM and Accord as soon as it lands,
>> hence the immediate branching.
>>
>> And the alpha release as that is what our Release Lifecycle states it to
>> be.
>> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle
>>
>> My understanding is that there was no squeezing in extra features into
&

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-02 Thread Jeremiah Jordan

>
> My reading of ASF policy is that directing users to CEP preview releases
> that are not formally voted upon is not acceptable. The policy you quote
> indicates they should be intended only for active participants on dev@,
> whereas our explicit intention is to enable them to be advertised to users
> at the summit.
>

Yes this is my read as well.  If we want to announce anything at summit and
have a wider audience getting involved in using it, then we need to make a
“formal” preview release available, aka one that has been voted on and
approved by the PMC.  There is nothing stopping us from building and voting
on a release that comes from a feature branch, it just needs to go through
the normal artifact build and verification that any release goes through.

So as Mick asked earlier in the thread:
>
> Is anyone up for looking into adding a "preview" qualifier to our release
> process?
>
>
> I'm in favor of this. If we cut preview snapshots from trunk and all
> feature branches periodically (nightly? weekly?), preferably as docker
> images, this satisfies the desire to get these features into the hands of
> the dev and user community to test them out and provide feedback to the dev
> process while also allowing us to keep a high bar for merge to trunk.
>
>
I am also in favor of this, and if we can make it work there is nothing
stopping us from having someone use the capability to make a preview
release that can go to a formal vote.

My view is that we wait and see what the CI looks like at that time.
>

I also agree we should see what CI looks like at the time and make the
“go”/"no go" on merge decision based on the state then.  But I think we
need to have a plan for what happens if we end up with “no go”.  We don’t
want to be scrambling at the last minute in that case.  Again “no go” is
not my preferred outcome, but I want to make sure we have plans in place
should it occur.

-Jeremiah

On Nov 2, 2023 at 9:16:54 AM, Benedict  wrote:

> My view is that we wait and see what the CI looks like at that time.
>
>
> My reading of ASF policy is that directing users to CEP preview releases
> that are not formally voted upon is not acceptable. The policy you quote
> indicates they should be intended only for active participants on dev@,
> whereas our explicit intention is to enable them to be advertised to users
> at the summit.
>
>
>
>
> On 2 Nov 2023, at 13:27, Josh McKenzie  wrote:
>
> 
>
> I’m not sure we need any additional mechanisms beyond DISCUSS threads,
> polls and lazy consensus?
> ...
> This likely means at least another DISCUSS thread and lazy consensus if
> you want to knowingly go against it, or want to modify or clarify what’s
> meant.
> ...
> It can be chucked out or rewoven at zero cost, but if the norms have taken
> hold and are broadly understood in the same way, it won’t change much or at
> all, because the actual glue is the norm, not the words, which only serve
> to broadcast some formulation of the norm.
>
> 100% agree on all counts. Hopefully this discussion is useful for other
> folks as well.
>
> So - with the clarification that our agreement on green CI represents a
> polled majority consensus of the folks participating on the discussion at
> the time but not some kind of hard unbendable obligation, is this something
> we want to consider relaxing for TCM and Accord?
>
> This thread ran long (and got detoured - mea culpa) - the tradeoffs seem
> like:
>
>1. We merge them without green CI and cut a cassandra-5.1 branch so we
>can release an alpha-1 snapshot from that branch. This likely leaves
>cassandra-5.1 and trunk in an unstable place w/regards to CI. TCM/Accord
>devs can be expected to be pulled into fixing core issues / finalizing the
>features and the burden for test stabilization "leaking out" across others
>in the community who don't have context on their breakage (see:
>CASSANDRA-8099, cassandra-4.0 release, cassandra-4.1 release, now push for
>cassandra-5.0 QA stabilization).
>2. Push for green CI on Accord / TCM before merge and alpha
>availability, almost certainly delaying their availability to the 
> community.
>3. Cut a preview / snapshot release from the accord feature branch,
>made available to the dev community. We could automate creation / update of
>docker images with snapshot releases of all HEAD for trunk and feature
>branches.
>4. Some other approach I'm not thinking of / missed
>
> So as Mick asked earlier in the thread:
>
> Is anyone up for looking into adding a "preview" qualifier to our release
> process?
>
>
> I'm in favor of this. If we cut preview snapshots from trunk and all
> feature branches periodically (nightly? weekly?), preferably as docker
> images, this satisfies the desire to get these features into the hands of
> the dev and user community to test them out and provide feedback to the dev
> process while also allowing us to keep a high bar for merge to trunk.
>
> Referencing the ASF Relea

Re: Push TCM (CEP-21) and Accord (CEP-15) to 5.1 (and cut an immediate 5.1-alpha1)

2023-11-02 Thread Jeremiah Jordan

st wasted cycles.
>
>
>
>
> On 1 Nov 2023, at 15:31, Josh McKenzie  wrote:
>
> 
>
> That vote thread also did not reach the threshold; it was incorrectly
> counted, as committer votes are not binding for procedural changes. I
> counted at most 8 PMC +1 votes.
>
> This piqued my curiosity.
>
> Link to how we vote:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance
> *STATUS: Ratified 2020/06/25*
> Relevant bits here:
>
> On dev@:
>
>1. Discussion / binding votes on releases (Consensus: min 3 PMC +1, no
>-1)
>2. Discussion / binding votes on project structure and governance
>changes (adopting subprojects, how we vote and govern, etc). (super
>majority)
>
>
> The thread where we voted on the CI bar Jeremiah referenced:
> https://lists.apache.org/thread/2shht9rb0l8fh2gfqx6sz9pxobo6sr60
> Particularly relevant bit:
>
> Committer / pmc votes binding. Simple majority passes.
>
> I think you're arguing that voting to change our bar for merging when it
> comes to CI falls under "votes on project structure"? I think when I called
> that vote I was conceptualizing it as a technical discussion about a shared
> norm on how we as committers deal with code contributions, where the
> "committer votes are binding, simple majority" applies.
>
> I can see credible arguments in either direction, though I'd have expected
> those concerns or counter-arguments to have come up back in Jan of 2022
> when we voted on the CI changes, not almost 2 years later after us
> operating under this new shared norm. The sentiments expressed on the
> discuss and vote thread were consistently positive and uncontentious; this
> feels to me like it falls squarely under the spirit of lazy consensus only
> at a much larger buy-in level than usual:
> https://community.apache.org/committers/decisionMaking.html#lazy-consensus
>
> We've had plenty of time to call this vote and merge bar into question
> (i.e. every ticket we merge we're facing the "no regressions" bar), and the
> only reason I'd see us treating TCM or Accord differently would be because
> they're much larger bodies of work at merge so it's going to be a bigger
> lift to get to non-regression CI, and/or we would want a release cut from a
> formal branch rather than a feature branch for preview.
>
> An alternative approach to keep this merge and CI burden lower would have
> been more incremental work merged into trunk periodically, an argument many
> folks in the community have made in the past. I personally have mixed
> feelings about it; there's pros and cons to both approaches.
>
> All that said, I'm in favor of us continuing with this as a valid and
> ratified vote (technical norms == committer binding + simple majority). If
> we want to open a formal discussion about instead considering that a
> procedural change and rolling things back based on those grounds I'm fine
> with that, but we'll need to discuss that and think about the broader
> implications since things like changing import ordering, tooling, or other
> ecosystem-wide impacting changes (CI systems we all share, etc) would
> similarly potentially run afoul of needing supermajority pmc participation
> of we categorize that type of work as "project structure" as per the
> governance rules.
>
> On Tue, Oct 31, 2023, at 1:25 PM, Jeremy Hanna wrote:
>
> I think the goal is to say "how could we get some working version of
> TCM/Accord into people's hands to try out at/by Summit?"  That's all.
> People are eager to see it and try it out.
>
> On Oct 31, 2023, at 12:16 PM, Benedict  wrote:
>
>
> No, if I understand it correctly we’re in weird hypothetical land where
> people are inventing new release types (“preview”) to avoid merging TCM[1]
> in the event we want to cut a 5.1 release from the PR prior to the summit
> if there’s some handful of failing tests in the PR.
>
> This may or may not be a waste of everyone’s time.
>
> Jeremiah, I’m not questioning: it was procedurally invalid. How we handle
> that is, as always, a matter for community decision making.
>
> [1] how this helps isn’t entirely clear
>
>
> On 31 Oct 2023, at 17:08, Paulo Motta  wrote:
>
> 
> Even if it was not formally prescribed as far as I understand, we have
> been following the "only merge on Green CI" custom as much as possible for
> the past several years. Is the proposal to relax this rule for 5.0?
>
> On Tue, Oct 31, 2023 at 1:02 PM Jeremiah Jordan 
> wrote:
>
> You are free to argue validity.  I am just stating what I see on the
> mailing list and in the wiki.  We

Re: [DISCUSS] CASSANDRA-18940 SAI post-filtering reads don't update local table latency metrics

2023-12-01 Thread Jeremiah Jordan

 Again I am coming at this from the operator/end user perspective.
Creating a metrics dashboard, and then I am looking at those metrics to
understand what my queries are doing.  We have coordinator query level
metrics, and then we have lower level table metrics on the replicas.  I
want to be able to draw a line from this set of coordinator query metrics,
to that set of table metrics, and be able to understand how they are
affecting each other for a given query.

The best would be for SAI / Indexes to have their very own sets of all the
metrics to understand how many rows are read by a given SAI query, and how
that turns into the over all time for the query, and how long those
individual reads were taking, etc.

But at the very least I want all of that separate from the metrics for my
regular point reads.

And yes putting the individual point read metrics into the range metrics
would be strange.  But rolling up the time to get all the rows and rolling
that into the Range metrics could possibly make sense.  Still strange.  So
again SAI specific metrics seem the best to me, rather than shoe horning
them into the existing metrics.

-Jeremiah

On Dec 1, 2023 at 1:04:47 PM, Caleb Rackliffe 
wrote:

> Right. SAI queries are distributed range queries that produce local
> single-partition reads. They should absolutely not be recorded in the local
> range read latency metric. I'm fine ultimately with a new metric or the
> existing local single-partition read metric.
>
> On Fri, Dec 1, 2023 at 1:02 PM J. D. Jordan 
> wrote:
>
>> At the coordinator level SAI queries fall under Range metrics. I would
>> either put them under the same at the lower level or in a new SAI metric.
>>
>> It would be confusing to have the top level coordinator query metrics in
>> Range and the lower level in Read.
>>
>> On Dec 1, 2023, at 12:50 PM, Caleb Rackliffe 
>> wrote:
>>
>> 
>> So the plan would be to have local "Read" and "Range" remain unchanged in
>> TableMetrics, but have a third "SAIRead" (?) just for SAI post-filtering
>> read SinglePartitionReadCommands? I won't complain too much if that's what
>> we settle on, but it just depends on how much this is a metric for
>> ReadCommand subclasses operating at the node-local level versus something
>> we think we should link conceptually to a user query. SAI queries will
>> produce a SinglePartitionReadCommand per matching primary key, so that
>> definitely won't work for the latter.
>>
>> @Mike On a related note, we now have "PartitionReads" and "RowsFiltered"
>> in TableQueryMetrics. Should the former just be removed, given a.) it
>> actually is rows now not partitions and b.) "RowsFiltered" seems like it'll
>> be almost  the same thing now? (I guess if we ever try batching rows reads
>> per partition, it would come in handy again...)
>>
>> On Fri, Dec 1, 2023 at 12:30 PM J. D. Jordan 
>> wrote:
>>
>>> I prefer option 2. It is much easier to understand and roll up two
>>> metrics than to do subtractive dashboards.
>>>
>>> SAI reads are already “range reads” for the client level metrics, not
>>> regular reads. So grouping them into the regular read metrics at the lower
>>> level seems confusing to me in that sense as well.
>>>
>>> As an operator I want to know how my SAI reads and normal reads are
>>> performing latency wise separately.
>>>
>>> -Jeremiah
>>>
>>> On Dec 1, 2023, at 11:15 AM, Caleb Rackliffe 
>>> wrote:
>>>
>>> 
>>> Option 1 would be my preference. Seems both useful to have a single
>>> metric for read load against the table and a way to break out SAI reads
>>> specifically.
>>>
>>> On Fri, Dec 1, 2023 at 11:00 AM Mike Adamson 
>>> wrote:
>>>
 Hi,

 We are looking at adding SAI post-filtering reads to the local table
 metrics and would like some feedback on the best approach.

 We don't think that SAI reads are that special so they can be included
 in the table latencies, but how do we handle the global counts and the SAI
 counts? Do we need to maintain a separate count of SAI reads? We feel the
 answer to this is yes so how do we do the counting? There are two options
 (others welcome):

 1. All reads go into the current global count and we have a separate
 count for SAI specific reads. So non-SAI reads = global count - SAI count
 2. We leave the exclude the SAI reads from the current global count so
 total reads = global count + SAI count

 Our preference is for option 1 above. Does anyone have any strong views
 / opinions on this?

 --
 [image: DataStax Logo Square]  *Mike
 Adamson*
 Engineering

 +1 650 389 6000 <16503896000> | datastax.com

 Find DataStax Online: [image: LinkedIn Logo]

Re: Welcome Mike Adamson as Cassandra committer

2023-12-08 Thread Jeremiah Jordan

 Congrats Mike!  Thanks for all your work on SAI and Vector index.  Well
deserved!

On Dec 8, 2023 at 8:52:07 AM, Brandon Williams  wrote:

> Congratulations Mike!
>
> Kind Regards,
> Brandon
>
> On Fri, Dec 8, 2023 at 8:41 AM Benjamin Lerer  wrote:
>
>
> The PMC members are pleased to announce that Mike Adamson has accepted
>
> the invitation to become committer.
>
>
> Thanks a lot, Mike, for everything you have done for the project.
>
>
> Congratulations and welcome
>
>
> The Apache Cassandra PMC members
>
>

Re: Future direction for the row cache and OHC implementation

2023-12-15 Thread Jeremiah Jordan

>
> from a maintenance and
> integration testing perspective I think it would be better to keep the
> ohc in-tree, so we will be aware of any issues immediately after the
> full CI run.


>From the original email bringing OHC in tree is not an option because the
current maintainer is not interested in donating it to the ASF.  Thus the
option 1 of some set of people forking it to their own github org and
maintaining a version outside of the ASF C* project.

-Jeremiah

On Dec 15, 2023 at 5:57:31 AM, Maxim Muzafarov  wrote:

> Ariel,
> thank you for bringing this topic to the ML.
>
> I may be missing something, so correct me if I'm wrong somewhere in
> the management of the Cassandra ecosystem.  As I see it, the problem
> right now is that if we fork the ohc and put it under its own root,
> the use of that row cache is still not well tested (the same as it is
> now). I am particularly emphasising the dependency management side, as
> any version change/upgrade in Cassandra and, as a result of that
> change a new set of libraries in the classpath should be tested
> against this integration.
>
> So, unless it is being widely used by someone else outside of the
> community (which it doesn't seem to be), from a maintenance and
> integration testing perspective I think it would be better to keep the
> ohc in-tree, so we will be aware of any issues immediately after the
> full CI run.
>
> I'm also +1 for not deprecating it, even if it is used in narrow
> cases, while the cost of maintaining its source code remains quite low
> and it brings some benefits.
>
> On Fri, 15 Dec 2023 at 05:39, Ariel Weisberg  wrote:
>
>
> Hi,
>
>
> To add some additional context.
>
>
> The row cache is disabled by default and it is already pluggable, but
> there isn’t a Caffeine implementation present. I think one used to exist
> and could be resurrected.
>
>
> I personally also think that people should be able to scratch their own
> itch row cache wise so removing it entirely just because it isn’t commonly
> used isn’t the right move unless the feature is very far out of scope for
> Cassandra.
>
>
> Auto enabling/disabling the cache is a can of worms that could result in
> performance and reliability inconsistency as the DB enables/disables the
> cache based on heuristics when you don’t want it to. It being off by
> default seems good enough to me.
>
>
> RE forking, we could create a GitHub org for OHC and then add people to
> it. There are some examples of dependencies that haven’t been contributed
> to the project that live outside like CCM and JAMM.
>
>
> Ariel
>
>
> On Thu, Dec 14, 2023, at 5:07 PM, Dinesh Joshi wrote:
>
>
> I would avoid taking away a feature even if it works in narrow set of
> use-cases. I would instead suggest -
>
>
> 1. Leave it disabled by default.
>
> 2. Detect when Row Cache has a low hit rate and warn the operator to turn
> it off. Cassandra should ideally detect this and do it automatically.
>
> 3. Move to Caffeine instead of OHC.
>
>
> I would suggest having this as the middle ground.
>
>
> On Dec 14, 2023, at 4:41 PM, Mick Semb Wever  wrote:
>
>
>
>
>
> 3. Deprecate the row cache entirely in either 5.0 or 5.1 and remove it in
> a later release
>
>
>
>
>
> I'm for deprecating and removing it.
>
> It constantly trips users up and just causes pain.
>
>
> Yes it works in some very narrow situations, but those situations often
> change over time and again just bites the user.  Without the row-cache I
> believe users would quickly find other, more suitable and lasting,
> solutions.
>
>
>
>

Re: Welcome Maxim Muzafarov as Cassandra Committer

2024-01-08 Thread Jeremiah Jordan

 Congrats Maxim!  Thanks for all of your contributions!

On Jan 8, 2024 at 12:19:04 PM, Josh McKenzie  wrote:

> The Apache Cassandra PMC is pleased to announce that Maxim Muzafarov has
> accepted
> the invitation to become a committer.
>
> Thanks for all the hard work and collaboration on the project thus far,
> and we're all looking forward to working more with you in the future.
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>
>
>

Re: [DISCUSS] Add subscription mangement instructions to user@, dev@ message footers

2024-01-22 Thread Jeremiah Jordan

Here was the thread where it was removed:lists.apache.orgOn Jan 22, 2024, at 12:37 PM, J. D. Jordan  wrote:I think we used to have this and removed them because it was breaking the encryption signature on messages or something which meant they were very likely to be treated as spam?Not saying we can’t put it back on, but it was removed for good reasons from what I recall.On Jan 22, 2024, at 12:19 PM, Brandon Williams  wrote:+1Kind Regards,BrandonOn Mon, Jan 22, 2024 at 12:10 PM C. Scott Andreas  wrote:Hi all,I'd like to propose appending the following two footers to messages sent to the user@ and dev@ lists. The proposed postscript including line breaks is between the "X" blocks below.User List Footer:X---Unsubscribe: Send a blank email to user-unsubscr...@cassandra.apache.org. Do not reply to this message.Cassandra Community: Follow other mailing lists or join us in Slack: https://cassandra.apache.org/_/community.htmlXDev List Footer:X---Unsubscribe: Send a blank email to dev-unsubscr...@cassandra.apache.org. Do not reply to this message.Cassandra Community: Follow other mailing lists or join us in Slack: https://cassandra.apache.org/_/community.htmlXOffering this proposal for three reasons:– Many users are sending "Unsubscribe" messages to the full mailing list which prompts others to wish to unsubscribe – a negative cascade that affects the size of our user community.– Many users don't know where to go to figure out how to unsubscribe, especially if they'd joined many years ago.– Nearly all mailing lists provide a one-click mechanism for unsubscribing or built-in mail client integration to do so via message headers. Including compact instructions on how to leave is valuable to subscribers.#asfinfra indicates that such footers can be appended given project consensus and an INFRA- ticket: https://the-asf.slack.com/archives/CBX4TSBQ8/p1705939868631079If we reach consensus on adding a message footer, I'll file an INFRA ticket with a link to this thread.Thanks,– Scott

Re: Welcome Brad Schoening as Cassandra Committer

2024-02-21 Thread Jeremiah Jordan

Congrats!

On Feb 21, 2024 at 2:46:14 PM, Josh McKenzie  wrote:

> The Apache Cassandra PMC is pleased to announce that Brad Schoening has
> accepted
> the invitation to become a committer.
>
> Your work on the integrated python driver, launch script environment, and
> tests
> has been a big help to many. Congratulations and welcome!
>
> The Apache Cassandra PMC members
>

Re: Welcome Alexandre Dutra, Andrew Tolbert, Bret McGuire, Olivier Michallat as Cassandra Committers

2024-04-17 Thread Jeremiah Jordan

Congrats all!


On Apr 17, 2024 at 12:10:11 PM, Benjamin Lerer  wrote:

> The Apache Cassandra PMC is pleased to announce that Alexandre Dutra,
> Andrew Tolbert, Bret McGuire and Olivier Michallat have accepted the
> invitation to become committers on the java driver sub-project.
>
> Thanks for your contributions to the Java driver during all those years!
> Congratulations and welcome!
>
> The Apache Cassandra PMC members
>

Re: Cassandra PMC Chair Rotation, 2024 Edition

2024-06-20 Thread Jeremiah Jordan

 Welcome to the Chair role Dinesh!  Congrats!

On Jun 20, 2024 at 10:50:37 AM, Josh McKenzie  wrote:

> Another PMC Chair baton pass incoming! On behalf of the Apache Cassandra
> Project Management Committee (PMC) I would like to welcome and congratulate
> our next PMC Chair Dinesh Joshi (djoshi).
>
> Dinesh has been a member of the PMC for a few years now and many of you
> likely know him from his thoughtful, measured presence on many of our
> collective discussions as we've grown and evolved over the past few years.
>
> I appreciate the project trusting me as liaison with the board over the
> past year and look forward to supporting Dinesh in the role in the future.
>
> Repeating Mick (repeating Paulo's) words from last year: The chair is an
> administrative position that interfaces with the Apache Software Foundation
> Board, by submitting regular reports about project status and health. Read
> more about the PMC chair role on Apache projects:
> - https://www.apache.org/foundation/how-it-works.html#pmc
> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
>
> The PMC as a whole is the entity that oversees and leads the project and
> any PMC member can be approached as a representative of the committee. A
> list of Apache Cassandra PMC members can be found on:
> https://cassandra.apache.org/_/community.html
>

Re: Suggestions for CASSANDRA-18078

2024-06-20 Thread Jeremiah Jordan

 +1 from me for 1, just remove it now.
I think this case is different from CASSANDRA-19556/CASSANDRA-17425.  The
new guardrail from 19556 which would deprecate the 17425 has not been
committed yet.  In the case of MAXWRITETIME the replacement is already in
the code, we just didn’t remove MAXWRITETIME yet.

Jeremiah Jordan
e. jerem...@datastax.com
w. www.datastax.com



On Jun 20, 2024 at 11:46:08 AM, Štefan Miklošovič 
wrote:

> List,
>
> we need your opinions about CASSANDRA-18078.
>
> That ticket is about the removal of MAXWRITETIME function which was added
> in CASSANDRA-17425 and firstly introduced in 5.0-alpha1.
>
> This function was identified to be redundant in favor of CASSANDRA-8877
> and CASSANDRA-18060.
>
> The idea of the removal was welcomed and the patch was prepared doing so
> but it was never delivered and the question what to do with it, in
> connection with 5.0.0, still remains.
>
> The options are:
>
> 1) since 18078 was never released in GA, there is still time to remove it.
> 2) it is too late for the removal hence we would keep it in 5.0.0 and we
> would deprecate it in 5.0.1 and remove it in trunk.
>
> It is worth to say that there is a precedent in 2), in CASSANDRA-17495,
> where it was the very same scenario. A guardrail was introduced in alpha1.
> We decided to release and deprecate in 5.0.1 and remove in trunk. The same
> might be applied here, however we would like to have it confirmed if this
> is indeed the case or we prefer to just go with 1) and be done with it.
>
> Regards
>

Re: fixing paging state for 4.0

2019-09-24 Thread Jeremiah Jordan

Clients do negotiate the protocol version they use when connecting. If the 
server bumped the protocol version then this larger paging state could be part 
of the new protocol version. But that doesn’t solve the problem for existing 
versions.

The special treatment of Integer.MAX_VALUE can be done back to 3.x and fix the 
bug in all versions, letting users requests to receive all of their data.  
Which realistically is probably what someone who sets the protocol level query 
limit to Integer.MAX_VALUE is trying to do.

-Jeremiah

> On Sep 24, 2019, at 4:09 PM, Blake Eggleston  
> wrote:
> 
> Right, mixed version clusters. The opaque blob isn't versioned, and there 
> isn't an opportunity for min version negotiation that you have with the 
> messaging service. The result is situations where a client begins a read on 
> one node, and attempts to read the next page from a different node over a 
> protocol version where the paging state serialization format has changed. 
> This causes an exception deserializing the paging state and the read fails.
> 
> There are ways around this, but they're not comprehensive (I think), and 
> they're much more involved than just interpreting Integer.MAX_VALUE as 
> unlimited. The "right" solution would be for the paging state to be 
> deserialized/serialized on the client side, but that won't happen in 4.0.
> 
>> On Sep 24, 2019, at 1:12 PM, Jon Haddad  wrote:
>> 
>> What's the pain point?  Is it because of mixed version clusters or is there
>> something else that makes it a problem?
>> 
>>> On Tue, Sep 24, 2019 at 11:03 AM Blake Eggleston
>>>  wrote:
>>> 
>>> Changing paging state format is kind of a pain since the driver treats it
>>> as an opaque blob. I'd prefer we went with Sylvain's suggestion to just
>>> interpret Integer.MAX_VALUE as "no limit", which would be a lot simpler to
>>> implement.
>>> 
 On Sep 24, 2019, at 10:44 AM, Jon Haddad  wrote:
 
 I'm working with a team who just ran into CASSANDRA-14683 [1], which I
 didn't realize was an issue till now.
 
 Anyone have an interest in fixing full table pagination?  I'm not sure of
 the full implications of changing the int to a long in the paging stage.
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D14683&d=DwIFAg&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=6_gWDV_kv-TQJ8GyBlYfcrhPGl7WmGYGEJ9ET6rPARo&s=LcYkbQwf4gzl8tnMcVbFKr3PeZ_u8mHHnXTBRWtIZFU&e=
  
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>> 
>>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: [DISCUSS] Switch to using GitHub pull requests?

2020-01-23 Thread Jeremiah Jordan

Can’t you currently open a PR with the right commit message, have do review 
there with all comments posted back to JIRA, run CI on it and then merge it 
closing the PR?  This is the basic workflow you are proposing yes?

It is the reviewer and authors job to make sure CI ran and didn’t introduce new 
failing tests, it doesn’t matter how they were ran. It is just as easy to let 
something through when “pr triggered” tests have a failure as it is tests 
manually linked from a JIRA comment, if the author and reviewer think the 
failures are not new.

If someone want to setup some extra niceties, like auto triggered builds or 
something, to happen if people use the PR workflow, then I see no problem 
there. But I don’t think we need to force use of PRs.

This is why I don’t think we need to “switch” to using PR’s. There is no need 
to switch. People can “also” use PRs. If someone who likes the PR workflow sets 
up some more nice stuff to happen when it is used, that would probably 
encourage more people to do things that way. But it doesn’t need to be forced.


> On Jan 22, 2020, at 9:53 PM, David Capwell  wrote:
> 
> Sorry Jeremiah, I don't understand your comment, would it be possible to
> elaborate more?
> 
> About the point on not forbidding as long as the review and testing needs
> are met, could you define what that means to you?
> 
> There are a few questions I ask myself
> 
> "Does the current process stop code which breaks the build from merging?"
> And
> "How long does it take for regressions to get noticed"
> 
> If I take myself as a example, I added a test which always failed in
> CircleCI (I assume Jenkins as well), this got merged, and the jira to fix
> it was around 3 months later.  I am personally trying to find ways to
> detect issues faster, but also see that test fail frequently (unit, jvm
> dtest, python dtest, etc.) so it's easy for this to slip through.
> 
> My mind set is that by switching to PRs (even if all the conversations are
> in JIRA) we can setup automation which helps detect issues before merging.
> 
>> On Wed, Jan 22, 2020, 7:00 PM J. D. Jordan 
>> wrote:
>> 
>> Doesn’t this github review workflow as described work right now?  It’s
>> just not the “only” way people do things?
>> 
>> I don’t think we need to forbid other methods of contribution as long as
>> the review and testing needs are met.
>> 
>> -Jeremiah
>> 
 On Jan 22, 2020, at 6:35 PM, Yifan Cai  wrote:
>>> 
>>> +1 nb to the PR approach for reviewing.
>>> 
>>> 
>>> And thanks David for initiating the discussion. I would like to put my 2
>>> cents in it.
>>> 
>>> 
>>> IMO, reviews comments are better associated with the changes, precisely
>> to
>>> the line level, if they are put in the PR rather than in the JIRA
>> comments.
>>> Discussions regarding each review comments are naturally organized into
>>> this own dedicated thread. I agree that JIRA comments are more suitable
>> for
>>> high-level discussion regarding the design. But review comments in PR can
>>> do a better job at code-level discussion.
>>> 
>>> 
>>> Another benefit is to relief reviewers’ work. In the PR approach, we can
>>> leverage the PR build step to perform an initial qualification. The
>> actual
>>> review can be deferred until the PR build passes. So reviewers are sure
>>> that the change is good at certain level, i.e. it builds and the tests
>> can
>>> pass. Right now, contributors volunteer for providing the link to CI test
>>> (however, one still needs to open the link to see the result).
>>> 
 On Wed, Jan 22, 2020 at 3:16 PM David Capwell 
>> wrote:
 
 Thanks for the links Benedict!
 
 Been reading the links and see the following points being made
 
 *) enabling the spark process would lower the process to enter the
>> project
 *) high level discussions should be in JIRA [1]
 *) not desirable to annotation JIRA and Github; should only annotate
>> JIRA
 (reviewer, labels, etc.)
 *) given the multi branch nature, pull requires are not intuitive [2]
 *) merging is problematic and should keep the current merge process
 *) commits@ is not usable with PRs
 *) commits@ is better because of PRs
 *) people are more willing to nit-pick with PRs, less likely with
>> current
 process [3]
 *) opens potential to "prevent commits that don't pass the tests" [4]
 *) prefer the current process
 https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_development_patches.html&d=DwIFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=CNZK3RiJDLqhsZDG6FQGnXn8WyPRCQhp4x_uBICNC0g&m=8ytcWFMDYEPmp0dPd-upl_fBM0p1Yg59TmWwfX-wul4&s=VcidG8VBMY2hhNtOULVNpvbfSdq4tMobD6JxCLB91J8&e=
   [5]
 *) current process is annoying since you have to take the link in github
 and attach to JIRA for each comment in review
 *) missed notifications, more trust in commits@
 *) if someone rewrites history, comments could be hard to see
 *) its better to leave comments in the source

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-19 Thread Jeremiah Jordan

If you don’t know what you are doing you will have one rack which will also be 
safe. If you are setting up racks then you most likely read something about 
doing that, and should also be fine.
This discussion has gone off the rails 100 times with what ifs that are 
“letting perfect be the enemy of good”. The setting doesn’t need to be perfect. 
It just needs to be “good enough“.

> On Feb 19, 2020, at 1:44 AM, Mick Semb Wever  wrote:
> 
> Why do we have to assume random assignment?
> 
> 
> 
> Because token allocation only works once you have a node in RF racks. If
> you don't bootstrap nodes in alternating racks, or just never have RF racks
> setup (but more than one rack) it's going to be random.
> 
> Whatever default we choose should be a safe choice, not the best for
> experts. Making it safe (4 as the default would be great) shouldn't be
> difficult, and I thought Joey was building a  list of related issues?
> 
> Seeing these issues put together summarised would really help build the
> consensus IMHO.
> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Proposal: release 2.2 (based on current trunk) before 3.0 (based on 8099)

2015-05-11 Thread Jeremiah Jordan

Cassandra-jdbc can do cql3 as well as cql2. The rub (and why I would never 
recommend it) is that it does cql3 over thrift. So you lose out on all the 
native protocol features.



> On May 11, 2015, at 2:53 PM, Brian Hess  wrote:
> 
> One thing that does jump out at me, though, is about CQL2.  As much as we
> have advised against using cassandra-jdbc, I have encountered a few that
> actually have used that as an integration point.  I believe that
> cassandra-jdbc is CQL2-based, which is the main reason we have been
> advising folks against it.
> 
> Can we just confirm that there isn't in fact widespread use of CQL2-based
> cassandra-jdbc?  That just jumps out at me.
> 
> On Mon, May 11, 2015 at 2:59 PM, Aleksey Yeschenko 
> wrote:
> 
>>> So I think EOLing 2.0.x when 2.2 comes
>>> out is reasonable, especially considering that 2.2 is realistically a
>> month
>>> or two away even if we can get a beta out this week.
>> 
>> Given how long 2.0.x has been alive now, and the stability of 2.1.x at the
>> moment, I’d say it’s fair enough to EOL 2.0 as soon as 2.2 gets out. Can’t
>> argue here.
>> 
>>> If push comes to shove I'm okay being ambiguous here, but can we just
>> say
>>> "when 3.0 is released we EOL 2.1?"
>> 
>> Under our current projections, that’ll be exactly “a few months after 2.2
>> is released”, so I’m again fine with it.
>> 
>>> P.S. The area I'm most concerned about introducing destabilizing changes
>> in
>>> 2.2 is commitlog
>> 
>> So long as you don’t you compressed CL, you should be solid. You are
>> probably solid even if you do use compressed CL.
>> 
>> Here are my only concerns:
>> 
>> 1. New authz are not opt-in. If a user implements their own custom
>> authenticator or authorized, they’d have to upgrade them sooner. The test
>> coverage for new authnz, however, is better than the coverage we used to
>> have before.
>> 
>> 2. CQL2 is gone from 2.2. Might force those who use it migrate faster. In
>> practice, however, I highly doubt that anybody using CQL2 is also someone
>> who’d already switch to 2.1.x or 2.2.x.
>> 
>> 
>> --
>> AY
>> 
>> On May 11, 2015 at 21:12:26, Jonathan Ellis (jbel...@gmail.com) wrote:
>> 
>> On Sun, May 10, 2015 at 2:42 PM, Aleksey Yeschenko 
>> wrote:
>> 
>>> 3.0, however, will require a stabilisation period, just by the nature of
>>> it. It might seem like 2.2 and 3.0 are closer to each other than 2.1 and
>>> 2.2 are, if you go purely by the feature list, but in fact the opposite
>> is
>>> true.
>> 
>> You are probably right. But let me push back on some of the extra work
>> you're proposing just a little:
>> 
>> 1) 2.0.x branch goes EOL when 3.0 is out, as planned
>> 
>> 3.0 was, however unrealistically, planned for April. And it's moving the
>> goalposts to say the plan was always to keep 2.0.x for three major
>> releases; the plan was to EOL with "the next major release after 2.1"
>> whether that was called 3.0 or not. So I think EOLing 2.0.x when 2.2 comes
>> out is reasonable, especially considering that 2.2 is realistically a month
>> or two away even if we can get a beta out this week.
>> 
>> 2) 3.0.x LTS branch stays, as planned, and helps us stabilise the new
>>> storage engine
>> 
>> Yes.
>> 
>> 
>>> 3) in a few months after 2.2 gets released, we EOL 2.1. Users upgrade to
>>> 2.2, get the same stability as with 2.1.7, plus a few new features
>> 
>> If push comes to shove I'm okay being ambiguous here, but can we just say
>> "when 3.0 is released we EOL 2.1?"
>> 
>> P.S. The area I'm most concerned about introducing destabilizing changes in
>> 2.2 is commitlog; I will follow up to make sure we have a solid QA plan
>> there.
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>>

Re: COMPACT STORAGE in 4.0?

2016-04-11 Thread Jeremiah Jordan

As I understand it "COMPACT STORAGE" only has meaning in the CQL parser for 
backwards compatibility as of 3.0. The on disk storage is not affected by its 
usage.

> On Apr 11, 2016, at 3:33 PM, Benedict Elliott Smith  
> wrote:
> 
> Compact storage should really have been named "not wasteful storage" - now
> everything is "not wasteful storage" so it's void of meaning. This is true
> without constraint. You do not need to limit yourself to a single non-PK
> column; you can have many and it will remain as or more efficient than
> "compact storage"
> 
> On Mon, 11 Apr 2016 at 15:04, Jack Krupansky 
> wrote:
> 
>> My understanding is Thrift is being removed from Cassandra in 4.0, but will
>> COMPACT STORAGE be removed as well? Clearly the two are related, but
>> COMPACT STORAGE had a performance advantage in addition to Thrift
>> compatibility, so its status is ambiguous.
>> 
>> I recall vague chatter, but no explicit deprecation notice or 4.0 plan for
>> removal of COMPACT STORAGE. Actually, I don't even see a deprecation notice
>> for Thrift itself in CHANGES.txt.
>> 
>> Will a table with only a single non-PK column automatically be implemented
>> at a comparable level of efficiency compared to the old/current Compact
>> STORAGE? That will still leave the question of how to migrate a non-Thrift
>> COMPACT STORAGE table (i.e., used for performance by a CQL-oriented
>> developer rather than Thrift compatibility per se) to pure CQL.
>> 
>> -- Jack Krupansky
>>

Re: Proposal - 3.5.1

2016-10-20 Thread Jeremiah Jordan

In the original tick tock plan we would not have kept 4.0.x around.  So I am 
proposing a change for that and then we label the 3.x and 4.x releases as 
"development releases" or some other thing and have "yearly" LTS releases with 
.0.x.
Those are similar to the previous 1.2/2.0/2.1/2.2 and we are adding semi stable 
development releases as well which give people an easier way to try out new 
stuff than "build it yourself", which was the only way to do that in between 
the previous Big Bang releases.



> On Oct 20, 2016, at 3:59 PM, Jeff Jirsa  wrote:
> 
> 
> 
>> On 2016-10-20 13:26 (-0700), "J. D. Jordan"  
>> wrote: 
>> If you think of the tick tock releases as interim development releases I 
>> actually think they have been working pretty well. What if we continue with 
>> the same process and do 4.0.x as LTS like we have 3.0.x LTS.
>> 
>> So you get 4.x releases that are trickling out new features which will 
>> eventually be in the 5.0.x LTS and you get 4.0.x as an LTS release of all 
>> the 3.x built up features.
>> 
>> This seems like a fairly straight forward process to me.  It gives people 
>> monthly releases that they can test new features with, but it also provides 
>> a stable line for those that want one.
>> 
> 
> So just tick/tock with new labels? How do we stop users from getting into the 
> situation where they're running 4.5, there's a critical flaw in 4.5, and 
> there's no 4.5.1 ever going to be released? Real users still won't want to 
> jump to 4.7, because there's added risk from stuff that went into 4.6 and 4.7 
> ? Or is it simply "if you want to run bleeding edge, you better be willing to 
> stay on that bleeding edge for up to a year"? 
> 
> 
> 
>

Re: [VOTE] self-assignment of jira tickets

2017-03-29 Thread Jeremiah Jordan

+1 non-binding. That requirement always seemed silly to me.

> On Mar 29, 2017, at 8:21 AM, Jason Brown  wrote:
> 
> Hey all,
> 
> Following up my thread from a week or two ago (
> https://lists.apache.org/thread.html/0665f40c7213654e99817141972c003a2131aba7a1c63d6765db75c5@%3Cdev.cassandra.apache.org%3E),
> I'd like to propose a vote to change to allow any potential contributor to
> assign a jira to themselves without needing to be added to the contributors
> group first.
> 
> https://issues.apache.org/jira/browse/INFRA-11950 is an example of how to
> get this done with INFRA.
> 
> Vote will be open for 72 hours.
> 
> Thanks,
> 
> -Jason Brown

Re: Getting partition min/max timestamp

2018-01-14 Thread Jeremiah Jordan

Don’t forget about deleted and missing data. The bane of all on replica 
aggregation optimization’s. 

> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa  wrote:
> 
> 
> You’re right it’s not stored in metadata now. Adding this to metadata isn’t 
> hard, it’s just hard to do it right where it’s useful to people with other 
> data models (besides yours) so it can make it upstream (if that’s your goal). 
> In particular the worst possible case is a table with no clustering key and a 
> single non-partition key column. In that case storing these extra two long 
> time stamps may be 2-3x more storage than without, which would be a huge 
> regression, so you’d have to have a way to turn that feature off.
> 
> 
> Worth mentioning that there are ways to do this without altering Cassandra -  
> consider using static columns that represent the min timestamp and max 
> timestamp. Create them both as ints or longs and write them on all 
> inserts/updates (as part of a batch, if needed). The only thing you’ll have 
> to do is find a way for “min timestamp” to work - you can set the min time 
> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so 
> that future writes won’t overwrite those values. That gives you a first write 
> win behavior for that column, which gives you an effective min timestamp for 
> the partition as a whole.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 13, 2018, at 4:58 AM, Arthur Kushka  wrote:
>> 
>> Hi folks,
>> 
>> Currently, I working on custom CQL operator that should return the max
>> timestamp for some partition.
>> 
>> I don't think that scanning of partition for that kind of data is a nice
>> idea. Instead of it, I thinking about adding a metadata to the partition. I
>> want to store minTimestamp and maxTimestamp for every partition as it
>> already done in Memtable`s. That timestamps will be updated on each
>> mutation operation, that is quite cheap in comparison to full scan.
>> 
>> I quite new to Cassandra codebase and want to get some critics and ideas,
>> maybe that kind of data already stored somewhere or you have better ideas.
>> Is my assumption right?
>> 
>> Best,
>> Artur
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Getting partition min/max timestamp

2018-01-14 Thread Jeremiah Jordan

Finding the max timestamp of a partition is an aggregation.  Doing that 
calculation purely on the replica (wether pre-calculated or not) is problematic 
for any CL > 1 in the face of deletions or update that are missing. As the 
contents of the partition on a given replica are different than what they would 
be when merged on the coordinator.

> On Jan 14, 2018, at 3:33 PM, "arhel...@gmail.com" wrote:
> 
> First of all, thx for all the ideas. 
> 
> Benedict ElIiott Smith, in code comments I found a notice that data in 
> EncodingStats can be wrong, not sure that its good idea to use it for 
> accurate results. As I understand incorrect data is not a problem for the 
> current use case of it, but not for my one. Currently, I added fields for 
> every AtomicBTreePartition. That fields I update in addAllWithSizeDelta call, 
> but also now I get that I should think about the case of data removing.
> 
> I currently don't really care about TTL's, but its the case about I should 
> think, thx.
> 
> Jeremiah Jordan, thx for notice, but I don't really get what are you mean 
> about replica aggregation optimization’s. Can you please explain it for me?
> 
>> On 2018-01-14 17:16, Benedict Elliott Smith  wrote: 
>> (Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
>> that if TTLs or tombstones are involved the metadata we have, or can add,
>> is going to be worthless in most cases anyway)
>> 
>> On 14 January 2018 at 16:11, Benedict Elliott Smith 
>> wrote:
>> 
>>> We already store the minimum timestamp in the EncodingStats of each
>>> partition, to support more efficient encoding of atom timestamps.  This
>>> just isn't exposed beyond UnfilteredRowIterator, though it probably could
>>> be.
>>> 
>>> Storing the max alongside would still require justification, though its
>>> cost would actually be fairly nominal (probably only a few bytes; it
>>> depends on how far apart min/max are).
>>> 
>>> I'm not sure (IMO) that even a fairly nominal cost could be justified
>>> unless there were widespread benefit though, which I'm not sure this would
>>> provide.  Maintaining a patched variant of your own that stores this
>>> probably wouldn't be too hard, though.
>>> 
>>> In the meantime, exposing and utilising the minimum timestamp from
>>> EncodingStats is probably a good place to start to explore the viability of
>>> the approach.
>>> 
>>> On 14 January 2018 at 15:34, Jeremiah Jordan 
>>> wrote:
>>> 
>>>> Don’t forget about deleted and missing data. The bane of all on replica
>>>> aggregation optimization’s.
>>>> 
>>>>> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa  wrote:
>>>>> 
>>>>> 
>>>>> You’re right it’s not stored in metadata now. Adding this to metadata
>>>> isn’t hard, it’s just hard to do it right where it’s useful to people with
>>>> other data models (besides yours) so it can make it upstream (if that’s
>>>> your goal). In particular the worst possible case is a table with no
>>>> clustering key and a single non-partition key column. In that case storing
>>>> these extra two long time stamps may be 2-3x more storage than without,
>>>> which would be a huge regression, so you’d have to have a way to turn that
>>>> feature off.
>>>>> 
>>>>> 
>>>>> Worth mentioning that there are ways to do this without altering
>>>> Cassandra -  consider using static columns that represent the min timestamp
>>>> and max timestamp. Create them both as ints or longs and write them on all
>>>> inserts/updates (as part of a batch, if needed). The only thing you’ll have
>>>> to do is find a way for “min timestamp” to work - you can set the min time
>>>> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
>>>> that future writes won’t overwrite those values. That gives you a first
>>>> write win behavior for that column, which gives you an effective min
>>>> timestamp for the partition as a whole.
>>>>> 
>>>>> --
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Jan 13, 2018, at 4:58 AM, Arthur Kushka  wrote:
>>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> Currently, I working on custom CQL operator that should return the max
>>>>>> timestamp for some partition.
>>

Re: [DISCUSS] java 9 and the future of cassandra on the jdk

2018-03-20 Thread Jeremiah Jordan

My suggestion would be to keep trunk on the latest LTS by default, but with 
compatibility with the latest release if possible.  Since Oracle LTS releases 
are every 3 years, I would not want to tie us to that release cycle?
So until Java 11 is out that would mean trunk should work under Java 8, with 
the option of being compiled/run under Java 9 or 10.  Once Java 11 is out we 
could then switch to 11 only.

-Jeremiah

On Mar 20, 2018, at 10:48 AM, Jason Brown  wrote:

>>> Wouldn't that potentially leave us in a situation where we're ready for
> a C* release but blocked waiting on a new LTS cut?
> 
> Agreed, and perhaps if we're close enough to a LTS release (say three
> months or less), we could choose to delay (probably with community
> input/vote). If we're a year or two out, then, no, we should not wait. I
> think this is what I meant to communicate by "Perhaps we can evaluate this
> over time." (poorly stated, in hindsight)
> 
>> On Tue, Mar 20, 2018 at 7:22 AM, Josh McKenzie  wrote:
>> 
>> Need a little clarification on something:
>> 
>>> 2) always release cassandra on a LTS version
>> combined with:
>>> 3) keep trunk on the lasest jdk version, assumming we release a major
>>> cassandra version close enough to a LTS release.
>> 
>> Wouldn't that potentially leave us in a situation where we're ready
>> for a C* release but blocked waiting on a new LTS cut? For example, if
>> JDK 9 were the currently supported LTS and trunk was on JDK 11, we'd
>> either have to get trunk to work with 9 or wait for 11 to resolve
>> that.
>> 
>>> On Tue, Mar 20, 2018 at 9:32 AM, Jason Brown  wrote:
>>> Hi all,
>>> 
>>> 
>>> TL;DR Oracle has started revving the JDK version much faster, and we need
>>> an agreed upon plan.
>>> 
>>> Well, we probably should has this discussion this already by now, but
>> here
>>> we are. Oracle announced plans to release updated JDK version every six
>>> months, and each new version immediate supercedes the previous in all
>> ways:
>>> no updates/security fixes to previous versions is the main thing, and
>>> previous versions are EOL'd immediately. In addition, Oracle has planned
>>> parallel LTS versions that will live for three years, and then superceded
>>> by the next LTS; but not immediately EOL'd from what I can tell. Please
>> see
>>> [1, 2] for Oracle's offical comments about this change ([3] was
>>> particularly useful, imo), [4] and many other postings on the internet
>> for
>>> discussion/commentary.
>>> 
>>> We have a jira [5] where Robert Stupp did most of the work to get us onto
>>> Java 9 (thanks, Robert), but then the announcement of the JDK version
>>> changes happened last fall after Robert had done much of the work on the
>>> ticket.
>>> 
>>> Here's an initial proposal of how to move forward. I don't suspect it's
>>> complete, but a decent place to start a conversation.
>>> 
>>> 1) receommend OracleJDK over OpenJDK. IIUC from [3], the OpenJDK will
>>> release every six months, and the OracleJDK will release every three
>> years.
>>> Thus, the OracleJDK is the LTS version, and it just comes from a snapshot
>>> of one of those OpenJDK builds.
>>> 
>>> 2) always release cassandra on a LTS version. I don't think we can
>>> reasonably expect operators to update the JDK every six months, on time.
>>> Further, if there are breaking changes to the JDK, we don't want to have
>> to
>>> update established c* versions due to those changes, every six months.
>>> 
>>> 3) keep trunk on the lasest jdk version, assumming we release a major
>>> cassandra version close enough to a LTS release. Currently that seems
>>> reasonable for cassandra 4.0 to be released with java 11 (18.9 LTS)
>>> support. Perhaps we can evaluate this over time.
>>> 
>>> 
>>> Once we agree on a path forward, *it is impreative that we publish the
>>> decision to the docs* so we can point contributors and operators there,
>>> instead of rehashing the same conversation.
>>> 
>>> I look forward to a lively discussion. Thanks!
>>> 
>>> -Jason
>>> 
>>> [1] http://www.oracle.com/technetwork/java/eol-135779.html
>>> [2]
>>> https://blogs.oracle.com/java-platform-group/faster-and-
>> easier-use-and-redistribution-of-java-se
>>> [3]
>>> https://www.oracle.com/java/java9-screencasts.html?bcid=
>> 5582439790001&playerType=single-social&size=events
>>> [4]
>>> http://blog.joda.org/2018/02/java-9-has-six-weeks-to-live.
>> html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+
>> StephenColebournesBlog+%28Stephen+Colebourne%27s+blog%29
>>> [5] https://issues.apache.org/jira/browse/CASSANDRA-9608
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
>> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Welcome Joey Lynch as Cassandra PMC member

2024-07-24 Thread Jeremiah Jordan

Congrats Joey!

On Jul 24, 2024 at 10:12:21 AM, Abhijeet Dubey 
wrote:

> Congratulations Joey :)
>
> On Wed, Jul 24, 2024 at 8:38 PM Josh McKenzie 
> wrote:
>
>> Congrats Joey!
>>
>> On Wed, Jul 24, 2024, at 10:55 AM, Abe Ratnofsky wrote:
>>
>> Congratulations!
>>
>>
>>
>
> --
> *Abhijeet*
>

Re: [DISCUSS] CASSANDRA-13704 Safer handling of out of range tokens

2024-09-12 Thread Jeremiah Jordan

>
> JD we know it had nothing to do with range movements and could/should have
> been prevented far simpler with operational correctness/checks.
>
“Be better” is not the answer.  Also I think you are confusing our
incidents, the out of range token issue we saw was not because of an
operational “oops” that could have been avoided.

In the extreme, when no writes have gone to any of the replicas, what
> happened ? Either this was CL.*ONE, or it was an operational failure (not
> C* at fault).  If it's an operational fault, both the coordinator and the
> node can be wrong.  With CL.ONE, just the coordinator can be wrong and the
> problem still exists (and with rejection enabled the operator is now more
> likely to ignore it).
>

If some node has a bad ring state it can easily send no writes to the
correct place, no need for CL ONE, with the current system behavior CL ALL
will be successful, with all the nodes sent a mutation happily accepting
and acking data they do not own.

Yes, even with this patch if you are using CL ONE, if the coordinator has a
faulty ring state where no replica is “real” and it also decides that it is
one of the replicas, then you will have a successful write, even though no
correct node got the data.  If you are using CL ONE you already know you
are taking on a risk.  Not great, but there should be evidence in other
nodes of the bad thing occurring at the least.  Also for this same ring
state, for any CL > ONE with the patch the write would fail (assuming only
a single node has the bad ring state).

Even when the fix is only partial, so really it's more about more
> forcefully alerting the operator to the problem via over-eager
> unavailability …?
>

Not sure why you are calling this “over-eager unavailability”.  If the data
is going to the wrong nodes then the nodes may as well be down.  Unless the
end user is writing at CL ANY they have requested to be ACKed when CL nodes
which own the data have acked getting it.

-Jeremiah

On Sep 12, 2024 at 2:35:01 PM, Mick Semb Wever  wrote:

> Great that the discussion explores the issue as well.
>
> So far we've heard three* companies being impacted, and four times in
> total…?  Info is helpful here.
>
> *) Jordan, you say you've been hit by _other_ bugs _like_ it.  Jon i'm
> assuming the company you refer to doesn't overlap. JD we know it had
> nothing to do with range movements and could/should have been prevented far
> simpler with operational correctness/checks.
>
> In the extreme, when no writes have gone to any of the replicas, what
> happened ? Either this was CL.*ONE, or it was an operational failure (not
> C* at fault).  If it's an operational fault, both the coordinator and the
> node can be wrong.  With CL.ONE, just the coordinator can be wrong and the
> problem still exists (and with rejection enabled the operator is now more
> likely to ignore it).
>
> WRT to the remedy, is it not to either run repair (when 1+ replica has
> it), or to load flushed and recompacted sstables (from the period in
> question) to their correct nodes.  This is not difficult, but
> understandably lost-sleep and time-intensive.
>
> Neither of the above two points I feel are that material to the outcome,
> but I think it helps keep the discussion on track and informative.   We
> also know there are many competent operators out there that do detect data
> loss.
>
>
>
> On Thu, 12 Sept 2024 at 20:07, Caleb Rackliffe 
> wrote:
>
>> If we don’t reject by default, but log by default, my fear is that we’ll
>> simply be alerting the operator to something that has already gone very
>> wrong that they may not be in any position to ever address.
>>
>> On Sep 12, 2024, at 12:44 PM, Jordan West  wrote:
>>
>> 
>> I’m +1 on enabling rejection by default on all branches. We have been bit
>> by silent data loss (due to other bugs like the schema issues in 4.1) from
>> lack of rejection on several occasions and short of writing extremely
>> specialized tooling its unrecoverable. While both lack of availability and
>> data loss are critical, I will always pick lack of availability over data
>> loss. Its better to fail a write that will be lost than silently lose it.
>>
>> Of course, a change like this requires very good communication in
>> NEWS.txt and elsewhere but I think its well worth it. While it may surprise
>> some users I think they would be more surprised that they were silently
>> losing data.
>>
>> Jordan
>>
>> On Thu, Sep 12, 2024 at 10:22 Mick Semb Wever  wrote:
>>
>>> Thanks for starting the thread Caleb, it is a big and impacting patch.
>>>
>>> Appreciate the criticality, in a new major release rejection by default
>>> is obvious.   Otherwise the logging and metrics is an important addition to
>>> help users validate the existence and degree of any problem.
>>>
>>> Also worth mentioning that rejecting writes can cause degraded
>>> availability in situations that pose no problem.  This is a coordination
>>> problem on a probabilistic design, it's choose your evi

Re: [DISCUSS] CASSANDRA-13704 Safer handling of out of range tokens

2024-09-12 Thread Jeremiah Jordan

>
> 1. Rejecting writes does not prevent data loss in this situation.  It only
> reduces it.  The investigation and remediation of possible mislocated data
> is still required.
>

All nodes which reject a write prevent mislocated data.  There is still the
possibility of some node having the same wrong view of the ring as the
coordinator (including if they are the same node) accepting data.  Unless
there are multiple nodes with the same wrong view of the ring, data loss is
prevented for CL > ONE.

2. Rejecting writes is a louder form of alerting for users unaware of the
> scenario, those not already monitoring logs or metrics.
>

Without this patch no one is aware of any issues at all.  Maybe you are
referring to a situation where the patch is applied, but the default
behavior is to still accept the “bad” data?  In that case yes, turning on
rejection makes it “louder” in that your queries can fail if too many nodes
are wrong.

3. Rejecting writes does not capture all places where the problem is
> occurring.  Only logging/metrics fully captures everywhere the problem is
> occurring.
>

Not sure what you are saying here.

nodes can be rejecting writes when they are in fact correct hence
causing “over-eager
> unavailability”.
>

When would this occur?  I guess when the node with the bad ring information
is a replica sent data from a coordinator with the correct ring state?
There would be no “unavailability” here unless there were multiple nodes in
such a state.  I also again would not call this over eager, because the
node with the bad ring state is f’ed up and needs to be fixed.  So if being
considered unavailable doesn’t seem over-eager to me.

Given the fact that a user can read NEWS.txt and turn off this rejection of
writes, I see no reason not to err on the side of “the setting which gives
better protection even if it is not perfect”.  We should not let the want
to solve everything prevent incremental improvements, especially when we
actually do have the solution coming in TCM.

-Jeremiah

On Sep 12, 2024 at 5:25:25 PM, Mick Semb Wever  wrote:

>
> I'm less concerned with what the defaults are in each branch, and more the
> accuracy of what we say, e.g. in NEWS.txt
>
> This is my understanding so far, and where I hoped to be corrected.
>
> 1. Rejecting writes does not prevent data loss in this situation.  It only
> reduces it.  The investigation and remediation of possible mislocated data
> is still required.
>
> 2. Rejecting writes is a louder form of alerting for users unaware of the
> scenario, those not already monitoring logs or metrics.
>
> 3. Rejecting writes does not capture all places where the problem is
> occurring.  Only logging/metrics fully captures everywhere the problem is
> occurring.
>
> 4. This situation can be a consequence of other problems (C* or
> operational), not only range movements and the nature of gossip.
>
>
> (2) is the primary argument I see for setting rejection to default.  We
> need to inform the user that data mislocation can still be happening, and
> the only way to fully capture it is via monitoring of enabled
> logging/metrics.  We can also provide information about when range
> movements can cause this, and that nodes can be rejecting writes when they
> are in fact correct hence causing “over-eager unavailability”.  And
> furthermore, point people to TCM.
>
>
>
> On Thu, 12 Sept 2024 at 23:36, Jeremiah Jordan 
> wrote:
>
>> JD we know it had nothing to do with range movements and could/should
>>> have been prevented far simpler with operational correctness/checks.
>>>
>> “Be better” is not the answer.  Also I think you are confusing our
>> incidents, the out of range token issue we saw was not because of an
>> operational “oops” that could have been avoided.
>>
>> In the extreme, when no writes have gone to any of the replicas, what
>>> happened ? Either this was CL.*ONE, or it was an operational failure (not
>>> C* at fault).  If it's an operational fault, both the coordinator and the
>>> node can be wrong.  With CL.ONE, just the coordinator can be wrong and the
>>> problem still exists (and with rejection enabled the operator is now more
>>> likely to ignore it).
>>>
>>
>> If some node has a bad ring state it can easily send no writes to the
>> correct place, no need for CL ONE, with the current system behavior CL ALL
>> will be successful, with all the nodes sent a mutation happily accepting
>> and acking data they do not own.
>>
>> Yes, even with this patch if you are using CL ONE, if the coordinator has
>> a faulty ring state where no replica is “real” and it also decides that it
>> is one of the replicas, then you will have a successful write, even

RE: persistent connection among cluster nodes

2012-10-02 Thread Jeremiah Jordan

Cluster nodes don't talk on 9160.  Pretty sure they talk on "storage_port: 
7000" from the yaml file.

-Jeremiah

From: Niteesh kumar [nitees...@directi.com]
Sent: Tuesday, October 02, 2012 4:52 AM
To: dev@cassandra.apache.org
Subject: persistent connection among cluster nodes

while looking at netstat table i observed that my cluster nodes not
using persistent connection  to talk among themselves on port 9160 to
redirect request. I also observed that local write latency is around
30-40 microsecond, while its takes around .5 miliseconds if the chosen
node is not the node responsible for the key for 50K QPS. I think this
attributes to connection making time among servers as my servers are on
same rack.

does cassandra support  this feature to maintain persistent connection
among cluster nodes

Re: Write Timestamps

2012-10-24 Thread Jeremiah Jordan

How are you doing the write?  CQL or Thrift?  In thrift, the client specifies 
the timestamp, and you should always be seeing that as the timestamp.  In CQL, 
the CQL layer on the server adds the timestamp.  I am less familiar with the 
CQL code, maybe something screwy is going on there.  1.1.6 is out, do you see 
the same behavior there?

-Jeremiah

On Oct 24, 2012, at 3:57 PM, William Katsak  wrote:

> Here is what I am seeing on each replica node. This is after a write with 
> consistencylevel=ALL.
> 
> DEBUG [MutationStage:48] 2012-10-24 16:56:01,050 RowMutationVerbHandler.java 
> (line 56) RowMutation(keyspace='normal', key='746573746b65793337', 
> modifications=[ColumnFamily(data [636f6c:false:3@1351112161048000,])]) 
> applied.  Sending response to 770151@/172.16.18.112
> 
> DEBUG [MutationStage:59] 2012-10-24 16:56:02,889 RowMutationVerbHandler.java 
> (line 56) RowMutation(keyspace='normal', key='746573746b65793337', 
> modifications=[ColumnFamily(data [636f6c:false:3@1351112162785000,])]) 
> applied.  Sending response to 770152@/172.16.18.112
> 
> DEBUG [MutationStage:46] 2012-10-24 16:55:59,129 RowMutationVerbHandler.java 
> (line 56) RowMutation(keyspace='normal', key='746573746b65793337', 
> modifications=[ColumnFamily(data [636f6c:false:3@1351112159127000,])]) 
> applied.  Sending response to 770153@/172.16.18.112
> 
> Now, if I do a read of this data, I will always see a digest failure the 
> first time.
> 
> Thanks,
> Bill
> 
> 
> On 10/24/2012 04:09 PM, Jonathan Ellis wrote:
>> Timestamps are part of the ColumnFamily objects and their Columns,
>> contained in the RowMutation.
>> 
>> On Wed, Oct 24, 2012 at 2:57 PM, William Katsak  
>> wrote:
>>> Hello,
>>> 
>>> I sent this message a few days ago, but it seems to have gotten lost (I
>>> don't see it on the archive), so I am trying again.
>>> 
>>> -
>>> 
>>> I am using Cassandra for some academic-type work that involves some hacking
>>> of replica placement, etc. and I am observing a strange behavior (well,
>>> strange to me).
>>> 
>>> Using the stock 1.1.5 snapshot, when you do a write (even with
>>> consistencylevel = ALL), it seems that all nodes will get the data with a
>>> slightly different timestamp, and any read (even at ALL) with always have a
>>> digest failure on the first read (and subsequent reads until read repair
>>> catches up).
>>> 
>>> It would make sense to me that timestamps should be distributed with the
>>> RowMutation, not set on each node independently.
>>> 
>>> Is this the intended behavior? Is there a design reason for this that I
>>> should be aware of?
>>> 
>>> Thanks,
>>> Bill Katsak
>> 
>>

RE: Write Timestamps

2012-10-26 Thread Jeremiah Jordan

Sorry, should have said, "If you do not provide one, the CQL layer on the 
server adds the timestamp", unlike thrift where the timestamp is always client 
side.

Bill,
Glad 1.1.6 fixed your issue.

-Jeremiah

From: Eric Evans [eev...@acunu.com]
Sent: Thursday, October 25, 2012 4:09 PM
To: dev@cassandra.apache.org
Subject: Re: Write Timestamps

On Wed, Oct 24, 2012 at 9:13 PM, Jeremiah Jordan
 wrote:
> How are you doing the write?  CQL or Thrift?  In thrift, the client specifies 
> the timestamp, and you should always be seeing that as the timestamp.  In 
> CQL, the CQL layer on the server adds the timestamp.

For the record, you can supply a timestamp with CQL, same as you can
with Thrift.  For example:

INSERT INTO somedb.sometable (id, given, surname) VALUES ('pgriffith',
'Peter', 'Griffith') USING TIMESTAMP 42;

--
Eric Evans
Acunu | http://www.acunu.com | @acunu

Re: [VOTE CLOSED] Release Apache Cassandra 2.0.10

2014-08-13 Thread Jeremiah Jordan

Everything should be good now. Thanks!

> On Aug 11, 2014, at 9:34 AM, Sylvain Lebresne  wrote:
> 
> Ok, ok, closing this vote for now. We'll re-roll as soon as the pig stuff
> are fixed.
> 
> 
> On Fri, Aug 8, 2014 at 10:07 PM, Jeremiah D Jordan 
> wrote:
> 
>> I'm -1 on this until we get CqlRecordReader fixed (which will also fix the
>> newly added in 2.0.10 Pig CqlNativeStoarge):
>> https://issues.apache.org/jira/browse/CASSANDRA-7725
>> https://issues.apache.org/jira/browse/CASSANDRA-7726
>> 
>> Without those two things anyone using CqlStorage previously (which removed
>> with the removal of CPRR) who updates to using CqlNativeStoarge will have
>> broken scripts unless they are very very careful.
>> 
>> 
>> -Jeremiah
>> 
>>> On Aug 8, 2014, at 5:03 AM, Sylvain Lebresne  wrote:
>>> 
>>> I propose the following artifacts for release as 2.0.10.
>>> 
>>> sha1: cd37d07baf5394d9bac6763de4556249e9837bb0
>>> Git:
>> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/2.0.10-tentative
>>> Artifacts:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1023/org/apache/cassandra/apache-cassandra/2.0.10/
>>> Staging repository:
>> https://repository.apache.org/content/repositories/orgapachecassandra-1023/
>>> 
>>> The artifacts as well as the debian package are also available here:
>>> http://people.apache.org/~slebresne/
>>> 
>>> The vote will be open for 72 hours (longer if needed).
>>> 
>>> [1]: http://goo.gl/xzb9ky (CHANGES.txt)
>>> [2]: http://goo.gl/nBI37B (NEWS.txt)
>> 
>>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

2021-10-22 Thread Jeremiah Jordan

Hi Stefan,
That idea is not related to this CEP which is about the file formats of the
sstables, not file system access.  But you should take a look at the work
recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
to switch to using java.nio.file.Path for file access.  This should allow
the use of a file system provider to access files which could be the basis
for work to load the files from S3.

-Jeremiah

On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
stefan.mikloso...@instaclustr.com> wrote:

> One point I would like to add to this; I was already looking into how
> to extend this but what I saw in SSTableReader was that it is very
> much "file system oriented". There was not any possibility to actually
> hook something like that there. I think what importing does is that it
> will use SSTableReader / Writer stuff so I think that the modification
> of these classes to accommodate this idea would be necessary.
>
> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>  wrote:
> >
> > Hi Jacek,
> >
> > Thanks for taking the lead on this.
> >
> > There was importing of SSTables introduced in 4.0 via
> > StorageService#importNewSSTables. The "problem" with this is that
> > SSTables need to be physically located at disk so Cassandra can read
> > them. If a backup is taken and SSTables are uploaded to, for example,
> > S3 bucket, then upon restore, all these SSTables need to be downloaded
> > first and then imported. What about downloading them / importing them
> > directly from S3? Or any custom source for that matter? Importing of
> > SSTables is a very nice feature in 4.0, we do not need to copy / hard
> > link / refresh, it is all handled internally.
> >
> > I am not sure if your work is related to this idea but I would
> > appreciate it if this is pluggable as well for the sake of simplicity
> > and effectiveness as we would not have to download all sstables before
> > importing them.
> >
> > If it is not related, feel free to skip that completely and I guess I
> > would have to try to push that forward myself.
> >
> > Regards
> >
> >
> > On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
> >  wrote:
> > >
> > > I'd like to start a discussion about SSTable format API proposal
> (CEP-17)
> > >
> > > Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> > > CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> > >
> > > Thanks,
> > > Jacek
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

[DISCUSS] CEP-18: Improving Modularity

2021-10-22 Thread Jeremiah Jordan

Hi All,
As has been seen with the work already started in CEP-10, increasing the
modularity of our subsystems can improve their testability, and also the
ability to try new implementations without breaking things.

Our team has been working on doing this and CEP-18 has been created to
propose adding more modularity to a few different subsystems.
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-18%3A+Improving+Modularity

CASSANDRA-17044 has already been created for Schema Storage changes related
to this work and more JIRAs and PRs are to follow for the other subsystems
proposed in the CEP.

Thanks,
-Jeremiah Jordan

Re: Thanks to Nate for his service as PMC Chair

2022-07-11 Thread Jeremiah Jordan

Thanks Nate! And welcome to the role Mick!

> On Jul 11, 2022, at 7:54 AM, Paulo Motta  wrote:
> 
> 
> Hi,
> 
> I wanted to announce on behalf of the Apache Cassandra Project Management 
> Committee (PMC) that Nate McCall (zznate) has stepped down from the PMC chair 
> role. Thank you Nate for all the work you did as the PMC chair!
> 
> The Apache Cassandra PMC has nominated Mick Semb Wever (mck) as the new PMC 
> chair. Congratulations and good luck on the new role Mick!
> 
> The chair is an administrative position that interfaces with the Apache 
> Software Foundation Board, by submitting regular reports about project status 
> and health. Read more about the PMC chair role on Apache projects:
> - https://www.apache.org/foundation/how-it-works.html#pmc
> - https://www.apache.org/foundation/how-it-works.html#pmc-chair
> - https://www.apache.org/foundation/faq.html#why-are-PMC-chairs-officers
> 
> The PMC as a whole is the entity that oversees and leads the project and any 
> PMC member can be approached as a representative of the committee. A list of 
> Apache Cassandra PMC members can be found on: 
> https://cassandra.apache.org/_/community.html
> 
> Kind regards,
> 
> Paulo

Create then Delete KS without putting anything in it causes exception

2010-08-10 Thread Jeremiah Jordan

The following from python causes an exception on
apache-cassandra-2010-08-10_13-08-19-bin.tar.gz and a bunch of earlier
builds in the 0.7 line:
socket = TSocket.TSocket(host, 9160)
transport = TTransport.TFramedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
client = Cassandra.Client(protocol)
transport.open()
try:
client.describe_keyspace(dbName)
except NotFoundException, e:
keyspaceDef = KsDef(name=dbName,
 
strategy_class='org.apache.cassandra.locator.RackUnawareStrategy',
replication_factor=replicationFactor,
cf_defs=[])
client.set_keyspace('system')
client.system_add_keyspace(keyspaceDef)

try:
client.describe_keyspace(dbName)
client.set_keyspace('system')
client.system_drop_keyspace(dbName)
except NotFoundException, e:
pass

The system_drop_keyspace throws:
InvalidRequestException(why='java.util.concurrent.ExecutionException:
java.lang.NullPointerException')

If I put a system_add_column_family in the middle it doesn't crash.
This broke sometime after apache-cassandra-2010-07-06_13-27-21

-Jeremiah

____
Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

Trying to use the new column index feature

2010-08-27 Thread Jeremiah Jordan

I am trying to use the new column index feature.  I am using the nightly
from: apache-cassandra-2010-08-23_13-57-40-bin.tar.gz.
I created a column family:
colFam = CfDef('Activity',
   'Activity',
   'Standard',
   'Timestamp',
   'BytesType')
colFam.column_metadata = ColumnDef(name='Time',
 
validation_class='LongType',
 
index_type=IndexType.KEYS,
   index_name='TIME_INDEX')

self._client.system_add_column_family(colFam)

When I try to use batch_mutate to insert data I get:
Traceback (most recent call last):
  File "C:\GitStuff\olympus_beta\Python\Olympus\Common\DataHelper.py",
line 404, in InsertData
self._client.batch_mutate(mutation_map=dataToInsert,
consistency_level=self._DATA_CONSISTENCY_WRITE)
  File
"C:\GitStuff\olympus_beta\Python\Olympus\Common\cassandra\Cassandra.py",
line 786, in batch_mutate
self.recv_batch_mutate()
  File
"C:\GitStuff\olympus_beta\Python\Olympus\Common\cassandra\Cassandra.py",
line 803, in recv_batch_mutate
raise x
TApplicationException: Required field 'cf_def' was not present! Struct:
system_add_column_family_args(cf_def:null)

The system.log has this in it:
10/08/27 10:09:05 ERROR thrift.CustomTThreadPoolServer: Thrift error
occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProto
col.java:211)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2
487)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cu
stomTThreadPoolServer.java:167)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:908)
at java.lang.Thread.run(Thread.java:619)

My code and the server are both using API version 11.1.0, so that wasn't
the problem.

[defa...@unknown] connect localhost/9160
Connected to: "Test Cluster" on localhost/9160
[defa...@unknown] show api version
11.1.0

>>> import cassandra.constants
>>> print cassandra.constants.VERSION
11.1.0
>>>

Am I doing something wrong or is this a bug?
I looked at the code in test_thrift_server.py, but it uses insert not
batch_mutate to put data into the indexed column.


Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

RE: Trying to use the new column index feature

2010-08-27 Thread Jeremiah Jordan

If I remove the "colFam.column_metadata =" so that I don't have an indexed 
colum everything works fine.

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, August 27, 2010 11:29 AM
To: dev@cassandra.apache.org
Subject: Re: Trying to use the new column index feature

the TProtocolException means you're either (most likely) mixing
framed/unframed modes between client/server, or (less likely) creating
an obsolete TBinaryProtocol.

On Fri, Aug 27, 2010 at 10:13 AM, Jeremiah Jordan
 wrote:
> I am trying to use the new column index feature.  I am using the nightly
> from: apache-cassandra-2010-08-23_13-57-40-bin.tar.gz.
> I created a column family:
> colFam = CfDef('Activity',
>                       'Activity',
>                       'Standard',
>                       'Timestamp',
>                       'BytesType')
> colFam.column_metadata = ColumnDef(name='Time',
>
> validation_class='LongType',
>
> index_type=IndexType.KEYS,
>                                               index_name='TIME_INDEX')
>
> self._client.system_add_column_family(colFam)
>
> When I try to use batch_mutate to insert data I get:
> Traceback (most recent call last):
>  File "C:\GitStuff\olympus_beta\Python\Olympus\Common\DataHelper.py",
> line 404, in InsertData
>    self._client.batch_mutate(mutation_map=dataToInsert,
> consistency_level=self._DATA_CONSISTENCY_WRITE)
>  File
> "C:\GitStuff\olympus_beta\Python\Olympus\Common\cassandra\Cassandra.py",
> line 786, in batch_mutate
>    self.recv_batch_mutate()
>  File
> "C:\GitStuff\olympus_beta\Python\Olympus\Common\cassandra\Cassandra.py",
> line 803, in recv_batch_mutate
>    raise x
> TApplicationException: Required field 'cf_def' was not present! Struct:
> system_add_column_family_args(cf_def:null)
>
> The system.log has this in it:
> 10/08/27 10:09:05 ERROR thrift.CustomTThreadPoolServer: Thrift error
> occurred during processing of message.
> org.apache.thrift.protocol.TProtocolException: Missing version in
> readMessageBegin, old client?
>        at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProto
> col.java:211)
>        at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2
> 487)
>        at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(Cu
> stomTThreadPoolServer.java:167)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
> r.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
>        at java.lang.Thread.run(Thread.java:619)
>
> My code and the server are both using API version 11.1.0, so that wasn't
> the problem.
>
> [defa...@unknown] connect localhost/9160
> Connected to: "Test Cluster" on localhost/9160
> [defa...@unknown] show api version
> 11.1.0
>
>>>> import cassandra.constants
>>>> print cassandra.constants.VERSION
> 11.1.0
>>>>
>
> Am I doing something wrong or is this a bug?
> I looked at the code in test_thrift_server.py, but it uses insert not
> batch_mutate to put data into the indexed column.
>
> 
> Jeremiah Jordan
> Application Developer
> Morningstar, Inc.
>
> Morningstar. Illuminating investing worldwide.
>
> +1 312 696-6128 voice
> jeremiah.jor...@morningstar.com
>
> www.morningstar.com
>
> This e-mail contains privileged and confidential information and is
> intended only for the use of the person(s) named above. Any
> dissemination, distribution, or duplication of this communication
> without prior written consent from Morningstar is strictly prohibited.
> If you have received this message in error, please contact the sender
> immediately and delete the materials from any computer.
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

RE: [VOTE] 0.7.1 (attempt #2)

2011-02-02 Thread Jeremiah Jordan

So are 0.6.10 and 0.6.11 broken too or did only the 0.7.0 version of
https://issues.apache.org/jira/browse/CASSANDRA-1959 break stuff?

-Original Message-
From: Stu Hood [mailto:stuh...@gmail.com] 
Sent: Tuesday, February 01, 2011 7:19 PM
To: dev@cassandra.apache.org
Subject: Re: [VOTE] 0.7.1 (attempt #2)

-1
Kelvin was kind enough to confirm that ALL is broken in this release and
trunk. See https://issues.apache.org/jira/browse/CASSANDRA-2094

On Sun, Jan 30, 2011 at 5:24 PM, Stu Hood  wrote:

> -0
> Upgrading from 0.7.0 to these artifacts was fine, but the write ONE
read
> ALL distributed test times out in an unexpected location, with no
error
> messages on the server. The test looks valid, but is also failing in
> 0.8/trunk.
>
> I'll try and bisect it tomorrow from CASSANDRA-1964 (which passed
> consistently) to the breakage.
>
>
> On Sun, Jan 30, 2011 at 1:14 AM, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> I'm getting
>>
>> Bad Gateway
>>
>> The proxy server received an invalid response from an upstream
server.
>>
>> From repository.apache.org.
>>
>> So the Maven central artifacts will probably be staged tomorrow AM
(as
>> my wife will kill me if I "waste" Sunday working on this! and she'd
be
>> right too!) ;-)
>>
>> -Stephen
>>
>> On 28 January 2011 20:34, Stephen Connolly
>>  wrote:
>> > I'll drop and restage the artifacts for maven central when I get a
>> chance
>> >
>> > - Stephen
>> >
>> > ---
>> > Sent from my Android phone, so random spelling mistakes, random
nonsense
>> > words and other nonsense are a direct result of using swype to type
on
>> the
>> > screen
>> >
>> > On 28 Jan 2011 20:30, "Eric Evans"  wrote:
>> >>
>> >> CASSANDRA-2058[1] has landed in 0.7, so let's give this another
shot. I
>> >> propose the following for release as 0.7.1.
>> >>
>> >> SVN:
>> >>
>>
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r10648
45
>> >> 0.7.1 artifacts: http://people.apache.org/~eevans
>> >>
>> >> The vote will be open for 72 hours.
>> >>
>> >>
>> >> [1]: https://issues.apache.org/jira/browse/CASSANDRA-2058
>> >> [2]: http://goo.gl/5Tafg (CHANGES.txt)
>> >> [3]: http://goo.gl/PkreZ (NEWS.txt)
>> >>
>> >> --
>> >> Eric Evans
>> >> eev...@rackspace.com
>> >>
>> >>
>> >>
>> >
>>
>
>

RE: [VOTE] 0.7.3

2011-02-28 Thread Jeremiah Jordan

So do we need to scrub if we are updating from 0.6.8 to this?

-Jeremiah Jordan

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, February 25, 2011 3:22 PM
To: dev@cassandra.apache.org
Subject: Re: [VOTE] 0.7.3

Please test nodetool scrub -- see
https://issues.apache.org/jira/browse/CASSANDRA-2217.  Anyone who
upgraded from a 0.6 or 0.7 version before 0.7.1 should run this to fix
bloom filter de/serialization.

Appropriate note of caution: we do have one open bug against scrubbing
(https://issues.apache.org/jira/browse/CASSANDRA-2240) that we have
been unable to reproduce so far.  However, nodetool scrub snapshots
before doing anything else, so testing it should be quite safe: you
can always revert to the pre-scrub snapshot.

On Fri, Feb 25, 2011 at 3:09 PM, Eric Evans  wrote:
>
> Shall we?  I propose the following for release as 0.7.3.
>
> SVN:
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r1074693
> 0.7.3 artifacts: http://people.apache.org/~eevans
>
> The vote will be open for 72 hours.
>
> Thanks.
>
>
> [1]: http://goo.gl/0CykW (CHANGES.txt)
> [2]: http://goo.gl/9NNKv (NEWS.txt)
>
> --
> Eric Evans
> eev...@rackspace.com
>
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Can we get a 0.7.7 release?

2011-07-07 Thread Jeremiah Jordan

Been a while since 0.7.6 and a bunch of stuff has been fixed in JIRA,
including https://issues.apache.org/jira/browse/CASSANDRA-2654 which I
think may be affecting some of our servers.




Jeremiah Jordan
Application Developer
Morningstar, Inc.

Morningstar. Illuminating investing worldwide.

+1 312 696-6128 voice
jeremiah.jor...@morningstar.com

www.morningstar.com

This e-mail contains privileged and confidential information and is
intended only for the use of the person(s) named above. Any
dissemination, distribution, or duplication of this communication
without prior written consent from Morningstar is strictly prohibited.
If you have received this message in error, please contact the sender
immediately and delete the materials from any computer.

Re: Data retrieval inconsistent

2011-11-10 Thread Jeremiah Jordan

I am pretty sure the way you have K1 configured it will be placed across 
both DC's as if you had large ring.  If you want it only in DC1 you need 
to say DC1:1, DC2:0.
If you are writing and reading at ONE you are not guaranteed to get the 
data if RF > 1.  If RF = 2, and you write with ONE, you data could be 
written to server 1, and then read from server 2 before it gets over there.


The differing on server times will only really matter for TTL's.  Most 
everything else works off comparing user supplied times.


-Jeremiah

On 11/10/2011 02:27 PM, Subrahmanya Harve wrote:


I am facing an issue in 0.8.7 cluster -

- I have two clusters in two DCs (rather one cross dc cluster) and two 
keyspaces. But i have only configured one keyspace to replicate data 
to the other DC and the other keyspace to not replicate over to the 
other DC. Basically this is the way i ran the keyspace creation  -
create keyspace K1 with 
placement_strategy='org.apache.cassandra.locator.SimpleStrategy' and 
strategy_options = [{replication_factor:1}];
create keyspace K2 with 
placement_strategy='org.apache.cassandra.locator.NetworkTopologyStrategy' 
and strategy_options = [{DC1:2, DC2:2}];


I had to do this because i expect that K1 will get a large volume of 
data and i do not want this wired over to the other DC.


I am writing the data at CL=ONE and reading the data at CL=ONE. I am 
seeing an issue where sometimes i get the data and other times i do 
not see the data. Does anyone know what could be going on here?


A second larger question is  - i am migrating from 0.7.4 to 0.8.7 , i 
can see that there are large changes in the yaml file, but a specific 
question i had was - how do i configure disk_access_mode like it used 
to be in 0.7.4?


One observation i have made is that some nodes of the cross dc cluster 
are at different system times. This is something to fix but could this 
be why data is sometimes retrieved and other times not? Or is there 
some other thing to it?


Would appreciate a quick response.

Re: [VOTE] Release Apache Cassandra 1.0.3

2011-11-12 Thread Jeremiah Jordan

Someone should add it to pypi


On Nov 12, 2011, at 10:25 AM, Jonathan Ellis wrote:

> http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/issues/detail?id=6
> and 
> http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/issues/detail?id=7
> will address the dependency problem adequately for me.
> 
> On Sat, Nov 12, 2011 at 1:34 AM, Eric Evans  wrote:
>> On Fri, Nov 11, 2011 at 1:50 PM, Sylvain Lebresne  
>> wrote:
>>> I know we're releasing like crazy these days, but CASSANDRA-3446 and
>>> CASSANDRA-3482 are pretty bad and warrant a release. And then there stuffs
>>> like CASSANDRA-3481 and a few other improvements that makes it even more 
>>> worth
>>> it.
>>> 
>>> SVN: 
>>> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-1.0@1201039
>>> Artifacts: 
>>> https://repository.apache.org/content/repositories/orgapachecassandra-177/org/apache/cassandra/apache-cassandra/1.0.3/
>>> Staging repository:
>>> https://repository.apache.org/content/repositories/orgapachecassandra-177/
>>> 
>>> The artifacts as well as the debian package are also available here:
>>> http://people.apache.org/~slebresne/
>> 
>> Since this appears to be the first release that distributes the new
>> cqlsh, let me first say, it's pretty kick-ass.  I'm a fan.
>> 
>> That said, I have to ask, am I the only one that finds the
>> distribution of it a little strange?  By that I mean that it's
>> distributed with Cassandra and not the Python driver (which is needed
>> to run it).
>> 
>> I realize that there is an implicit checklist somewhere with a box
>> that says "comes with an interactive client", but if this succeeds in
>> ticking that box then it seems like a technicality.
>> 
>> +1 otherwise.
>> 
>> --
>> Eric Evans
>> Acunu | http://www.acunu.com | @acunu
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: [VOTE] Release Apache Cassandra 1.0.3

2011-11-12 Thread Jeremiah Jordan

Ah =).  Searching "cassandra-dbapi2" on pypi doesn't find it, guess they don't 
index the homepage links...


On Nov 12, 2011, at 10:45 AM, Jonathan Ellis wrote:

> http://pypi.python.org/pypi/cql/
> 
> On Sat, Nov 12, 2011 at 10:29 AM, Jeremiah Jordan
>  wrote:
>> Someone should add it to pypi
>> 
>> 
>> On Nov 12, 2011, at 10:25 AM, Jonathan Ellis wrote:
>> 
>>> http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/issues/detail?id=6
>>> and 
>>> http://code.google.com/a/apache-extras.org/p/cassandra-dbapi2/issues/detail?id=7
>>> will address the dependency problem adequately for me.
>>> 
>>> On Sat, Nov 12, 2011 at 1:34 AM, Eric Evans  wrote:
>>>> On Fri, Nov 11, 2011 at 1:50 PM, Sylvain Lebresne  
>>>> wrote:
>>>>> I know we're releasing like crazy these days, but CASSANDRA-3446 and
>>>>> CASSANDRA-3482 are pretty bad and warrant a release. And then there stuffs
>>>>> like CASSANDRA-3481 and a few other improvements that makes it even more 
>>>>> worth
>>>>> it.
>>>>> 
>>>>> SVN: 
>>>>> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-1.0@1201039
>>>>> Artifacts: 
>>>>> https://repository.apache.org/content/repositories/orgapachecassandra-177/org/apache/cassandra/apache-cassandra/1.0.3/
>>>>> Staging repository:
>>>>> https://repository.apache.org/content/repositories/orgapachecassandra-177/
>>>>> 
>>>>> The artifacts as well as the debian package are also available here:
>>>>> http://people.apache.org/~slebresne/
>>>> 
>>>> Since this appears to be the first release that distributes the new
>>>> cqlsh, let me first say, it's pretty kick-ass.  I'm a fan.
>>>> 
>>>> That said, I have to ask, am I the only one that finds the
>>>> distribution of it a little strange?  By that I mean that it's
>>>> distributed with Cassandra and not the Python driver (which is needed
>>>> to run it).
>>>> 
>>>> I realize that there is an implicit checklist somewhere with a box
>>>> that says "comes with an interactive client", but if this succeeds in
>>>> ticking that box then it seems like a technicality.
>>>> 
>>>> +1 otherwise.
>>>> 
>>>> --
>>>> Eric Evans
>>>> Acunu | http://www.acunu.com | @acunu
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: How is Cassandra being used?

2011-11-16 Thread Jeremiah Jordan

+1 for a separate jar (and a second download link that doesn't include this 
jar, though I would make the primary link include it with BIG BOLD PRINT saying 
it is in there)
+1 for a config option to turn off auto-post (defaulted on in the download that 
has the jar)
+1 for a nodetool command to dump it to a file for manual posting

I think this could be a good debugging tool as well.  Have a command to dump 
"here is what my cluster looks like" to a file, that could then be sent though 
email for others to be used help resolve issues with would be nice.  The 
current nodetool information commands have too much stuff that needs to be 
sanitized out before you can send it outside the firewall.

- Jeremiah

On Nov 16, 2011, at 7:16 PM, Jeremy Hanna wrote:

> Sounds like it would be best if it were in a separate jar for people?
> 
> On Nov 16, 2011, at 4:58 PM, Bill wrote:
> 
>>> Thoughts?
>>> 
>> 
>> We'll turn this off, and would possibly patch it out of the code. That's not 
>> to say it wouldn't be useful to others.
>> 
>> Bill
>> 
>> 
>> On 15/11/11 23:23, Jonathan Ellis wrote:
>>> I started a "users survey" thread over on the users list (replies are
>>> still trickling in), but as useful as that is, I'd like to get
>>> feedback that is more quantitative and with a broader base.  This will
>>> let us prioritize our development efforts to better address what
>>> people are actually using it for, with less guesswork.  For instance:
>>> we put a lot of effort into compression for 1.0.0; if it turned out
>>> that only 1% of 1.0.x users actually enable compression, then it means
>>> that we should spend less effort fine-tuning that moving forward, and
>>> use the energy elsewhere.
>>> 
>>> (Of course it could also mean that we did a terrible job getting the
>>> word out about new features and explaining how to use them, but either
>>> way, it would be good to know!)
>>> 
>>> I propose adding a basic cluster reporting feature to cassandra.yaml,
>>> enabled by default.  It would send anonymous information about your
>>> cluster to an apache.org VM.  Information like, number (but not names)
>>> of keyspaces and columnfamilies, ks-level options like compression, cf
>>> options like compaction strategy, data types (again, not names) of
>>> columns, average row size (or better: the histogram data), and average
>>> sstables per read.
>>> 
>>> Thoughts?
>>> 
>> 
>> 
>

Re: Ticket CASSANDRA-3578 - Multithreaded CommitLog

2011-12-07 Thread Jeremiah Jordan

Another option is to have multiple threads reading from the queue and 
writing to their own commit log files.  If you have multiple commit log 
directories with each having its own task writing to it, you can keep 
the "only sequential writes" optimization.  Multiple writers to one disk 
only makes sense if you are using a SSD for storage, other wise you 
don't only have have sequential writes, which would slow down the writing.


On 12/07/2011 10:56 AM, Piotr Kołaczkowski wrote:


Hello everyone,

As an interview task I've got to make CommitLog multithreaded. I'm new 
to Cassandra project and therefore, before I start modifying code, I 
have to make sure I understand what is going on there correctly.

Feel free to correct anything I got wrong or partially wrong.

1. The CommitLog singleton object is responsible for receiving 
RowMutation objects by its add method. The add method is thread-safe 
and is aimed to be called by many threads adding their RowMutations 
independently.


2. Each invocation of CommitLog#add  puts a new task onto the queue. 
This task is represented by LogRecordAdder callable object, which is 
responsible for actually calling the CommitLogSegment#write method for 
doing all the "hard work" of serializing the RowMutation object, 
calculating CRC and writing that to the memory mapped CommitLogSegment 
file buffer. The add method immediately returns a Future object, which 
can be waited for (if needed) - it will block until the row mutation 
is saved to the log file and (optionally) synced.


3. The queued tasks are processed one-by-one, sequentially by the 
appropriate ICommitLogExecutorService. This service also controls 
syncing the active memory mapped segments. There are two sync 
strategies available: periodic and batched. The periodic simply calls 
sync periodically by asynchronously putting appropriate sync task into 
the queue, inbetween the LogRecordAdder tasks. The LogRecordAdder 
tasks are "done" as soon as they are written to the log, so the caller 
*won't wait* for the sync. On the other hand, the batched strategy 
(BatchCommitLogExecutorService), performs the tasks in batches, each 
batch finished with an sync operation. The tasks are marked as done 
*after* the sync operation is finished. This deferred task marking  is 
achieved thanks to CheaterFutureTask class - allowing to run the task 
without immediately marking FutureTask as done. Nice. :)


4. The serialized size of the RowMutation object is calculated twice: 
once before submitting to the ExecutorService - to detect if it is not 
larger than the segment size, and then after being taken from the 
queue for execution - to check if it fits into the active 
CommitLogSegment, and if it doesn't, to activate a new 
CommitLogSegment. Looks to me like a point needing optimisation. I 
couldn't find any code for caching the serialized size to avoid doing 
it twice.


5. The serialization, CRC calculation and actual commit log writes are 
happening sequentially. The aim of this ticket is to make it parallel.


Questions:
1. What happens to the recovery, if the power goes off before the log 
has been synced, and it has been written partially (e.g. it is 
truncated in the middle of the RowMutation data)? Are incomplete 
RowMutation writes detected only by means of CRC (CommitLog around 
lines 237-240), or is there some other mechanism for it?


2. Is the CommitLog#add method allowed to do some heavier 
computations? What is the contract for it? Does it have to return 
immediately or can I move some code into it?


Solutions I consider (please comment):

1. Moving the serialized size calculation, serialization and CRC 
calculation totally before the executor service queue, so that these 
operations would be parallel, and performed once per RowMutation 
object. The calculated size / data array / CRC value would be appended 
to the task and put into the queue. Then copying that into the commit 
log would proceed sequentially - the task would contain only code for 
log writing. This is the safest and easiest solution, but also the 
least performant, because copying is still sequential and still might 
be a bottleneck. The logic of allocating new commit log segments and 
syncing remains unchanged.


2. Moving the serialized size calculation, serialization, CRC 
calculation *and commit log writing* before the executor service 
queue. This raises immediately some problems / questions:
a) The code for segment allocation needs to be changed, as it becomes 
multithreaded. It can be done using AtomicInteger.compareAndSet, so 
that each RowMutation gets its own, non-overlapping piece of commit 
log to write into.
b) What happens if there is not enough free space in the current 
active segment? Do we allow more active segments at once? Or do we 
restrict the parallelism to writing just into a single active segment 
(I don't like it, as it would be for certain less performant, because 
we would have to wait for finishing the current active

Re: 1.1 freeze approaching

2011-12-20 Thread Jeremiah Jordan

Unless you need the new features, you don't need to upgrade.  And the current 
version won't stop getting updates.  As mentioned in the thread where the 
project moved to a 4 month major version cycle, smaller changes between major 
versions means they will be more stable.  Even with the smaller cycle it takes 
a bit to get all the kinks out of new features (see all the 1.0.X releases), if 
the release cycle is longer, you have even longer to wait, like you did with 
0.7 where it took 5-6 months, for the new release to really be stable.

-Jeremiah

On Dec 20, 2011, at 12:45 AM, Radim Kolar wrote:

> 
>> Just a reminder that for us to meet our four-month major release
>> schedule (i.e., 1.1 = Feb 18),
> can you make release cycle slower? its better to have more new features and 
> do major upgrades less often. it saves time needed for testing and migrations.

Re: FYI Cassandra Git Repo on Apache servers

2012-01-11 Thread Jeremiah Jordan


Bring it up here: https://issues.apache.org/jira/browse/INFRA

On 01/10/2012 11:40 PM, Dave Brosius wrote:

Greetings,

   I wanted to mention this to folks who may be running into this 
issue. A user on IRC reporting that cloning the cassandra repo on the 
apache servers


http://git-wip-us.apache.org/repos/asf/cassandra.git

fails with error

error: RPC failed; result=22, HTTP code = 417

Obviously most folks do not have this issue.

417 is a server's response to a request that includes an expectation 
header specifying level of service that the server can not fulfill.


After some discover it looks as if the user in question was effected 
by squid  and that 
was injecting the expect header, based on the output from curl -v 
http://git-wip-us.apache.org/repos/asf/cassandra.git


Clones from github for him do not fail, which I infer to mean that the 
github servers are more modern than the apache ones.


Not sure if there is anything that can be done in the apache 
infrastructure to address this.


dave

RE: Queries on AuthN and AuthZ for multi tenant Cassandra

2012-02-02 Thread Jeremiah Jordan

10-15 KS should be fine.  The issue is when you want to have hundreds or
thousands of KS/CF.

-Jeremiah

-Original Message-
From: Subrahmanya Harve [mailto:subrahmanyaha...@gmail.com] 
Sent: Thursday, February 02, 2012 1:43 AM
To: dev@cassandra.apache.org
Subject: Re: Queries on AuthN and AuthZ for multi tenant Cassandra

Thanks for the response Aaron.

We do not anticipate more than 10-15 tenants on the cluster. Even if one
does decide to create one KS/tenant, there is the problem of variable
loads
on the KS's. I went through this link
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-mem
ory-and-disk-space-managementwhich
does promise better memory management. I did have two more questions
-
- Was the new memory management written taking into account a situation
of
many KS's? (In other words, did multi-tenancy influence the re-design of
memory management?)
- i know that users trying out multi-tenancy are generally recommending
not
to create many Ks's/CF's, but i am wondering if there is any
documentation
for why this happens or the details on the negative impact on
memory/performance?and are there are any performance benchmarks
available
for Cassandra 1.0 clusters with many KS's?


On Wed, Feb 1, 2012 at 12:11 PM, aaron morton
wrote:

> The existing authentication plug-in does not support row level
> authorization.
>
> You will need to add authentication to your API layer to ensure that a
> request from client X always has the client X key prefix. Or modify
> cassandra to provide row level authentication.
>
> The 1.x Memtable memory management is awesome, but I would still be
> hesitant about creating KS's and CF's at the request of an API client.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/02/2012, at 8:52 AM, Subrahmanya Harve wrote:
>
> > We are using Cassandra 0.8.7 and building a multi-tenant cassandra
> platform
> > where we have a common KS and common CFs for all tenants. By using
> Hector's
> > virtual keyspaces, we are able to add modify rowkeys to have a
tenant
> > specific id. (Note that we do not allow tenants to modify/create
KS/CF.
> We
> > just allow tenants to write and read data) However we are in the
process
> of
> > adding authentication and authorization on top of this platform such
that
> > no tenant should be able to retrieve data belonging to any other
tenant.
> >
> > By configuring Cassandra for security using the documentation here -
> > http://www.datastax.com/docs/0.8/configuration/authentication , we
were
> > able to apply the security constraints on the common keyspace and
common
> > CFs. However this does not prevent a tenant from retrieving data
> belonging
> > to another tenant. For this to happen, we would need to have
separate CFs
> > and/or keyspaces for each tenant.
> > Looking for more information on the topic here
> >
>
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Re-Mult
i-tenancy-and-authentication-and-authorization-td5935230.htmland
> > other places, it looks like the recommendation is "not" to create
> > separate CFs and KSs for every tenant as this would have impacts on
> > Memtables and other memory issues. Does this recommendation still
hold
> > good?
> > With jiras like
> > https://issues.apache.org/jira/browse/CASSANDRA-2006resolved, does
it
> > mean we can now create multiple (but limited) CFs and KSs?
> > More generally, how do we prevent a tenant from
intentional/accidental
> data
> > manipulation of data owned by another tenant? (given that all
tenants
> will
> > provide the right credentials)
>
>

RE: RFC: Cassandra Virtual Nodes

2012-03-20 Thread Jeremiah Jordan

So taking a step back, if we want "vnodes" why can't we just give every node 
100 tokens instead of only one?  Seems to me this would have less impact on the 
rest of the code.  It would just look like you had a 500 node cluster, instead 
of a 5 node cluster.  Your replication strategy would have to know about the 
physical machines so that data gets replicated right, but there is already some 
concept of this with the data center aware and rack aware stuff.

>From what I see I think you could get most of the benefits of vnodes by 
>implementing a new Placement Strategy that did something like this, and you 
>wouldn't have to touch (and maybe break) code in other places.

Am I crazy? Naive?

Once you had this setup, you can start to implement the vnode like stuff on top 
of it.  Like bootstrapping nodes in one token at a time, and taking them on 
from the whole cluster, not just your neighbor. etc. etc.

-Jeremiah Jordan


From: Rick Branson [rbran...@datastax.com]
Sent: Monday, March 19, 2012 5:16 PM
To: dev@cassandra.apache.org
Subject: Re: RFC: Cassandra Virtual Nodes

I think if we could go back and rebuild Cassandra from scratch, vnodes
would likely be implemented from the beginning. However, I'm concerned that
implementing them now could be a big distraction from more productive uses
of all of our time and introduce major potential stability issues into what
is becoming a business critical piece of infrastructure for many people.
However, instead of just complaining and pedantry, I'd like to offer a
feasible alternative:

Has there been consideration given to the idea of a supporting a single
token range for a node?

While not theoretically as capable as vnodes, it seems to me to be more
practical as it would have a significantly lower impact on the codebase and
provides a much clearer migration path. It also seems to solve a majority
of complaints regarding operational issues with Cassandra clusters.

Each node would have a lower and an upper token, which would form a range
that would be actively distributed via gossip. Read and replication
requests would only be routed to a replica when the key of these operations
matched the replica's token range in the gossip tables. Each node would
locally store it's own current active token range as well as a target token
range it's "moving" towards.

As a new node undergoes bootstrap, the bounds would be gradually expanded
to allow it to handle requests for a wider range of the keyspace as it
moves towards it's target token range. This idea boils down to a move from
hard cutovers to smoother operations by gradually adjusting active token
ranges over a period of time. It would apply to token change operations
(nodetool 'move' and 'removetoken') as well.

Failure during streaming could be recovered at the bounds instead of
restarting the whole process as the active bounds would effectively track
the progress for bootstrap & target token changes. Implicitly these
operations would be throttled to some degree. Node repair (AES) could also
be modified using the same overall ideas provide a more gradual impact on
the cluster overall similar as the ideas given in CASSANDRA-3721.

While this doesn't spread the load over the cluster for these operations
evenly like vnodes does, this is likely an issue that could be worked
around by performing concurrent (throttled) bootstrap & node repair (AES)
operations. It does allow some kind of "active" load balancing, but clearly
this is not as flexible or as useful as vnodes, but you should be using
RandomPartitioner or sort-of-randomized keys with OPP right? ;)

As a side note: vnodes fail to provide solutions to node-based limitations
that seem to me to cause a substantial portion of operational issues such
as impact of node restarts / upgrades, GC and compaction induced latency. I
think some progress could be made here by allowing a "pack" of independent
Cassandra nodes to be ran on a single host; somewhat (but nowhere near
entirely) similar to a pre-fork model used by some UNIX-based servers.

Input?

--
Rick Branson
DataStax

Re: Document storage

2012-03-28 Thread Jeremiah Jordan

Sounds interesting to me.  I looked into adding protocol buffer support at one 
point, and it didn't look like it would be too much work.  The tricky part was 
I also wanted to add indexing support for attributes of the inserted protocol 
buffers.  That looked a little trickier, but still not impossible.  Though 
other stuff came up and I never got around to actually writing any code.
JSON support would be nice, especially if you figured out how to get built in 
indexing of the attributes inside the JSON to work =).

-Jeremiah

On Mar 28, 2012, at 10:58 AM, Ben McCann wrote:

> Hi,
> 
> I was wondering if it would be interesting to add some type of
> document-oriented data type.
> 
> I've found it somewhat awkward to store document-oriented data in Cassandra
> today.  I can make a JSON/Protobuf/Thrift, serialize it, and store it, but
> Cassandra cannot differentiate it from any other string or byte array.
> However, if my column validation_class could be a JsonType that would
> allow tools to potentially do more interesting introspection on the column
> value.  E.g. bug 3647
> calls for
> supporting arbitrarily nested "documents" in CQL.  Running a
> query against the JSON column in Pig is possible as well, but again in this
> use case it would be helpful to be able to encode in column metadata that
> the column is stored as JSON.  For debugging, running nightly reports, etc.
> it would be quite useful compared to the opaque string and byte array types
> we have today.  JSON is appealing because it would be easy to implement.
> Something like Thrift or Protocol Buffers would actually be interesting
> since they would be more space efficient.  However, they would also be a
> bit more difficult to implement because of the extra typing information
> they provide.  I'm hoping with Cassandra 1.0's addition of compression that
> storing JSON is not too inefficient.
> 
> Would there be interest in adding a JsonType?  I could look at putting a
> patch together.
> 
> Thanks,
> Ben

RE: Document storage

2012-03-29 Thread Jeremiah Jordan

Its not clear what 3647 actually is, there is no code attached, and no real 
example in it.

Aside from that, the reason this would be useful to me (if we could get 
indexing of attributes working), is that I already have my data in 
JSON/Thrift/ProtoBuff, depending how large the data is, it isn't trivial to 
break it up into columns to insert, and re-assemble into columns to read.  
Also, until we get multiple slice range reads, I can't read two different 
structures out of one row without getting all the other stuff between them, 
unless there are only two columns and I read them using column names not slices.

As it is right now I have to maintain custom indexes on all my attributes to be 
able to put ProtoBuff's into columns, and get some searching on them.  It would 
be nice if I could drop all my custom indexing code and just tell Cassandra, 
hey, index column.attr1.subattr2.

-Jeremiah

From: Jake Luciani [jak...@gmail.com]
Sent: Thursday, March 29, 2012 7:44 AM
To: dev@cassandra.apache.org
Subject: Re: Document storage

Is there a reason you would prefer a JSONType over CASSANDRA-3647?  It
would seem the only thing a JSON type offers you is validation.  3647 takes
it much further by deconstructing a JSON document using composite columns
to flatten the document out, with the ability to access and update portions
of the document (as well as reconstruct it).

On Wed, Mar 28, 2012 at 11:58 AM, Ben McCann  wrote:

> Hi,
>
> I was wondering if it would be interesting to add some type of
> document-oriented data type.
>
> I've found it somewhat awkward to store document-oriented data in Cassandra
> today.  I can make a JSON/Protobuf/Thrift, serialize it, and store it, but
> Cassandra cannot differentiate it from any other string or byte array.
>  However, if my column validation_class could be a JsonType that would
> allow tools to potentially do more interesting introspection on the column
> value.  E.g. bug 3647
> calls for
> supporting arbitrarily nested "documents" in CQL.  Running a
> query against the JSON column in Pig is possible as well, but again in this
> use case it would be helpful to be able to encode in column metadata that
> the column is stored as JSON.  For debugging, running nightly reports, etc.
> it would be quite useful compared to the opaque string and byte array types
> we have today.  JSON is appealing because it would be easy to implement.
>  Something like Thrift or Protocol Buffers would actually be interesting
> since they would be more space efficient.  However, they would also be a
> bit more difficult to implement because of the extra typing information
> they provide.  I'm hoping with Cassandra 1.0's addition of compression that
> storing JSON is not too inefficient.
>
> Would there be interest in adding a JsonType?  I could look at putting a
> patch together.
>
> Thanks,
> Ben
>

--
http://twitter.com/tjake

RE: Document storage

2012-03-29 Thread Jeremiah Jordan

Its not clear what 3647 actually is, there is no code attached, and no real 
example in it.

Aside from that, the reason this would be useful to me (if we could get 
indexing of attributes working), is that I already have my data in 
JSON/Thrift/ProtoBuff, depending how large the data is, it isn't trivial to 
break it up into columns to insert, and re-assemble into columns to read.  
Also, until we get multiple slice range reads, I can't read two different 
structures out of one row without getting all the other stuff between them, 
unless there are only two columns and I read them using column names not slices.

As it is right now I have to maintain custom indexes on all my attributes to be 
able to put ProtoBuff into

From: Jake Luciani [jak...@gmail.com]
Sent: Thursday, March 29, 2012 7:44 AM
To: dev@cassandra.apache.org
Subject: Re: Document storage

Is there a reason you would prefer a JSONType over CASSANDRA-3647?  It
would seem the only thing a JSON type offers you is validation.  3647 takes
it much further by deconstructing a JSON document using composite columns
to flatten the document out, with the ability to access and update portions
of the document (as well as reconstruct it).

On Wed, Mar 28, 2012 at 11:58 AM, Ben McCann  wrote:

> Hi,
>
> I was wondering if it would be interesting to add some type of
> document-oriented data type.
>
> I've found it somewhat awkward to store document-oriented data in Cassandra
> today.  I can make a JSON/Protobuf/Thrift, serialize it, and store it, but
> Cassandra cannot differentiate it from any other string or byte array.
>  However, if my column validation_class could be a JsonType that would
> allow tools to potentially do more interesting introspection on the column
> value.  E.g. bug 3647
> calls for
> supporting arbitrarily nested "documents" in CQL.  Running a
> query against the JSON column in Pig is possible as well, but again in this
> use case it would be helpful to be able to encode in column metadata that
> the column is stored as JSON.  For debugging, running nightly reports, etc.
> it would be quite useful compared to the opaque string and byte array types
> we have today.  JSON is appealing because it would be easy to implement.
>  Something like Thrift or Protocol Buffers would actually be interesting
> since they would be more space efficient.  However, they would also be a
> bit more difficult to implement because of the extra typing information
> they provide.  I'm hoping with Cassandra 1.0's addition of compression that
> storing JSON is not too inefficient.
>
> Would there be interest in adding a JsonType?  I could look at putting a
> patch together.
>
> Thanks,
> Ben
>

--
http://twitter.com/tjake

RE: Document storage

2012-03-29 Thread Jeremiah Jordan

But it isn't special case logic.  The current AbstractType and Indexing of 
Abstract types for the most part would already support this.  Someone just has 
to write the code for JSONType or ProtoBuffType.

The problem isn't writing the code to break objects up, the problem is 
encode/decode time.  Encode/decode to thrift is already a significant portion 
of the time line in writing data, adding an object to column encode/decode on 
top of that makes it even longer.  For a read heavy load that wants the 
JSON/Proto as the thing to be served to clients, an increase in the write time 
line to parse/index the blob is probably acceptable, so that you don't have to 
pay the re-assemble penalty every time you hit the database for that object.

But, once we get multi range slicing, for the average case I think the break it 
up into multiple columns approach will be best for most people.  That is the 
other problem I have with doing the break into columns thing right now.  I have 
to either use Super Columns and not be able to index, so why did I break them 
up?  Or I can't get multiple objects at once, with out pulling a huge slice 
from o1 start to o5 end and then throwing away the majority of the data I 
pulled back that doesn't belong to o1 and o5

-Jeremiah

From: Jonathan Ellis [jbel...@gmail.com]
Sent: Thursday, March 29, 2012 11:23 AM
To: dev@cassandra.apache.org
Subject: Re: Document storage

On Thu, Mar 29, 2012 at 9:57 AM, Jeremiah Jordan
 wrote:
> Its not clear what 3647 actually is, there is no code attached, and no real 
> example in it.
>
> Aside from that, the reason this would be useful to me (if we could get 
> indexing of attributes working), is that I already have my data in 
> JSON/Thrift/ProtoBuff, depending how large the data is, it isn't trivial to 
> break it up into columns to insert, and re-assemble into columns to read.

I don't understand the problem.  Assuming Cassandra support for maps
and lists, I could write a Python module that takes json (or thrift,
or protobuf) objects and splits them into Cassandra rows by fields in
a couple hours.  I'm pretty sure this is essentially what Brian's REST
api for Cassandra does now.

I think this is a much better approach because that gives you the
ability to update or retrieve just parts of objects efficiently,
rather than making column values just blobs with a bunch of special
case logic to introspect them.  Which feels like a big step backwards
to me.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: [Discuss] Repair inside C*

2024-10-21 Thread Jeremiah Jordan

 I love the idea of a repair service being there by default for an install
of C*.  My main concern here is that it is putting more services into the
main database process.  I actually think we should be looking at how we can
move things out of the database process.  The C* process being a giant
monolith has always been a pain point.  Is there anyway it makes sense for
this to be an external process rather than a new thread pool inside the C*
process?

-Jeremiah Jordan

On Oct 18, 2024 at 2:58:15 PM, Mick Semb Wever  wrote:

>
> This is looking strong, thanks Jaydeep.
>
> I would suggest folk take a look at the design doc and the PR in the CEP.
> A lot is there (that I have completely missed).
>
> I would especially ask all authors of prior art (Reaper, DSE nodesync,
> ecchronos)  to take a final review of the proposal
>
> Jaydeep, can we ask for a two week window while we reach out to these
> people ?  There's a lot of prior art in this space, and it feels like we're
> in a good place now where it's clear this has legs and we can use that to
> bring folk in and make sure there's no remaining blindspots.
>
>
> On Fri, 18 Oct 2024 at 01:40, Jaydeep Chovatia 
> wrote:
>
>> Sorry, there is a typo in the CEP-37 link; here is the correct link
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution>
>>
>>
>> On Thu, Oct 17, 2024 at 4:36 PM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> First, thank you for your patience while we strengthened the CEP-37.
>>>
>>>
>>> Over the last eight months, Chris Lohfink, Andy Tolbert, Josh McKenzie,
>>> Dinesh Joshi, Kristijonas Zalys, and I have done tons of work (online
>>> discussions/a dedicated Slack channel #cassandra-repair-scheduling-cep37)
>>> to come up with the best possible design that not only significantly
>>> simplifies repair operations but also includes the most common features
>>> that everyone will benefit from running at Scale.
>>>
>>> For example,
>>>
>>>-
>>>
>>>Apache Cassandra must be capable of running multiple repair types,
>>>such as Full, Incremental, Paxos, and Preview - so the framework should 
>>> be
>>>easily extendable with no additional overhead from the operator’s point 
>>> of
>>>view.
>>>-
>>>
>>>An easy way to extend the token-split calculation algorithm with a
>>>default implementation should exist.
>>>-
>>>
>>>Running incremental repair reliably at Scale is pretty challenging,
>>>so we need to place safeguards, such as migration/rollback w/o restart 
>>> and
>>>stopping incremental repair automatically if the disk is about to get 
>>> full.
>>>
>>> We are glad to inform you that CEP-37 (i.e., Repair inside Cassandra) is
>>> now officially ready for review after multiple rounds of design, testing,
>>> code reviews, documentation reviews, and, more importantly, validation that
>>> it runs at Scale!
>>>
>>>
>>> Some facts about CEP-37.
>>>
>>>-
>>>
>>>Multiple members have verified all aspects of CEP-37 numerous times.
>>>-
>>>
>>>The design proposed in CEP-37 has been thoroughly tried and tested
>>>on an immense scale (hundreds of unique Cassandra clusters, tens of
>>>thousands of Cassandra nodes, with tens of millions of QPS) on top of 4.1
>>>open-source for more than five years; please see more details here
>>>
>>> <https://www.uber.com/en-US/blog/how-uber-optimized-cassandra-operations-at-scale/>
>>>.
>>>-
>>>
>>>The following presentation
>>>
>>> <https://docs.google.com/presentation/d/1Zilww9c7LihHULk_ckErI2s4XbObxjWknKqRtbvHyZc/edit#slide=id.g30a4fd4fcf7_0_13>
>>>highlights the rigorous applied to CEP-37, which was given during last
>>>week’s Apache Cassandra Bay Area Meetup
>>><https://www.meetup.com/apache-cassandra-bay-area/events/303469006/>,
>>>
>>>
>>> Since things are massively overhauled, we believe it is almost ready for
>>> a final pass pre-VOTE. We would like you to please review the CEP-37
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution)>
>>> and the associated detailed design doc
>>> <https://docs.google.com/document/d/1CJWxjEi-m

Re: [DISCUSS] Chronicle Queue's development model and a hypothetical replacement of the library

2024-09-18 Thread Jeremiah Jordan

> When it comes to alternatives, what about logback + slf4j? It has
> appenders where we want, it is sync / async, we can code some nio appender
> too I guess, it logs it as text into a file so we do not need any special
> tooling to review that. For tailing which Chronicle also offers, I guess
> "tail -f that.log" just does the job? logback even rolls the files after
> they are big enough so it rolls the files the same way after some
> configured period / size as Chronicle does (It even compresses the logs).
>

Yes it was considered.  The whole point was to have a binary log because
serialization to/from (remember replay is part off this) text explodes the
size on disk and in memory as well as the processing time required and does
not meet the timing requirements of fqltool.

-Jeremiah

Re: Status of CEP-1

2024-10-01 Thread Jeremiah Jordan

 I don’t really have an opinion on re-writing the existing one vs closing
that and making a new one.
But I do think we should have some CEP describing the "1.0 shippable
version" of the side car that is being proposed, then it can have a VOTE
thread, and there will be no issues voting the release meets the CEP once
it is ready.

-Jeremiah

On Oct 1, 2024 at 7:58:41 AM, Josh McKenzie  wrote:

> CEP-1 is still completely relevant and we could send an update
>
> CEP-1 feels really fat compared to all our other CEP's. When you need a
> table to enumerate all the subsets of things you're going to do with
> something so you can keep track of progress... it might be too large. :D
>
> If we think we can navigate that, I definitely won't stand in the way. But
> given that the people actively working on it aren't the original authors
> and the shepherd's inactive, ISTM a reboot would be cleaner.
>
> On Mon, Sep 30, 2024, at 8:36 PM, Dinesh Joshi wrote:
>
> CEP-1 is still completely relevant and we could send an update but as it
> stands right now we’ve made a ton of progress and would like to focus on
> getting to a release so it’s real for the community.
>
> On Mon, Sep 30, 2024 at 5:31 PM Patrick McFadin 
> wrote:
>
> There are two easy choices.
>
> 1 - Re-furbish CEP-1 and start a [DISCUSS] thread
> 2 - Close out CEP-1 and Propose something fresh and start a [DISCUSS]
> Thread on that.
>
> Do you think there is enough in CEP-1 to keep moving with or is it
> completely wrong?
>
> Patrick
>
> On Mon, Sep 30, 2024 at 4:53 PM Francisco Guerrero 
> wrote:
>
> Hi folks,
>
> I feel I need to update the status of CEP-1 as it currently stands.
> For context, the Cassandra Sidecar project has had a steady flow of
> contributions in the past couple of years. And there is a steady stream
> of upcoming contributions, i.e live migration (CEP-40), CDC (CEP-44),
> and many others. However, I believe we need to address one issue
> with CEP-1; and that is its scope.
> The scope of CEP-1 is too broad, and I would like to propose either
> closing on CEP-1 or rescoping it. We have a Sidecar now, it's part of
> the foundation, and AFAIK we've pretty much satisfied the 2 goals of
> CEP-1 which are listed as "extensible and passes the curl test" and
> "provides basic but essential and useful functionality".
> CEP-1 was discussed and consensus was achieved in 2018 after
> a lot of discussion[4]. CEP-1 contributed to the foundation of the CEP
> process. Several JIRAs have been opened and active contribution is
> happening in the subproject.
> We are getting close to proposing the first release of Sidecar, pending
> some trivial fixes needed in the configuration and build
> processes[1][2][3];
> as well as CASSANDRASC-141[5] which will bring authn/authz into Sidecar.
> Once
> we close on CASSANDRASC-141, Sidecar will be ready for the 1.0 release.
> Any new major feature to Sidecar would go through the regular CEP process.
> Cassandra’s Sidecar usage is not restricted to the Analytics library,
> however
> it does support this use case at the moment. I will not touch on vnode
> support in Cassandra Analytics as it deserves its own separate discussion.
> We're excited to invite you to a talk on Cassandra Sidecar at the Community
> Over Code next week. Join us as we explore the current features and share
> what’s on the horizon for Sidecar.
>
> Looking forward to hearing your thoughts on this proposal.
> Best,
> ⁃ Francisco
> [1] https://issues.apache.org/jira/browse/CASSANDRASC-120
> [2] https://issues.apache.org/jira/browse/CASSANDRASC-121
> [3] https://issues.apache.org/jira/browse/CASSANDRASC-122
> [4] https://lists.apache.org/thread/xyg8n5hkt7xrfqv48k91tx1jwp0pvcpw
> [5] https://issues.apache.org/jira/browse/CASSANDRASC-141
>
>
>
>

Re: [DISCUSS] Secondary Indexes and Single-Partition Reads

2024-10-01 Thread Jeremiah Jordan

 Did we add new metrics for index queries?  The only issue I see is that
this change will mix index queries into the regular read metrics, where
before they were in the range metrics, so maybe some changes to metrics
should go with it.  But I think this is a good change over all.

On Oct 1, 2024 at 1:51:10 PM, Jon Haddad  wrote:

> This seems like it's strictly a win.  Doesn't sound to me like a flag is
> needed.
>
> On Tue, Oct 1, 2024 at 2:44 PM Caleb Rackliffe 
> wrote:
>
>> > (Higher rate of mismatches requiring a second full read? Why would 2i
>> be more likely?)
>>
>> Right, I don't see any reason they should be more likely to actuate
>> read-repair than slice queries are today...
>>
>> Didn't mention this above, but I'd obviously be open to having a system
>> property that switches this behavior.
>>
>> On Tue, Oct 1, 2024 at 12:43 PM Jeff Jirsa  wrote:
>>
>>>
>>>
>>> > On Oct 1, 2024, at 10:28 AM, Caleb Rackliffe 
>>> wrote:
>>> >
>>> > Hello fellow secondary index enjoyers!
>>> >
>>> > If you're familiar with index queries, you probably know that they are
>>> treated as range reads no matter what. This is true even if the user query
>>> restricts results to a single partition. This means that they bypass the
>>> digest read process that normal single-partition reads do.
>>>
>>> TIL.
>>>
>>> >
>>> > While I don't think this is something that we need to consider for
>>> 5.0, I would be very interested in the next major release being able to use
>>> proper single-partition reads for partition-restricted index queries,
>>> allowing them to take advantage of digest reads. (If single partition slice
>>> queries do it, why not index queries?)
>>>
>>> This seems like an obvious yes, so reverse the question - is there any
>>> reason why we WOULDNT want to do this?
>>>
>>> (Higher rate of mismatches requiring a second full read? Why would 2i be
>>> more likely?)
>>>
>>>

Re: [DISCUSS] CEP-31 negotiated authentication

2024-12-04 Thread Jeremiah Jordan

I think you are talking about the SASL handshake?  The authenticator can 
override the SASL handler for the connections. The negotiating authenticator 
just needs to implement that override?  You can then implement the flow you 
mentioned.

At DataStax we do basically exactly that to support negotiated authentication. 
As you suggest the “default” mechanism is also specified in the config for the 
negotiating authenticator such that you can have it fall back to the “current” 
mechanism for existing clients that don’t try to negotiate.  Which means it can 
be seamlessly enabled.

-Jeremiah

> On Dec 4, 2024, at 5:27 PM, Joel Shepherd  wrote:
> 
> A negotiating authenticator is appealing, but I'm concerned that it doesn't 
> have a good migration story. If a client has not been configured with a 
> "negotiating provider" before it attempts to connect to a node with a 
> negotiating authenticator, the results will be unpredictable. Today, the 
> AUTHENTICATE message names the node-selected authenticator for the client, 
> but there is no requirement that the client validate that it can work with 
> the authenticator before its own auth provider/authenticator generates its 
> initial AUTH_RESPONSE. From what I can tell, most don't validate. There is 
> not a way today at the protocol level for the client to tell the node "I 
> can't work with the authenticator you've specified: let's try a different 
> one." The node can't communicate that to the client either. Once a node 
> switches over to a negotiating authenticator, many clients will assume that 
> they can authenticate using the same mechanism that they always have. This 
> would require the node's authenticator to heuristically determine how the 
> client is trying to authenticate in order to continue the handshake. That 
> seems unreliable, but I believe without that then switching nodes to a 
> negotiating authenticator will either require downtime (to switch the clients 
> as well) or result in client authentication failures.
> 
> If the negotiation is done up-front via the OPTIONS, SUPPORTED and START 
> messages, I believe it can be done in a way that enables clients and/or nodes 
> to authenticate as they do today if needed. For example, clients that do not 
> initiate connections with an OPTIONS message and that do not select an auth 
> method via their START message can be assumed to not support negotiation, and 
> the node can use today's existing authentication mechanism with the client. 
> Similarly, nodes that do not specify available authenticators through their 
> SUPPORTED response can be assumed to not support negotiation and the client 
> can use today's authentication mechanism without negotiation.
> 
> Given that, I don't believe introducing a new negotiating authenticator is 
> the best path forward.
> 
> If it would help, I can provide documentation on the proposed protocol-based 
> mechanism: it may be unclear here.
> 
> Thanks -- Joel.
> 
>> On 12/3/2024 5:34 PM, J. D. Jordan wrote:
>> I think you can implement this as a single authenticator that has separate 
>> configuration of the supported mechanisms. So the single authenticator 
>> maintained is the “negotiating authenticator” which can proxy off to which 
>> ever other mechanisms you want.
>> 
 On Dec 3, 2024, at 6:37 PM, Joel Shepherd  wrote:
>>> 
>>> I'm interested, at least in a more narrowly-scoped subset of CEP-31: 
>>> authentication negotiation only, configured via YAML (not dynamically), 
>>> with CQL integration, proxy authorization, multiple role managers and new 
>>> authn mechanisms out of scope.
>>> 
>>> I've started working through Derek's proposal in 
>>> https://issues.apache.org/jira/browse/CASSANDRA-11471 , to use the 
>>> OPTIONS/SUPPORTED exchange to start the negotiation, and continue it by 
>>> extending STARTUP to optionally include the client's preferred 
>>> authentication mechanism. I believe this can be done in a way that is 
>>> compatible (i.e., maintains the status quo) for clients and/or nodes that 
>>> aren't negotiation-aware. Having such a mechanism in place would make it 
>>> much safer to roll out new authenticators, which is something else I'm 
>>> interested in.
>>> 
>>> This is looking like a more invasive change on the Cassandra core side, 
>>> however. If I'm reading things correctly, the DatabaseDescriptor maintains 
>>> a single authenticator across all clients. Negotiation would be much more 
>>> useful if different clients could use different node-supported 
>>> authentication mechanisms: e.g., automated clients could use mTLS and apps 
>>> for humans could use Kerberos, both against a single node. This means that 
>>> authenticator needs to be pushed down to connection- or session-level, 
>>> which will affect everything from the daemon startup code to the 
>>> authentication workflow. That's not a reason not to do it, but it is a 
>>> little invasive. Maybe I'm overlooking a better way.
>>> 
>>> If time allows, I'll put to

Re: [DISCUSS] NOT_NULL constraint vs STRICTLY_NOT_NULL constraint

2025-02-10 Thread Jeremiah Jordan

 Having thought about this in the past, some options that have come up in
those discussions  were:

   1. Constraints forcing users to always specify a value for a given
   column or all columns.  Only allow NOT NULL for columns with such a
   constraint applied.
   2. Similar to the above but only requiring that for INSERT, letting
   UPDATE be “user beware”.
   3. Forcing a read before write for all cases where it is not specified.
  1. You have to consider some problem cases here with optimizing
  this.  If you want to only do the check on the replica, you need to
  correctly handle the case where the value only exists on some
replicas and
  not others.


I do think any implementation of NOT NULL that has a way to let NULL in is
bad.  So I would be -1 on the proposal here that lets through INSERTs that
don’t specify the column (also I would be -1 on the option 2 above, but I
included it as something I have discussed with others in the past).

-Jeremiah

On Feb 10, 2025 at 9:27:52 AM, Bernardo Botella <
conta...@bernardobotella.com> wrote:

> I will create a Jira to keep track of that “NO VERIFY” suggestion. For
> this thread, I’d like to stick to the actual proposal for both NOT_NULL and
> STRICTLY_NOT_NULL constraints Stefan and I are adding on the patch.
>
>
> On Feb 10, 2025, at 7:18 AM, Benedict  wrote:
>
> Thanks. While I agree we shouldn’t be applying these constraints post hoc
> on read or compaction, I think we need to make clear to the user whether we
> are validating a new constraint before accepting it for alter table. Which
> is to say I think alter table should require something like “NO VERIFY” or
> some other additional keywords to make clear we aren’t checking the
> constraint applies to existing data.
>
>
> On 10 Feb 2025, at 15:10, Bernardo Botella 
> wrote:
>
> Hi. These was a topic we discussed during the ML thread:
> lists.apache.org
> 
> 
> 
> 
>
> Here was one of my answers on that:
> lists.apache.org
> 
> 
> 
> 
>
> It was also specified in the CEP (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-42%3A+Constraints+Framework#CEP42:ConstraintsFramework-Constraintexecutionatwritetime
> ):
> "Note: This constraints are only enforced at write time. So, an ALTER
> CONSTRAINT with more restrictive constraints shouldn’t affect preexisting
> data.”
>
> Long story short, constraints are only checked at write time. If a
> constraint is added to a table with preexisting offending data, that data
> stays untouched.
>
> I hope this helps,
> Bernardo
>
> On Feb 10, 2025, at 7:00 AM, Benedict  wrote:
>
> This is counterintuitive to me. The constraint should be applied to the
> table, not to the update. NOT NULL should imply a value is always specified.
>
> How are you handling this for tables that already exist? Can we alter
> table to add constraints, and if so what are the semantics?
>
> On 10 Feb 2025, at 14:50, Bernardo Botella 
> wrote:
>
> Hi everyone,
>
> Stefan Miklosovic and I have been working on a NOT_NULL (
> https://github.com/apache/cassandra/pull/3867) constraint to be added to
> the constraints tool belt, and a really interesting conversation came up.
>
> First, as a problem statement, let's consider this:
>
> -
> CREATE TABLE ks.tb2 (
>   id int,
>   cl1 int,
>   cl2 int,
>   val text CHECK NOT_NULL(val),
>   PRIMARY KEY (id, cl1, cl2)
> )
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> null);
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Column value does not satisfy value constraint for column 'val' as
> it is null."
>
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3,
> “text");
> cassandra@cqlsh> select * from ks.tb2;
>
> id | cl1 | cl2 | val
> +-+-+--
> 1 |   2 |   3 | text
>
> (1 rows)
> cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
> cassandra@cqlsh> select * from ks.tb2;
>
> id | cl1 | cl2 | val
> +-+-+--
> 1 |   2 |   3 | text
> 1 |   2 |   4 | null
>
> -
>
> As you see, we have a hole in which a 'null' value is getting written on
> column val even if we have a NOT_NULL on that particular column whenever
> the column is NOT specified on the write. That raises the question on how
> this particular constraint should behave.
>
> If we consider the other constraints (scalar constraint and length
> constraint so far), this particular behavior is fine. But, if the
> constraint is NOT_NULL, then it becomes a little bit trickier.
>
> The conc

Re: [DISCUSS] Default Selection of 2i

2025-02-06 Thread Jeremiah Jordan

 Rather than changing the default, I would be +1 to making a system
property so that an operator who knows what they are doing could change
it.  A little hesitant of just changing it outright in a patch release.



On Feb 6, 2025 at 1:10:28 PM, Caleb Rackliffe 
wrote:

> Hey everyone!
>
> I'll keep this short. SASI and later SAI, in lieu of anything resembling a
> query planner, have always just greedily returned a min long from
> Index#getEstimatedResultRows(), thereby stealing the right to be used to
> execute the query even when a legacy 2i is present on the relevant columns.
> If we have a user that needs to migrate away from a legacy 2i, this seems
> like the exact opposite of what we want to do by default. I want to propose
> that we invert this behavior, and have legacy 2i continue to serve queries
> instead of new SAI indexes by default until they are dropped.
>
> Note that we'll have more options around this when CASSANDRA-18112
>  lands, but for
> now the change in default seems valuable.
>
> Thoughts?
>

Re: Meaningless emptiness and filtering

2025-02-11 Thread Jeremiah Jordan

 AFAIK this EMPTY stuff goes back to thrift days.  We let people insert
these zero length values back then, so we have to support those zero length
values existing for ever :/.

How useful is such a distinction?  I don’t know.  Is anybody actually doing
this?  Well Andres brought up
https://issues.apache.org/jira/browse/CASSANDRA-20313 as a problem because
we had an end user create an SAI index on a column which contained EMPTY
values in it.  So people are inserting these into the database.  Would they
expect to be able to query by EMPTY?  I do not know.

This is the first I have heard of the “isEmptyValueMeaningless” setting.
The meaning of EMPTY to me has always been the same for an Integer or a
String, “this column has a value of no value” vs NULL which means "this
column is not set/has no value”.  If we truly want to follow the spirit of
that setting, then maybe we should be converting such values into a
tombstone / NULL up front when deserializing them, rather than storing the
EMPTY byte buffer in the DB?

Anyway, I am kind of rambling here.  I am of two minds.
I can see that this does seem like a silly distinction to have for some
types, so maybe we should just decide that in a CQL world, EMPTY means NULL
for some types, and actually just make that a tombstone.  Maybe 6.0 would
be a good major version change to make such a “breaking” behavior change in.

I can also see the “don’t screw up the legacy apps” use case.  Everything
besides SAI, including the table based 2i and ALLOW FILTERING, treat EMPTY
as a distant value which can be inserted and queried on.  We have supported
it in the past, so we should continue to support it into the future, even
if it is painful to do.

Flip a coin and I can argue either side.  So I would love to hear others
thoughts to convince me one way to the other.

-Jeremiah

On Feb 11, 2025 at 12:55:35 PM, Caleb Rackliffe 
wrote:

> The case where allowsEmpty == true AND is meaningless == true is
> especially confusing. If I could design this from scratch, I would reject
> writes and filtering on EMPTY values for int and the other types where
> meaningless == true. (In other words, if we allow EMPTY, it is meaningful
> and queryable. If we don't, it isn't.) That avoids problems that can't have
> anything other than an arbitrary solution, like what we do with < and > for
> EMPTY for int. When we add IS [NOT] NULL support, that would preferably NOT
> match EMPTY values for the types where empty means something, like strings.
> For everything else, EMPTY could be equivalent to null and match IS NULL.
>
> The only real way to make SAI compatible with the current behavior is to
> add something like a special postings list to its data structures that
> corresponds to the rows where the indexed column value is EMPTY.
>
> On Tue, Feb 11, 2025 at 12:21 PM David Capwell  wrote:
>
>> Bringing this discussion to dev@ rather than Slack as we try to figure
>> out CASSANDRA-20313 and CASSANDRA-19461.
>>
>> In the type system, we have 2 different (but related) methods:
>>
>> AbstractType#allowsEmpty- if the user gives empty
>> bytes (new byte[0]) will the type reject it
>> AbstractType#isEmptyValueMeaningless  - if the user gives empty bytes,
>> should this be handled like null?
>>
>> In practice, there are 2 cases that matter:
>>
>> allowsEmpty = true AND is meaningless = false - stuff like text and bytes
>> allowsEmpty = true AND is meaningless = true  - many types, example "int"
>>
>> What this means is that users are able to use empty bytes when writing to
>> these types, but this leads to complexity in the filter path, and is
>> something we are trying to flesh out the “correct” semantics for SAI.
>>
>> Simple example:
>>
>> {code}
>>
>> @Test
>> public void test() throws IOException
>> {
>> try (Cluster cluster = Cluster.build(1).start())
>> {
>> init(cluster);
>> cluster.schemaChange(withKeyspace("CREATE TABLE %s.tbl (pk int 
>> primary key, v int)"));
>> IInvokableInstance node = cluster.get(1);
>> for (int i = 0; i < 10; i++)
>> node.executeInternal(withKeyspace("INSERT INTO %s.tbl (pk, v) 
>> VALUES (?, ?)"), i, ByteBufferUtil.EMPTY_BYTE_BUFFER);
>>
>> var qr = node.executeInternalWithResult(withKeyspace("SELECT * FROM 
>> %s.tbl WHERE v=? ALLOW FILTERING"), ByteBufferUtil.EMPTY_BYTE_BUFFER);
>> StringBuilder sb = new StringBuilder();
>> sb.append(qr.names());
>> while (qr.hasNext())
>> {
>> var next = qr.next();
>> sb.append('\n').append(next);
>> }
>> System.out.println(sb);
>> }
>> }
>>
>> {code}
>>
>> “Should” this return 10 rows or 0?  In this case, the type is int, and
>> int defines empty as meaningless, which means it should act as a null; yet
>> this query returns 10 rows, which violates CQL as foo = null == false.
>>
>> Right now there really isn’t a way to query for NULL (CASSANDRA-10715 is
>> still open), but

Re: [DISCUSS] Experimental flagging (fork from Re-evaluate compaction defaults in 5.1/trunk)

2024-12-10 Thread Jeremiah Jordan

 I agree with Aleksey and Patrick.  We should define terminology and then
stick to it.  My preferred list would be:


   1. Preview - Ready to be tried by end users but has caveats and most
   likely is not api stable.
   2. Beta - Feature complete/API stable but has not had enough testing to
   be considered rock solid.
   3. GA - Ready for use, no known issue, PMC is satisfied with the testing
   that has been done


Whether or not something is enabled by default or the default
implementation is a separate access from the readiness.  Though if we are
replacing an existing thing with a new default I would hope we apply extra
rigor to allowing that to happen.

-Jeremiah

On Dec 10, 2024 at 11:15:37 AM, Patrick McFadin  wrote:

> I'm going to try to pull this back from the inevitable bikeshedding
> and airing of grievances that happen. Rewind all the way back to
> Josh's  original point, which is a defined process. Why I really love
> this being brought up is our maturing process of communicating to the
> larger user base. The dev list has very few participants. Less than
> 1000 last I looked. Most users I talk to just want to know what they
> are getting. Well-formed, clear communication is how the PMC can let
> end users know that a new feature is one of three states:
>
> 1. Beta
> 2. Generally Available
> 3. Default (where appropriate)
>
> Yes! The work is just sorting out what each level means and then
> codifying that in confluence. Then, we look at any features that are
> under question, assign a level, and determine what it takes to go from
> one state to another.
>
> The CEPs need to reflect this change. What makes a Beta, GA, Default
> for new feature X. It makes it clear for implementers and end users,
> which is an important feature of project maturity.
>
> Patrick
>


On Dec 10, 2024 at 5:46:38 AM, Aleksey Yeshchenko  wrote:

> What we’ve done is we’ve overloaded the term ‘experimental’ to mean too
> many related but different ideas. We need additional, more specific
> terminology to disambiguate.
>
> 1. Labelling released features that were known to be unstable at release
> as ‘experimental’  retroactively shouldn’t happen and AFAIK only happened
> once, with MVs, and ‘experimental’ there was just a euphemism for ‘broken’.
> Our practices are more mature now, I like to think, that a situation like
> this would not arise in the future - the bar for releasing a completed
> marketable feature is higher. So the label ‘experimental’ should not be
> applied retroactively to anything.
>
> 2. It’s possible that a released, once considered production-ready
> feature, might be discovered to be deeply flawed after being released
> already. We need to temporarily mark such a feature as ‘broken' or
> ‘flawed'. Not experimental, and not even ‘unstable’. Make sure we emit a
> warning on its use everywhere, and, if possible, make it opt-in in the next
> major, at the very least, to prevent new uses of it. Announce on dev, add a
> note in NEWS.txt, etc. If the flaws are later addressed, remove the label.
> Removing the feature itself might not be possible, but should be
> considered, with heavy advanced telegraphing to the community.
>
> 3. There is probably room for genuine use of ‘experimental’ as a feature
> label. For opt-in features that we commit with an understanding that they
> might not make it at all. Unstable API is implied here, but a feature can
> also have an unstable API without being experimental - so ‘experimental'
> doesn’t equal to ‘api-unstable’. These should not be relied on by any
> production code, they would be heavily gated by unambiguous configuration
> flags, disabled by default, allowed to be removed or changed in any version
> including a minor one.
>
> 4. New features without known flaws, intended to be production-ready and
> marketable eventually, that we may want to gain some real-world confidence
> with before we are happy to market or make default. UCS, for example, which
> seems to be in heavy use in Astra and doesn’t have any known open issues
> (AFAIK). It’s not experimental, it’s not unstable, it’s not ‘alpha’ or
> ‘beta’, it just hasn't been widely enough used to have gained a lot of
> confidence. It’s just new. I’m not sure what label even applies here. It’s
> just a regular feature that happens to be new, doesn’t need a label, just
> needs to see some widespread use before we can make it a default. No other
> limitation on its use.
>
> 5. Early-integrated, not-yet fully-completed features that are NOT
> experimental in nature. Isolated, gated behind deep configuration flags.
> Have a CEP behind them, we trust that they will be eventually completed,
> but for pragmatic reasons it just made sense to commit them at an earlier
> stage. ‘Preview’, ‘alpha’, ‘beta’ are labels that could apply here
> depending on current feature readiness status. API-instability is implied.
> Once finished they just become a regular new feature, no flag needed, no
> heavy config gating needed.
>

Re: [DISCUSS] 5.1 should be 6.0

2024-12-10 Thread Jeremiah Jordan

 The question is if we are signaling compatibility or purely marketing with
the release number.
We dropped compatibility with a few things in 5.0, which was the reason for
the .0 rather than 4.2.  I don’t know if we are breaking any compatibility
with current trunk?  Though maybe some of the TCM stuff could be considered
that.
If we are purely going for marketing value, then yes, I agree TCM+Accord
would be 6.0 worthy.

-Jeremiah

On Dec 10, 2024 at 10:48:21 AM, Jon Haddad  wrote:

> Keeping this short.  I'm not sure why we're calling the next release 5.1.
> TCM and Accord are a massive thing.  Other .1 / .2 releases were the .0
> with some smaller things added.  Imo this is a huge step forward, as big as
> 5.0 was, so we should call it 6.0.
>
>
>

Re: [DISCUSS] 5.1 should be 6.0

2024-12-10 Thread Jeremiah Jordan

res,
>>> even if they’re not meant to be disruptive.
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Dec 10, 2024, at 9:46 AM, Josh McKenzie 
>>> wrote:
>>> > >>
>>> > >> Currently we reserve MAJOR in semver changes for API breaking only:
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199530302#Patching,versioning,andLTSreleases-Versioningandtargeting
>>> :
>>> > >>
>>> > >> That's consistent w/semver itself: link:
>>> > >>
>>> > >> Given a version number MAJOR.MINOR.PATCH, increment the:
>>> > >>
>>> > >> MAJOR version when you make incompatible API changes
>>> > >> MINOR version when you add functionality in a backward compatible
>>> manner
>>> > >> PATCH version when you make backward compatible bug fixes
>>> > >>
>>> > >>
>>> > >> So absolute literal "correctness" of what we're doing aside, our
>>> version numbers mean something to us as a dev community but also mean
>>> something to Cassandra users. I'm not confident they mean the same thing to
>>> each constituency. I'm also not comfortable with us prioritizing our own
>>> version number needs over that of our users, should they differ in meaning.
>>> > >>
>>> > >> Does anybody have insight into how other well known widely adopted
>>> projects do things we might be able to learn from? I generally only think
>>> about this topic when a discussion like this comes up on our dev list so
>>> don't have much insight to bring to the discussion.
>>> > >>
>>> > >> On Tue, Dec 10, 2024, at 11:52 AM, Jeremiah Jordan wrote:
>>> > >>
>>> > >> The question is if we are signaling compatibility or purely
>>> marketing with the release number.
>>> > >> We dropped compatibility with a few things in 5.0, which was the
>>> reason for the .0 rather than 4.2.  I don’t know if we are breaking any
>>> compatibility with current trunk?  Though maybe some of the TCM stuff could
>>> be considered that.
>>> > >> If we are purely going for marketing value, then yes, I agree
>>> TCM+Accord would be 6.0 worthy.
>>> > >>
>>> > >> -Jeremiah
>>> > >>
>>> > >> On Dec 10, 2024 at 10:48:21 AM, Jon Haddad 
>>> wrote:
>>> > >>
>>> > >> Keeping this short.  I'm not sure why we're calling the next
>>> release 5.1.  TCM and Accord are a massive thing.  Other .1 / .2 releases
>>> were the .0 with some smaller things added.  Imo this is a huge step
>>> forward, as big as 5.0 was, so we should call it 6.0.
>>> > >>
>>> > >>
>>> >
>>>
>>>
>>>

Re: [DISCUSS] 5.1 should be 6.0

2024-12-12 Thread Jeremiah Jordan

 My expectation is that in trunk SCM CASSANDRA_4 would change to SCM
CASSANDRA_5.  I think we should be striving to support full
downgrade/rollback ability to the previous major version from trunk.
With TCM I would expect that when running in CASSANDRA_5 mode that
initializing TCM would not be possible, as once initialized you could no
longer roll back.
Do we have no way to support the gossip paths continuing to work prior to
initializing TCM?

-Jeremiah

On Dec 11, 2024 at 7:41:48 AM, Sam Tunnicliffe  wrote:

> My point is that the upgrade to 5.1/6.0 isn't really complete until the
> CMS is initialised and this can't be done while running with SCM
> CASSANDRA_4 because of the messaging service limitation. Until that point,
> schema changes & node replacements are not supported which affects how long
> a bake time is tolerable.
> This specific issue could probably be fixed by revisiting the SCM
> implementation in 5.1/6.0, so we should certainly do that but the fact
> remains that we don't have great test coverage to indicate how clusters
> behave when running in SCM for a prolonged period.
>
> Thanks,
> Sam
>
> On 11 Dec 2024, at 13:29, Brandon Williams  wrote:
>
>
> On Wed, Dec 11, 2024 at 7:22 AM Sam Tunnicliffe  wrote:
>
> >
>
> > so running in any SCM mode for a prolonged period is not really viable.
>
>
> This is what many users want to do though, upgrade one DC and let it
>
> bake to see how it goes before continuing.  I don't think that's
>
> unreasonable, but from working on CASSANDRA-20118 I know how difficult
>
> that is already.  I don't think we've built enough SCM muscle yet to
>
> think about handling multiple previous versions.
>
>
> Kind Regards,
>
> Brandon
>
>
>

Re: Supporting 2.2 -> 5.0 upgrades

2024-12-12 Thread Jeremiah Jordan

>
> TL;DR - in progress migration off 2.2 to 5.0 is annoying as there were
>> different bugs in the past we have to support again.  Out of process
>> migration to me feels far more plausible, but feels annoying without
>> splitting off our reader/writer… doable… just more annoying…
>
>
This is your main blocker for the 5.0/trunk code converting 2.2 sstables
correctly.  You would have to bring back a bunch of code paths special
casing those versions and dealing with bugs in them.  If you are actually
interested in figuring out an offline 2.2 to 5.0 path I would recommend you
do it in two steps.  Offline 2.2 to 4.1? or what ever the latest thing is
that can sstableupgrade a 2.2 sstable, and then offline the result of that
with 5.0.  But at that point maybe you just do it online in those two steps.


On Dec 12, 2024 at 11:34:31 AM, Štefan Miklošovič 
wrote:

> I think it does not make a lot of sense to get away from Ant unless we
> split it into more jars. Splitting it into more jars while moving away from
> Ant at the same time is just too much work. So, what is the point of having
> monolithic cassandra-all in Gradle / Maven? Smoother release? We mastered
> that already. We are releasing, aren't we? Dependencies? That is working
> too. Sure cache and all the "hacks" would go away overnight but otherwise
> ... I think modularising it first so it is easier to reuse and so on is
> more important.
>
> On Thu, Dec 12, 2024 at 6:21 PM David Capwell  wrote:
>
>> but I still find myself very rarely interacting with ant
>>
>>
>> I think that is where most people are as not many actually maintain or
>> modify ant… there are so many things that bug me (lack of cache, making
>> sure new people use the right version (was totally fun to learn 4.1 didn’t
>> build with the default ant that got installed when I was helping out a new
>> higher… we had to downgrade… woohoo….), hand rolled IDE integration, etc…),
>> but I would disagree that ant is a blocker for different jars… we could
>> switch off ant for Makefile, or bash, and we would still be able to produce
>> new jars… Its more of… how many people actually feel comfortable enough to
>> alter our build system to make such a change?  I don’t…
>>
>> Our build system does not impact our ability to offer migration from 2.2
>> to 5.0… so don’t want to keep distracting this thread…
>>
>> TL;DR - in progress migration off 2.2 to 5.0 is annoying as there were
>> different bugs in the past we have to support again.  Out of process
>> migration to me feels far more plausible, but feels annoying without
>> splitting off our reader/writer… doable… just more annoying...
>>
>> On Dec 12, 2024, at 9:04 AM, Alex Petrov  wrote:
>>
>> > I have for a while advocated for a shared lib to also share between
>> Harry, accord, dtests etc
>>
>> Big +1 for a shared lib for our concurrency and test utils. Been
>> intending to start working on this for a while now, but never got to do
>> this so far.
>>
>> On Thu, Dec 12, 2024, at 5:58 PM, Benedict wrote:
>>
>>
>> Why would ant get in the way? We already build multiple jars, and accord
>> will be a submodule. We have far more organisational issues to overcome
>> than ant.
>>
>> I have for a while advocated for a shared lib to also share between
>> Harry, accord, dtests etc
>>
>> I am however not 100% sure about splitting read/write path, at least not
>> as first posited. The idea of maintaining it as an API for dropping in
>> different jars is a whole other world of potential pain I don’t want to
>> countenance. Supporting eg bulk readers or writers or other integrations
>> seems pretty feasible though.
>>
>>
>> On 12 Dec 2024, at 16:53, Paulo Motta  wrote:
>>
>> 
>> >  I think that will not happen until we are out of Ant as doing this
>> multi jar / subproject mumbo jumbo is not too much appealing to ... anybody?
>>
>> This is a contentious/controversial topic, but the more I work with
>> gradle the more I lean towards ant's simplicity. That said, I'd support
>> moving away if it becomes a technical blocker to break up cassandra-all -
>> and if this happen I would vote for maven as replacement. :-D
>>
>> On Thu, Dec 12, 2024 at 11:42 AM Miklosovic, Stefan via dev <
>> dev@cassandra.apache.org> wrote:
>>
>> These are all good ideas but in practical terms I think that will not
>> happen until we are out of Ant as doing this multi jar / subproject mumbo
>> jumbo is not too much appealing to ... anybody?
>>
>> 
>> From: Paulo Motta 
>> Sent: Thursday, December 12, 2024 17:35
>> To: dev@cassandra.apache.org
>> Subject: Re: Supporting 2.2 -> 5.0 upgrades
>>
>> EXTERNAL EMAIL - USE CAUTION when clicking links or attachments
>>
>>
>>
>> >  +1 on moving the read/write logic into its own jar.
>>
>> +1, not only read-write logic but anything used by both the server and
>> subprojects (ie. cassandra-sidecar), for example JMX Mbeans and other
>> interfaces.
>>
>> I think one way to do that would be to split cassandra-a

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Jeremiah Jordan

>
> On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe 
> wrote:
>
>> You mean like to control the tokenization/analysis of query terms?
>>
>
Yes.  Elastic for example lets you specify the query time analyzer in the
query, over riding what is specified at the index level.

https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html#specify-search-query-analyzer

On Dec 20, 2024 at 5:37:58 PM, Caleb Rackliffe 
wrote:

> So that would look something like...
>
> SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' :
> [, ] }
>

Yeah something like that would work.



On Dec 20, 2024 at 5:37:58 PM, Caleb Rackliffe 
wrote:

> So that would look something like...
>
> SELECT ... FROM ... WHERE ... WITH OPTIONS = { 'exclude_indexes' :
> [, ] }
>
> On Fri, Dec 20, 2024 at 5:36 PM Caleb Rackliffe 
> wrote:
>
>> You mean like to control the tokenization/analysis of query terms?
>>
>> On Fri, Dec 20, 2024 at 4:38 PM Jeremiah Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>>
>>> Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}.  If we
>>> move into allowing analysis/tokenization on indexed items, then a more
>>> general WITH OPTIONS would be useful for that too.  That would let us add
>>> any other new options to a SELECT without needing to modify the grammar
>>> further.
>>>
>>> -Jeremiah
>>>
>>> On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe 
>>> wrote:
>>>
>>>> Some of your are probably familiar with work in the DS fork to improve
>>>> the selection of indexes for SAI queries in
>>>> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
>>>> .
>>>>
>>>> While I'm eagerly anticipating working on that in the new year, I'm
>>>> also wondering whether we think some simple CQL extensions to manually
>>>> control index selection would be helpful. Maxwell proposed this a while
>>>> back in CASSANDRA-18112, and I'd like to propose a syntax:
>>>>
>>>>
>>>> ex. Do not use the specified index during the query.
>>>>
>>>> SELECT ... FROM ... WHERE ... WITHOUT INDEX 
>>>>
>>>> This could be helpful for intersection queries where one of the
>>>> provided clauses is not very selective and could simply be handled via
>>>> post-filtering.
>>>>
>>>> ex. Require the specified index to be used.
>>>>
>>>> SELECT ... FROM ... WHERE ... WITH INDEX 
>>>>
>>>> This could be helpful in scenarios where multiple indexes exist on a
>>>> column and was the primary motivation for CASSANDRA-18112.
>>>>
>>>> Thoughts?
>>>>
>>>

Re: [DISCUSS] Index selection syntax for CASSANDRA-18112

2024-12-20 Thread Jeremiah Jordan

 Rather than WITH INDEX/WITHOUT INDEX what about WITH OPTIONS {}.  If we
move into allowing analysis/tokenization on indexed items, then a more
general WITH OPTIONS would be useful for that too.  That would let us add
any other new options to a SELECT without needing to modify the grammar
further.

-Jeremiah

On Dec 20, 2024 at 2:28:58 PM, Caleb Rackliffe 
wrote:

> Some of your are probably familiar with work in the DS fork to improve the
> selection of indexes for SAI queries in
> https://github.com/datastax/cassandra/commit/eeb33dd62b9b74ecf818a263fd73dbe6714b0df0#diff-2830028723b7f4af5ec7450fae2c206aeefa5a2c3455eff6f4a0734a85cb5424
> .
>
> While I'm eagerly anticipating working on that in the new year, I'm also
> wondering whether we think some simple CQL extensions to manually control
> index selection would be helpful. Maxwell proposed this a while back
> in CASSANDRA-18112, and I'd like to propose a syntax:
>
>
> ex. Do not use the specified index during the query.
>
> SELECT ... FROM ... WHERE ... WITHOUT INDEX 
>
> This could be helpful for intersection queries where one of the provided
> clauses is not very selective and could simply be handled via
> post-filtering.
>
> ex. Require the specified index to be used.
>
> SELECT ... FROM ... WHERE ... WITH INDEX 
>
> This could be helpful in scenarios where multiple indexes exist on a
> column and was the primary motivation for CASSANDRA-18112.
>
> Thoughts?
>

Re: [DISCUSS] Snapshots outside of Cassandra data directory

2025-01-23 Thread Jeremiah Jordan

For commit log archiving we already have the concept of “commands” to be
executed.  Maybe a similar concept would be useful for snapshots?  Maybe a
new “user snapshot with command” nodetool action could be added.  The
server would make its usual hard links inside a snapshot folder and then it
could shell off a new process running the “snapshot archiving command”
passing it the directory just made.  Then what ever logic wanted could be
implemented in the command script.  Be that copying to S3, or copying to a
folder on another mount point, or what ever the operator wants to happen.

-Jeremiah

On Jan 23, 2025 at 7:54:20 AM, Štefan Miklošovič 
wrote:

> Interesting, I will need to think about it more. Thanks for chiming in.
>
> On Wed, Jan 22, 2025 at 8:10 PM Blake Eggleston 
> wrote:
>
>> Somewhat tangential, but I’d like to see Cassandra provide a backup story
>> that doesn’t involve making copies of sstables. They’re constantly
>> rewritten by compaction, and intelligent backup systems often need to be
>> able to read sstable metadata to optimize storage usage.
>>
>> An interface purpose built to support incremental backup and restore
>> would almost definitely be more efficient since it could account for
>> compaction, and would separate operational requirements from storage layer
>> implementation details.
>>
>> On Jan 22, 2025, at 2:33 AM, Štefan Miklošovič 
>> wrote:
>>
>>
>>
>> On Wed, Jan 22, 2025 at 2:21 AM James Berragan 
>> wrote:
>>
>>> I think this is an idea worth exploring, my guess is that even if the
>>> scope is confined to just "copy if not exists" it would still largely be
>>> used as a cloud-agnostic backup/restore solution, and so will be shaped
>>> accordingly.
>>>
>>> Some thoughts:
>>>
>>> - I think it would be worth exploring more what the directory structure
>>> looks like. You mention a flat directory hierarchy, but it seems to me it
>>> would need to be delimited by node (or token range) in some way as the
>>> SSTable identifier will not be unique across the cluster. If we do need to
>>> delimit by node, is the configuration burden then on the user to mount
>>> individual drives to S3/Azure/wherever to unique per node paths? What do
>>> they do in the event of a host replacement, backup to a new empty
>>> directory?
>>>
>>
>> It will be unique when "uuid_sstable_identifiers_enabled: true", even
>> across the cluster. If we worked with "old identifiers" too, these are
>> indeed not unique (even across different tables in the same node). I am not
>> completely sure how far we want to go with this, I don't have a problem
>> saying that we support this feature only with
>> "uuid_sstable_identifiers_enabled: true". If we were to support the older
>> SSTable identifier naming as well, that would complicate it more. Esop's
>> directory structure of a remote destination is here:
>>
>>
>> https://github.com/instaclustr/esop?tab=readme-ov-file#directory-structure-of-a-remote-destination
>>
>> and how the content of the snapshot's manifest looks just below it.
>>
>> We may go with hierarchical structure as well if this is evaluated to be
>> a better approach. I just find flat hierarchy simpler. We can not have flat
>> hierarchy with old / non-unique identifiers so we would need to find a way
>> how to differentiate one SSTable from another, which naturally leads to
>> them being placed in keyspace/table/sstable hierarchy but I do not want to
>> complicated it more to have flat and non-flat hierarchies supported
>> simultaneously (where a user could pick which one he wants). We should go
>> just with one solution.
>>
>> When it comes to node replacement, I think that it would be just up to an
>> operator to rename the whole directory to reflect a new path for that
>> particular node. Imagine an operator has a bucket in Azure which is empty
>> (/) and it is mounted to /mnt/nfs/cassandra in every node. Then on node1,
>> Cassandra would automatically start to put SSTables into
>> /mnt/azure/cassandra/cluster-name/dc-name/node-id-1 and node 2 would put
>> that into /mnt/nfs/cassandra/cluster-name/dc-name/node-id-2.
>>
>> The part of "cluster-name/dc-name/node-id" would be automatically done by
>> Cassandra itself. It would just append it to /mnt/nfs/cassandra under which
>> a bucket be mounted.
>>
>> If you replaced the node, data would stay, it would just change node's
>> ID. In that case, all that would need to be necessary would be to rename
>> "node-id-1" directory to "node-id-3" (id-3 being a host id of the replaced
>> node). Snapshot manifest does not know anything about host id so content of
>> the manifest would not need to be changed. If you don't rename the node id
>> directory, then snapshots would be indeed made under a new host id
>> directory which would be empty at first.
>>
>>
>>> - The challenge often with restore is restoring from snapshots created
>>> before a cluster topology change (node replacements, token moves,
>>> cluster expansions/shrinks etc). This could be solved by

Re: [DISCUSS] 5.1 should be 6.0

2025-01-29 Thread Jeremiah Jordan

On Jan 29, 2025 at 3:32:13 PM, Josh McKenzie  wrote:

> My opinion is that it would be valuable to take this discussion as a
> forcing function to determine how we plan to handle releases broadly to
> answer the "5.1 should be 6.0" question. Assuming we move away from ad hoc
> per-release debate. If there's broad strong dissent (i.e. let's have 6.0 be
> the next major and talk about this topic separately) I'm happy to open
> another thread, but I didn't see clear consensus on this thread yet and was
> trying to help drive to that.
>

We are 30 messages deep into this thread.  So I am guessing we have lost a
lot of people from it already.  If you want a broad discussion, I would
still suggest a new thread where the subject line is the broad subject you
want to discuss …

Re: [DISCUSS] 5.1 should be 6.0

2025-01-29 Thread Jeremiah Jordan

 This got way off topic from 5.1 should be 6.0, so maybe there should be a
new DISCUSS thread with the correct title to have a discussion around
codifying our upgrade paths?

FWIW this mostly agrees with my thoughts around upgrade support.

T-2 online upgrade supported, T-1 API compatible, deprecate-then-remove is
> a combination of 3 simple things that I think will improve this situation
> greatly and hopefully put a nail in the coffin of the topic, improve
> things, and let us move on to more interesting topics that we can then
> re-litigate endlessly. ;)
>
>
Depending on what “T-2” means for the online upgrade.  If you mean 4.0,
4.1, and 5.0 are all online upgrade supported versions for trunk, then I
agree.  If you mean only 4.1 and 5.0 would be online upgrade targets, I
would suggest we change that to T-3 so you encompass all “currently
supported” releases at the time the new branch is GAed.

-Jeremiah

On Jan 29, 2025 at 10:49:17 AM, Josh McKenzie  wrote:

> To clarify, when I say unspoken it includes "not consciously considered
> but shapes engagement patterns". I don't think there's people sitting
> around deeply against either the status quo or my proposal who are holding
> back for nefarious purposes or anything.
>
> And yeah - my goal is to try and put a little more energy into this to see
> if we can surface pushback as I don't think it'd be appropriate to move to
> a VOTE thread on a proposal with essentially nil engagement. My intuition
> is that the properties of the status quo isn't actually what the polity
> wants, whether or not what I'm proposing is an improvement on that status
> quo.
>
> On Wed, Jan 29, 2025, at 11:15 AM, Benedict wrote:
>
>
> I think you’re making the mistake of assuming a representative sample of
> the community participates in these debates. Sensibly, a majority of the
> community sits these out, and I think on this topic that’s actually the
> rational response.
>
> That doesn’t stop folk voting for something else when the decision
> actually matters, as it shouldn’t - the polity can’t bind itself after all.
>
> Which is only to say, I applaud your optimism but it’s probably wrong to
> assume there’ll be pushback that reifies the community’s revealed
> preferences. There’s no reason to assume there will be, and history shows
> there usually isn’t.
>
> To be clear, I don’t think these are our “unspoken incentives” but our
> collective preferences that simply can’t functionally be codified due to
> the fact nobody is willing to actually argue this is a good thing.
> Sometimes no individual likes what happens, but it’s what the polity
> actually wants, collectively. That’s fine, let’s be at peace with it.
>
> On 29 Jan 2025, at 16:00, Josh McKenzie  wrote:
>
> 
> I've let this topic sit in my head overnight and kind of chewed on it.
> While I agree w/the "we're doing what matches our unspoken incentives"
> angle Benedict, I think we can do better than that both for ourselves and
> our users if we apply energy here and codify something. If people come out
> with energy to push *against* that codification, that'll at least bring
> the unspoken incentives to light to work through.
>
> I think it's important we release on a predictable cadence for our users.
> We've fallen short (in some cases exceptionally) on this in the past, and
> it also adds value for operators to plan out verification and adoption
> cycles. It also helps users considering different databases to see a
> predictable cadence and a healthy project. My current position is that 12
> months is a happy medium min-value, especially with a T-2 supported cycle
> since that gives users between 12 months for high appetite fast adoption up
> to 36 months for slow verification. I don't want to further pry open
> Pandora's box, but I'd love to see us cut alphas from trunk quarterly as
> well.
>
> I also think it's important that our release versioning is clear and
> simple. Right now,  *to my mind*, it is not. The current matrix of:
>
>- Any .MINOR to next MAJOR is supported
>- Any .MAJOR to next MAJOR is supported
>- A release will be supported for some variable amount of time based
>on when we get around to new releases
>- API breaks in MAJOR changes, except when we get excited about a
>feature and want to .MAJOR to signal that in which case it may be
>completely low-risk and easy adoption, or we change JDK's and need to
>signal that, or any of another slew of caveats that require digging into
>NEWS.txt to see what the hell we're up to
>- And all of our CI pain that ensues from the above
>
> In my opinion the above is a mess. This isn't a particularly interesting
> topic to me, and us re-litigating this on every release (even if you
> discount me agitating about it; this isn't just me making noise I think),
> is a giant waste of time and energy for a low value outcome.
>
> T-2 online upgrade supported, T-1 API compatible, deprecate-then-remove is
> a combination of 3 simple

Re: [VOTE][IP CLEARANCE] Spark-Cassandra-Connector

2025-03-18 Thread Jeremiah Jordan

 +1

On Mar 18, 2025 at 3:13:09 AM, Mick Semb Wever  wrote:

> (general@incubator cc'd)
>
> Please vote on the acceptance of the Spark-Cassandra-Connector and its
> IP Clearance:
>
> https://incubator.apache.org/ip-clearance/cassandra-spark-cassandra-connector.html
>
> All consent from original authors of the donation, and tracking of
> collected CLAs, is found in
> https://github.com/datastax/spark-cassandra-connector/pull/1376 and
>
> https://docs.google.com/spreadsheets/d/1rkFtfnXbIckV1tYQlgFtwoHHOKUJj0vv-VndlQWA4rY
> These do not all require acknowledgement before the vote.
>
> The code is prepared for donation at
> https://github.com/datastax/spark-cassandra-connector
>
> Once this vote passes we will request ASF Infra to move the
> datastax/spark-cassandra-connector as-is to
> apache/cassandra-spark-connector  .  The master and gh-pages branches,
> all tags, and all history, will be kept.  The master branch will be
> renamed to trunk.
>
> PMC members, please check carefully the IP Clearance requirements before
> voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members
> are considered binding. A vote passes if there are at least three
> binding +1s and no -1's.
>
> regards,
> Mick
>

Re: [DISCUSS] slack notifications for subprojects

2025-04-08 Thread Jeremiah Jordan

 +1 from me for that proposal.

On Apr 8, 2025 at 2:51:09 PM, Ekaterina Dimitrova 
wrote:

> I’d say we mimic the current CASSANDRA tickets handling plus adding to the
> #cassandra-sidecar. That means:
>
> 1) Open and close notifications to #cassandra-dev and #cassandra-sidecar
> 2) all other notifications to #cassandra-noise
> WDYT?
>
> On Tue, 8 Apr 2025 at 15:48, Josh McKenzie  wrote:
>
>> Currently we don't have Qbot notifying us on CASSSIDECAR ticket creation
>> and state change. Seems we could:
>>
>>1. notify in #cassandra-dev and #cassandra-sidecar
>>2. notify in the #cassandra-sidecar channel
>>
>> My preference is for 1 since there's a tight relationship between what
>> we're doing with the subprojects and the main db and there's probably
>> shared interest there.
>>
>> Any other opinions?
>>
>

Re: [DISCUSS] How we version our releases

2025-04-11 Thread Jeremiah Jordan

 +1 from me.
No more wondering what the next version number will be.
No more wondering what version I can upgrade from to use the new release.

-Jeremiah

On Apr 10, 2025 at 3:54:13 PM, Josh McKenzie  wrote:

> This came up in the thread from Jon on "5.1 should be 6.0".
>
> I think it's important that our release versioning is clear and simple.
> The current status quo of:
> - Any .MINOR to next MAJOR is supported
> - Any .MAJOR to next MAJOR is supported
> - We reserve .MAJOR for API breaking changes
> - except for when we get excited about a feature and want to .MAJOR to
> signal that
> - or we change JDK's and need to signal that
> - or any of another slew of caveats that require digging into NEWS.txt
> to see what the hell we're up to. :D
> - And all of our CI pain that ensues from the above
>
> In my opinion the above is overly complex and could use simplification. I
> also believe us re-litigating this on every release is a waste of time and
> energy that could better be spent elsewhere on the project or in life. It's
> also a signal about how confusing our release versioning has been for the
> community.
>
> Let's leave aside the decision about whether we scope releases based on
> time or based on features; let's keep this to the discussion about how we
> version our releases.
>
> So here's what I'm thinking: a new release strategy that doesn't use
> .MINOR of semver. Goals:
> - Simplify versioning for end users
> - Provide clearer contracts for users as to what they can expect in
> releases
> - Simplify support for us (CI, merges, etc)
> - Clarify our public API deprecation process
>
> Structure / heuristic:
> - Online upgrades are supported for all GA supported releases at time of
> new .MAJOR
> - T-1 releases are guaranteed API compatible
> - We use a deprecate-then-remove strategy for API breaking changes
>
> This would translate into the following for our upcoming releases
> (assuming we stick with 3 supported majors at any given time):
> 6.0:
> - 5.0, 4.1, 4.0 online upgrades are supported (grandfather window)
> - We drop support for 4.0
> - API compatibility is guaranteed w/5.0
> 7.0:
> - 6.0, 5.0, 4.1 online upgrades are supported (grandfather window)
> - We drop support for 4.1
> - API compatibility is guaranteed w/6.0
> 8.0:
> - 7.0, 6.0, 5.0 online upgrades are supported (fully on new paradigm)
> - We drop support for 5.0
> - API compatibility guaranteed w/7.0
>
> So: what do we think?
>

Re: [VOTE] Simplifying our release versioning process

2025-04-17 Thread Jeremiah Jordan

 +1

On Apr 17, 2025 at 10:58:24 AM, Josh McKenzie  wrote:

> [DISCUSS] thread:
> https://lists.apache.org/thread/jy6vodbkh64plhdfwqz3l3364gsmh2lq
>
> The proposed new versioning mechanism:
>
>1. We no longer use semver .MINOR
>2. Online upgrades are supported for all GA supported releases at time
>of new .MAJOR
>3. T-1 releases are guaranteed API compatible for non-deprecated
>features
>4. We use a deprecate-then-remove strategy for API breaking changes
>(deprecate in release N, then remove in N+1)
>
> This would translate into the following for our upcoming releases
> (assuming 3 supported majors at all times):
>
>- 6.0: 5.0, 4.1, 4.0 online upgrades are supported (grandfather
>window). We drop support for 4.0. API compatibility is guaranteed w/5.0
>- 7.0: 6.0, 5.0, 4.1 online upgrades are supported (grandfather
>window). We drop support for 4.1. API compatibility is guaranteed w/6.0
>- 8.0: 7.0, 6.0, 5.0 online upgrades are supported (fully on new
>paradigm). We drop support for 5.0. API compatibility guaranteed w/7.0
>
> David asked the question:
>
> Does this imply that each release is allowed to make breaking changes
> (assuming they followed the “correct” deprecation process)? My first
> instinct is to not like this
>
> Each release *would* be allowed to make breaking changes but only for
> features that have already been deprecated for one major release cycle.
>
> This is a process change so as per our governance:
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance,
> it'll require a super majority of 50% of the roll called PMC in favor.
> Current roll call is 21 so we need 11 pmc members to participate, 8 of
> which are in favor of the change.
>
> I'll plan to leave the vote open until we hit enough participation to pass
> or fail it up to probably a couple weeks.
>

Re: [VOTE] Simplifying our release versioning process

2025-04-23 Thread Jeremiah Jordan

> The JVM version also isn’t a feature to deprecate, technically.

I agree with this. I think the JVM version the server runs under and how we
cycle those is a separate discussion from feature deprecation.

There can and has been some overlap there that would need to be handled on
a case by case basis (when a new JVM removed something that we did not have
a good way to keep doing without it, talking about you scripting runtime
based UDFs), but in general I don’t think switching JVMs is the same as
feature removal/deprecation.

-Jeremiah


On Wed, Apr 23, 2025 at 4:48 PM Jordan West  wrote:

> I agree with Jon that I’m now a bit confused on part of what I voted for.
> It feels like there is more discussion to be had here. Or we need to split
> it into two votes if we want to make progress on the part where there is
> consensus and revisit where there is not.
>
> Regarding JVM version what I’ve mostly seen as reasons against forcing a
> JVM upgrade with a C* upgrade is risk tolerance. Folks bit by past upgrades
> have a tendency to want to limit as many variables as possible. From a
> technical perspective I’m not sure that’s justified tbh but having been one
> of the folks wanting to reduce variables and still getting bit by upgrades
> I understand it. The JVM version also isn’t a feature to deprecate,
> technically. And having made the decision once to hold off on upgrading the
> JVM and regretting it I too would like to see the project try to keep pace
> with JVM releases instead of being on older LTS or unsupported versions.
>
> Jordan
>
> On Wed, Apr 23, 2025 at 13:49 Jon Haddad  wrote:
>
>> >   If 5.0 supports 17, then 7.0 should too, if we are to say we support
>> 5.0 to 7.0 upgrades.
>>
>> I have to disagree with this.  I don't see a good reason have a tight
>> coupling of JVM versions to C* versions, and I also don't see a good reason
>> to overlap outside of CI.  Even on CI, the reasoning is a bit weak, Linux
>> distros have supported multiple JDK versions for at least a decade
>> (update-java-alternatives on Ubuntu and alternatives on RedHat).
>>
>> I've heard several folks explain their reasoning for overlap in JVM
>> versions, and it just doesn't resonate with me when weighed against the
>> downsides of being anchored to the limitations imposed by supporting old
>> JVM versions.
>>
>> I don't want this to come back and bite us later - so unless we're
>> exempting the JVM version from this upgrade requirement, I'm changing my
>> vote to  -1.
>>
>> Furthermore, really shouldn't be changing the terms of the thing we're
>> voting on mid-vote.  This feels really weird to me.  Anyone who cast a vote
>> previously may not be keeping up with the ML on a daily basis and it's not
>> fair to impose changes on them.  People should be aware of what they're
>> voting for and not be surprised when the VOTE is closed.
>>
>> Jon
>>
>>
>>
>> On Wed, Apr 23, 2025 at 1:04 PM Mick Semb Wever  wrote:
>>
>>>.
>>>

 This reads to me that Java 17 would need to be deprecated now, continue
 to be deprecated in 6.0 (at least one major in deprecated), then removed in
 7.0.

>>>
>>>
>>> This is technically true.  But I don't think we need to be explicitly
>>> deprecating jdk versions.  Users are generally aware of Java's LTS cycle,
>>> and we can document this separately.
>>>
>>> Where we are bound is that our upgrade tests require an overlapping
>>> common jdk.  So we can only test upgrades that support a common jdk.  And
>>> 🥁  IMHO, we should not be saying we recommend/support upgrades that we
>>> don't test (regardless if not having broken compatibility means we think
>>> untested upgrade paths would still work).   If 5.0 supports 17, then 7.0
>>> should too, if we are to say we support 5.0 to 7.0 upgrades.
>>>
>>>
>>

Re: [VOTE][IP CLEARANCE] easy-cass-stress

2025-04-30 Thread Jeremiah Jordan

+1

The code is prepared for donation at
> https://github.com/rustyrazorblade/easy-cass-stress
>
For anyone else trying to find it, the code that is prepped with proper
licenses etc for donation is in
https://github.com/rustyrazorblade/easy-cass-stress/tree/jwest/donation_prep
(the
branch from PR 41) not in main.

The code in the donation_prep branch looks good to me.
The collected CLAs look good.

On Apr 30, 2025 at 10:15:57 AM, Jordan West  wrote:

> (general@incubator cc'd)
>
> Please vote on the acceptance of the easy-cass-stress (to be renamed
> cassandra-stress) and its IP Clearance:
>
> https://incubator.apache.org/ip-clearance/cassandra-easy-cass-stress.html
>
> All consent from original authors of the donation, and tracking of
> collected CLAs, is found in
>
> https://github.com/rustyrazorblade/easy-cass-stress/pull/41/files and
> 
> https://delicate-tail-8c0.notion.site/easy-cass-stress-submission-141ac849cc9d80a4972cc8623aa54667
>
> These do not all require acknowledgement before the vote.
>
> The code is prepared for donation at
> https://github.com/rustyrazorblade/easy-cass-stress
>
> Once this vote passes we will request ASF Infra to move the
> rustyrazorblade/easy-cass-stress as-is to apache/cassandra-stress. The main
> branch and gh-pages branches, all tags, and all history, will be kept.  The
> main branch will continue to be named main.
>
> PMC members, please check carefully the IP Clearance requirements before
> voting.
>
> The vote will be open for 72 hours (or longer). Votes by PMC members
>
> are considered binding. A vote passes if there are at least three binding
> +1s and no -1's.
>
> Thanks,
>
> Jordan
>

Re: Python and Go callouts during ant compile/build task

2025-04-23 Thread Jeremiah Jordan

I think the default build should be to build and check everything.  I think
that if someone is new it is better to have everything built and checked by
default to flag issues.

If someone knows what they are doing and wants to speed up the process it
is very easy to add the right settings to the ant command so things are
faster.

-Jeremiah

On Wed, Apr 23, 2025 at 4:36 PM Jordan West  wrote:

> Should we consider making that the default and then passing false
> explicitly in CI/builds? I agree with Alex it’s a bit surprising and
> shorter build times when developing would be helpful.
>
> Jordan
>
> On Wed, Apr 23, 2025 at 13:37 Mick Semb Wever  wrote:
>
>> Python and Go are used by the gen-doc target.
>>
>> Code changes can break these, hence it is part of `ant check`.
>> It is not called by `ant jar`
>>
>> If you want to run check but skip it, it's to add
>> `-Dant.gen-doc.skip=true`
>>
>>
>>
>> On Wed, 23 Apr 2025 at 22:06, Alex Petrov  wrote:
>>
>>> Hi folks,
>>>
>>> Building Cassandra jar has been getting increasingly slow, and now it
>>> looks like we depend not only on python3 (which was already not optimal),
>>> but also on go:
>>>
>>> ant -Dno-checkstyle=true
>>>
>>> ...
>>>
>>>  [exec] python3 ./scripts/gen-nodetool-docs.py
>>>  [exec] python3 ./scripts/convert_yaml_to_adoc.py
>>> ../conf/cassandra.yaml
>>> ./modules/cassandra/pages/managing/configuration/cass_yaml_file.adoc
>>>  [exec] ./scripts/process-native-protocol-specs-in-docker.sh
>>>  [exec] Go env not found in your system, proceeding with
>>> installation.
>>>  [exec] Downloading Go 1.23.1...
>>>  [exec] Installing Go 1.23.1...
>>>  [exec] Building the cqlprotodoc...
>>>  [exec] Cloning into 'cassandra-website'...
>>>  [exec] Your branch is up to date with 'origin/trunk'.
>>>  [exec] go: downloading github.com/mvdan/xurls v1.1.0
>>>  [exec] Processing the .spec files...
>>>
>>> I personally consider this extremely dangerous, but also unnecessary. My
>>> current stance is that functionality introducing python3 and go should be
>>> moved to a separate task that only runs on demand / ci / release.  I
>>> welcome convincing arguments that would suggest otherwise.
>>>
>>> If you agree we should not require python and go to run `ant
>>> -Dno-checkstyle=true`, please also write a short message, this will be very
>>> helpful as well.
>>>
>>> Thank you,
>>> --Alex
>>>
>>

Re: [DISCUSS] 5.1 should be 6.0

2025-04-10 Thread Jeremiah Jordan

+1 to 6.0

On Thu, Apr 10, 2025 at 1:38 PM Josh McKenzie  wrote:

> +1 to 6.0.
>
> On Thu, Apr 10, 2025, at 2:28 PM, Jon Haddad wrote:
>
> Bringing this back up.
>
> I don't think we have any reason to hold up renaming the version.  We can
> have a separate discussion about what upgrade paths are supported, but
> let's at least address this one issue of version number so we can have
> consistent messaging.  When i talk to people about the next release, I'd
> like to be consistent with what I call it, and have a unified voice as a
> project.
>
> Jon
>
> On Thu, Jan 30, 2025 at 1:41 AM Mick Semb Wever  wrote:
>
> .
>
>
> If you mean only 4.1 and 5.0 would be online upgrade targets, I would
> suggest we change that to T-3 so you encompass all “currently supported”
> releases at the time the new branch is GAed.
>
> I think that's better actually, yeah. I was originally thinking T-2 from
> the "what calendar time frame is reasonable" perspective, but saying "if
> you're on a currently supported branch you can upgrade to a release that
> comes out" makes clean intuitive sense. That'd mean:
>
> 6.0: 5.0, 4.1, 4.0 online upgrades supported. Drop support for 4.0. API
> compatible guaranteed w/5.0.
> 7.0: 6.0, 5.0, 4.1 online upgrades supported. Drop support for 4.1. API
> compatible guaranteed w/6.0.
> 8.0: 7.0, 6.0, 5.0 online upgrades supported. Drop support for 5.0. API
> compatible guaranteed w/7.0.
>
>
>
>
> I like this.
>
>

Re: [DISCUSS] auto-installing golang in `ant gen-doc` (CASSANDRA-19915)

2025-04-28 Thread Jeremiah Jordan

When I first read this thread I assumed the go download was using some
standard tool similar to mvnw or gradlew which download and cache a copy of
the respective tool on use, with md5sum checks and such to verify the
download before use.

But that doesn’t seem to be the case here, the script checks for arm vs
amd64, Linux vs Mac, and then fetches and untars the go distro into tmp.
There is no verification of the download.  The only check is if curl
returned non 0.

I did some google searches and while
there does exist a “gow” tool, it does not seem to be maintained and also
seems to not be in widespread use within the go community.

So while it would be nice to keep things such that someone just runs ant
and gets everything built, given this does not seem to be a standard method
of dealing with a go install in build scripts, I would suggest we stop
doing it.  It looks to be very simple to install  Go, so maybe switch to
telling someone how to install it if it is not found, as well as giving
them the setting to disable that artifact.

-Jeremiah

On Mon, Apr 28, 2025 at 1:49 PM Brandon Williams  wrote:

> On Mon, Apr 28, 2025 at 12:19 PM Jon Haddad 
> wrote:
> >
> > I strongly prefer we didn't install things on users systems they didn't
> ask for.
>
> I agree with you, and it also doesn't make sense to install go for one
> small thing, but depend on (not install) python for a lot more things.
> If we aren't willing to install python why are we installing go?
>
> Kind Regards,
> Brandon
>

1 2 >

1 - 100 of 118 matches

Mail list logo