Re: [DISCUSSION] If we fix code that used default encoding to now be UTF-8... is this a regression?

2022-11-29 Thread Derek Chen-Becker
As an initial step, could we introduce some sort of log warning,
metric or other indicator for operators to determine if they're
running with a non-UTF-8 encoding?

On Mon, Nov 28, 2022 at 1:21 PM David Capwell  wrote:
>
> It probably has to be done on a  case-by-case basis
>
>
> Yeah, this is what I feel as well…
>
> Does the linter provide more detail than just the list?
>
>
> Not really, it shows how to fix but can’t really say if the fix will cause 
> issues… If you are not running with UTF-8 we do the right thing most of the 
> time, but some files “may” break… this would also be true if you 
> backup/restore these files on a different environment...
>
>
> On Nov 10, 2022, at 12:44 PM, Derek Chen-Becker  wrote:
>
> This seems fraught with peril. I think that it should be fixed, but I
> also wonder what the testing requirements would be to validate no
> regression. It probably has to be done on a  case-by-case basis. Is it
> as simple as auditing places where we're calling getBytes or
> PrintReader/PrintWriter without an explicit encoding? Some of them,
> like 
> https://github.com/apache/cassandra/blob/30ad754d7e95501ffa916bf986e4cfda1aa5e441/src/java/org/apache/cassandra/tools/HashPassword.java#L128,
> look like that would be easy to address, but others seem like they
> could be complicated.
>
> Does the linter provide more detail than just the list?
>
> Cheers,
>
> Derek
>
> On Fri, Nov 4, 2022 at 2:09 PM David Capwell  wrote:
>
>
> Testing out linter trying to see if it can solve a case for Simulator and see 
> we have 25 cases where we don’t add the encoding and rely on default, which 
> is based off the system…
>
> If we attempt to fix these cases, I am wondering if this is a regression… it 
> “might” be the case someone set -Dfile.encoding=ascii or updated env LANG to 
> something non-UTF based…
>
> Here is the list reported
>
> org.apache.cassandra.cql3.functions.JavaBasedUDFunction since first 
> historized release
> org.apache.cassandra.db.ColumnFamilyStore since first historized release
> org.apache.cassandra.db.compaction.CompactionLogger$CompactionLogSerializer 
> since first historized release
> org.apache.cassandra.db.filter.RowFilter$CustomExpression since first 
> historized release
> org.apache.cassandra.db.lifecycle.LogTransaction since first historized 
> release
> org.apache.cassandra.gms.FailureDetector since first historized release
> org.apache.cassandra.index.sasi.analyzer.StandardTokenizerImpl since first 
> historized release
> org.apache.cassandra.io.sstable.SSTable since first historized release
> org.apache.cassandra.io.util.FileReader since first historized release
> org.apache.cassandra.io.util.FileReader since first historized release
> org.apache.cassandra.io.util.FileWriter since first historized release
> org.apache.cassandra.io.util.FileWriter since first historized release
> org.apache.cassandra.metrics.SamplingManager since first historized release
> org.apache.cassandra.metrics.SamplingManager since first historized release
> org.apache.cassandra.schema.IndexMetadata since first historized release
> org.apache.cassandra.security.PEMBasedSslContextFactory since first 
> historized release
> org.apache.cassandra.tools.HashPassword since first historized release
> org.apache.cassandra.tools.JMXTool$Dump$Format$3 since first historized 
> release
> org.apache.cassandra.tools.NodeTool$NodeToolCmd since first historized release
> org.apache.cassandra.tools.SSTableMetadataViewer since first historized 
> release
> org.apache.cassandra.transport.Client since first historized release
> org.apache.cassandra.utils.ByteArrayUtil since first historized release
> org.apache.cassandra.utils.FBUtilities since first historized release
> org.apache.cassandra.utils.GuidGenerator since first historized release
> org.apache.cassandra.utils.HeapUtils since first historized release
>
>
>
> --
> +---+
> | Derek Chen-Becker |
> | GPG Key available at https://keybase.io/dchenbecker and   |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---+
>
>


-- 
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+


Cassandra Summit CFP update

2022-11-29 Thread Patrick McFadin
*Hi everyone,An update on the current CFP process for Cassandra
Summit. There are currently 23 talk submissions which are far behind what
we need. Two days of tracks mean we need 60 approved talks. Ideally, we
need over 100 submitted to ensure we have a good pool of quality talks. We
already have quite a few vendor pitches that have nothing to do with
Cassandra. Think of it as like CFP
spam. https://events.linuxfoundation.org/cassandra-summit/program/cfp/
The
deadline is December 11th. That is 12 days! If you are assuming that will
get pushed out, don’t. We have a tight schedule before March 13th. Speakers
must be notified of talk acceptance by the beginning of January to book
travel in time. The full schedule will be published by mid-January. That
being said, I have talked to quite a few people that are working on a
submission. Thank you for being willing to create a talk! How can I help
you get it completed? Again, here is my Calendly link if you need to talk
it over:
https://calendly.com/patrick-mcfadin/15-minute-cassandra-summit-cfp-consult
This
is our conference! Let’s make it a festival of the database we love and the
things we build with it. One more thing. We need sponsors! If your employer
can, this is a great opportunity to get your brand out in front of people
building the future. I’ll be back. Go submit a talk. You’ll be happy you
did! Patrick*


Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-11-29 Thread Jon Haddad
So much awesome here.  Big +1 to having checkstyle be the source of truth. 

On 2022/11/24 17:10:28 Maxim Muzafarov wrote:
> Hello everyone,
> 
> 
> First of all, thank you all for this awesome project which I have
> often been inspired by. My name is Maxim Muzafarov I'm a Committer and
> PMC of Apache Ignite hence you most likely don't know me as I come
> from another part of the ASF. Perhaps, I did almost the same things
> with B-Trees, off-heap memory management, rebalancing, checkpointing,
> snapshotting, and IEPs (you are calling it CEPs) but on a slightly
> different distributed database architecture.
> 
> Nevertheless,
> 
> I was chasing down for my first issue to get experience with Cassandra
> and found a bunch of opened JIRAs related to the source code analysis
> (static analysis as well as the code style). These issues still appear
> in JIRA from time to time [1][2][3][4]. It seems to me there not
> enough attention has been paid to this topic and all possible options
> for this analysis and code style haven't been widely discussed before.
> I'd like to summarize everything that I have found and offer my skills
> and my experience for solving some of such issues we'll agree on.
> 
> 
> = Motivation =
> 
> The goal is to make small contributions easier and safer to apply with
> GitHub PRs for both a contributor and a committer by adding automation
> code checks for each new Cassandra contribution. This also will help
> to reduce the time required for reviewing and applying such PRs by an
> experienced developer.
> 
> As you may know, the infrastructure team has disabled public sign-up
> to ASF JIRA (the GitHub issues are recommended instead). Thus the
> following things become more important if we are still interested in
> attracting new contributions as it was discussed earlier [6].
> 
> I do not want to add extra steps to the existing workflow with code
> review or make GitHub pull requests as default for patches as it also
> was discussed already [7], just to improve the automation checks in it
> and make checks more convenient.
> 
> 
> = Proposed Solution =
> 
> == 1. Make the checkstyle config a single point of truth for the
> source code style. ==
> 
> The checkstyle is already used and included in the Cassandra project
> build lifecycle (ant command line, Jenkins, CircleCI). There is no
> need to maintain code style configurations for different types of IDEs
> (e.g. IntelliJ inspections configuration) since the checkstyle.xml
> file can be directly imported to IDE used by a developer. This is fair
> for Intellij Idea, NetBeans, and Eclipse.
> 
> So, I propose to focus on the checks themselves and checking pull
> requests with automation scripts, rather than maintaining these
> integrations. The benefits here are avoiding all issues with
> maintaining configurations for different IDEs. Another good advantage
> of this approach would be the ability to add new checkstyle rules
> without touching IDE configuration - and such tickets will be LFH and
> easy to commit.
> 
> The actions points here are:
> 
> - create an umbrella JIRA ticket for all checkstyle issues e.g. [8]
> (or label checkstyle);
> - add checkstyle to GitHub pull requests using GitHub actions (execute
> ant command);
> - include checkstyle to the build (already done);
> - remove redundant code style configurations related to IDEs from the
> source code e.g. [9];
> 
> 
> == 2. Add additional tooling for code analysis to the build and GitHub
> pull requests. ==
> 
> The source code static analysis and automated checks have been
> discussed recently in the "SpotBugs to the build" topic [10]. I'd like
> to add my 50 cents here.
> 
> Before discussing the pros and cons of each solution, let's focus on
> the criteria that such solutions must meet. You can find the most
> complete list of such tooling here [11].
> 
> From my point of view, the crucial criteria are:
> - free for open-source (at least licenses should allow such usages);
> - popularity in the ASF ecosystems;
> - convenient integration and/or plugins for IDEs and GitHub;
> - we must be able to integrate with CirleCI, and Jenkins as well as
> launch from a command line;
> 
> 
> === Sonar ===
> 
> pros
> + this tool is free for open-source and recommended by the ASF
> infrastructure team [12];
> + was already used for the Cassandra project some time ago at
> sonarcloud.io [13];
> + it has GitHub pull requests analysis [14];
> 
> cons
> - run locally requires additional configuration and TOKEN_ID due to
> the analysis results stored in the ext database (infra will not
> provide it for local usage);
> 
> === SpotBugs (FindBugs) ===
> 
> pros
> + license is allowed to use it and run it as a library (should be legal for 
> us);
> + it analyses the bytecode that differs from how the checkstyle works;
> + can be executed from the command line as well as integrated into the build;
> 
> cons
> - requires compiled source code;
> 
> === PMD ===
> 
> pros
> + BSD licenses more permiss

Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-11-29 Thread Patrick McFadin
I'm going to +1 what Stefan has said. I've heard on many occasions from
newcomers to the project that having to use Ant is a deterrent. As a matter
of fact, a few weeks ago, I spent a Sunday afternoon helping somebody
trying to build Cassandra and Ant caused a ton of problems. "Ok. ant really
super clean this time"

Sure it still works for people that have been doing this for years. I drive
a 20 year old Toyota truck, but I'm reminded by my kids often that it's not
cool. So in that spirit, I feel my saying we need to keep Ant is like
saying "You kids get off my lawn!" If it's something that will help attract
new contributors, I'm all for it.

Patrick

On Fri, Nov 25, 2022 at 2:22 AM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> I agree with what you wrote. How I understand it is that migrating to
> Maven/Gradle makes the project more "attractive" for newcomers. If a
> project is built on "that old un-cool Ant", it might be a little bit
> off-putting and questionable if we are "stuck in the past on build systems
> and not progressing".
>
> So in that sense I agree this is more "marketing" rather than
> technological question but on the other hand, does not Maven/Gradle allow
> us to modularize the project better? Maybe we would like to modularize but
> nobody is up to that because build system makes it impossible or at least
> quite inconvenient to do so. Do you really think there are not any
> significant benefits to switch even if it "just works" now?
>
> 
> From: Benedict 
> Sent: Friday, November 25, 2022 11:07
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> There’s always a handful of people asking for it, but notably few if any
> of the full time contributors doing the majority of the core development of
> Cassandra. It strikes me as something very appealing to others, but less so
> to those wanting to get on with development.
>
> I never really see a good argument articulated for the migration, besides
> general hand waving that ant is old, and people like newer build systems.
> Ant is working fine, so there isn’t a strong technical reason to replace
> it, and there are good organisational reasons not to.
>
> Why do you consider a migration inevitable?
>
>
>
> > On 25 Nov 2022, at 09:58, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
> >
> > Interesting take on Ant / no-Ant, Benedict. I am very curious how this
> unfolds. My long-term perception is that changing it to something else is
> more or less inevitable but if there is a broader consensus to not do that
>  well.
> >
> > 
> > From: Benedict 
> > Sent: Friday, November 25, 2022 10:52
> > To: dev@cassandra.apache.org
> > Subject: Re: [DISCUSSION] Cassandra's code style and source code analysis
> >
> > NetApp Security WARNING: This is an external email. Do not click links
> or open attachments unless you recognize the sender and know the content is
> safe.
> >
> >
> >
> >
> > I was in a bit of a rush last night. I should say that I’m of course +1
> a general endeavour to clean this up, and to expand our use of linters, and
> I appreciate your volunteering to help out in this way Maxim.
> >
> > However, responding to Stefan, I’m pretty -1 migrating from ant to
> another build system without really good reason. Migration has a real cost
> to productivity for all existing contributors, and the phantom of
> increasing new contributions has never paid off historically. I’m all for
> easing people into participation, but not at penalty to the existing
> contributor base.
> >
> > If the only reason is to make it easier to open in a different IDE, we
> can perhaps have some basic build files outlining code structure for
> importing, that are compatible with our canonical ant build? We could
> perhaps even generate them.
> >
> >
> >> On 25 Nov 2022, at 09:35, Miklosovic, Stefan <
> stefan.mikloso...@netapp.com> wrote:
> >>
> >> For the record, I was testing that same combo Claude mentioned and it
> did not work out of the box but it is definitely possible to set up
> successfully. I do not remember the details.
> >>
> >> To replay to Maxim, it all seems good to me, roughly, but I humbly
> think it all boils down to Maven/Gradle refactoring and on top of that we
> can do all else.
> >>
> >> For example, there is (1) where the solution, besides fixing the tests,
> is to introduce an Ant task which would check this on build. That being
> said, how is that going to look like when we change Ant for something else?
> That stuff suddenly becomes obsolete.
> >>
> >> This case maybe applies to other problems we want to solve as well. I
> do not want to do something tailored for one build system just to rewrite
> it all or to s

Re: [ANNOUNCE] Apache Cassandra 4.1.0 test artifact available

2022-11-29 Thread Mick Semb Wever
> The test build of Cassandra 4.1.0 is available.

Our requirements for 4.1-rc were one green CI run. And no regression
flakies (except the two we have waivers for).

Three consecutive jdk8 and jdk11 circleci runs can be found in these lists:

jdk8
https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/49

jdk11
 https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/47
 A bit messing here, but
1. 
https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/47/workflows/9d64c24f-12b3-4180-8279-8c2427c29bab
2. 
https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/47/workflows/80f7fa43-1b3d-42b0-b1c2-a335355d9dc7
3. 
https://app.circleci.com/pipelines/github/michaelsembwever/cassandra/47/workflows/d21f140f-0b63-43db-91b3-c46924b2bd49

And the ci-cassandra.a.o run, with 5 flakies,
https://ci-cassandra.apache.org/job/Cassandra-4.1/226/


> A vote of this test build will be initiated next Wednesday 30th. This ensures 
> one week passes from the RC announcement to the GA vote start.

I will be holding off from starting the vote until Monday the 5th.
This gives us more time for all the testing.