Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-29 Thread Chen-Becker, Derek
Does the same guidance apply to 3.x clusters? I read through the JIRA ticket 
linked below, along with tickets that it links to, but it's not clear that the 
new allocation algorithm is available in 3.x or if there are other reasons that 
this would be problematic.

Thanks,

Derek

On 1/29/20, 9:54 AM, "Jon Haddad"  wrote:

Ive put a lot of my previous clients on 4 tokens, all of which have
resulted in a major improvement.

I wouldn't use any more than 4 except under some pretty unusual
circumstances.

Jon

On Wed, Jan 29, 2020, 11:18 AM Ben Bromhead  wrote:

> +1 to reducing the number of tokens as low as possible for availability
> issues. 4 lgtm
>
> On Wed, Jan 29, 2020 at 1:14 AM Dinesh Joshi  wrote:
>
> > Thanks for restarting this discussion Jeremy. I personally think 4 is a
> > good number as a default. I think whatever we pick, we should have 
enough
> > documentation for operators to make sense of the new defaults in 4.0.
> >
> > Dinesh
> >
> > > On Jan 28, 2020, at 9:25 PM, Jeremy Hanna 
> > wrote:
> > >
> > > I wanted to start a discussion about the default for num_tokens that
> > we'd like for people starting in Cassandra 4.0.  This is for ticket
> > CASSANDRA-13701 
> > (which has been duplicated a number of times, most recently by me).
> > >
> > > TLDR, based on availability concerns, skew concerns, operational
> > concerns, and based on the fact that the new allocation algorithm can be
> > configured fairly simply now, this is a proposal to go with 4 as the new
> > default and the allocate_tokens_for_local_replication_factor set to 3.
> > That gives a good experience out of the box for people and is the most
> > conservative.  It does assume that racks and DCs have been configured
> > correctly.  We would, of course, go into some detail in the NEWS.txt.
> > >
> > > Joey Lynch and Josh Snyder did an extensive analysis of availability
> > concerns with high num_tokens/virtual nodes in their paper <
> >
> 
http://mail-archives.apache.org/mod_mbox/cassandra-dev/201804.mbox/%3CCALShVHcz5PixXFO_4bZZZNnKcrpph-=5QmCyb0M=w-mhdyl...@mail.gmail.com%3E
> >.
> > This worsens as clusters grow larger.  I won't quote the paper here but
> in
> > order to have a conservative default and with the accompanying new
> > allocation algorithm, I think it makes sense as a default.
> > >
> > > The difficulties have always been that virtual nodes have been
> > beneficial for operations but that 256 is too high for the purposes of
> > repair and as Joey and Josh cover, for availability.  Going lower with
> the
> > original allocation algorithm has produced skew in allocation in its
> naive
> > distribution.  Enter CASSANDRA-7032 <
> > https://issues.apache.org/jira/browse/CASSANDRA-7032> and the new token
> > allocation algorithm.  CASSANDRA-15260 <
> > https://issues.apache.org/jira/browse/CASSANDRA-15260> makes the new
> > algorithm operationally simpler.
> > >
> > > One other item of note - since Joey and Josh's analysis, there have
> been
> > improvements in streaming and other considerations that can reduce the
> > probability of more than one node representing some token range being
> > unavailable, but it would still be good to be conservative.
> > >
> > > Please chime in with any concerns with having num_tokens=4 and
> > allocate_tokens_for_local_replication_factor=3 and the accompanying
> > rationale so we can improve the experience for all users.
> > >
> > > Other resources:
> > >
> >
> 
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
> > >
> >
> 
https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/config/configVnodes.html
> > >
> >
> 
https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
>
> Ben Bromhead
>
> Instaclustr | www.instaclustr.com | @instaclustr
>  | (650) 284 9692
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Documentation donation

2020-05-01 Thread Chen-Becker, Derek
From the peanut gallery, my main concern is less with the features of a 
given markup and more with ensuring that whatever markup/doc system is 
used stays mostly out of the way of people who want to contribute to the 
docs. I don't want to have to learn a whole publishing system just to be 
able to contribute, whereas minor differences in markup syntax seem 
reasonable. Whatever system ends up getting chosen, is there additional 
work that can be done to simplify work for writers? I've used all three 
(albeit not in-depth), so I'm willing to help.


Derek

On 5/1/20 11:08 AM, Jon Haddad wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



Apologies, I didn't mean to upset or insult you.  My intent was to
demonstrate that my opinion is based on experience and I wasn't suggesting
we switch tooling based on a whim.  I also wasn't trying to imply you
aren't knowledgeable about writing documentation.

Apart from this misunderstanding, I think we mostly agree.  I'm not looking
to perform a migration that removes functionality, and the features you've
listed are all important to keep.  Thanks for listing out the bits that are
more complex that I glossed over with my "We write basic text with links
and a menu" comment, that was clearly wrong and I appreciate the correction.

Regarding the functionality you listed:

* Note and warning are both useful features and come built into
asciidoctor.  I made use of them in the TLP documentation for tlp-cluster
[1]
* I believe the extlinks feature can be replicated easily using a macro.
Here's an example [2].
* The grammar is a bit more difficult and I don't think there's a drop in
replacement.  Writing a block processor [3] would be the right way to
approach it, I think.
* We'd probably want to set up a configuration for syntax highlighting via
highlight.js (or one of the other supported libs).  We can use the SQL one
[4] as a guide since it's going to be similar to what we need, and ideally
we would contribute it back for CQL support.

I agree with you that any migration would at a *minimum* need the above
functionality to be on par with what we already have.  A POC in a branch
displaying a handful of pages (that work with the site theme, etc), one of
which is a converted DDL page [5] would suffice, I think, to assess whether
or not it's worth the effort.

No matter which direction we end up going, we still need to get
documentation improvements in for 4.0, so it's probably worth focusing on
that now, and convert to adoc later.  I'm happy to get on a call soon with
folks who are new to the project documentation to answer any questions you
all may have.  I'm also available to review patches to the docs, just set
me as the reviewer and ping me on Slack.  I try to get to them within 24h.

Jon

[1] http://thelastpickle.com/tlp-cluster/#_setup
[2] https://markhneedham.com/blog/2018/02/19/asciidoctor-creating-macro/
[3]
https://github.com/asciidoctor/asciidoctorj/blob/v2.1.0/docs/integrator-guide.adoc#blockprocessor
[4]
https://github.com/highlightjs/highlight.js/blob/master/src/languages/sql.js
[5] https://cassandra.apache.org/doc/latest/cql/ddl.html

On Thu, Apr 30, 2020 at 2:21 PM Sylvain Lebresne  wrote:


As I mentioned, I really have nothing against asciidoc. In fact, I think
it's
great.

Let's just say that I think rst/sphinx is also pretty capable, and that I
consider
your characterization of the syntax difference (one "awful", the other "a
dream") a tad over-the-top. I do agree on the point on documentation
though,
the asciidoc one is better (not that I find the rst one completely
inadequate
either), and I reckon it's a good argument.

So to be clear, if someone makes the change to asciidoc and it's not
botched, I
won't personally stand in the way.

I'll note however that asking we analyze the pros and cons of a change
should not be seen as suspicious. And we should imo strive to justify any
change with objective arguments. One's experience certainly increases the
believability of one's arguments, but it doesn't dispense from presenting
arguments in the first place.

And I wish the substance of your previous email wasn't: I know, you don't,
and
the project don't have time to wait on you learning, so just trust me.


You're right about markdown being a little limited, but we're not really
using anything advanced in sphinx. We write basic text with links and a

menu.

Not really true of at least the CQL section. It makes somewhat extensive
use
of the 'productionlist::' feature. Which gives us decent formatting of the
CQL
grammar elements "for free", automatic cross-referencing within said
grammar
and easy cross-referencing to said grammar from the rest of the text. I
think
that's kind of nice? I could be wrong, but getting the same even with
asciidoc
is going to be much more manual, and definitively would with markdown.

We also use 'note::' an

Re: [VOTE] CEP-16 - Auth Plugin Support for CQLSH

2021-10-11 Thread Chen-Becker, Derek
+1 nb

On 10/11/21, 3:51 AM, "Stefan Miklosovic"  
wrote:

CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



Hi list,

based on the discussion thread about CEP-16 (1), I would like to have
a vote on that.

It seems to me CEP-16 is so straightforward there is more or less
nothing to discuss in more depth as the feedback it gathered was
mostly formal and nobody has had any objections so far having the
discussion thread open for such a long time.

The vote is open for 72 hours based on the guidelines, it needs at
least 3 binding +1's and no vetoes.

I am +1 on this.

Regards

(1) 
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-16%3A+Auth+Plugin+Support+for+CQLSH

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org




Re: Updating our Code Contribution/Style Guide

2022-05-13 Thread Chen-Becker, Derek
I have a couple of questions/comments (in no particular order):


  *   When we reference the "Sun Java coding conventions" can we have a 
canonical link to that so that people don't have to make an assumption or try 
and find the version we're talking about? Are we referring to the (now Oracle) 
version here, or something else? 
https://www.oracle.com/java/technologies/javase/codeconventions-contents.html
  *   I would recommend that we strengthen the recommendation for using enums 
for Boolean properties for any type that is used in method parameters. In my 
experience the improvement in readability at the call site outweighs the 
(modest, IMHO) cost of introducing a new enum, and the enum also provides a 
useful "handle" for providing documentation on the semantics of the flag. There 
are already a lot of Boolean parameters in use in the codebase and I can take a 
look at what it would take to clean these up
  *   I like the section on Method clarity, but I would also call out 
non-trivial predicate logic as a candidate for encapsulation in its own method
  *   Should we consider @NotNull/@Nullable or other annotations besides 
@Override?
  *   In the exception handling section should we discuss using the most 
applicable exception type for the handler? I.e. don't catch Exception or 
Throwable? This probably falls under the don't silently swallow or log 
exceptions paragraph
  *   The guidance on brace placement seems to contradict the Java coding 
conventions if we place the opening brace on a new line. Is that intentional or 
am I misreading the statement? Would it be clearer to link to a specific style 
as defined somewhere (e.g. 
https://en.wikipedia.org/wiki/Indentation_style#Variant:_Java)
  *   The doc doesn't seem to cover a recommendation for braces with 
single-line bodies of conditional/loop statements. In my own experience it 
makes it easier to read if we uniformly used braces everywhere, but it does 
look like there are quite a few places in the code where we have unbraced ifs

Overall the doc is well written and carefully considered, and I appreciate all 
of the work that went into it!

Cheers,

Derek

From: "bened...@apache.org" 
Reply-To: "dev@cassandra.apache.org" 
Date: Friday, May 13, 2022 at 6:41 AM
To: "dev@cassandra.apache.org" 
Subject: RE: [EXTERNAL]Updating our Code Contribution/Style Guide


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


It’s been a couple of months since I opened this discussion. I think I have 
integrated the feedback into the google doc. Are there any elements anyone 
wants to continue discussing, or things I have not fully addressed? I’ll take 
an absence of response as lazy consensus to commit the changes to the wiki.



From: bened...@apache.org 
Date: Monday, 14 March 2022 at 09:41
To: dev@cassandra.apache.org 
Subject: Updating our Code Contribution/Style Guide
Our style guide hasn’t been updated in about a decade, and I think it is 
overdue some improvements that address some shortcomings as well as modern 
facilities such as streams and lambdas.

Most of this was put together for an effort Dinesh started a few years ago, but 
has languished since, in part because the project has always seemed to have 
other priorities. I figure there’s never a good time to raise a contended 
topic, so here is my suggested update to contributor guidelines:

https://docs.google.com/document/d/1sjw0crb0clQin2tMgZLt_ob4hYfLJYaU4lRX722htTo

Many of these suggestions codify norms already widely employed, sometimes in 
spite of the style guide, but some likely remain contentious. Some potentially 
contentious things to draw your attention to:


  *   Deemphasis of getX() nomenclature, in favour of richer set of prefixes 
and more succinct simple x() to retrieve where clear
  *   Avoid implementing methods, incl. equals(), hashCode() and toString(), 
unless actually used
  *   Modified new-line rules for multi-line function calls
  *   External dependency rules (require DISCUSS thread before introducing)





Re: [DISSCUSS] Access to JDK internals only after dev mailing list consensus?

2022-11-07 Thread Chen-Becker, Derek
There was one very minor grammatical nit that I commented on, but I think that 
otherwise this is clear and well-written 😊

Cheers,

Derek


From: Ekaterina Dimitrova 
Reply-To: "dev@cassandra.apache.org" 
Date: Friday, November 4, 2022 at 3:51 PM
To: "dev@cassandra.apache.org" 
Subject: RE: [EXTERNAL][DISSCUSS] Access to JDK internals only after dev 
mailing list consensus?


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


👋

I finally got the chance to put down a proposal for a section at the end of the 
Cassandra Code Style document.
Please help a fellow non-native speaker and definitely not a writer with some 
constructive criticism. :-)
My proposal is in this commit -
https://github.com/ekaterinadimitrova2/cassandra-website/commit/4a9edc7e88fd9fc2c6aa8a84290b75b02cac03bf

I noticed the Dependencies section suggested in the past by Benedict was 
missing, even if we had a consensus around that. I added it back from the 
original doc.

Best regards ,
Ekaterina

On Fri, 9 Sep 2022 at 13:34, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Hi everyone,
Seems to me that people will be fine with heads up on the mailing list and 
details on tickets?
Plus update of the Code Style to add a point around that, as suggested.

I will leave this thread open a few more days and if there are no objections I 
will continue with documenting it.

Have a great weekend everyone!

On Fri, 2 Sep 2022 at 14:01, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Git and jira , nothing specific

On Fri, 2 Sep 2022 at 12:51, Derek Chen-Becker 
mailto:de...@chen-becker.org>> wrote:
I think it's fine to state it explicitly rather than making it an assumption. 
Are we tracking any usage of internals in the codebase currently?

Cheers,

Derek

On Fri, Sep 2, 2022 at 6:30 AM Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:


“ A quick heads up to the dev list with the jira would be sufficient for 
anybody interested in discussing it further to comment on the jira.”

Agreed, I did’t mean voting but more or less we have the lazy consensus or 
sharing concerns. Discussing them on a ticket should be enough but it needs to 
happen. Also, it shouldn’t be  more often than adding dependencies I guess.

JDK team is only closing more and more internals and warning us about potential 
breakages. I want to prevent us from urgent fixing in patch releases and to 
ease the maintenance.

I think ensuring that it is clearly documented why an exception is acceptable 
and what options were considered will be of benefit for maintenance. We can 
revise in time what has changed.

“ . Unless absolutely needed we should avoid accessing the internals. Folks on 
this project should understand why. We can make the dangers of this explicit in 
our contributor documentation. ”
+1

On Fri, 2 Sep 2022 at 1:26, Dinesh Joshi 
mailto:djo...@apache.org>> wrote:
Personally not opposed to this. However, this is something that should be 
vetted closely by the reviewers. Unless absolutely needed we should avoid 
accessing the internals. Folks on this project should understand why. We can 
make the dangers of this explicit in our contributor documentation. However, 
requiring an entire dev list discussion around it seems unnecessary. A quick 
heads up to the dev list with the jira would be sufficient for anybody 
interested in discussing it further to comment on the jira. WDYT?
Dinesh


On Sep 1, 2022, at 8:31 AM, Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Hi everyone,

Some time ago we added a note to the project Cassandra Code Style:
“New dependencies should not be included without community consensus first 
being obtained via a [DISCUSS] thread on the 
dev@cassandra.apache.org mailing list”

I would like to suggest also to add a point around accessing JDK internals. Any 
 patch that suggests accessing internals and/or adding even more 
add-opens/add-exports to be approved prior commit on the mailing list.

It seems to me the project can only benefit of this visibility. If something is 
accepted as an exception, we need to have the right understanding and 
visibility of why; in some cases maybe to see for alternatives, to have follow 
up tickets opened, ownership taken. In my opinion this will be very helpful for 
maintaining the codebase.

If others agree with that I can add a sentence to the Code Style. Please let me 
know what you think.

Best regards,
Ekaterina




--
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+



Re: A proposal for refactoring the CircleCI config

2022-11-15 Thread Chen-Becker, Derek
It seemed like a lot of other changes were happening around the CircleCI 
config, so I was holding off on the parameterization. I would be happy to work 
with Claude for the changes if that’s already in progress, though.

Cheers,

Derek

From: "Claude Warren, Jr via dev" 
Reply-To: "dev@cassandra.apache.org" , "Claude 
Warren, Jr" 
Date: Friday, November 11, 2022 at 1:06 AM
To: "dev@cassandra.apache.org" 
Subject: RE: [EXTERNAL]A proposal for refactoring the CircleCI config


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


I have been working on 
https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-18012 which 
modifies the generate.sh script for the circleci configurations.  Perhaps all 
of this should be rolled into one change?

On Fri, Nov 11, 2022 at 3:47 AM Ekaterina Dimitrova 
mailto:e.dimitr...@gmail.com>> wrote:
Hey Derek,
Thanks for looking into this.
As we spoke in Slack, probably an easy way to show people how things will look 
like is to have a prototype with some minimal config. Could be even not 
Cassandra one but something that will show how things will look like and 
improve the current model.
Thanks,
Ekaterina

On Wed, 2 Nov 2022 at 17:08, David Capwell 
mailto:dcapw...@apple.com>> wrote:
Here is the ticket I was talking about 
https://issues.apache.org/jira/browse/CASSANDRA-17600



On Nov 2, 2022, at 1:29 PM, Derek Chen-Becker 
mailto:de...@chen-becker.org>> wrote:

For the parallel param logic, sounds fine to me.  Not sure if this would also 
work for resource_type, but I still argue that xlarge isn’t needed in 90% of 
the cases its used… so fixing this may be better than param there…. So yes, I 
would be cool with this change if it basically removes the patching logic… I 
had another JIRA to have a python script rewrite the YAML, but this method may 
solve in a cleaner way.

Almost any part of a CircleCI definition can be replaced with a
parameter, so basically we want config-2_1.yml to be a template, and
we plug different values in as desired. Would you mind sending a link
to that JIRA so I can understand that use case?


About matrix jobs; I don’t know them in circle but have used in other places, 
this sounds good to me.  I would also enhance to argue that JVM is just 1 
config and we sadly have many more:

JVM: [8, 11, 17]
VNODE: [true, false]
CDC: [true, false]
COMPRESSION: [true, false]
MEMTABLE: [skiplist, shardedskiplist, trie]

My understanding is that we could parameterize all of these such that
we could use a matrix as long as all combinations are valid. Let me
get parameterization of basic configuration reviewed first, and then
we can take a look at how to matricize things.

Cheers,

Derek

--
+---+
| Derek Chen-Becker |
| GPG Key available at https://keybase.io/dchenbecker and   |
| https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
| Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---+