Re: Cassandra Java Driver and DataStax

2016-06-08 Thread Edward Capriolo
What a fun topic. I re-joined the list just for this.

As I understand, it the nature of the Apache Software Licence any corporate
entity is allows to produce open and closed source software based on Apache
Cassandra, however the Cassandra name is a trademark of the ASF foundation.

As I under it, any corporation or person is free to maintain any
documentation about the software in a public or private form.

IMHO the Apache Cassandra wiki is in a sad state, and Corporate site X has
better material, but that is not an indictment of  Corporation X.

I will leave planetcassandra.org to be its own issue.

If someone were to propose a Java/Python driver to be included in the
source code of Cassandra, and said driver were rejected that would be a
clear red flag.

There are several awkward things about the driver being found at somewhere
else. These are all hypothetical but have practical implications.
Following the 'itch to scratch' philosophy perhaps I want to write the
driver in UDP for max performance. Right now even if it were implemented in
the database you have a situation where the driver living over there
ultimately is a VETO, you really can not accomplish one without there other
and they have to move lock step to do reasonable development.

There is a saying in apache something like "if it did not happen on the
list/in jira it did not happen." We have to ask ourselves honestly:

Q: Is it possible that technical writers "over there" are able to come up
with better documentation than the project itself?

A: Yes I wrote the Apache Hive book, and I believe it was more up to date
and complete than the documentation at the time

Q: Is that happening here? Who is to say?

Q: Is the CQL spec "written" in code or in documentation good enough for
someone to reasonable re-create the protocol?

Paraphrased things said on this thread that make me laugh, cry, nod:

"There are plenty of drivers like Kundera, hector"

These projects have been killed off by people as they are unable to keep up
with ever changing cassandra client specs. Thrift 0.6 -> 07 breaking
changes, CQL and the entire deprecation of thrift and the original data
model the database was built around.

"Web server X does not come with a web browser"
This is an established protocol for 20+ years and reasonable clients
already exist. That is not building a new protocol and implementation that
is conforming to an exist one apply the Apache logic to google Spdy.

"Postres does it like X"
Someone else pointed it out, but this ain't postgres, and this ain't
mongohq. The Apache licence and the Apache way are different things.

"No one at company X commits my patches because I dont work there"
As the minority (non facebook) hive committer for years I can tell you,
"wink wink"


Re: #cassandra-dev IRC logging

2016-08-26 Thread Edward Capriolo
One thing to watch out for. The way apache-gossip is setup the PR's get
sent to the dev list. However the address is not part of the list so the
project owners get an email asking to approve/reject every PR and comment
on the PR.

This is ok because we have a small quite group but you probably do not want
that with the number of SCM changes in the cassandra project.

On Fri, Aug 26, 2016 at 3:05 PM, Jeff Jirsa 
wrote:

> +1 to both as well
>
> On 8/26/16, 11:59 AM, "Tyler Hobbs"  wrote:
>
> >+1 on doing this and using ASFBot in particular.
> >
> >On Fri, Aug 26, 2016 at 1:40 PM, Jason Brown 
> wrote:
> >
> >> @Dave ASFBot looks like a winner. If others are on board with this, I
> can
> >> work on getting it up and going.
> >>
> >> On Fri, Aug 26, 2016 at 11:27 AM, Dave Lester 
> >> wrote:
> >>
> >> > +1. Check out ASFBot for logging IRC, along with other
> integrations.[1]
> >> >
>
>


Re: #cassandra-dev IRC logging

2016-08-26 Thread Edward Capriolo
Yes. I did. My bad.

On Fri, Aug 26, 2016 at 4:07 PM, Jason Brown  wrote:

> Ed, did you mean this to post this to the other active thread today, the
> one about github pull requests? (just want to make sure I'm understanding
> correctly :) )
>
> On Fri, Aug 26, 2016 at 12:28 PM, Edward Capriolo 
> wrote:
>
> > One thing to watch out for. The way apache-gossip is setup the PR's get
> > sent to the dev list. However the address is not part of the list so the
> > project owners get an email asking to approve/reject every PR and comment
> > on the PR.
> >
> > This is ok because we have a small quite group but you probably do not
> want
> > that with the number of SCM changes in the cassandra project.
> >
> > On Fri, Aug 26, 2016 at 3:05 PM, Jeff Jirsa 
> > wrote:
> >
> > > +1 to both as well
> > >
> > > On 8/26/16, 11:59 AM, "Tyler Hobbs"  wrote:
> > >
> > > >+1 on doing this and using ASFBot in particular.
> > > >
> > > >On Fri, Aug 26, 2016 at 1:40 PM, Jason Brown 
> > > wrote:
> > > >
> > > >> @Dave ASFBot looks like a winner. If others are on board with this,
> I
> > > can
> > > >> work on getting it up and going.
> > > >>
> > > >> On Fri, Aug 26, 2016 at 11:27 AM, Dave Lester <
> dave_les...@apple.com>
> > > >> wrote:
> > > >>
> > > >> > +1. Check out ASFBot for logging IRC, along with other
> > > integrations.[1]
> > > >> >
> > >
> > >
> >
>


Re: Github pull requests

2016-08-29 Thread Edward Capriolo
>> I think it goes the other way around. When you push to ASF git with the
right commit message then the integration from that side closes the pull
request.

Yes. This is how apache-gossip is setup. Someone makes a JIRA and they
include a link to there branch and tell me they are done. We review

git checkout apache master
git pull otherperson jira-123
git push origin master

Ticket on github is "magically" closed.

On Mon, Aug 29, 2016 at 8:45 AM, J. D. Jordan 
wrote:

> I think it goes the other way around. When you push to ASF git with the
> right commit message then the integration from that side closes the pull
> request.
>
> > On Aug 28, 2016, at 11:48 PM, Jonathan Ellis  wrote:
> >
> > Don't we need something on the infra side to turn a merged pull request
> > into a commit to the ASF repo?
> >
> > On Sun, Aug 28, 2016 at 11:07 PM, Nate McCall 
> > wrote:
> >
> >>>
> >>>
> >>> Infra is exploring options for giving PMCs greater control over GitHub
> >>> config (including allowing GitHub to be the master with a golden copy
> >>> held at the ASF) but that is a work in progress.
> >> ^  Per Mark's comment, there is not anything we can really do past what
> >> Jake F. described with Thrift. We dealt with this with Usergrid back in
> >> incubation two years ago (Jake F. actually helped us get it all sorted
> at
> >> the time) when we were using https://github.com/usergrid/usergrid as
> the
> >> source:
> >> http://mail-archives.apache.org/mod_mbox/usergrid-dev/201405.mbox/%
> >> 3CCANyrgvdTVzZQD7w3C96LUHa=h7-h4qmu4h7ajsxoat0gd0f...@mail.gmail.com%3E
> >>
> >> Here is the Thrift guide again for reference:
> >> https://github.com/apache/thrift/blob/master/
> CONTRIBUTING.md#contributing-
> >> via-github-pull-requests
> >>
> >> JClouds also has a nice write up/how-to (we based Usergrid on this,
> >> initially):
> >> https://cwiki.apache.org/confluence/display/JCLOUDS/Git+workflow
> >>
> >> Maybe we just amend our 'how-to-commit' with similar details as the two
> >> references above?
> >> http://cassandra.apache.org/doc/latest/development/how_to_commit.html
> >>
> >> -Nate
> >>
> >> On Mon, Aug 29, 2016 at 10:44 AM, Nate McCall 
> >> wrote:
> >>
> >>>
>  Nate, since you have experience with this from Usergrid, can you
> figure
>  out
>  what we need to do to make this happen and follow up with infra?
> >>>
> >>> Yep - i'll look into this.
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>


Re: Proposal - 3.5.1

2016-09-15 Thread Edward Capriolo
Where did we come from?

We came from a place where we would say, "You probably do not want to run
2.0.X until it reaches 2.0.6"

One thing about Cassandra is we get into a situation where we can only go
forward. For example, when you update from version X to version Y, version
Y might start writing a new versions of sstables.

X - sstables-v1
Y - sstables-v2

This is very scary operations side because you can not bring the the system
back to running version X as Y data is unreadable.

Where are we at now?

We now seem to be in a place where you say "Problem in 3.5 (trunk at a
given day)?,  go to 3.9 (trunk at last tt- release) "

http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/

"To get there, we are investing significant effort in making trunk “always
releasable,” with the goal that each release, or at least each odd-numbered
bugfix release, should be usable in production. "

I support releasable trunk, but the qualifying statement "or at least each
odd number release" undoes the assertion of "always releasable". Not trying
to nit pick here. I realize it may be hard to get to the desired state of
releasable trunk in a short time.

Anecdotally I notice a lot of "movement" in class names/names of functions.
Generally, I can look at a stack trace of a piece of software and I can
bring up the line number in github and it is dead on, or fairly close to
the line of code. Recently I have tried this in versions fairly close
together and seen some drastic changes.

We know some things i personally do not like:
1) lack of stable-ish api's in the codebase
2) use of singletons rather than simple dependency injection (like even
constructor based injection)

IMHO these do not fit well with 'release often' and always produce 'high
quality release'.

I do not love the concept of 'bug fix release' I would not mind waiting
longer for a feature as long as I could have a high trust factor in in
working right the first time.

Take a feature like trickle_fs, By the description it sounds like a clear
optimization win. It is off by default. The description says "turn on for
ssd" but elsewhere in the configuration # disk_optimization_strategy: ssd.
Are we tuning for ssd by default or not?

By being false, it is not tested in wild, how is it covered and trusted
during tests, how many tests have it off vs on?

I think the concept that trickle_fs can be added as a feature, set false
and possibly gains real world coverage is not comforting to me. I do not
want to turn it on and get some weird issue because no one else is running
this. I would rather it be added on by default with extreme confidence or
not added at all.



On Thu, Sep 15, 2016 at 1:37 AM, Jonathan Haddad  wrote:

> In this particular case, I'd say adding a bug fix release for every version
> that's affected would be the right thing.  The issue is so easily
> reproducible and will likely result in massive data loss for anyone on 3.X
> WHERE X < 6 and uses the "date" type.
>
> This is how easy it is to reproduce:
>
> 1. Start Cassandra 3.5
> 2. create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> 3. use test;
> 4. create table fail (id int primary key, d date);
> 5. delete d from fail where id = 1;
> 6. Stop Cassandra
> 7. Start Cassandra
>
> You will get this, and startup will fail:
>
> ERROR 05:32:09 Exiting due to error while processing commit log during
> initialization.
> org.apache.cassandra.db.commitlog.CommitLogReplayer$
> CommitLogReplayException:
> Unexpected error deserializing mutation; saved to
> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4rgn/T/
> mutation6313332720566971713dat.
> This may be caused by replaying a mutation against a table with the same
> name but incompatible schema.  Exception follows:
> org.apache.cassandra.serializers.MarshalException: Expected 4 byte long
> for
> date (0)
>
> I mean.. come on.  It's an easy fix.  It cleanly merges against 3.5 (and
> probably the other releases) and requires very little investment from
> anyone.
>
>
> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa 
> wrote:
>
> > We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency fixes,
> but
> > we certainly didn’t/won’t go back and cut new releases from every branch
> > for every critical bug in future releases, so I think we need to draw the
> > line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), it seems like
> > you’ve got options (either stay on the tick and go up to 3.7, or bail
> down
> > to 3.0.x)
> >
> > Perhaps, though, this highlights the fact that tick/tock may not be the
> > best option long term. We’ve tried it for a year, perhaps we should
> instead
> > discuss whether or not it should continue, or if there’s another process
> > that gives us a better way to get useful patches into versions people are
> > willing to run in production.
> >
> >
> >
> > On 9/14/16, 8:55 PM, "Jonathan Haddad"  wrote:
> >
> > >Common sense is what prevents someone from upgrading to yet another
> > >co

Re: Proposal - 3.5.1

2016-09-15 Thread Edward Capriolo
It is funny you say this:

"tick-tock started based off of the 3.0 big bang “we broke everything”
release"

*"Brain battles itself over short-term rewards, long-term goals"*
https://www.princeton.edu/pr/news/04/q4/1014-brain.htm

*Normalization of deviance in software: how broken practices become
standard*
https://news.ycombinator.com/item?id=10811822

I had something really long written. I summarized to this thought. Huge
generalization coming:

Group 1 "I have 1GB of data on a 200GB disk, I am going to switch to level
DB and see what happens. YOLO DB!"

v.s.

Group 2 "I have 60GB data on a 200GB disk, If i switch to level DB I have
to do in a way that does not impact my current users, and a way that won't
fill my disks, and doing this in a controlled way might take days"

Users gravitate toward Group 2 as they move they become more risk adverse.
They are not going to want to upgrade more than twice a year. If they see
risk they will not upgrade at all. If Group 2 is not upgrading all the
"testers" become that of Group 1.

I think a new metric systems would be fun. In the readme.txt

TestAdded
T
DTestAdded
D
Feature
F
Fix
B
Ninja Fix
N
Refactor
R

Version 3.0
DDFFBBRDD

Version 3.1
FBBTTDD

Over time IF these did not gravitate toward FTD we know we are headed in
the wrong direction.











On Thu, Sep 15, 2016 at 2:57 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:

> Because tick-tock started based off of the 3.0 big bang “we broke
> everything” release I don’t think we can judge wether or not it is working
> until we are another 6 months in.  AKA when we would have been releasing
> the next big bang release.  Right now a lot if not most of the bugs in a
> given tick tock release are bugs that were introduced in 3.0.  Even the bug
> mentioned here, it is not a tick tock bug, it is a 3.0 bug.
>
>
> > On Sep 15, 2016, at 1:48 PM, Jake Luciani  wrote:
> >
> > I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to
> > change.
> >
> > The problem for me is going back to the old way doesn't sound great.
> There
> > are parts of tick-tock I really like,
> > for example, the cadence and limited scope per release.
> >
> > I know at the summit there were a lot of ideas thrown around I can
> > regurgitate but perhaps people
> > who have been thinking about this would like to chime in and present
> ideas?
> >
> > -Jake
> >
> > On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith <
> bened...@apache.org
> >> wrote:
> >
> >> I agree tick-tock is a failure.  But for two reasons IMO:
> >>
> >> 1) Ultimately, the users are the real testers and it takes a while for a
> >> release to percolate into the wild for feedback.  The reality is that a
> >> release doesn't have its tires properly kicked for at least three months
> >> after it's cut.  So if we are to have any tocks, they should be
> completely
> >> unwed from the ticks, and should probably happen on a ~3M cadence to
> keep
> >> the labour down but the utility up (and there should probably still be
> more
> >> than one tock per tick)
> >>
> >> 2) Those promised resources to improved process never happened.  We
> haven't
> >> even reached parity with the 2.1 release until very recently, i.e. no
> >> failing u/dtests.
> >>
> >>
> >> On 15 September 2016 at 19:08, Jeff Jirsa 
> >> wrote:
> >>
> >>> I know we’ve got a lot of folks following the dev list without a lot of
> >>> background, so let’s make sure we get some context here so everyone can
> >> be
> >>> on the same page.
> >>>
> >>> Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and
> >> 3.3.1,
> >>> etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first
> before
> >>> the RE manpower is spent on backporting fixes, even critical fixes,
> >> because
> >>> 3.9 has multiple critical fixes for people running 3.7).
> >>>
> >>> Now some background:
> >>>
> >>> For many years, Cassandra used to have a dev process that kept 3 active
> >>> branches - “bleeding edge”, a “stable”, and an “old stable” branch,
> where
> >>> developers would be committing ALL new contributions to the bleeding
> >> edge,
> >>> non-api-breaking changes to stable, and bugfixes only to old stable.
> >> While
> >>> the api changed and major features were added, that bleeding edge would
> >>> just be ‘trunk’, and it’d get cut into a major version when it was
> ready
> >> to
> >>> ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 /
> 1.2,
> >>> and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released
> >> as
> >>> a major x.y.0, the third, oldest, most stable branch went EOL, and new
> >>> features would go into trunk for the next major version.
> >>>
> >>> There were two big negatives observed with this:
> >>>
> >>> The first big negative is that if multiple major new features were in
> >>> flight, releases were prone to delay. Nobody wants to break an API on a
> >>> x.y.1 release, and nobody wants to add a new feature to a x.y.

Re: Proposal - 3.5.1

2016-09-16 Thread Edward Capriolo
"The historical trend with the Cassandra codebase has been to test
minimally,
throw the code over the wall, and get feedback from people putting it in
prod who run into issues."

At the summit Brandon and a couple others were making fun over range
tombstones from thrift
https://issues.apache.org/jira/browse/CASSANDRA-5435

I added the thrift support based on code already in trunk. But there was
something ugly bit in there
and far on down the line someone else stuck with an edge case and had to
fix it. Now, I actually added a number
of tests, unit test, and nosetests. I am sure the range tombstones also had
their own set of tests at the storage level.

So as Brandon was making fun of me, I was thinking to myself, "Well I did
not make the bug, I just made it possible for others to find it! So I am
helping!"

The next time I submit a thrift patch I am going to write 5x the unit tests
jk :)

On Fri, Sep 16, 2016 at 11:18 AM, Jonathan Haddad  wrote:

> I've worked on a few projects where we've had a branch that new stuff went
> in before merging to master / trunk.  What you've described reminds me a
> lot of git-flow (http://nvie.com/posts/a-successful-git-branching-model/)
> although not quite the same.  I'll be verbose in this email to minimize the
> reader's assumptions.
>
> The goals of the release cycle should be (in descending order of priority):
>
> 1. Minimize bugs introduced through change
> 2. Allow the codebase to iterate quickly
> 3. Not get caught up in a ton of back porting bug fixes
>
> There is significant benefit to having a releasable trunk.  This is
> different from a trunk which is constantly released.  A releasable trunk
> simply means all tests should *always* pass and PMC & committers should
> feel confident that they could actually put it in prod for a project that
> actually matters.  Having it always be releasable (all tests pass, etc)
> means people can at least test the DB on sample data or evaluate it before
> the release happens, and get feedback to the team when there are bugs.
>
> This is a different mentality from having a "features" branch, where it's
> implied that at times it's acceptable that it not be stable.  The
> historical trend with the Cassandra codebase has been to test minimally,
> throw the code over the wall, and get feedback from people putting it in
> prod who run into issues.  In my experience I have found a general purpose
> "features" branch to result in poorly quality codebases.  It's shares a lot
> of the same problems as the 1+ year release cycle did previously, with
> things getting merged in and then an attempt to stabilize later.
>
> Improving the state of testing in trunk will catch more bugs, satisfying
> #1, which naturally leads to #2, and by reducing bugs before they get
> released #3 will happen over time.
>
> My suggestion for a *supported* feature release every 3 months (could just
> as well be 4 or 6) mixed with Benedict's idea of frequent non-supported
> releases (tagged as alpha).  Supported releases should get ~6 months worth
> of bug fixes, which if done right, will decrease over time due to a
> hopefully more stable codebase.  I 100% agree with Mick that semver makes
> sense here, it's not just for frameworks.  Major.Minor.Patch is well
> understood and is pretty standard throughout the world, I don't think we
> need to reinvent versioning.
>
> TL;DR:
> Release every 3 months
> Support for 6
> Keep a stable trunk
> New features get merged into trunk but the standard for code quality and
> testing needs to be property defined as something closer to "production
> ready" rather than "let the poor user figure it out"
>
> Jon
>
>
>
>
>
>
>
> On Fri, Sep 16, 2016 at 3:05 AM Sylvain Lebresne 
> wrote:
>
> > As probably pretty much everyone at this point, I agree the tick-tock
> > experiment
> > isn't working as well as it should and that it's probably worth course
> > correcting. I happen to have been thinking about this quite a bit already
> > as it
> > turns out so I'm going to share my reasoning and suggestion below, even
> > though
> > it's going to be pretty long, in the hope it can be useful (and if it
> > isn't, so
> > be it).
> >
> > My current thinking is that a good cycle should accommodate 2 main
> > constraints:
> >   1) be useful for users
> >   2) be realistic/limit friction on the development side
> > and let me develop what I mean by both points slightly first.
> >
> > I think users mostly want 2 things out of the release schedule: they
> want a
> > clearly labeled stable branch to know what they should run into
> production,
> > and
> > they want new features and improvements. Let me clarify that different
> > users
> > will want those 2 in different degrees and with variation over time, but
> I
> > believe it's mainly some combination of those. On the development side, I
> > don't
> > think it's realistic to expect more than 2/3 branches/series to be
> > supported at
> > any one time (not going to argue that, let's call it a pro

Re: Proposal - 3.5.1

2016-09-16 Thread Edward Capriolo
If you all have never seen the movie "grandma's boy" I suggest it.

https://www.youtube.com/watch?v=uJLQ5DHmw-U

There is one funny seen where the product/project person says something
like, "The game is ready. We have fixed ALL THE BUGS". The people who made
the movie probably think the coders doing dance dance revolution is funny.
To me the funniest part of the movie is the summary statement that "all the
bugs are fixed".

I agree with Sylvain, that cutting branches really has nothing to do with
"quality". Quality like "production ready" is hard to define.

I am phrasing this next part as questions to encourage deep thought not to
be a jerk.

Someone jokingly said said 3.0 was the "break everything" release. What if
4.0 was the "fix everything" release?
What would that mean?
What would we need?
No new features for 6 months?
A vast network of amazon machines to test things?
Jepsen ++?
24 hour integration tests that run CAS operations across a multi-node mixed
version cluster while we chaos monkey nodes?
Could we keep busy for 6 months just looking at the code and fix all the
bugs for Mr. Cheezle?
Could we fix ALL THE BUGS and then from that day it is just feature,
feature, feature?
We sit there and join and unjoin nodes for 2 days while running stress and
at the end use the map reduce export and prove that not a single datum was
lost?








On Fri, Sep 16, 2016 at 2:42 PM, Sylvain Lebresne 
wrote:

> On Fri, Sep 16, 2016 at 6:59 PM, Blake Eggleston 
> wrote:
>
> > Clearly, we won’t get to this point right away, but it should definitely
> > be a goal.
> >
>
> I'm not entirely clear on why anyone would read in what I'm saying that it
> shouldn't be a goal. I'm a huge proponent of this and of putting emphasis
> on quality in general, and because it's Friday night and I'm tired, I'm
> gonna add that I think I have a bigger track record of actually acting on
> improving quality for Cassandra than anyone else that is putting word in my
> mouth.
>
> Mainly, I'm suggesting that we don't have to tie the existence of a clearly
> labeled stable branch (useful to user, especially newcomers) to future
> improvement in the "releasability" of trunk in our design of a new release
> cycle. If we do so, but releasability don't improve as quickly as we'd
> hope, we penalize users in the end. Adopting a release cycle that ensure
> said clearly labeled stable branch does exist no matter the rate of
> improvement to the level of "trunk" releasibility is feels safer, and
> doesn't preclude any effort in improving said releasibilty, nor
> re-evaluating this in 1-2 year to move to release stable releases from
> trunk directly if we have proven we're there.
>
>
>
> >
> > On September 16, 2016 at 9:04:03 AM, Sylvain Lebresne (
> > sylv...@datastax.com) wrote:
> >
> > On Fri, Sep 16, 2016 at 5:18 PM, Jonathan Haddad 
> > wrote:
> >
> > >
> > > This is a different mentality from having a "features" branch, where
> it's
> > > implied that at times it's acceptable that it not be stable.
> >
> >
> > I absolutely never implied that, though I willingly admit my choice of
> > branch
> > names may be to blame. I 100% agree that no releases should be done
> > without a green test board moving forward and if something was implicit
> > in my 'feature' branch proposal, it was that.
> >
> > Where we might not be in the same page is that I just don't believe it's
> > reasonable to expect the project will get any time soon in a state where
> > even a green test board release (with new features) meets the "can be
> > confidently put into production". I'm not even sure it's reasonable to
> > expect from *any* software, and even less so for an open-source
> > project based on volunteering. Not saying it wouldn't be amazing, it
> > would, I just don't believe it's realistic. In a way, the reason why I
> > think
> > tick-tock doesn't work is *exactly* because it's based on that
> unrealistic
> > assumption.
> >
> > Of course, I suppose that's kind of my opinion. I'm sure some will think
> > that the "historical trend" of release instability is simply due to a
> lack
> > of
> > effort (obviously Cassandra developers don't give a shit about users,
> that
> > must the simplest explanation).
> >
>


Question on assert

2016-09-21 Thread Edward Capriolo
There are a variety of assert usages in the Cassandra. You can find several
tickets like mine.

https://issues.apache.org/jira/browse/CASSANDRA-12643

https://issues.apache.org/jira/browse/CASSANDRA-11537

Just to prove that I am not the only one who runs into these:

https://issues.apache.org/jira/browse/CASSANDRA-12484

To paraphrase another ticket that I read today and can not find,
"The problem is X throws Assertion which is not caught by the Exception
handler and it bubbles over and creates a thread death."

The jvm.properties file claims this:

# enable assertions.  disabling this in production will give a modest
# performance benefit (around 5%).
-ea

If assertions incur a "5% penalty" but are not always trapped what value do
they add?

These are common sentiments about how assert should be used: (not trying to
make this a this is what the internet says type debate)

http://stackoverflow.com/questions/2758224/what-does-the-java-assert-keyword-do-and-when-should-it-be-used

"Assertions
 (by
way of the *assert* keyword) were added in Java 1.4. They are used to
verify the correctness of an invariant in the code. They should never be
triggered in production code, and are indicative of a bug or misuse of a
code path. They can be activated at run-time by way of the -eaoption on the
java command, but are not turned on by default."

http://stackoverflow.com/questions/1957645/when-to-use-an-assertion-and-when-to-use-an-exception

"An assertion would stop the program from running, but an exception would
let the program continue running."

I look at how Cassandra uses assert and how it manifests in how the code
operates in production. Assert is something like semi-unchecked exception.
All types of internal Util classes might throw it, downstream code is
essentially unaware and rarely specifically handles it. They do not always
result in the hard death one would expect from an assert.

I know this is a ballpark type figure, but would "5% performance penalty"
be in the ballpark of a checked exception? Being that they tend to bubble
through things uncaught do they do more danger than good?


Re: Question on assert

2016-09-21 Thread Edward Capriolo
" potential 5% performance win when you've corrupted all their data."
This is somewhat of my point. Why do assertions that sometimes are trapped
"protect my data" better then a checked exception?

On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> I hate that comment with a passion. Please please please please do
> yourself a favor and *always* run with asserts on. `-ea` for life. In
> practice I'd be surprised if you actually got a reliable 5% performance win
> and I doubt your customers will care about a potential 5% performance win
> when you've corrupted all their data.
>
> best,
> kjellman
>
> > On Sep 21, 2016, at 10:21 AM, Edward Capriolo 
> wrote:
> >
> > There are a variety of assert usages in the Cassandra. You can find
> several
> > tickets like mine.
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-12643
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-11537
> >
> > Just to prove that I am not the only one who runs into these:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-12484
> >
> > To paraphrase another ticket that I read today and can not find,
> > "The problem is X throws Assertion which is not caught by the Exception
> > handler and it bubbles over and creates a thread death."
> >
> > The jvm.properties file claims this:
> >
> > # enable assertions.  disabling this in production will give a modest
> > # performance benefit (around 5%).
> > -ea
> >
> > If assertions incur a "5% penalty" but are not always trapped what value
> do
> > they add?
> >
> > These are common sentiments about how assert should be used: (not trying
> to
> > make this a this is what the internet says type debate)
> >
> > http://stackoverflow.com/questions/2758224/what-does-
> the-java-assert-keyword-do-and-when-should-it-be-used
> >
> > "Assertions
> > <http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.10>
> (by
> > way of the *assert* keyword) were added in Java 1.4. They are used to
> > verify the correctness of an invariant in the code. They should never be
> > triggered in production code, and are indicative of a bug or misuse of a
> > code path. They can be activated at run-time by way of the -eaoption on
> the
> > java command, but are not turned on by default."
> >
> > http://stackoverflow.com/questions/1957645/when-to-use-
> an-assertion-and-when-to-use-an-exception
> >
> > "An assertion would stop the program from running, but an exception would
> > let the program continue running."
> >
> > I look at how Cassandra uses assert and how it manifests in how the code
> > operates in production. Assert is something like semi-unchecked
> exception.
> > All types of internal Util classes might throw it, downstream code is
> > essentially unaware and rarely specifically handles it. They do not
> always
> > result in the hard death one would expect from an assert.
> >
> > I know this is a ballpark type figure, but would "5% performance penalty"
> > be in the ballpark of a checked exception? Being that they tend to bubble
> > through things uncaught do they do more danger than good?
>
>


Re: Question on assert

2016-09-21 Thread Edward Capriolo
You are essentially arguing, "if you turn off -ea your screwed" which is a
symptom of a larger problem that I am pointing out.

Forget the "5%" thing. I am having a discussion about use of assert.

You have:
1) checked exceptions
2) unchecked exceptions
3) Error (like ioError which we sometime have to track)

The common case for assert is to only be used in testing. This is why -ea
is off by default.

My point is that using assert as a Apache Cassandra specific "psuedo
exception" seems problematic. I can point at tickets in the Cassandra Jira
where the this is not trapped properly. It appears to me that having deal
with a 4th "pseudo exception" is code smell.

Sometimes you see assert in place of a bounds check or a null check that
you would never want to turn off. Other times it is uses as a quasi
IllegalStateException. Other times an class named "estimator" asserts when
the "estimate" "overflows". This seem far away from the defined purpose of
assert.

The glaring issue is that it bubbles through try catch so it hardly makes
me feel "safe" either on or off.










On Wed, Sep 21, 2016 at 1:34 PM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Asserts have their place as sanity checks. Just like exceptions have their
> place.
>
> They can both live in harmony and they both serve a purpose.
>
> What doesn't serve a purpose is that comment encouraging n00b users to get
> a mythical 5% performance increase and then get silent corruption when
> their disk/io goes sideways and the asserts might have caught things before
> it went really wrong.
>
> Sent from my iPhone
>
> On Sep 21, 2016, at 10:31 AM, Edward Capriolo  <mailto:edlinuxg...@gmail.com>> wrote:
>
> " potential 5% performance win when you've corrupted all their data."
> This is somewhat of my point. Why do assertions that sometimes are trapped
> "protect my data" better then a checked exception?
>
> On Wed, Sep 21, 2016 at 1:24 PM, Michael Kjellman <
> mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:
>
> I hate that comment with a passion. Please please please please do
> yourself a favor and *always* run with asserts on. `-ea` for life. In
> practice I'd be surprised if you actually got a reliable 5% performance win
> and I doubt your customers will care about a potential 5% performance win
> when you've corrupted all their data.
>
> best,
> kjellman
>
> On Sep 21, 2016, at 10:21 AM, Edward Capriolo  <mailto:edlinuxg...@gmail.com>>
> wrote:
>
> There are a variety of assert usages in the Cassandra. You can find
> several
> tickets like mine.
>
> https://issues.apache.org/jira/browse/CASSANDRA-12643
>
> https://issues.apache.org/jira/browse/CASSANDRA-11537
>
> Just to prove that I am not the only one who runs into these:
>
> https://issues.apache.org/jira/browse/CASSANDRA-12484
>
> To paraphrase another ticket that I read today and can not find,
> "The problem is X throws Assertion which is not caught by the Exception
> handler and it bubbles over and creates a thread death."
>
> The jvm.properties file claims this:
>
> # enable assertions.  disabling this in production will give a modest
> # performance benefit (around 5%).
> -ea
>
> If assertions incur a "5% penalty" but are not always trapped what value
> do
> they add?
>
> These are common sentiments about how assert should be used: (not trying
> to
> make this a this is what the internet says type debate)
>
> http://stackoverflow.com/questions/2758224/what-does-
> the-java-assert-keyword-do-and-when-should-it-be-used
>
> "Assertions
> <http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.10>
> (by
> way of the *assert* keyword) were added in Java 1.4. They are used to
> verify the correctness of an invariant in the code. They should never be
> triggered in production code, and are indicative of a bug or misuse of a
> code path. They can be activated at run-time by way of the -eaoption on
> the
> java command, but are not turned on by default."
>
> http://stackoverflow.com/questions/1957645/when-to-use-
> an-assertion-and-when-to-use-an-exception
>
> "An assertion would stop the program from running, but an exception would
> let the program continue running."
>
> I look at how Cassandra uses assert and how it manifests in how the code
> operates in production. Assert is something like semi-unchecked
> exception.
> All types of internal Util classes might throw it, downstream code is
> essentially unaware and rarely specifically handles it. They do not
> always
> result in the hard death one would expect from an assert.
>
> I know this is a ballpark type figure, but would "5% performance penalty"
> be in the ballpark of a checked exception? Being that they tend to bubble
> through things uncaught do they do more danger than good?
>
>
>


Re: Question on assert

2016-09-22 Thread Edward Capriolo
Yes obviously we do not need to go in and replace them all at once. Some
rough guidance/general consensus should be in place, because we are
violating the standard usage:

https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html

Do *not* use assertions for argument checking in public methods.
Do *not* use assertions to do any work that your application requires for
correct operation.

There should be a rational as to why and when this is right. Otherwise
changes like this might be considered bikeshedding.

In any case I created

https://issues.apache.org/jira/browse/CASSANDRA-12688

since I think we can all agree that can not run without them at the moment
and we do not want to give someone an incentive to set this off which I
feel the claim of 5% performance does.


On Thu, Sep 22, 2016 at 7:29 AM, Benjamin Lerer  wrote:

> I fully agree.
>
> On Thu, Sep 22, 2016 at 11:57 AM, Dave Brosius 
> wrote:
>
> > As an aside, C* for some reason heavily uses asserts in unit tests, which
> > adds to the "can't see the forest for the trees" problem. I see no reason
> > for that. they should all be moved over to junit asserts.
> >
> >
> >
> > On 09/22/2016 03:52 AM, Benjamin Lerer wrote:
> >
> >> We can spend hours arguing about assert vs exceptions. I have seen it
> >> happen in every company I worked for.
> >> Overall, based on the patches I have reviewed, it seems to me that in
> >> general people are using them only has internal safety checks.
> >> Unfortunatly, the code change and we can miss things.
> >> If anybody think that some SPECIFIC assertions should be replaced by
> some
> >> real checks, I think the best way to do it is to open a JIRA ticket to
> >> raise the problem.
> >>
> >>
> >
>


Re: [Discuss] Adding dtest to project

2016-09-23 Thread Edward Capriolo
I love DTest I think it is a great thing in the tool belt. One thing that I
want to point out, nosettests and dtests are black-box type testing. You
can not step or trace these things very easily.

My dream would be if cassandra was re-entrant and it was possible to run a
3 node cluster in one JVM and set a break point. I think you could prove
out many things much easier and faster.

On Thu, Sep 22, 2016 at 11:44 PM, Nate McCall  wrote:

> [Moved from PMC as there is nothing really private involved]
>
> DataStax has graciously offered to contribute cassandra-dtest [0] to
> the project.
>
> There were, however, two issues noted by Jonathan when he presented
> the offer to the PMC:
>
> 1. dtest mixes tests for many cassandra versions together in a single
> project.  So having it live in the main cassandra repo, versioned by
> cassandra release, doesn't really make sense.  Is Infra able to create a
> second repo for this, or is the "one project, one repo" mapping fixed?
>
> 2. DataStax did not require a CLA to contribute to dtest, so the non-DS
> contributors to dtest would need to be contacted for their permission to
> assign copyright to the ASF.  Is the PMC willing to tackle this?
>
> In a brief discussion, it was deduced that #1 can be addressed by
> adding apache/cassandra-dtest to the ASF repo (the example of
> apache/aurora and apache/aurora-packaging was given to justify this).
>
> #2 will be harder as it will require tracking a few people people down
> to sign ASF CLAs.
>
> Before we really put effort into this, I wanted to open the discussion
> up about whether we are willing to take on the development of this in
> the project. Thoughts?
>
> -Nate
>
>
> [0] https://github.com/riptano/cassandra-dtest
>


Re: Create table with ID - Question

2016-09-28 Thread Edward Capriolo
I have a similar set of problems. I will set the stage: in the past, for a
variety of reasons I had to create tables(column families) by time range
for an event processing system.

The man reason was expiring data (TTL) did not purge easily. It was easier
to simply truncate/drop old column families than two deal with different
evolving compaction strategies.

The main loop of my program looked like this:
public void writeThisStuff(List event ){
  MutationBatch mb;
  for (Event event : events){
mb.add(event)
  }
  maybeCreateNeededTables(mb)
  executeBatch(mb);
}

public void maybeCreateNeededTables(mb){
  Set columnFamilieToCreate =
  for (mutation : batch) {
 columnFamiliesToCreate.add(extractColumnFamilyFromMutation(mutation));
  }
  for (String cf: columnFamiliesToCreate){
 if ! hectorAstyanaxFlavoroftheweekclientDoesCfExist(cf)){
hectorAstyanaxFlavoroftheweekclienCreateCf(cf);
 }
  }
}

The size of the batches were in the 5-10K range. For a given batch the
number of target cfs was typically one, but at most two. That mean worst
case scenario 1 would need to be created. Effectively this meant 1 metadata
read before write. (You could cache the already existing columns as well).
One quick read is not a huge cost when you consider the savings of batching
5K roundtrips.

Even with this type of scenario you can run into a concurrent schema
problem. But you can add whatever gizmo to confirm schema agreement here:

  for (String cf: columnFamiliesToCreate){
*waitForSchemaToSettleGizmo()*
 if ! hectorAstyanaxFlavoroftheweekclientDoesCfExist(cf)){
*waitForSchemaToSettleGizmo()*
hectorAstyanaxFlavoroftheweekclienCreateCf(cf);
 }
  }

On Wed, Sep 28, 2016 at 12:01 PM, Aleksey Yeschenko 
wrote:

> No way to do that via Thrift I’m afraid, nor will there be one. Sorry.
>
> --
> AY
>
> On 28 September 2016 at 16:43:58, Roman Bielik (roman.bielik@
> openmindnetworks.com) wrote:
>
> Hi,
>
> in CQL it is possible to create a table with explicit ID: CREATE TABLE ...
> WITH ID='xyz'.
>
> Is something like this possible via Thrift interface?
> There is an int32 "id" field in CfDef, but it has no effect on the table
> ID.
>
> My problem is, that concurrent create table (add_column_family) requests
> for the same table name result in clash with somewhat unpredictable
> behavior.
>
> This problem was reported in:
> https://issues.apache.org/jira/browse/CASSANDRA-9933
>
> and seems to be related to changes from ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-5202
>
> A workaround for me could be using the same ID in create table, however I'm
> using Thrift interface only.
>
> Thank you.
> Regards,
> Roman
>
> --
>
> 
> 
>  
>


Re: Proprietary Replication Strategies: Cassandra Driver Support

2016-10-08 Thread Edward Capriolo
I have contemplated using LocalStrategy as a "do it yourself client side
sharding system".

On Sat, Oct 8, 2016 at 12:37 AM, Vladimir Yudovin 
wrote:

> Hi Prasenjit,
> I would like to get the replication factors of the key-spaces using the
> strategies in the same way we get the replication factors for Simple and
> NetworkTopology.
>  Actually LocalSarategy has no replication factor:
>
> SELECT * FROM system_schema.keyspaces WHERE keyspace_name IN ('system',
> 'system_schema');
>  keyspace_name | durable_writes | replication
> ---++---
> -
> system   | True | {'class':
> 'org.apache.cassandra.locator.LocalStrategy'}
>  system_schema | True | {'class':
> 'org.apache.cassandra.locator.LocalStrategy'}
>
>
> It's used for internal tables and not accessible to users:
>
> CREATE KEYSPACE excel WITH replication = {'class': 'LocalStrategy'};
> ConfigurationException: Unable to use given strategy class: LocalStrategy
> is reserved for internal use.
>
>
> Best regards, Vladimir Yudovin,
> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
> Launch your cluster in minutes.
>
>
>
>
>  On Fri, 07 Oct 2016 17:06:09 -0400 Prasenjit
> Sarkar wrote 
>
> Thanks Vlad and Jeremiah.
>
> There were questions about support, so let me address that in more detail.
>
> If I look at the latest Cassandra python driver, the support for
> LocalStrategy is very limited (code snippet shown below) and the support
> for EverywhereStrategy is non-existent. By limited I mean that the
> Cassandra python driver only provides the name of the strategy for
> LocalStrategy and not much else.
>
> What I would like (and happy to help) is for the Cassandra python driver to
> provide support for Local and Everywhere to the same extent it is provided
> for Simple and NetworkTopology. I understand that token aware routing is
> not applicable to either strategy but I would like to get the replication
> factors of the key-spaces using the strategies in the same way we get the
> replication factors for Simple and NetworkTopology.
>
> Hope this helps,
> Prasenjit
>
>
> class LocalStrategy(ReplicationStrategy):
> def __init__(self, options_map):
> pass
> def make_token_replica_map(self, token_to_host_owner, ring):
> return {}
> def export_for_schema(self):
> """
> Returns a string version of these replication options which are
> suitable for use in a CREATE KEYSPACE statement.
> """
> return "{'class': 'LocalStrategy'}"
> def __eq__(self, other):
> return isinstance(other, LocalStrategy)
>
> On Fri, Oct 7, 2016 at 11:56 AM, Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
> > What kind of support are you thinking of? All drivers should support
> them
> > already, drivers shouldn’t care about replication strategy except when
> > trying to do token aware routing.
> > But since anyone can make a custom replication strategy, drivers that
> do
> > token aware routing just need to handle falling back to not doing
> token
> > aware routing if a replication strategy they don’t know about is in
> use.
> > All the open sources drivers I know of do this, so they should all
> > “support” those strategies already.
> >
> > -Jeremiah
> >
> > > On Oct 7, 2016, at 1:02 PM, Prasenjit Sarkar &
> lt;prasenjit.sar...@datos.io>
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > To the best of my understanding that Datastax has proprietary
> replication
> > > strategies: Local and Everywhere which are not part of the open
> source
> > > Apache Cassandra project.
> > >
> > > Do we know of any plans in the open source Cassandra driver
> community to
> > > support these two replication strategies? Would Datastax have a
> licensing
> > > concern if the open source driver community supported these
> strategies?
> > I'm
> > > fairly new here and would like to understand the dynamics.
> > >
> > > Thanks,
> > > Prasenjit
> >
> >
>
>
>
>
>
>


Low hanging fruit crew

2016-10-18 Thread Edward Capriolo
I go through the Cassandra jira weekly and I notice a number of tickets
which appear to be clear issues or requests for simple metrics.

https://issues.apache.org/jira/browse/CASSANDRA-12626

https://issues.apache.org/jira/browse/CASSANDRA-12330

I also have a few jira issues (opinion) would be simple to triage and
merge. Getting things merged is the primary pathway to meritocracy in the
ASF.

Across some other ASF projects I have seen that when the number of small
patches becomes larger that bodies to review them. It can result and chick
and egg scenario where reviewers feel overburdened, but the pathway to
removing this burden is promoting contributors to committers.

My suggestion:
Assemble a low-hanging-fruit-crew. This crew would be armed general support
for small commits, logging, metrics, test coverage, things static analysis
reveals etc. They would have a reasonable goal like "get 2 lhf merged a
day/week/whatever". If the process is successful, in a few months there
would hopefully be 1-2 committers graduated who would naturally wish to
move into low hanging fruit duties.

Thoughts?


Re: Cleanup after yourselves please

2016-10-18 Thread Edward Capriolo
IMHO, while through the code. A vast majority of the //comments would be
better as /** */ comments in method declarations. Many believe that
excessive inline comments could indicate a code smell.


On Tue, Oct 18, 2016 at 1:21 PM, Josh McKenzie  wrote:

> >
> >  tests hastily and messly commented out line by line (*whyy?*)
>
>  Couldn't we use /* */ comments instead of every single line one by one?
>
>
> When Jake and I were mass porting unit tests for 8099, I know I used idea's
> shortcut (ctrl + /) to block comment out things that wouldn't compile while
> porting over other tests; multi-line comments break from other multi-line
> comments inside/between methods. Unfortunately attribution wasn't retained
> on merges so we don't know whether to blame Sylvain, Jake, or myself on the
> commented out tests that snuck through in the final patch. =/
>
> Not necessarily a good reason, but at least it is *a* reason.
>
> On Tue, Oct 18, 2016 at 12:04 PM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
>
> > Gotcha, I didn't know we were actually bringing them back from the dead!
> >
> > That being said, won't the unit tests need to be re-writtten (or at least
> > refactored) after your work? Couldn't we use /* */ comments instead of
> > every single line one by one? Given we use source control couldn't we
> > remove the dead code and get it from the revision history if we need it
> in
> > the future?
> >
> > > On Oct 18, 2016, at 8:18 AM, Oleksandr Petrov <
> > oleksandr.pet...@gmail.com> wrote:
> > >
> > > I'm currently working on actually making Super Columns work in CQL
> > context.
> > > Currently they do not really work[1].
> > >
> > > It's not a very small piece of work. It was in the pipeline for some
> > time,
> > > although there most likely were more important things that had to be
> > worked
> > > on. I understand your disappointment and am sorry you stumbled upon
> this.
> > > But for now you may just disregard the commented tests. My branch is
> > going
> > > to be ready for review soon.
> > >
> > > [1] https://issues.apache.org/jira/browse/CASSANDRA-12373
> > >
> > >
> > > On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
> > > mkjell...@internalcircle.com> wrote:
> > >
> > >> There was a bunch of tests hastily and messly commented out line by
> line
> > >> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
> > >> SuperColumns support post 8099.
> > >>
> > >> Could those responsible please cleanup after themselves? It's been a
> > while
> > >> since 8099 was committed in the first place and I don't see us adding
> > Super
> > >> Column support at this point and the unit tests surly will need to be
> > >> rewritten anyways.
> > >>
> > >> As my mother always said, pick your dirty wet towel in the hamper off
> > the
> > >> floor and put it in the hamper please
> > >>
> > >> best,
> > >> kjellman
> > >>
> > >> Sent from my iPhone
> > >
> > > --
> > > Alex Petrov
> >
> >
>


Re: Use of posix_fadvise

2016-10-18 Thread Edward Capriolo
I want to point out something:

https://issues.apache.org/jira/browse/CASSANDRA-6846

"I'm definitively -1 on putting any type of contract on the internals. They
are called internals for a reason, and if rewriting it all entirely
tomorrow is best for Cassandra, we should have the possibility to do so."

Not attempting to pick words apart here. I find this problematic. A large
problem is highlighted in this debate. Everything "works as is" or in this
case "not at all". It is hard to say "what it ever did" or know what it is
"supposed to do".  The black box testing and code coverage does not (IMHO)
do enough to document what layered internal api's do.

For example, in my recent experience. I was recently looking over
OutboundTcpConnection. There are a large number of cases where
"undroppable" verbs are dropped. There is defiantly no inline documentation
(and little/no direct testing) that leads me to believe a contract exists.

In the past (and this is just my bias opinion) I have had my attempts to
refactor some "works as is" code labeled "bikeshedding". It does make it
rather intimidating for someone to pick up an issue. The barrier for entry
to code "appears" to black box testing, while the barrier for for make it
better "appears" to be "get-off-my-lawn".

Technically: I think we could do better with clear components (aka APIs),
initialized using clear dependency injection, such that they can me unit
and mock tested such that they all directly do some provable thing. For
example: When does outbound TCPConnection drop droppable verbs? Is there a
test that shows this? If there is no test how can the goto statement be
refactored so that the code is more testable. ETC.









On Tuesday, October 18, 2016, Benedict Elliott Smith 
wrote:

> I'm not certain this is the best way to go about encouraging people to help
> you, or generally encourage participation in the project.  You have seemed
> to lash out at the project (and in this case me specifically) in a fairly
> antagonistic manner a multitude of times in just a couple of hours.
>
> Your original question, on zero, predates anything I know about.  JIRA is
> your best bet, and provides historical context that is never going to live
> in comments.  I did not imply that the comments were adequate, only that
> *this
> is where you should probably look to answer your question.  *Comment policy
> and norms have changed a lot throughout Cassandra's history, and you're
> asking about a time that predates the current level of maturity, but JIRA
> has always been (AFAIK) the main source of historical context.  I attempted
> to provide some links into this to save you from the "billion" (handful) of
> tickets.
>
> I don't have time for another flamewar, so I will leave out trying to
> assist you in future.
>
>
>
>
> On 18 October 2016 at 18:28, Michael Kjellman <
> mkjell...@internalcircle.com>
> wrote:
>
> > Sorry, No. Always document your assumptions. I shouldn't need to git
> blame
> > a thousand commits and read thru a billion tickets to maybe understand
> why
> > something was done. Clearly thru the conversations on this topic I've had
> > on IRC and the responses so far on this email thread it's not/still not
> > obvious.
> >
> > best,
> > kjellman
> >
> > On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith <
> bened...@apache.org
> > > wrote:
> >
> > This is what JIRA is for.
> >
> >
>


Re: Low hanging fruit crew

2016-10-19 Thread Edward Capriolo
Yes. The LHFC crew should always pay it forward. Not many of us have a
super computer to run all the tests, but for things that are out there
marked patch_available apply it to see that it applies clean, if it
includes a test run that test (and possibly some related ones in the
file/folder etc for quick coverage). A nice initial sweep is a good thing.

I have seen before a process which triggered and auto-build when the patch
was added to the ticket. This took a burden off the committers, by the time
someone got to the ticket they already knew if tests passed then it was
only a style and fine tuning review.

In this case if we have a good initial pass we can hopefully speed along
the process.

On Wed, Oct 19, 2016 at 4:18 AM, kurt Greaves  wrote:

> On 19 October 2016 at 05:30, Nate McCall  wrote:
>
> > if you are offering up resources for review and test coverage,
> > there is work out there. Most immediately in the department of reviews
> > for "Patch Available."
> >
>
> We can certainly put some minds to this. There are a few of us with a very
> good understanding of working Cassandra yet could use more exposure to the
> code base. We'll start getting out there and looking for things to review.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Low hanging fruit crew

2016-10-19 Thread Edward Capriolo
I realize that test passing a small tests and trivial reviews will not
catch all issues. I am  not attempting to trivialize the review process.

Both deep and shallow bugs exist. The deep bugs, I am not convinced that
even an expert looking at the contribution for N days can account for a
majority of them.

The shallow ones may appear shallow and may be deep, but given that a bug
already exists an attempt to fix it usually does not arrive at something
worse.

Many of these issues boil down to simple, seeemingly clear fixes. Some are
just basic metric addition. Many have no interaction for weeks.


On Wednesday, October 19, 2016, Jeff Jirsa 
wrote:

> Let’s not get too far in the theoretical weeds. The email thread really
> focused on low hanging tickets – tickets that need review, but definitely
> not 8099 level reviews:
>
> 1) There’s a lot of low hanging tickets that would benefit from outside
> contributors as their first-patch in Cassandra (like
> https://issues.apache.org/jira/browse/CASSANDRA-12541 ,
> https://issues.apache.org/jira/browse/CASSANDRA-12776 )
> 2) Some of these patches already exist and just need to be reviewed and
> eventually committed.
>
> Folks like Ed and Kurt have been really active in Jira lately, and there
> aren’t a ton of people currently reviewing/committing – that’s part of OSS
> life, but the less friction that exists getting those patches reviewed and
> committed, the more people will be willing to contribute in the future, and
> the better off the project will be.
>
>
> On 10/19/16, 9:14 AM, "Jeremy Hanna"  > wrote:
>
> >And just to be clear, I think everyone would welcome more testing for
> both regressions of new code correctness.  I think everyone would
> appreciate the time savings around more automation.  That should give more
> time for a thoughtful review - which is likely what new contributors really
> need to get familiar with different parts of the codebase.  These LHF
> reviews won’t be like the code reviews of the vnode or the 8099 ticket
> obviously, so it won’t be a huge burden.  But it has some very tangible
> benefits imo, as has been said.
> >
> >> On Oct 19, 2016, at 11:08 AM, Jonathan Ellis  > wrote:
> >>
> >> I specifically used the phrase "problems that the test would not" to
> show I
> >> am talking about more than mechanical correctness.  Even if the tests
> are
> >> perfect (and as Jeremiah points out, how will you know that without
> reading
> >> the code?), you can still pass tests with bad code.  And is expecting
> >> perfect tests really realistic for multithreaded code?
> >>
> >> But besides correctness, reviews should deal with
> >>
> >> 1. Efficiency.  Is there a quadratic loop that will blow up when the
> number
> >> of nodes in a cluster gets large?  Is there a linear amount of memory
> used
> >> that will cause problems when a partition gets large?  These are not
> >> theoretical problems.
> >>
> >> 2. Maintainability.  Is this the simplest way to accomplish your
> goals?  Is
> >> there a method in SSTableReader that would make your life easier if you
> >> refactored it a bit instead of reinventing it?  Does this reduce
> technical
> >> debt or add to it?
> >>
> >> 3. "Bus factor."  If only the author understands the code and all anyone
> >> else knows is that tests pass, the project will quickly be in bad shape.
> >> Review should ensure that at least one other person understand the code,
> >> what it does, and why, at a level that s/he could fix bugs later in it
> if
> >> necessary.
> >>
> >> On Wed, Oct 19, 2016 at 10:52 AM, Jonathan Haddad  > wrote:
> >>
> >>> Shouldn't the tests test the code for correctness?
> >>>
> >>> On Wed, Oct 19, 2016 at 8:34 AM Jonathan Ellis  > wrote:
> >>>
>  On Wed, Oct 19, 2016 at 8:27 AM, Benjamin Lerer <
>  benjamin.le...@datastax.com 
> > wrote:
> 
> > Having the test passing does not mean that a patch is fine. Which is
> >>> why
>  we
> > have a review check list.
> > I never put a patch available without having the tests passing but
> most
>  of
> > my patches never pass on the first try. We always make mistakes no
> >>> matter
> > how hard we try.
> > The reviewer job is to catch those mistakes by looking at the patch
> >>> from
> > another angle. Of course, sometime, both of them fail.
> >
> 
>  Agreed.  Review should not just be a "tests pass, +1" rubber stamp,
> but
>  actually checking the code for correctness.  The former is just
> process;
>  the latter actually catches problems that the tests would not.  (And
> this
>  is true even if the tests are much much better than ours.)
> 
>  --
>  Jonathan Ellis
>  co-founder, https://urldefense.proofpoint.com/v2/url?u=http-3A__www.
> datastax.com&d=DQIFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=
> yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=
> eXI0TPp0DM06kmTuJQRNcUX5zy_O_KhWcDKMA-jxww0&s=D28xk3JpCIOAQnCXJAVky0lJJv_
> mPnYwy4gKxLKsSqw&e=
>  @spy

Re: Low hanging fruit crew

2016-10-19 Thread Edward Capriolo
Also no one has said anything to the effect of 'we want to rubber stamp
reviews' so that ...evil reason. Many of us are coders by trade and
understand why that is bad.

On Wednesday, October 19, 2016, Edward Capriolo 
wrote:

> I realize that test passing a small tests and trivial reviews will not
> catch all issues. I am  not attempting to trivialize the review process.
>
> Both deep and shallow bugs exist. The deep bugs, I am not convinced that
> even an expert looking at the contribution for N days can account for a
> majority of them.
>
> The shallow ones may appear shallow and may be deep, but given that a bug
> already exists an attempt to fix it usually does not arrive at something
> worse.
>
> Many of these issues boil down to simple, seeemingly clear fixes. Some are
> just basic metric addition. Many have no interaction for weeks.
>
>
> On Wednesday, October 19, 2016, Jeff Jirsa  > wrote:
>
>> Let’s not get too far in the theoretical weeds. The email thread really
>> focused on low hanging tickets – tickets that need review, but definitely
>> not 8099 level reviews:
>>
>> 1) There’s a lot of low hanging tickets that would benefit from outside
>> contributors as their first-patch in Cassandra (like
>> https://issues.apache.org/jira/browse/CASSANDRA-12541 ,
>> https://issues.apache.org/jira/browse/CASSANDRA-12776 )
>> 2) Some of these patches already exist and just need to be reviewed and
>> eventually committed.
>>
>> Folks like Ed and Kurt have been really active in Jira lately, and there
>> aren’t a ton of people currently reviewing/committing – that’s part of OSS
>> life, but the less friction that exists getting those patches reviewed and
>> committed, the more people will be willing to contribute in the future, and
>> the better off the project will be.
>>
>>
>> On 10/19/16, 9:14 AM, "Jeremy Hanna"  wrote:
>>
>> >And just to be clear, I think everyone would welcome more testing for
>> both regressions of new code correctness.  I think everyone would
>> appreciate the time savings around more automation.  That should give more
>> time for a thoughtful review - which is likely what new contributors really
>> need to get familiar with different parts of the codebase.  These LHF
>> reviews won’t be like the code reviews of the vnode or the 8099 ticket
>> obviously, so it won’t be a huge burden.  But it has some very tangible
>> benefits imo, as has been said.
>> >
>> >> On Oct 19, 2016, at 11:08 AM, Jonathan Ellis 
>> wrote:
>> >>
>> >> I specifically used the phrase "problems that the test would not" to
>> show I
>> >> am talking about more than mechanical correctness.  Even if the tests
>> are
>> >> perfect (and as Jeremiah points out, how will you know that without
>> reading
>> >> the code?), you can still pass tests with bad code.  And is expecting
>> >> perfect tests really realistic for multithreaded code?
>> >>
>> >> But besides correctness, reviews should deal with
>> >>
>> >> 1. Efficiency.  Is there a quadratic loop that will blow up when the
>> number
>> >> of nodes in a cluster gets large?  Is there a linear amount of memory
>> used
>> >> that will cause problems when a partition gets large?  These are not
>> >> theoretical problems.
>> >>
>> >> 2. Maintainability.  Is this the simplest way to accomplish your
>> goals?  Is
>> >> there a method in SSTableReader that would make your life easier if you
>> >> refactored it a bit instead of reinventing it?  Does this reduce
>> technical
>> >> debt or add to it?
>> >>
>> >> 3. "Bus factor."  If only the author understands the code and all
>> anyone
>> >> else knows is that tests pass, the project will quickly be in bad
>> shape.
>> >> Review should ensure that at least one other person understand the
>> code,
>> >> what it does, and why, at a level that s/he could fix bugs later in it
>> if
>> >> necessary.
>> >>
>> >> On Wed, Oct 19, 2016 at 10:52 AM, Jonathan Haddad 
>> wrote:
>> >>
>> >>> Shouldn't the tests test the code for correctness?
>> >>>
>> >>> On Wed, Oct 19, 2016 at 8:34 AM Jonathan Ellis 
>> wrote:
>> >>>
>> >>>> On Wed, Oct 19, 2016 at 8:27 AM, Benjamin Lerer <
>> >>>> benjamin.le...@datastax.com
>> >>>>> wrote:
>

Re: Rough roadmap for 4.0

2016-11-04 Thread Edward Capriolo
I would like to propose features around seeds:
https://issues.apache.org/jira/browse/CASSANDRA-12627

I have other follow up issues like getting seeds from Amazon API, or from
JNDI/ DNS, etc.

I was hoping 12627 was an easy way to grease the wheels.


On Fri, Nov 4, 2016 at 8:39 AM, Jason Brown  wrote:

> Hey Nate,
>
> I'd like to add CASSANDRA-11559 (Enhance node representation) to the list,
> including the comment I made on the ticket (different storage ports for
> each node). For us, it's a great "would really like to have" as our group
> will probably need this in production within the next 1 year or less.
> However since it hasn't gotten much attention, I'm not sure if we should
> add it to the list of "must haves" for 4.0. I'm planning on bringing it up
> internally today, as well.
>
> Also, from the previous 4.0 email thread that Jonathan started back in July
> (
> https://mail-archives.apache.org/mod_mbox/cassandra-dev/
> 201607.mbox/%3CCALdd-zhW3qJ%3DOWida9nMXPj0JOsru7guOYh6-
> 7uTjqEBvacrgQ%40mail.gmail.com%3E
> )
>
> - CASSANDRA-5 (thrift removal) - ticket not mentioned explicitly in the
> email, but the notion of removing thrift was
> - CASSANDRA-10857 (Allow dropping COMPACT STORAGE flag)
>
> Thanks,
>
> -Jason
>
> On Thu, Nov 3, 2016 at 8:43 PM, sankalp kohli 
> wrote:
>
> > List looks really good. I will let you know if there is something else we
> > plan to add to this list.
> >
> > On Thu, Nov 3, 2016 at 7:47 PM, Nate McCall  wrote:
> >
> > > It was brought up recently at the PMC level that our goals as a
> > > project are not terribly clear.
> > >
> > > This is a pretty good point as outside of Jira 'Fix Version' labelling
> > > (which we actually suck less at compared to a lot of other ASF
> > > projects) this really isnt tracked anywhere outside of general tribal
> > > knowledge about who is working on what.
> > >
> > > I would like to see us change this for two reasons:
> > > - it's important we are clear with our community about where we are
> going
> > > - we need to start working more closely together
> > >
> > > To that end, i've put together a list (in no particular order) of the
> > > *major* features in which I know folks are interested, have patches
> > > coming, are awaiting design review, etc.:
> > >
> > > - CASSANDRA-9425 Immutable node-local schema
> > > - CASSANDRA-10699 Strongly consistent schema alterations
> > > - CASSANDRA-12229 NIO streaming
> > > - CASSANDRA-8457 NIO messaging
> > > - CASSANDRA-12345 Gossip 2.0
> > > - CASSANDRA-9754 Birch trees
> > >
> > > What did I miss? What else would folks like to see? Specifically, this
> > > should be "new stuff that could/will break things" given we are upping
> > > the major version.
> > >
> > > To be clear, it's not my intention to set this in stone and then beat
> > > people about the head with it. More to have it there to point it at a
> > > high level and foster better communication with our users from the
> > > perspective of an open source project.
> > >
> > > Please keep in mind that given everything else going on, I think it's
> > > a fantastic idea to keep this list small and spend some time focusing
> > > on stability particularly as we transition to a new release process.
> > >
> > > -Nate
> > >
> >
>


Re: Moderation

2016-11-04 Thread Edward Capriolo
Is the message in moderation because
1) it was sent by someone not registered with the list
2) some other reason (anti-spam etc)

If it is is case 1: Isn't the correct process to inform and encourage
someone list properly?
If it is case 2: Is there an expected ETA for list moderation events?
(probably not)

I see twitter mentioned. We know that sometimes news and sentiment in
social media move fast and cause reactions on incorrect/unvetted
information.


On Fri, Nov 4, 2016 at 11:58 AM, Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hmm. Not excessive but you have a situation where someone is tweeting
> thinking her message didn't go through and conversation is happening there
> when that same conversation could be had on list. If you are ok with that
> continuing to happen then great but I am not. Can someone please moderate
> the message through?
>
> Sent from my iPhone
>
> > On Nov 4, 2016, at 8:54 AM, Mark Thomas  wrote:
> >
> >> On 04/11/2016 15:47, Chris Mattmann wrote:
> >> Hi Folks,
> >>
> >> Kelly Sommers sent a message to dev@cassandra and I'm trying to figure
> out if it's in moderation.
> >>
> >> Can the moderators speak up?
> >
> > Using my infra karma, I checked the mail server. That message is waiting
> > for moderator approval. It has been in moderation for 12 hours which
> > doesn't strike me as at all excessive.
> >
> > Mark
> >
>


Re: DataStax role in Cassandra and the ASF

2016-11-04 Thread Edward Capriolo
On Thu, Nov 3, 2016 at 11:44 PM, Kelly Sommers 
wrote:

> I think the community needs some clarification about what's going on.
> There's a really concerning shift going on and the story about why is
> really blurry. I've heard all kinds of wild claims about what's going on.
>
> I've heard people say the ASF is pushing DataStax out because they don't
> like how much control they have over Cassandra. I've heard other people say
> DataStax and the ASF aren't getting along. I've heard one person who has
> pull with a friend in the ASF complained about a feature not getting
> considered (who also didn't go down the correct path of proposing) kicked
> and screamed and started the ball rolling for control change.
>
> I don't know what's going on, and I doubt the truth is in any of those, the
> truth is probably somewhere in between. As a former Cassandra MVP and
> builder of some of the larger Cassandra clusters in the last 3 years I'm
> concerned.
>
> I've been really happy with Jonathan and DataStax's role in the Cassandra
> community. I think they have done a great job at investing time and money
> towards the good interest in the project. I think it is unavoidable a
> single company bootstraps large projects like this into popularity. It's
> those companies investments who give the ability to grow diversity in later
> stages. The committer list in my opinion is the most diverse its ever been,
> hasn't it? Apple is a big player now.
>
> I don't think reducing DataStax's role for the sake of diversity is smart.
> You grow diversity by opening up new opportunities for others. Grow the
> committer list perhaps. Mentor new people to join that list. You don't kick
> someone to the curb and hope things improve. You add.
>
> I may be way off on what I'm seeing but there's not much to go by but
> gossip (ahaha :P) and some ASF meeting notes and DataStax blog posts.
>
> August 17th 2016 ASF changed the Apache Cassandra chair
> https://www.apache.org/foundation/records/minutes/
> 2016/board_minutes_2016_08_17.txt
>
> "The Board expressed continuing concern that the PMC was not acting
> independently and that one company had undue influence over the project."
>
> August 19th 2016 Jonothan Ellis steps down as chair
> http://www.datastax.com/2016/08/a-look-back-a-look-forward
>
> November 2nd 2016 DataStax moves committers to DSE from Cassandra.
> http://www.datastax.com/2016/11/serving-customers-serving-the-community
>
> I'm really concerned if indeed the ASF is trying to change control and
> diversity  of organizations by reducing DataStax's role. As I said earlier,
> I've been really happy at the direction DataStax and Jonathan has taken the
> project and I would much prefer see additional opportunities along side
> theirs grow instead of subtracting. The ultimate question that's really
> important is whether DataStax and Jonathan have been steering the project
> in the right direction. If the answer is yes, then is there really anything
> broken? Only if the answer is no should change happen, in my opinion.
>
> Can someone at the ASF please clarify what is going on? The ASF meeting
> notes are very concerning.
>
> Thank you for listening,
> Kelly Sommers
>

Kelly,

Thank you for taking the time to mention this. I want to react to this
statement:

"I've heard people say the ASF is pushing DataStax out because they don't
like how much control they have over Cassandra. I've heard other people say
DataStax and the ASF aren't getting along. I've heard one person who has
pull with a friend in the ASF complained about a feature not getting
considered (who also didn't go down the correct path of proposing) kicked
and screamed and started the ball rolling for control change."

There is an important saying in the ASF:
https://community.apache.org/newbiefaq.html

   - If it didn't happen on a mailing list, it didn't happen.

It is natural that communication happens outside of Jira. The rough aim of
this mandate is a conversation like that that happens by the water cooler
should be summarized and moved into a forum where it can be recorded and
discussed. There is a danger in repeating something anecdotal or 'things
you have heard'. If that party is being suppressed, that is an issue to
deal with. If a party is unwilling to speak for themselves publicly in the
ASF public forums that is on them. Retelling what others told us is
'gossip' as you put it.

"I think it is unavoidable a single company bootstraps large projects like
this into popularity"
"I don't think reducing DataStax's role for the sake of diversity is
smart."

Let me state my opinion as an open source ASF member that was never
directly payed to work on an open source project. I have proposed and seen
proposed by others ideas to several open source projects inside (ASF and
outside) which were rejected. Later (months maybe years later) the exact
idea or roughly the same idea is implemented by different person in a
slightly different form. There is a lot of grey area the

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-04 Thread Edward Capriolo
"There is also the issue of specialisation. Very few people can be trusted
with review of arbitrary
Cassandra patches. I can count them all on fingers of one hand."

I have to strongly disagree  here. The Cassandra issue tracker is over
12000 tickets. I do not think that cassandra has added 12000 "features"
since it's inception.  I reject this concept that only a hand full of
people are capable of reviewing and merging things. Firstly, if this
process was so insanely bullet proof we never had alternating tick-tock fix
releases. (Unless someone is going to argue we are still fixing zero day
bugs from the facebook code drop :). I in my spare time have passed over
code and found things.

I do not mean this to come off as offensive. There clearly are specialists
and they are well respected. When someone say things like:

"real reviews, not rubber-stamping a +1 formally"

I feel that is really standing up on a soap box. What would be the worst
thing that happens here? A "rubber stamp" review sneaks in and causes bug
12001. OMG! NO SOMEONE RUBBER STAMPED SOMETHING AND CREATED A BUG. THAT
NEVER HAPPENED BEFORE IN THE HISTORY OF THE PROJECT. THERE HAS NEVER BEEN A
UNTESTED FEATURE ADDED WHICH BROKE SOMETHING ELSE. ETC ETC.

Be real about this situation it. Just added sasi stuff has bugs.





On Fri, Nov 4, 2016 at 6:27 PM, Aleksey Yeschenko 
wrote:

> I’m sure that impactful, important, and/or potentially destabilising
> patches will continue getting reviewed
> by those engineers.
>
> As for patches that no organisation with a strong enough commercial
> interest cares about, they probably won’t.
> Engineering time is quite expensive, most employers are understaffed as it
> is, overloaded with deadlines and
> fires, so it’s hard to justify donating man hours to work that brings no
> value to your employer - be it Instagram,
> Apple, or DataStax.
>
> I don’t want to sound negative here, but I’d rather not fake optimism
> here, either. Expect that kind of patches
> to stay in unreviewed limbo for the most part.
>
> But significant work will still get reviewed and committed, keeping the
> project overall healthy. I wouldn’t worry much.
>
> --
> AY
>
> On 4 November 2016 at 22:13:42, Aleksey Yeschenko (alek...@apache.org)
> wrote:
>
> This has always been a concern. We’ve always had a backlog on unreviewed
> patches.
>
> Reviews (real reviews, not rubber-stamping a +1 formally) are real work,
> often taking as much work
> as creating the patch in question. And taking as much expertise (or more).
>
> It’s also not ‘fun’ and doesn’t lend itself to scratch-your-own-itch
> drive-by style contributions.
>
> In other words, not something people tend to volunteer for. Something done
> mostly by people
> paid to do the work, reviews assigned to them by their managers.
>
> There is also the issue of specialisation. Very few people can be trusted
> with review of arbitrary
> Cassandra patches. I can count them all on fingers of one hand. There are
> islands of expertise
> and people who can review certain subsystems, and most of them are
> employed by a certain one
> company. A few people at Apple, but with no real post-8099, 3.0 code
> experience at the moment.
>
> What I’m saying is that it’s insufficient to just have desire to volunteer
> - you also need the actual
> knowledge and skill to properly review non-trivial work, and for that we
> largely only have DataStax
> employed contributors, with a sprinkle of people at Apple, and that’s
> sadly about it.
>
> We tried to improve it by holding multiple bootcamps, at Summits, and
> privately within major companies,
> at non-trivial expense to the company, but those initiatives mostly failed
> so far :(
>
> This has always been a problem (lack of review bandwidth), and always will
> be. That said, I don’t expect it to get
> much worse than it is now.
>
> --
> AY
>
> On 4 November 2016 at 21:50:20, Nate McCall (zznat...@gmail.com) wrote:
>
> To be clear, getting the community more involved is a super hard,
> critically important problem to which I don't have a concrete answer
> other than I'm going to keep reaching out for opinions, ideas and
> involvement.
>
> Just like this.
>
> Please speak up here if you have ideas on how this could work.
>
> On Sat, Nov 5, 2016 at 10:38 AM, Nate McCall  wrote:
> > [Moved to a new thread because this topic is important by itself]
> >
> > There are some excellent points here - thanks for bringing this up.
> >
> >> What can inspiring developers contribute to 4.0
> >> that would move the project forward to it’s goals and would be very
> likely
> >> included in the final release?
> >
> > That is a hard question with regards to the tickets I listed. My goal
> > was to list the large, potentially breaking changes which would
> > necessitate a move from '3' to '4' major release. Unfortunately in
> > this context, those types of issues have a huge surface area that
> > requires experience with the code to review in a meaningful way.
> >
> > We are k

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-04 Thread Edward Capriolo
"I’m sure users running Cassandra in production would prefer actual proper
reviews to non-review +1s."

Again, you are implying that only you can do a proper job.

Lets be specific here: You and I are working on this one:

https://issues.apache.org/jira/browse/CASSANDRA-10825

Now, Ariel reported there was no/low code coverage. I went looking a the
code and found a problem.

If someone were to merge this: I would have more incentive to look for
other things, then I might find more bugs and improvements. If this process
keeps going, I would naturally get exposed to more of the code. Finally in
maybe (I don't know in 10 or 20 years) I could become one of these
specialists.

Lets peal this situation apart:

https://issues.apache.org/jira/browse/CASSANDRA-10825

"If you grep test/src and cassandra-dtest you will find that the string
OverloadedException doesn't appear anywhere."

Now let me flip this situation around:

"I'm sure the users running Cassandra in production would prefer proper
coding practice like writing unit and integration test to rubber stamp
merges"

When the shoe is on the other foot it does not feel so nice.










On Fri, Nov 4, 2016 at 7:08 PM, Aleksey Yeschenko 
wrote:

> Dunno. A sneaky correctness or data corruption bug. A performance
> regression. Or something that can take a node/cluster down.
>
> Of course no process is bullet-proof. The purpose of review is to minimise
> the odds of such a thing happening.
>
> I’m sure users running Cassandra in production would prefer actual proper
> reviews to non-review +1s.
>
> --
> AY
>
> On 4 November 2016 at 23:03:23, Edward Capriolo (edlinuxg...@gmail.com)
> wrote:
>
> I feel that is really standing up on a soap box. What would be the worst
> thing that happens here


Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-05 Thread Edward Capriolo
On Sat, Nov 5, 2016 at 9:19 AM, Benedict Elliott Smith 
wrote:

> Hi Ed,
>
> I would like to try and clear up what I perceive to be some
> misunderstandings.
>
> Aleksey is relating that for *complex* tickets there are desperately few
> people with the expertise necessary to review them.  In some cases it can
> amount to several weeks' work, possibly requiring multiple people, which is
> a huge investment.  EPaxos is an example where its complexity likely needs
> multiple highly qualified reviewers.
>
> Simpler tickets on the other hand languish due to poor incentives - they
> aren't sexy for volunteers, and aren't important for the corporately
> sponsored contributors, who also have finite resources.  Nobody *wants* to
> do them.
>
> This does contribute to an emergent lack of diversity in the pool of
> contributors, but it doesn't discount Aleksey's point.  We need to find a
> way forward that handles both of these concerns.
>
> Sponsored contributors have invested time into efforts to expand the
> committer pool before, though they have universally failed.  Efforts like
> the "low hanging fruit squad" seem like a good idea that might payoff, with
> the only risk being the cloud hanging over the project right now.  I think
> constructive engagement with potential sponsors is probably the way
> forward.
>
> (As an aside, the policy on test coverage was historically very poor
> indeed, but is I believe much stronger today - try not to judge current
> behaviours on those of the past)
>
>
> On 5 November 2016 at 00:05, Edward Capriolo 
> wrote:
>
> > "I’m sure users running Cassandra in production would prefer actual
> proper
> > reviews to non-review +1s."
> >
> > Again, you are implying that only you can do a proper job.
> >
> > Lets be specific here: You and I are working on this one:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10825
> >
> > Now, Ariel reported there was no/low code coverage. I went looking a the
> > code and found a problem.
> >
> > If someone were to merge this: I would have more incentive to look for
> > other things, then I might find more bugs and improvements. If this
> process
> > keeps going, I would naturally get exposed to more of the code. Finally
> in
> > maybe (I don't know in 10 or 20 years) I could become one of these
> > specialists.
> >
> > Lets peal this situation apart:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10825
> >
> > "If you grep test/src and cassandra-dtest you will find that the string
> > OverloadedException doesn't appear anywhere."
> >
> > Now let me flip this situation around:
> >
> > "I'm sure the users running Cassandra in production would prefer proper
> > coding practice like writing unit and integration test to rubber stamp
> > merges"
> >
> > When the shoe is on the other foot it does not feel so nice.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, Nov 4, 2016 at 7:08 PM, Aleksey Yeschenko 
> > wrote:
> >
> > > Dunno. A sneaky correctness or data corruption bug. A performance
> > > regression. Or something that can take a node/cluster down.
> > >
> > > Of course no process is bullet-proof. The purpose of review is to
> > minimise
> > > the odds of such a thing happening.
> > >
> > > I’m sure users running Cassandra in production would prefer actual
> proper
> > > reviews to non-review +1s.
> > >
> > > --
> > > AY
> > >
> > > On 4 November 2016 at 23:03:23, Edward Capriolo (edlinuxg...@gmail.com
> )
> > > wrote:
> > >
> > > I feel that is really standing up on a soap box. What would be the
> worst
> > > thing that happens here
> >
>

Benedict,

Well said. I think we both see a similar way forward.

"Sponsored contributors have invested time into efforts to expand the
committer pool before, though they have universally failed."

Lets talk about this. I am following a number of tickets. Take for example
this one.

https://issues.apache.org/jira/browse/CASSANDRA-12649

September 19th: User submits a patch along with a clear rational. (It is
right in the description of the ticket):

October 19th: (me) +1 (non binding) users with unpredictable batch sizes
tend to also have gc problems and this would aid in insight.

October 28th: Someone else: Would be nice to see this committed. We have
seen a lot of users mistakenly batch against multiple partitions.

Note: 3 people have agreed they s

Re: DataStax role in Cassandra and the ASF

2016-11-05 Thread Edward Capriolo
gt; blocking point for anyone coming from external sides to get involved into
> > project and help it growing. If someone changes requires moves in project
> > core or it’s public APIs that person will require support from project
> > members to get this done. If such help will not be given it any outside
> > change will be ever completed and noone will invest time in doing
> something
> > more than fixing typos or common programmer errors which we all do from
> > time to time. Despite of impersonal nature of communications in Internet
> we
> > still do have human interactions and we all have just one chance to make
> > first impression. If we made it wrong at beginning its hard to fix it
> later
> > on.
> > Some decisions made in past by project PMCs lead to situation that
> project
> > was forked and maintained outside ASF (ie. stratio cassandra which
> > eventually ended up as lucene indexes plugin over a year ago), some other
> > did hurt users running cassandra for long time (ie. discontinuation of
> > thrift). Especially second decission was seen by outsiders, who do not
> > desire billion writes per second, as marketing driven. This led to people
> > looking and finding alternatives using compatible interface which might
> be,
> > ironically, even faster (ie. scylladb).
> >
> > And since there was quote battle on twitter between Jim Jagielski and
> > Benedict, I can throw some in as well. Over conferences I attended and
> even
> > during consultancy services I got, I’ve spoken with some people having
> > records of DataStax in their resumes and even them told me "collaboration
> > with them [cassandra team] was hard". Now imagine how outsider will get
> any
> > chance to get any change done with such attitude shown even to own
> > colleagues? Must also note that Tinkerpop is getting better on this field
> > since it has much more generic nature.
> > I don’t think this whole topic is to say that you (meaning DataStax) made
> > wrong job, or you are doing wrong for project but about letting others
> join
> > forces with you to make Cassandra even better. Maybe there is not a lot
> of
> > people currently walking around but once you will welcome and help them
> > working with you on code base you may be sure that others will join
> making
> > your development efforts easier and shared across community.
> >
> > Kind regards,
> > Lukasz
> >
> > > Wiadomość napisana przez Edward Capriolo  w
> dniu
> > 4 lis 2016, o godz. 18:55:
> > >
> > > On Thu, Nov 3, 2016 at 11:44 PM, Kelly Sommers  >
> > > wrote:
> > >
> > >> I think the community needs some clarification about what's going on.
> > >> There's a really concerning shift going on and the story about why is
> > >> really blurry. I've heard all kinds of wild claims about what's going
> > on.
> > >>
> > >> I've heard people say the ASF is pushing DataStax out because they
> don't
> > >> like how much control they have over Cassandra. I've heard other
> people
> > say
> > >> DataStax and the ASF aren't getting along. I've heard one person who
> has
> > >> pull with a friend in the ASF complained about a feature not getting
> > >> considered (who also didn't go down the correct path of proposing)
> > kicked
> > >> and screamed and started the ball rolling for control change.
> > >>
> > >> I don't know what's going on, and I doubt the truth is in any of
> those,
> > the
> > >> truth is probably somewhere in between. As a former Cassandra MVP and
> > >> builder of some of the larger Cassandra clusters in the last 3 years
> I'm
> > >> concerned.
> > >>
> > >> I've been really happy with Jonathan and DataStax's role in the
> > Cassandra
> > >> community. I think they have done a great job at investing time and
> > money
> > >> towards the good interest in the project. I think it is unavoidable a
> > >> single company bootstraps large projects like this into popularity.
> It's
> > >> those companies investments who give the ability to grow diversity in
> > later
> > >> stages. The committer list in my opinion is the most diverse its ever
> > been,
> > >> hasn't it? Apple is a big player now.
> > >>
> > >> I don't think reducing DataStax's role for the sake of diversity is
> > smart.
> > >> Y

Re: Rough roadmap for 4.0

2016-11-18 Thread Edward Capriolo
These tickets claim to duplicate each other:

https://issues.apache.org/jira/browse/CASSANDRA-12674
https://issues.apache.org/jira/browse/CASSANDRA-12746

But one is marked fixed and the other is still open.

What is the status here?

On Thu, Nov 17, 2016 at 5:20 PM, DuyHai Doan  wrote:

> Be very careful, there is a serious bug about AND/OR semantics, not solved
> yet and not going to be solved any soon:
> https://issues.apache.org/jira/browse/CASSANDRA-12674
>
> On Thu, Nov 17, 2016 at 7:32 PM, Jeff Jirsa 
> wrote:
>
> >
> > We’ll be voting in the very near future on timing of major releases and
> > release strategy. 4.0 won’t happen until that vote takes place.
> >
> > But since you asked, I have ONE tick/tock (3.9) cluster being qualified
> > for production because it needs SASI.
> >
> > - Jeff
> >
> > On 11/17/16, 9:59 AM, "Jonathan Haddad"  wrote:
> >
> > >I think it might be worth considering adopting the release strategy
> before
> > >4.0 release.  Are any PMC members putting tick tock in prod? Does anyone
> > >even trust it?  What's the downside of changing the release cycle
> > >independently from 4.0?
> > >
> > >On Thu, Nov 17, 2016 at 9:03 AM Jason Brown 
> wrote:
> > >
> > >Jason,
> > >
> > >That's a separate topic, but we will have a different vote on how the
> > >branching/release strategy should be for the future.
> > >
> > >On Thursday, November 17, 2016, jason zhao yang <
> > zhaoyangsingap...@gmail.com
> > >>
> > >wrote:
> > >
> > >> Hi,
> > >>
> > >> Will we still use tick-tock release for 4.x and 4.0.x ?
> > >>
> > >> Stefan Podkowinski >于2016年11月16日周三
> > >> 下午4:52写道:
> > >>
> > >> > From my understanding, this will also effect EOL dates of other
> > >branches.
> > >> >
> > >> > "We will maintain the 2.2 stability series until 4.0 is released,
> and
> > >3.0
> > >> > for six months after that.".
> > >> >
> > >> >
> > >> > On Wed, Nov 16, 2016 at 5:34 AM, Nate McCall  > >> > wrote:
> > >> >
> > >> > > Agreed. As long as we have a goal I don't see why we have to
> adhere
> > to
> > >> > > arbitrary date for 4.0.
> > >> > >
> > >> > > On Nov 16, 2016 1:45 PM, "Aleksey Yeschenko" <
> alek...@datastax.com
> > >> >
> > >> > wrote:
> > >> > >
> > >> > > > I’ll comment on the broader issue, but right now I want to
> > elaborate
> > >> on
> > >> > > > 3.11/January/arbitrary cutoff date.
> > >> > > >
> > >> > > > Doesn’t matter what the original plan was. We should continue
> with
> > >> 3.X
> > >> > > > until all the 4.0 blockers have been
> > >> > > > committed - and there are quite a few of them remaining yet.
> > >> > > >
> > >> > > > So given all the holidays, and the tickets remaining, I’ll
> > >personally
> > >> > be
> > >> > > > surprised if 4.0 comes out before
> > >> > > > February/March and 3.13/3.14. Nor do I think it’s an issue.
> > >> > > >
> > >> > > > —
> > >> > > > AY
> > >> > > >
> > >> > > > On 16 November 2016 at 00:39:03, Mick Semb Wever (
> > >> > m...@thelastpickle.com 
> > >> > > )
> > >> > > > wrote:
> > >> > > >
> > >> > > > On 4 November 2016 at 13:47, Nate McCall  > >> > wrote:
> > >> > > >
> > >> > > > > Specifically, this should be "new stuff that could/will break
> > >> things"
> > >> > > > > given we are upping
> > >> > > > > the major version.
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > How does this co-ordinate with the tick-tock versioning¹ leading
> > up
> > >> to
> > >> > > the
> > >> > > > 4.0 release?
> > >> > > >
> > >> > > > To just stop tick-tock and then say yeehaa let's jam in all the
> > >> > breaking
> > >> > > > changes we really want seems to be throwing away some of the
> > learnt
> > >> > > wisdom,
> > >> > > > and not doing a very sane transition from tick-tock to
> > >> > > > features/testing/stable². I really hope all this is done in a
> way
> > >> that
> > >> > > > continues us down the path towards a stable-master.
> > >> > > >
> > >> > > > For example, are we fixing the release of 4.0 to November? or
> > >> > continuing
> > >> > > > tick-tocks until we complete the 4.0 roadmap? or starting the
> > >> > > > features/testing/stable branching approach with 3.11?
> > >> > > >
> > >> > > >
> > >> > > > Background:
> > >> > > > ¹) Sylvain wrote in an earlier thread titled "A Home for 4.0"
> > >> > > >
> > >> > > > > And as 4.0 was initially supposed to come after 3.11, which is
> > >> > coming,
> > >> > > > it's probably time to have a home for those tickets.
> > >> > > >
> > >> > > > ²) The new versioning scheme slated for 4.0, per the "Proposal -
> > >> 3.5.1"
> > >> > > > thread
> > >> > > >
> > >> > > > > three branch plan with “features”, “testing”, and “stable”
> > >starting
> > >> > > with
> > >> > > > 4.0?
> > >> > > >
> > >> > > >
> > >> > > > Mick
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
On Friday, November 18, 2016, Jeff Jirsa  wrote:

> We should assume that we’re ditching tick/tock. I’ll post a thread on
> 4.0-and-beyond here in a few minutes.
>
> The advantage of a prod release every 6 months is fewer incentive to push
> unfinished work into a release.
> The disadvantage of a prod release every 6 months is then we either have a
> very short lifespan per-release, or we have to maintain lots of active
> releases.
>
> 2.1 has been out for over 2 years, and a lot of people (including us) are
> running it in prod – if we have a release every 6 months, that means we’d
> be supporting 4+ releases at a time, just to keep parity with what we have
> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
> old branches.
>
>
> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf of
> Blake Eggleston" > wrote:
>
> >> While stability is important if we push back large "core" changes until
> later we're just setting ourselves up to face the same issues later on
> >
> >In theory, yes. In practice, when incomplete features are earmarked for a
> certain release, those features are often rushed out, and not always fully
> baked.
> >
> >In any case, I don’t think it makes sense to spend too much time planning
> what goes into 4.0, and what goes into the next major release with so many
> release strategy related decisions still up in the air. Are we going to
> ditch tick-tock? If so, what will it’s replacement look like? Specifically,
> when will the next “production” release happen? Without knowing that, it's
> hard to say if something should go in 4.0, or 4.5, or 5.0, or whatever.
> >
> >The reason I suggested a production release every 6 months is because (in
> my mind) it’s frequent enough that people won’t be tempted to rush features
> to hit a given release, but not so frequent that it’s not practical to
> support. It wouldn’t be the end of the world if some of these tickets
> didn’t make it into 4.0, because 4.5 would fine.
> >
> >On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com
> ) wrote:
> >
> >On 18 November 2016 at 18:25, Jason Brown  > wrote:
> >
> >> #11559 (enhanced node representation) - decided it's *not* something we
> >> need wrt #7544 storage port configurable per node, so we are punting on
> >>
> >
> >#12344 - Forward writes to replacement node with same address during
> replace
> >depends on #11559. To be honest I'd say #12344 is pretty important,
> >otherwise it makes it difficult to replace nodes without potentially
> >requiring client code/configuration changes. It would be nice to get
> #12344
> >in for 4.0. It's marked as an improvement but I'd consider it a bug and
> >thus think it could be included in a later minor release.
> >
> >Introducing all of these in a single release seems pretty risky. I think
> it
> >> would be safer to spread these out over a few 4.x releases (as they’re
> >> finished) and give them time to stabilize before including them in an
> LTS
> >> release. The downside would be having to maintain backwards
> compatibility
> >> across the 4.x versions, but that seems preferable to delaying the
> release
> >> of 4.0 to include these, and having another big bang release.
> >
> >
> >I don't think anyone expects 4.0.0 to be stable. It's a major version
> >change with lots of new features; in the production world people don't
> >normally move to a new major version until it has been out for quite some
> >time and several minor releases have passed. Really, most people are only
> >migrating to 3.0.x now. While stability is important if we push back large
> >"core" changes until later we're just setting ourselves up to face the
> same
> >issues later on. There should be enough uptake on the early releases of
> 4.0
> >from new users to help test and get it to a production-ready state.
> >
> >
> >Kurt Greaves
> >k...@instaclustr.com 
>
>
 I don't think anyone expects 4.0.0 to be stable

Someone previously described 3.0 as the "break everything release".

We know that many people are still 2.1 and 3.0. Cassandra will always be
maintaining 3 or 4 active branches and have adoption issues if releases are
not stable and usable.

Being that cassandra was 1.0 years ago I expect things to be stable. Half
working features , or added this broke that are not appealing to me.



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
This is especially relevant if people wish to focus on removing things.

For example, gossip 2.0 sounds great, but seems geared toward huge clusters
which is not likely a majority of users. For those with a 20 node cluster
are the indirect benefits woth it?

Also there seems to be a first push to remove things like compact storage
or thrift. Fine great. But what is the realistic update path for someone.
If the big players are running 2.1 and maintaining backports, the average
shop without a dedicated team is going to be stuck saying (great features
in 4.0 that improve performance, i would probably switch but its not stable
and we have that one compact storage cf and who knows what is going to
happen performance wise when)

We really need to lose this realease wont be stable for 6 minor versions
concept.

On Saturday, November 19, 2016, Edward Capriolo 
wrote:

>
>
> On Friday, November 18, 2016, Jeff Jirsa  > wrote:
>
>> We should assume that we’re ditching tick/tock. I’ll post a thread on
>> 4.0-and-beyond here in a few minutes.
>>
>> The advantage of a prod release every 6 months is fewer incentive to push
>> unfinished work into a release.
>> The disadvantage of a prod release every 6 months is then we either have
>> a very short lifespan per-release, or we have to maintain lots of active
>> releases.
>>
>> 2.1 has been out for over 2 years, and a lot of people (including us) are
>> running it in prod – if we have a release every 6 months, that means we’d
>> be supporting 4+ releases at a time, just to keep parity with what we have
>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+ year
>> old branches.
>>
>>
>> On 11/18/16, 3:10 PM, "beggles...@apple.com on behalf of Blake
>> Eggleston"  wrote:
>>
>> >> While stability is important if we push back large "core" changes
>> until later we're just setting ourselves up to face the same issues later on
>> >
>> >In theory, yes. In practice, when incomplete features are earmarked for
>> a certain release, those features are often rushed out, and not always
>> fully baked.
>> >
>> >In any case, I don’t think it makes sense to spend too much time
>> planning what goes into 4.0, and what goes into the next major release with
>> so many release strategy related decisions still up in the air. Are we
>> going to ditch tick-tock? If so, what will it’s replacement look like?
>> Specifically, when will the next “production” release happen? Without
>> knowing that, it's hard to say if something should go in 4.0, or 4.5, or
>> 5.0, or whatever.
>> >
>> >The reason I suggested a production release every 6 months is because
>> (in my mind) it’s frequent enough that people won’t be tempted to rush
>> features to hit a given release, but not so frequent that it’s not
>> practical to support. It wouldn’t be the end of the world if some of these
>> tickets didn’t make it into 4.0, because 4.5 would fine.
>> >
>> >On November 18, 2016 at 1:57:21 PM, kurt Greaves (k...@instaclustr.com)
>> wrote:
>> >
>> >On 18 November 2016 at 18:25, Jason Brown  wrote:
>> >
>> >> #11559 (enhanced node representation) - decided it's *not* something we
>> >> need wrt #7544 storage port configurable per node, so we are punting on
>> >>
>> >
>> >#12344 - Forward writes to replacement node with same address during
>> replace
>> >depends on #11559. To be honest I'd say #12344 is pretty important,
>> >otherwise it makes it difficult to replace nodes without potentially
>> >requiring client code/configuration changes. It would be nice to get
>> #12344
>> >in for 4.0. It's marked as an improvement but I'd consider it a bug and
>> >thus think it could be included in a later minor release.
>> >
>> >Introducing all of these in a single release seems pretty risky. I think
>> it
>> >> would be safer to spread these out over a few 4.x releases (as they’re
>> >> finished) and give them time to stabilize before including them in an
>> LTS
>> >> release. The downside would be having to maintain backwards
>> compatibility
>> >> across the 4.x versions, but that seems preferable to delaying the
>> release
>> >> of 4.0 to include these, and having another big bang release.
>> >
>> >
>> >I don't think anyone expects 4.0.0 to be stable. It's a major version
>> >change with lots of new features; in the production world people don't
>> >normally move to a new

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
It has nothing to do with my positivity. It is not only my sentiment many
people who operate cassandra will repeate the notion that they dont like
that releases are not stable for 6 minors.

There is this concept where people accept deviation from the norm.

Of course the test dont all pass.
Of course the releases wont be stable.

I swore multiple people voted down tick tock for stability and life cycle
reasons.

But hey dont expect any release to be usable, that would be unreasonable.



On Saturday, November 19, 2016, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Honest question: are you *ever* positive Ed?
>
> Maybe give it a shot once in a while. It will be good for your mental
> health.
>
>
> Sent from my iPhone
>
> > On Nov 19, 2016, at 11:50 AM, Edward Capriolo  > wrote:
> >
> > This is especially relevant if people wish to focus on removing things.
> >
> > For example, gossip 2.0 sounds great, but seems geared toward huge
> clusters
> > which is not likely a majority of users. For those with a 20 node cluster
> > are the indirect benefits woth it?
> >
> > Also there seems to be a first push to remove things like compact storage
> > or thrift. Fine great. But what is the realistic update path for someone.
> > If the big players are running 2.1 and maintaining backports, the average
> > shop without a dedicated team is going to be stuck saying (great features
> > in 4.0 that improve performance, i would probably switch but its not
> stable
> > and we have that one compact storage cf and who knows what is going to
> > happen performance wise when)
> >
> > We really need to lose this realease wont be stable for 6 minor versions
> > concept.
> >
> > On Saturday, November 19, 2016, Edward Capriolo  >
> > wrote:
> >
> >>
> >>
> >> On Friday, November 18, 2016, Jeff Jirsa  
> >>  >> ');>>
> wrote:
> >>
> >>> We should assume that we’re ditching tick/tock. I’ll post a thread on
> >>> 4.0-and-beyond here in a few minutes.
> >>>
> >>> The advantage of a prod release every 6 months is fewer incentive to
> push
> >>> unfinished work into a release.
> >>> The disadvantage of a prod release every 6 months is then we either
> have
> >>> a very short lifespan per-release, or we have to maintain lots of
> active
> >>> releases.
> >>>
> >>> 2.1 has been out for over 2 years, and a lot of people (including us)
> are
> >>> running it in prod – if we have a release every 6 months, that means
> we’d
> >>> be supporting 4+ releases at a time, just to keep parity with what we
> have
> >>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
> year
> >>> old branches.
> >>>
> >>>
> >>> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf
> of Blake
> >>> Eggleston" > wrote:
> >>>
> >>>>> While stability is important if we push back large "core" changes
> >>> until later we're just setting ourselves up to face the same issues
> later on
> >>>>
> >>>> In theory, yes. In practice, when incomplete features are earmarked
> for
> >>> a certain release, those features are often rushed out, and not always
> >>> fully baked.
> >>>>
> >>>> In any case, I don’t think it makes sense to spend too much time
> >>> planning what goes into 4.0, and what goes into the next major release
> with
> >>> so many release strategy related decisions still up in the air. Are we
> >>> going to ditch tick-tock? If so, what will it’s replacement look like?
> >>> Specifically, when will the next “production” release happen? Without
> >>> knowing that, it's hard to say if something should go in 4.0, or 4.5,
> or
> >>> 5.0, or whatever.
> >>>>
> >>>> The reason I suggested a production release every 6 months is because
> >>> (in my mind) it’s frequent enough that people won’t be tempted to rush
> >>> features to hit a given release, but not so frequent that it’s not
> >>> practical to support. It wouldn’t be the end of the world if some of
> these
> >>> tickets didn’t make it into 4.0, because 4.5 would fine.
> >>>>
> >>>> On November 18, 2016 at 1:57:21 PM, kurt Greaves (
> k...@instaclustr.com )
> >>> wrote:
> >>>>
> >>>>> On 18 November 2016 at 18:25, Jaso

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

2016-11-19 Thread Edward Capriolo
I would say start with a mindset like 'people will run this in production'
not like 'why would you expect this to work'.

Now how does this logic effect feature develement? Maybe use gossip 2.0 as
an example.

I will play my given debby downer role. I could imagine 1 or 2 dtests and
the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes with
twitter announce of the release let bugs trickle in.

One could also do something comprehensive like test on clusters of 2 to
1000 nodes. Test with jepsen to see what happens during partitions, inject
things like jvm pauses and account for behaivor. Log convergence times
after given events.

Take a stand and say look "we engineered and beat the crap out of this
feature. I deployed this release feature at my company and eat my dogfood.
You are not my crash test dummy."


On Saturday, November 19, 2016, Jeff Jirsa  wrote:

> Any proposal to solve the problem you describe?
>
> --
> Jeff Jirsa
>
>
> > On Nov 19, 2016, at 8:50 AM, Edward Capriolo  > wrote:
> >
> > This is especially relevant if people wish to focus on removing things.
> >
> > For example, gossip 2.0 sounds great, but seems geared toward huge
> clusters
> > which is not likely a majority of users. For those with a 20 node cluster
> > are the indirect benefits woth it?
> >
> > Also there seems to be a first push to remove things like compact storage
> > or thrift. Fine great. But what is the realistic update path for someone.
> > If the big players are running 2.1 and maintaining backports, the average
> > shop without a dedicated team is going to be stuck saying (great features
> > in 4.0 that improve performance, i would probably switch but its not
> stable
> > and we have that one compact storage cf and who knows what is going to
> > happen performance wise when)
> >
> > We really need to lose this realease wont be stable for 6 minor versions
> > concept.
> >
> > On Saturday, November 19, 2016, Edward Capriolo  >
> > wrote:
> >
> >>
> >>
> >> On Friday, November 18, 2016, Jeff Jirsa  
> >>  >> ');>>
> wrote:
> >>
> >>> We should assume that we’re ditching tick/tock. I’ll post a thread on
> >>> 4.0-and-beyond here in a few minutes.
> >>>
> >>> The advantage of a prod release every 6 months is fewer incentive to
> push
> >>> unfinished work into a release.
> >>> The disadvantage of a prod release every 6 months is then we either
> have
> >>> a very short lifespan per-release, or we have to maintain lots of
> active
> >>> releases.
> >>>
> >>> 2.1 has been out for over 2 years, and a lot of people (including us)
> are
> >>> running it in prod – if we have a release every 6 months, that means
> we’d
> >>> be supporting 4+ releases at a time, just to keep parity with what we
> have
> >>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
> year
> >>> old branches.
> >>>
> >>>
> >>> On 11/18/16, 3:10 PM, "beggles...@apple.com  on behalf
> of Blake
> >>> Eggleston" > wrote:
> >>>
> >>>>> While stability is important if we push back large "core" changes
> >>> until later we're just setting ourselves up to face the same issues
> later on
> >>>>
> >>>> In theory, yes. In practice, when incomplete features are earmarked
> for
> >>> a certain release, those features are often rushed out, and not always
> >>> fully baked.
> >>>>
> >>>> In any case, I don’t think it makes sense to spend too much time
> >>> planning what goes into 4.0, and what goes into the next major release
> with
> >>> so many release strategy related decisions still up in the air. Are we
> >>> going to ditch tick-tock? If so, what will it’s replacement look like?
> >>> Specifically, when will the next “production” release happen? Without
> >>> knowing that, it's hard to say if something should go in 4.0, or 4.5,
> or
> >>> 5.0, or whatever.
> >>>>
> >>>> The reason I suggested a production release every 6 months is because
> >>> (in my mind) it’s frequent enough that people won’t be tempted to rush
> >>> features to hit a given release, but not so frequent that it’s not
> >>> practical to support. It wouldn’t be the end of the world if some of
> these
> >>> tickets didn’t make it into 4.0, because 4.

Re: Failed Dtest will block cutting releases

2016-12-03 Thread Edward Capriolo
I think it is fair to run a flakey test again. If it is determine it flaked
out due to a conflict with another test or something ephemeral in a long
process it is not worth blocking a release.

Just deleting it is probably not a good path.

I actually enjoy writing fixing, tweeking, tests so pinge offline or
whatever.

On Saturday, December 3, 2016, Benjamin Roth 
wrote:

> Excuse me if I jump into an old thread, but from my experience, I have a
> very clear opinion about situations like that as I encountered them before:
>
> Tests are there to give *certainty*.
> *Would you like to pass a crossing with a green light if you cannot be sure
> if green really means green?*
> Do you want to rely on tests that are green, red, green, red? What if a red
> is a real red and you missed it because you simply ignore it because it's
> flaky?
>
> IMHO there are only 3 options how to deal with broken/red tests:
> - Fix the underlying issue
> - Fix the test
> - Delete the test
>
> If I cannot trust a test, it is better not to have it at all. Otherwise
> people are staring at red lights and start to drive.
>
> This causes:
> - Uncertainty
> - Loss of trust
> - Confusion
> - More work
> - *Less quality*
>
> Just as an example:
> Few days ago I created a patch. Then I ran the utest and 1 test failed.
> Hmmm, did I break it? I had to check it twice by checking out the former
> state, running the tests again just to recognize that it wasn't me who made
> it fail. That's annoying.
>
> Sorry again, I'm rather new here but what I just read reminded me much of
> situations I have been in years ago.
> So: +1, John
>
> 2016-12-03 7:48 GMT+01:00 sankalp kohli  >:
>
> > Hi,
> > I dont see any any update on this thread. We will go ahead and make
> > Dtest a blocker for cutting releasing for anything after 3.10.
> >
> > Please respond if anyone has an objection to this.
> >
> > Thanks,
> > Sankalp
> >
> >
> >
> > On Mon, Nov 21, 2016 at 11:57 AM, Josh McKenzie  >
> > wrote:
> >
> > > Caveat: I'm strongly in favor of us blocking a release on a non-green
> > test
> > > board of either utest or dtest.
> > >
> > >
> > > > put something in prod which is known to be broken in obvious ways
> > >
> > > In my experience the majority of fixes are actually shoring up
> > low-quality
> > > / flaky tests or fixing tests that have been invalidated by a commit
> but
> > do
> > > not indicate an underlying bug. Inferring "tests are failing so we know
> > > we're asking people to put things in prod that are broken in obvious
> > ways"
> > > is hyperbolic. A more correct statement would be: "Tests are failing so
> > we
> > > know we're shipping with a test that's failing" which is not helpful.
> > >
> > > Our signal to noise ratio with tests has been very poor historically;
> > we've
> > > been trying to address that through aggressive triage and assigning out
> > > test failures however we need far more active and widespread community
> > > involvement if we want to truly *fix* this problem long-term.
> > >
> > > On Mon, Nov 21, 2016 at 2:33 PM, Jonathan Haddad  >
> > > wrote:
> > >
> > > > +1.  Kind of silly to put advise people to put something in prod
> which
> > is
> > > > known to be broken in obvious ways
> > > >
> > > > On Mon, Nov 21, 2016 at 11:31 AM sankalp kohli <
> kohlisank...@gmail.com 
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > We should not cut a releases if Dtest are not passing. I won't
> > > block
> > > > > 3.10 on this since we are just discussing this.
> > > > >
> > > > > Please provide feedback on this.
> > > > >
> > > > > Thanks,
> > > > > Sankalp
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Committer access to CassCI

2016-12-06 Thread Edward Capriolo
I will take this up at the next NYC-cassandra meetup. I have been on the
fence for "charging" for events for a while, but a nice donation piece
would be pretty cool if it can fuel the project.

I have also joked about creating CaSETI (Search for Extra Testing
Infrastructure) and building a docker that would phone home for testing
work, that we all can run on our workstations and xboxes.

On Tue, Dec 6, 2016 at 10:47 AM, Michael Shuler 
wrote:

> (Sent to private@ a couple weeks ago)
>
> We are currently working on configuring newly donated ASF recommended
> compute resources to the ASF Jenkins environment and will be
> transferring unit and dtests over there once the infrastructure is
> running jobs successfully.
>
> We are receiving requests for new committers to receive access to
> CassCI, and in the interim can do so, but please note that we are near
> capacity and jobs are starting to back up there. We are also offering
> new committers assistance in setting up a local environment which may
> give them a faster turnaround time in terms of test results. It is our
> preference to not create new CassCI accounts and to spend our efforts
> and contributions on improving running on ASF infrastructure.
>
>
> (Added notes 12/06)
>
> At this time, JIRA ticket reviewers may need to set up dev branch jobs,
> if patch submitters do not currently have their forks set up on CassCI.
> The goal is to eventually migrate off of CassCI, utilizing ASF Jenkins
> for main branch jobs, which is nearly complete. If compute resources
> materialize for dev branches to run on ASF Jenkins, that's great,
> otherwise, I've set up a model of how to set up Jenkins in-house to run
> jobs.
>
> The ASF Jenkins jobs are configured via Job DSL directly from the
> cassandra-builds git repository:
>
> https://git-wip-us.apache.org/repos/asf?p=cassandra-builds.git;a=summary
>
> There are very limited Jenkins plugins installed on the ASF Jenkins, so
> a base Jenkins install with the Job DSL plugin added should get other
> Jenkins admins up and running pretty quickly. Modifications for running
> Jenkins on a user's repo of Apache Cassandra and custom branches should
> be relatively straightforward, but feel free to ask for help.
>
> With 5 dedicated ASF Jenkins slaves for Apache Cassandra, we currently
> cannot support developer branches on the ASF Jenkins infrastructure - we
> would queue jobs for days/weeks waiting to run. If there are community
> members that have a desire to donate compute resources to ASF Jenkins to
> add testing capacity, here's some background and the related INFRA
> tickets as we started testing on ASF and adding/troubleshooting the
> initial 5 servers:
>
> https://issues.apache.org/jira/browse/INFRA-12366
> https://issues.apache.org/jira/browse/INFRA-12823
> https://issues.apache.org/jira/browse/INFRA-12897
> https://issues.apache.org/jira/browse/INFRA-12943
> https://issues.apache.org/jira/browse/INFRA-13018
>
> --
> Kind regards,
> Michael
>


Re: Rollback procedure for Cassandra Upgrade.

2017-01-10 Thread Edward Capriolo
On Tuesday, January 10, 2017, Romain Hardouin 
wrote:

> To be able to downgrade we should be able to pin both commitlog and
> sstables versions, e.g. -Dcassandra.commitlog_version=3
> -Dcassandra.sstable_version=jb
> That would be awesome because it would decorrelate binaries version and
> data version. Upgrades would be much less risky so I guess that adoption of
> new C* versions would increase.
> Best,
> Romain
>
> Le Mardi 10 janvier 2017 6h03, Brandon Williams  > a écrit :
>
>
>  However, it's good to determine *how* it failed.  If nodetool just died or
> timed out, that's no big deal, it'll finish.
>
> On Mon, Jan 9, 2017 at 11:00 PM, Jonathan Haddad  > wrote:
>
> > There's no downgrade procedure. You either upgrade or you go back to a
> > snapshot from the previous version.
> > On Mon, Jan 9, 2017 at 8:13 PM Prakash Chauhan <
> > prakash.chau...@ericsson.com >
> > wrote:
> >
> > > Hi All ,
> > >
> > > Do we have an official procedure to rollback the upgrade of C* from
> 2.0.x
> > > to 2.1.x ?
> > >
> > >
> > > Description:
> > > I have upgraded C* from 2.0.x to 2.1.x . As a part of upgrade
> procedure ,
> > > I have to run nodetool upgradesstables .
> > > What if the command fails in the middle ? Some of the sstables will be
> in
> > > newer format (*-ka-*) where as other might be in older format(*-jb-*).
> > >
> > > Do we have a standard procedure to do rollback in such cases?
> > >
> > >
> > >
> > > Regards,
> > > Prakash Chauhan.
> > >
> > >
> >
>
>
>


It would be amazing if the version could output commitlog and sstables at a
specific version so roll backs are possible.


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: [RELEASE] Apache Cassandra 3.10 released

2017-02-03 Thread Edward Capriolo
On Fri, Feb 3, 2017 at 6:52 PM, Michael Shuler 
wrote:

> The Cassandra team is pleased to announce the release of Apache
> Cassandra version 3.10.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a new feature and bug fix release[1] on the 3.X series.
> As always, please pay attention to the release notes[2] and Let us
> know[3] if you were to encounter any problem.
>
> This is the last tick-tock feature release of Apache Cassandra. Version
> 3.11.0 will continue bug fixes from this point on the cassandra-3.11
> branch in git.
>
> Enjoy!
>
> [1]: (CHANGES.txt) https://goo.gl/J0VghF
> [2]: (NEWS.txt) https://goo.gl/00KNVW
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>
Great job all on this release.


Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread Edward Capriolo
On Tue, Feb 7, 2017 at 8:12 AM, Kant Kodali  wrote:

> yes agreed with this response
>
> On Tue, Feb 7, 2017 at 5:07 AM, James Carman 
> wrote:
>
> > I think folks might agree that it's not worth the time to worry about
> what
> > they say.  The ASF isn't a commercial entity, so we don't worry about
> > market share or anything.  Sure, it's not cool for folks to say
> misleading
> > or downright false statements about Cassandra, but we can't police the
> > internet.  We would be better served focusing on what we can control,
> which
> > is Cassandra, making it the best NoSQL database it can be.  Perhaps you
> > should write a blog post showing Cassandra survive a failure and we can
> > link to it from the Cassandra site.
> >
> > Now, this doesn't apply to trademarks, as the PMC is responsible for
> > "defending" its marks.
> >
> >
> >
> > On Tue, Feb 7, 2017 at 7:59 AM Kant Kodali  wrote:
> >
> > > @James I don't see how people can agree to it if they know Cassandra or
> > > even better Distributed systems reasonably well
> > >
> > > On Tue, Feb 7, 2017 at 4:54 AM, Bernardo Sanchez <
> > > bernard...@pointclickcare.com> wrote:
> > >
> > > > same. yra
> > > >
> > > > Sent from my BlackBerry - the most secure mobile device - via the
> Bell
> > > > Network
> > > > From: benjamin.r...@jaumo.com
> > > > Sent: February 7, 2017 7:51 AM
> > > > To: dev@cassandra.apache.org
> > > > Reply-to: dev@cassandra.apache.org
> > > > Subject: Re: Why does CockroachDB github website say Cassandra has no
> > > > Availability on datacenter failure?
> > > >
> > > >
> > > > Btw this isn't the Bronx either. It's not incorrect to be polite.
> > > >
> > > > Am 07.02.2017 13:45 schrieb "Bernardo Sanchez" <
> > > > bernard...@pointclickcare.com>:
> > > >
> > > > > guys this isn't twitter. stop your stupid posts
> > > > >
> > > > > From: benjamin.le...@datastax.com
> > > > > Sent: February 7, 2017 7:43 AM
> > > > > To: dev@cassandra.apache.org
> > > > > Reply-to: dev@cassandra.apache.org
> > > > > Subject: Re: Why does CockroachDB github website say Cassandra has
> no
> > > > > Availability on datacenter failure?
> > > > >
> > > > >
> > > > > Do not get angry for that. It does not worth it. :-)
> > > > >
> > > > > On Tue, Feb 7, 2017 at 1:11 PM, Kant Kodali 
> > wrote:
> > > > >
> > > > > > lol. But seriously are they even allowed to say something that is
> > not
> > > > > true
> > > > > > about another product ?
> > > > > >
> > > > > > On Tue, Feb 7, 2017 at 4:05 AM, kurt greaves <
> k...@instaclustr.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Marketing never lies. Ever
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Join their mailing list,
Tony Cassandra: "Your database is a piece of ..."
Cockroach ML: "What are your talking about"
Tony Cassandra  "You know what Im talking about you  cockroach"
::grabs chaos monkey and points it at their cluster::


Re: If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-11 Thread Edward Capriolo
If you want to test the scenarios thia project would be helpful
Http://github.com/edwardcapriolo/ec

I use brute force at different CL and assert if i detect and consistency
issues. Having mvs would be nice


On Saturday, February 11, 2017, Benjamin Roth 
wrote:

> For MVs regarding this threads question only the partition key matters.
> Different primary keys can have the same partition key. Which is the case
> in the example in your last comment.
>
> Am 10.02.2017 20:26 schrieb "Kant Kodali"  >:
>
> @Benjamin Roth: How do you say something is a different PRIMARY KEY now?
> looks like you are saying
>
> The below is same partition key and same primary key?
>
> PRIMARY KEY ((a, b), c, d) and
> PRIMARY KEY ((a, b), d, c)
>
> @Russell Great to see you here! As always that is spot on!
>
> On Fri, Feb 10, 2017 at 11:13 AM, Benjamin Roth  >
> wrote:
>
> > Thanks a lot for that post. If I read the code right, then there is one
> > case missing in your post.
> > According to StorageProxy.mutateMV, local updates are NOT put into a
> batch
> > and are instantly applied locally. So a batch is only created if remote
> > mutations have to be applied and only for those mutations.
> >
> > 2017-02-10 19:58 GMT+01:00 DuyHai Doan  >:
> >
> > > See my blog post to understand how MV is implemented:
> > > http://www.doanduyhai.com/blog/?p=1930
> > >
> > > On Fri, Feb 10, 2017 at 7:48 PM, Benjamin Roth <
> benjamin.r...@jaumo.com >
> > > wrote:
> > >
> > > > Same partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a, b), d, c)
> > > >
> > > > PRIMARY KEY ((a), b, c) and
> > > > PRIMARY KEY ((a), c, b)
> > > >
> > > > Different partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a), b, d, c)
> > > >
> > > > PRIMARY KEY ((a), b) and
> > > > PRIMARY KEY ((b), a)
> > > >
> > > >
> > > > 2017-02-10 19:46 GMT+01:00 Kant Kodali  >:
> > > >
> > > > > Okies now I understand what you mean by "same" partition key.  I
> > think
> > > > you
> > > > > are saying
> > > > >
> > > > > PRIMARY KEY(col1, col2, col3) == PRIMARY KEY(col2, col1, col3) //
> so
> > > far
> > > > I
> > > > > assumed they are different partition keys.
> > > > >
> > > > > On Fri, Feb 10, 2017 at 10:36 AM, Benjamin Roth <
> > > benjamin.r...@jaumo.com 
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There are use cases where the partition key is the same. For
> > example
> > > if
> > > > > you
> > > > > > need a sorting within a partition or a filtering different from
> the
> > > > > > original clustering keys.
> > > > > > We actually use this for some MVs.
> > > > > >
> > > > > > If you want "dumb" denormalization with simple append only cases
> > (or
> > > > more
> > > > > > general cases that don't require a read before write on update)
> you
> > > are
> > > > > > maybe better off with batched denormalized atomics writes.
> > > > > >
> > > > > > The main benefit of MVs is if you need denormalization to sort or
> > > > filter
> > > > > by
> > > > > > a non-primary key field.
> > > > > >
> > > > > > 2017-02-10 19:31 GMT+01:00 Kant Kodali  >:
> > > > > >
> > > > > > > yes thanks for the clarification.  But why would I ever have MV
> > > with
> > > > > the
> > > > > > > same partition key? if it is the same partition key I could
> just
> > > read
> > > > > > from
> > > > > > > the base table right? our MV Partition key contains the columns
> > > from
> > > > > the
> > > > > > > base table partition key but in a different order plus an
> > > additional
> > > > > > column
> > > > > > > (which is allowed as of today)
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 10:23 AM, Benjamin Roth <
> > > > > benjamin.r...@jaumo.com 
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > It depends on your model.
> > > > > > > > If the base table + MV have the same partition key, then the
> MV
> > > > > > mutations
> > > > > > > > are applied synchronously, so they are written as soon the
> > write
> > > > > > request
> > > > > > > > returns.
> > > > > > > > => In this case you can rely on the R+F > RF
> > > > > > > >
> > > > > > > > If the partition key of the MV is different, the partition of
> > the
> > > > MV
> > > > > is
> > > > > > > > probably placed on a different host (or said differently it
> > > cannot
> > > > be
> > > > > > > > guaranteed that it is on the same host). In this case, the MV
> > > > updates
> > > > > > are
> > > > > > > > executed async in a logged batch. So it can be guaranteed
> they
> > > will
> > > > > be
> > > > > > > > applied eventually but not at the time the write request
> > returns.
> > > > > > > > => You cannot rely and there is no possibility to absolutely
> > > > > guarantee
> > > > > > > > anything, not matter what CL you choose. A MV update may
> always
> > > > > "arrive
> > > > > > > > late". I guess it has been implemented like this to not block
> > in
> > > > case
> > > > > > of
> > > > > > > > remote request to prefer the cluster sanity over consistency.
> > > >

Re: If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-12 Thread Edward Capriolo
On Sat, Feb 11, 2017 at 3:03 AM, Benjamin Roth 
wrote:

> For MVs regarding this threads question only the partition key matters.
> Different primary keys can have the same partition key. Which is the case
> in the example in your last comment.
>
> Am 10.02.2017 20:26 schrieb "Kant Kodali" :
>
> @Benjamin Roth: How do you say something is a different PRIMARY KEY now?
> looks like you are saying
>
> The below is same partition key and same primary key?
>
> PRIMARY KEY ((a, b), c, d) and
> PRIMARY KEY ((a, b), d, c)
>
> @Russell Great to see you here! As always that is spot on!
>
> On Fri, Feb 10, 2017 at 11:13 AM, Benjamin Roth 
> wrote:
>
> > Thanks a lot for that post. If I read the code right, then there is one
> > case missing in your post.
> > According to StorageProxy.mutateMV, local updates are NOT put into a
> batch
> > and are instantly applied locally. So a batch is only created if remote
> > mutations have to be applied and only for those mutations.
> >
> > 2017-02-10 19:58 GMT+01:00 DuyHai Doan :
> >
> > > See my blog post to understand how MV is implemented:
> > > http://www.doanduyhai.com/blog/?p=1930
> > >
> > > On Fri, Feb 10, 2017 at 7:48 PM, Benjamin Roth <
> benjamin.r...@jaumo.com>
> > > wrote:
> > >
> > > > Same partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a, b), d, c)
> > > >
> > > > PRIMARY KEY ((a), b, c) and
> > > > PRIMARY KEY ((a), c, b)
> > > >
> > > > Different partition key:
> > > >
> > > > PRIMARY KEY ((a, b), c, d) and
> > > > PRIMARY KEY ((a), b, d, c)
> > > >
> > > > PRIMARY KEY ((a), b) and
> > > > PRIMARY KEY ((b), a)
> > > >
> > > >
> > > > 2017-02-10 19:46 GMT+01:00 Kant Kodali :
> > > >
> > > > > Okies now I understand what you mean by "same" partition key.  I
> > think
> > > > you
> > > > > are saying
> > > > >
> > > > > PRIMARY KEY(col1, col2, col3) == PRIMARY KEY(col2, col1, col3) //
> so
> > > far
> > > > I
> > > > > assumed they are different partition keys.
> > > > >
> > > > > On Fri, Feb 10, 2017 at 10:36 AM, Benjamin Roth <
> > > benjamin.r...@jaumo.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There are use cases where the partition key is the same. For
> > example
> > > if
> > > > > you
> > > > > > need a sorting within a partition or a filtering different from
> the
> > > > > > original clustering keys.
> > > > > > We actually use this for some MVs.
> > > > > >
> > > > > > If you want "dumb" denormalization with simple append only cases
> > (or
> > > > more
> > > > > > general cases that don't require a read before write on update)
> you
> > > are
> > > > > > maybe better off with batched denormalized atomics writes.
> > > > > >
> > > > > > The main benefit of MVs is if you need denormalization to sort or
> > > > filter
> > > > > by
> > > > > > a non-primary key field.
> > > > > >
> > > > > > 2017-02-10 19:31 GMT+01:00 Kant Kodali :
> > > > > >
> > > > > > > yes thanks for the clarification.  But why would I ever have MV
> > > with
> > > > > the
> > > > > > > same partition key? if it is the same partition key I could
> just
> > > read
> > > > > > from
> > > > > > > the base table right? our MV Partition key contains the columns
> > > from
> > > > > the
> > > > > > > base table partition key but in a different order plus an
> > > additional
> > > > > > column
> > > > > > > (which is allowed as of today)
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 10:23 AM, Benjamin Roth <
> > > > > benjamin.r...@jaumo.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > It depends on your model.
> > > > > > > > If the base table + MV have the same partition key, then the
> MV
> > > > > > mutations
> > > > > > > > are applied synchronously, so they are written as soon the
> > write
> > > > > > request
> > > > > > > > returns.
> > > > > > > > => In this case you can rely on the R+F > RF
> > > > > > > >
> > > > > > > > If the partition key of the MV is different, the partition of
> > the
> > > > MV
> > > > > is
> > > > > > > > probably placed on a different host (or said differently it
> > > cannot
> > > > be
> > > > > > > > guaranteed that it is on the same host). In this case, the MV
> > > > updates
> > > > > > are
> > > > > > > > executed async in a logged batch. So it can be guaranteed
> they
> > > will
> > > > > be
> > > > > > > > applied eventually but not at the time the write request
> > returns.
> > > > > > > > => You cannot rely and there is no possibility to absolutely
> > > > > guarantee
> > > > > > > > anything, not matter what CL you choose. A MV update may
> always
> > > > > "arrive
> > > > > > > > late". I guess it has been implemented like this to not block
> > in
> > > > case
> > > > > > of
> > > > > > > > remote request to prefer the cluster sanity over consistency.
> > > > > > > >
> > > > > > > > Is it now 100% clear?
> > > > > > > >
> > > > > > > > 2017-02-10 19:17 GMT+01:00 Kant Kodali :
> > > > > > > >
> > > > > > > > > So R+W > RF doesnt apply for reads on MV right because say
> I
> > 

Re: New committers announcement

2017-02-15 Thread Edward Capriolo
Three cheers!
Hip , Hip, NotFound
1 ms later
Hip, Hip, Hooray
1 ms later
Hooray, Hooray, Hooray

On Tue, Feb 14, 2017 at 5:50 PM, Ben Bromhead  wrote:

> Congrats!!
>
> On Tue, 14 Feb 2017 at 13:37 Joaquin Casares 
> wrote:
>
> > Congratulations!
> >
> > +1 John's sentiments. That's a great list of new committers! :)
> >
> > Joaquin Casares
> > Consultant
> > Austin, TX
> >
> > Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> > On Tue, Feb 14, 2017 at 3:34 PM, Jonathan Haddad 
> > wrote:
> >
> > > Congratulations! Definitely a lot of great contributions from everyone
> on
> > > the list.
> > > On Tue, Feb 14, 2017 at 1:31 PM Jason Brown 
> > wrote:
> > >
> > > > Hello all,
> > > >
> > > > It's raining new committers here in Apache Cassandra!  I'd like to
> > > announce
> > > > the following individuals are now committers for the project:
> > > >
> > > > Branimir Lambov
> > > > Paulo Motta
> > > > Stefan Pokowinski
> > > > Ariel Weisberg
> > > > Blake Eggleston
> > > > Alex Petrov
> > > > Joel Knighton
> > > >
> > > > Congratulations all! Please keep the excellent contributions coming.
> > > >
> > > > Thanks,
> > > >
> > > > -Jason Brown
> > > >
> > >
> >
> --
> Ben Bromhead
> CTO | Instaclustr 
> +1 650 284 9692
> Managed Cassandra / Spark on AWS, Azure and Softlayer
>


Pluggable throttling of read and write queries

2017-02-20 Thread Edward Capriolo
Older versions had a request scheduler api.

On Monday, February 20, 2017, Ben Slater > wrote:

> We’ve actually had several customers where we’ve done the opposite - split
> large clusters apart to separate uses cases. We found that this allowed us
> to better align hardware with use case requirements (for example using AWS
> c3.2xlarge for very hot data at low latency, m4.xlarge for more general
> purpose data) we can also tune JVM settings, etc to meet those uses cases.
>
> Cheers
> Ben
>
> On Mon, 20 Feb 2017 at 22:21 Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Sat, Feb 18, 2017 at 3:12 AM, Abhishek Verma  wrote:
>>
>>> Cassandra is being used on a large scale at Uber. We usually create
>>> dedicated clusters for each of our internal use cases, however that is
>>> difficult to scale and manage.
>>>
>>> We are investigating the approach of using a single shared cluster with
>>> 100s of nodes and handle 10s to 100s of different use cases for different
>>> products in the same cluster. We can define different keyspaces for each of
>>> them, but that does not help in case of noisy neighbors.
>>>
>>> Does anybody in the community have similar large shared clusters and/or
>>> face noisy neighbor issues?
>>>
>>
>> Hi,
>>
>> We've never tried this approach and given my limited experience I would
>> find this a terrible idea from the perspective of maintenance (remember the
>> old saying about basket and eggs?)
>>
>> What potential benefits do you see?
>>
>> Regards,
>> --
>> Alex
>>
>> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: State of triggers

2017-03-03 Thread Edward Capriolo
On Thu, Mar 2, 2017 at 2:10 PM, Kant Kodali  wrote:

> +1
>
> On Thu, Mar 2, 2017 at 11:04 AM, S G  wrote:
>
> > Hi,
> >
> > I am not able to find any documentation on the current state of triggers
> > being production ready.
> >
> > The post at
> > http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-
> > 0-prototype-triggers-support
> > says that "The current implementation is experimental, and there is some
> > work to do before triggers in Cassandra can be declared final and
> > production-ready."
> >
> > So which version of Cassandra should we expect triggers to be stable
> > enough?
> > Our requirement is to develop a solution for several Cassandra users all
> > running on different versions (they won't upgrade easily) and no one is
> > using 3.5+ versions.
> > So the smallest Cassandra version which has production ready triggers
> would
> > be really good to know.
> >
> > Also any advice on common gotchas with Cassandra triggers would be great
> to
> > know.
> >
> > Thanks
> > SG
> >
>

I used them. I built do it yourself secondary indexes with them. They have
there gotchas, but so do all the secondary index implementations. Just
because datastax does not write about something. Lets see like 5 years ago
there was this: https://github.com/hmsonline/cassandra-triggers

There is a fairly large divergence to what actual users do and what other
groups 'say' actual users do in some cases.


Re: State of triggers

2017-03-04 Thread Edward Capriolo
On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa  wrote:

> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo 
> wrote:
>
> >
> > I used them. I built do it yourself secondary indexes with them. They
> have
> > there gotchas, but so do all the secondary index implementations. Just
> > because datastax does not write about something. Lets see like 5 years
> ago
> > there was this: https://github.com/hmsonline/cassandra-triggers
> >
> >
> Still in use? How'd it work? Production ready? Would you still do it that
> way in 2017?
>
>
> > There is a fairly large divergence to what actual users do and what other
> > groups 'say' actual users do in some cases.
> >
>
> A lot of people don't share what they're doing (for business reasons, or
> because they don't think it's important, or because they don't know
> how/where), and that's fine but it makes it hard for anyone to know what
> features are used, or how well they're really working in production.
>
> I've seen a handful of "how do we use triggers" questions in IRC, and they
> weren't unreasonable questions, but seemed like a lot of pain, and more
> than one of those people ultimately came back and said they used some other
> mechanism (and of course, some of them silently disappear, so we have no
> idea if it worked or not).
>
> If anyone's actively using triggers, please don't keep it a secret. Knowing
> that they're being used would be a great way to justify continuing to
> maintain them.
>
> - Jeff
>

"Still in use? How'd it work? Production ready? Would you still do it that
way in 2017?"

I mean that is a loaded question. How long has cassandra had Secondary
Indexes? Did they work well? Would you use them? How many times were they
re-written?


Re: State of triggers

2017-03-04 Thread Edward Capriolo
On Saturday, March 4, 2017, Edward Capriolo  wrote:

>
>
> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa  > wrote:
>
>> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo > >
>> wrote:
>>
>> >
>> > I used them. I built do it yourself secondary indexes with them. They
>> have
>> > there gotchas, but so do all the secondary index implementations. Just
>> > because datastax does not write about something. Lets see like 5 years
>> ago
>> > there was this: https://github.com/hmsonline/cassandra-triggers
>> >
>> >
>> Still in use? How'd it work? Production ready? Would you still do it that
>> way in 2017?
>>
>>
>> > There is a fairly large divergence to what actual users do and what
>> other
>> > groups 'say' actual users do in some cases.
>> >
>>
>> A lot of people don't share what they're doing (for business reasons, or
>> because they don't think it's important, or because they don't know
>> how/where), and that's fine but it makes it hard for anyone to know what
>> features are used, or how well they're really working in production.
>>
>> I've seen a handful of "how do we use triggers" questions in IRC, and they
>> weren't unreasonable questions, but seemed like a lot of pain, and more
>> than one of those people ultimately came back and said they used some
>> other
>> mechanism (and of course, some of them silently disappear, so we have no
>> idea if it worked or not).
>>
>> If anyone's actively using triggers, please don't keep it a secret.
>> Knowing
>> that they're being used would be a great way to justify continuing to
>> maintain them.
>>
>> - Jeff
>>
>
> "Still in use? How'd it work? Production ready? Would you still do it that
> way in 2017?"
>
> I mean that is a loaded question. How long has cassandra had Secondary
> Indexes? Did they work well? Would you use them? How many times were they
> re-written?
>
>
>
The state if triggers imho was more about the long standing opinion that
users should not be able to inject code into cassandra.

That stance reversed and people could inject code, eventually all the
walls: sandboxes, mandate on copying a jar to every server toppled.

In the mix the secondary index implementations (that read before write (and
maybe still do)) were pitches as the supported way to do it correctly.

To be fair i would probably do this in an application server infront of c
unless the trigger had to genenerate n in the hundreds or thousands of
events.


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: State of triggers

2017-03-04 Thread Edward Capriolo
On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa  wrote:

>
>
>
> > On Mar 4, 2017, at 7:06 AM, Edward Capriolo 
> wrote:
> >
> >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa  wrote:
> >>
> >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo 
> >> wrote:
> >>
> >>>
> >>> I used them. I built do it yourself secondary indexes with them. They
> >> have
> >>> there gotchas, but so do all the secondary index implementations. Just
> >>> because datastax does not write about something. Lets see like 5 years
> >> ago
> >>> there was this: https://github.com/hmsonline/cassandra-triggers
> >>>
> >>>
> >> Still in use? How'd it work? Production ready? Would you still do it
> that
> >> way in 2017?
> >>
> >>
> >>> There is a fairly large divergence to what actual users do and what
> other
> >>> groups 'say' actual users do in some cases.
> >>>
> >>
> >> A lot of people don't share what they're doing (for business reasons, or
> >> because they don't think it's important, or because they don't know
> >> how/where), and that's fine but it makes it hard for anyone to know what
> >> features are used, or how well they're really working in production.
> >>
> >> I've seen a handful of "how do we use triggers" questions in IRC, and
> they
> >> weren't unreasonable questions, but seemed like a lot of pain, and more
> >> than one of those people ultimately came back and said they used some
> other
> >> mechanism (and of course, some of them silently disappear, so we have no
> >> idea if it worked or not).
> >>
> >> If anyone's actively using triggers, please don't keep it a secret.
> Knowing
> >> that they're being used would be a great way to justify continuing to
> >> maintain them.
> >>
> >> - Jeff
> >>
> >
> > "Still in use? How'd it work? Production ready? Would you still do it
> that way in 2017?"
> >
> > I mean that is a loaded question. How long has cassandra had Secondary
> > Indexes? Did they work well? Would you use them? How many times were
> they re-written?
>
> It wasn't really meant to be a loaded question; I was being sincere
>
> But I'll answer: secondary indexes suck for many use cases, but they're
> invaluable for their actual intended purpose, and I have no idea how many
> times they've been rewritten but they're production ready for their narrow
> use case (defined by cardinality).
>
> Is there a real triggers use case still? Alternative to MVs? Alternative
> to CDC? I've never implemented triggers - since you have, what's the level
> of surprise for the developer?


:) You mention alternatives/: Lets break them down.

MV:
They seem to have a lot pf promise. IE you can use them for things other
then equality searches, and I do think the CQL example with the top N high
scores is pretty useful. Then again our buddy Mr Roth has a thread named
"Rebuild / remove node with MV is inconsistent". I actually think a lot of
the use case for mv falls into the category of "something you should
actually be doing with storm". I can vibe with the concept of not needing a
streaming platform, but i KNOW storm would do this correctly. I don't want
to land on something like 2x index v1 v2 where there was fundamental flaws
at scale.(not saying this is case but the rebuild thing seems a bit scary)

CDC:
I slightly afraid of this. Rational: A extensible piece design specifically
for a close source implementation of hub and spoke replication. I have some
experience trying to "play along" with extensible things
https://issues.apache.org/jira/browse/CASSANDRA-12627
"Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}."

Not a rub, but I can't even get something committed using an existing
extensible interface. Heaven forbid a use case I have would want to *change*
the interface, I would probably get a -12. So I have no desire to try and
maintain a CDC implementation. I see myself falling into the same old "why
you want to do this? -1" trap.

Coordinator Triggers:
To bring things back really old-school coordinator triggers everyone always
wanted. In a nutshell, I DO believe they are easier to reason about then
MV. It is pretty basic, it happens on the coordinator there is no batchlogs
or whatever, best effort possibly requiring more nodes then as the keys
might be on different services. Actually I tend do like features like. Once
something comes on the downs

Re: Code quality, principles and rules

2017-03-16 Thread Edward Capriolo
On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa  wrote:

>
>
> On 2017-03-16 10:32 (-0700), François Deliège 
> wrote:
> >
> > To get this started, here is an initial proposal:
> >
> > Principles:
> >
> > 1. Tests always pass.  This is the starting point. If we don't care
> about test failures, then we should stop writing tests. A recurring failing
> test carries no signal and is better deleted.
> > 2. The code is tested.
> >
> > Assuming we can align on these principles, here is a proposal for their
> implementation.
> >
> > Rules:
> >
> > 1. Each new release passes all tests (no flakinesss).
> > 2. If a patch has a failing test (test touching the same code path), the
> code or test should be fixed prior to being accepted.
> > 3. Bugs fixes should have one test that fails prior to the fix and
> passes after fix.
> > 4. New code should have at least 90% test coverage.
> >
> First I was
> I agree with all of these and hope they become codified and followed. I
> don't know anyone who believes we should be committing code that breaks
> tests - but we should be more strict with requiring green test runs, and
> perhaps more strict with reverting patches that break tests (or cause them
> to be flakey).
>
> Ed also noted on the user list [0] that certain sections of the code
> itself are difficult to test because of singletons - I agree with the
> suggestion that it's time to revisit CASSANDRA-7837 and CASSANDRA-10283
>
> Finally, we should also recall Jason's previous notes [1] that the actual
> test infrastructure available is limited - the system provided by Datastax
> is not generally open to everyone (and not guaranteed to be permanent), and
> the infrastructure currently available to the ASF is somewhat limited (much
> slower, at the very least). If we require tests passing (and I agree that
> we should), we need to define how we're going to be testing (or how we're
> going to be sharing test results), because the ASF hardware isn't going to
> be able to do dozens of dev branch dtest runs per day in its current form.
>
> 0: https://lists.apache.org/thread.html/f6f3fc6d0ad1bd54a6185ce7bd7a2f
> 6f09759a02352ffc05df92eef6@%3Cuser.cassandra.apache.org%3E
> 1: https://lists.apache.org/thread.html/5fb8f0446ab97644100e4ef987f36e
> 07f44e8dd6d38f5dc81ecb3cdd@%3Cdev.cassandra.apache.org%3E
>
>
>
Ed also noted on the user list [0] that certain sections of the code itself
are difficult to test because of singletons - I agree with the suggestion
that it's time to revisit CASSANDRA-7837 and CASSANDRA-10283

Thanks for the shout out!

I was just looking at a patch about compaction. The patch was to calculate
free space correctly in case X. Compaction is not something that requires
multiple nodes to test. The logic on the surface seems simple: find tables
of similar size and select them and merge them. The reality is it turns out
now to be that way. The coverage itself both branch and line may be very
high, but what the code does not do is directly account for a wide variety
of scenarios. Without direct tests you end up with a mental approximation
of what it does and that varies from person to person and accounts for the
cases that fit in your mind. For example, you personally are only running
LevelDB inspired compaction.

Being that this this is not a multi-node problem you should be able to re
factor this heavily. Pulling everything out to a static method where all
the parameters are arguments, or inject a lot of mocks in the current code,
and develop some scenario based coverage.

That is how i typically "rescue" code I take over. I look at the nightmare
and say, "damn i am really afraid to touch this". I construct 8 scenarios
that test green. Then I force some testing into it through careful re
factoring. Now, I probably know -something- about it. Now, you are fairly
free to do a wide ranging refactor, because you at least counted for 8
scenarios and you put unit test traps so that some rules are enforced. (Or
the person changing the code has to actively REMOVE your tests asserting it
was not or no longer is valid). Later on you (or someone else)  __STILL__
might screw the entire thing up, but at least you can now build forward.

Anyway that patch on compaction was great and I am sure it improved things.
That being said it did not add any tests :). So it can easily be undone by
the next person who does not understand the specific issue trying to be
addressed. Inline comments almost scream to me 'we need a test' not
everyone believes that.


Re: Code quality, principles and rules

2017-03-16 Thread Edward Capriolo
On Thu, Mar 16, 2017 at 5:18 PM, Jason Brown  wrote:

> >> do we have plan to integrate with a dependency injection framework?
>
> No, we (the maintainers) have been pretty much against more frameworks due
> to performance reasons, overhead, and dependency management problems.
>
> On Thu, Mar 16, 2017 at 2:04 PM, Qingcun Zhou 
> wrote:
>
> > Since we're here, do we have plan to integrate with a dependency
> injection
> > framework like Dagger2? Otherwise it'll be difficult to write unit test
> > cases.
> >
> > On Thu, Mar 16, 2017 at 1:16 PM, Edward Capriolo 
> > wrote:
> >
> > > On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa  wrote:
> > >
> > > >
> > > >
> > > > On 2017-03-16 10:32 (-0700), François Deliège <
> franc...@instagram.com>
> > > > wrote:
> > > > >
> > > > > To get this started, here is an initial proposal:
> > > > >
> > > > > Principles:
> > > > >
> > > > > 1. Tests always pass.  This is the starting point. If we don't care
> > > > about test failures, then we should stop writing tests. A recurring
> > > failing
> > > > test carries no signal and is better deleted.
> > > > > 2. The code is tested.
> > > > >
> > > > > Assuming we can align on these principles, here is a proposal for
> > their
> > > > implementation.
> > > > >
> > > > > Rules:
> > > > >
> > > > > 1. Each new release passes all tests (no flakinesss).
> > > > > 2. If a patch has a failing test (test touching the same code
> path),
> > > the
> > > > code or test should be fixed prior to being accepted.
> > > > > 3. Bugs fixes should have one test that fails prior to the fix and
> > > > passes after fix.
> > > > > 4. New code should have at least 90% test coverage.
> > > > >
> > > > First I was
> > > > I agree with all of these and hope they become codified and
> followed. I
> > > > don't know anyone who believes we should be committing code that
> breaks
> > > > tests - but we should be more strict with requiring green test runs,
> > and
> > > > perhaps more strict with reverting patches that break tests (or cause
> > > them
> > > > to be flakey).
> > > >
> > > > Ed also noted on the user list [0] that certain sections of the code
> > > > itself are difficult to test because of singletons - I agree with the
> > > > suggestion that it's time to revisit CASSANDRA-7837 and
> CASSANDRA-10283
> > > >
> > > > Finally, we should also recall Jason's previous notes [1] that the
> > actual
> > > > test infrastructure available is limited - the system provided by
> > > Datastax
> > > > is not generally open to everyone (and not guaranteed to be
> permanent),
> > > and
> > > > the infrastructure currently available to the ASF is somewhat limited
> > > (much
> > > > slower, at the very least). If we require tests passing (and I agree
> > that
> > > > we should), we need to define how we're going to be testing (or how
> > we're
> > > > going to be sharing test results), because the ASF hardware isn't
> going
> > > to
> > > > be able to do dozens of dev branch dtest runs per day in its current
> > > form.
> > > >
> > > > 0: https://lists.apache.org/thread.html/
> f6f3fc6d0ad1bd54a6185ce7bd7a2f
> > > > 6f09759a02352ffc05df92eef6@%3Cuser.cassandra.apache.org%3E
> > > > 1: https://lists.apache.org/thread.html/
> 5fb8f0446ab97644100e4ef987f36e
> > > > 07f44e8dd6d38f5dc81ecb3cdd@%3Cdev.cassandra.apache.org%3E
> > > >
> > > >
> > > >
> > > Ed also noted on the user list [0] that certain sections of the code
> > itself
> > > are difficult to test because of singletons - I agree with the
> suggestion
> > > that it's time to revisit CASSANDRA-7837 and CASSANDRA-10283
> > >
> > > Thanks for the shout out!
> > >
> > > I was just looking at a patch about compaction. The patch was to
> > calculate
> > > free space correctly in case X. Compaction is not something that
> requires
> > > multiple nodes to test. The logic on the surface seems simple: find
> > tables
> > > of similar size and select them and

Re: Code quality, principles and rules

2017-03-17 Thread Edward Capriolo
On Fri, Mar 17, 2017 at 6:41 AM, Ryan Svihla  wrote:

> Different DI frameworks have different initialization costs, even inside of
> spring even depending on how you wire up dependencies (did it use autowire
> with reflection, parse a giant XML of explicit dependencies, etc).
>
> To back this assertion up for awhile in that community benching different
> DI frameworks perf was a thing and you can find benchmarks galore with a
> quick Google.
>
> The practical cost is also dependent on the lifecycles used (transient
> versus Singleton style for example) and features used (Interceptors
> depending on implementation can get expensive).
>
> So I think there should be some quantification of cost before a framework
> is considered, something like dagger2 which uses codegen I wager is only a
> cost at compile time (have not benched it, but looking at it's feature set,
> that's my guess) , Spring I know from experience even with the most optimal
> settings is slower on initialization time than doing by DI "by hand" at
> minimum, and that can sometimes be substantial.
>
>
> On Mar 17, 2017 12:29 AM, "Edward Capriolo"  wrote:
>
> On Thu, Mar 16, 2017 at 5:18 PM, Jason Brown  wrote:
>
> > >> do we have plan to integrate with a dependency injection framework?
> >
> > No, we (the maintainers) have been pretty much against more frameworks
> due
> > to performance reasons, overhead, and dependency management problems.
> >
> > On Thu, Mar 16, 2017 at 2:04 PM, Qingcun Zhou 
> > wrote:
> >
> > > Since we're here, do we have plan to integrate with a dependency
> > injection
> > > framework like Dagger2? Otherwise it'll be difficult to write unit test
> > > cases.
> > >
> > > On Thu, Mar 16, 2017 at 1:16 PM, Edward Capriolo <
> edlinuxg...@gmail.com>
> > > wrote:
> > >
> > > > On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa 
> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 2017-03-16 10:32 (-0700), François Deliège <
> > franc...@instagram.com>
> > > > > wrote:
> > > > > >
> > > > > > To get this started, here is an initial proposal:
> > > > > >
> > > > > > Principles:
> > > > > >
> > > > > > 1. Tests always pass.  This is the starting point. If we don't
> care
> > > > > about test failures, then we should stop writing tests. A recurring
> > > > failing
> > > > > test carries no signal and is better deleted.
> > > > > > 2. The code is tested.
> > > > > >
> > > > > > Assuming we can align on these principles, here is a proposal for
> > > their
> > > > > implementation.
> > > > > >
> > > > > > Rules:
> > > > > >
> > > > > > 1. Each new release passes all tests (no flakinesss).
> > > > > > 2. If a patch has a failing test (test touching the same code
> > path),
> > > > the
> > > > > code or test should be fixed prior to being accepted.
> > > > > > 3. Bugs fixes should have one test that fails prior to the fix
> and
> > > > > passes after fix.
> > > > > > 4. New code should have at least 90% test coverage.
> > > > > >
> > > > > First I was
> > > > > I agree with all of these and hope they become codified and
> > followed. I
> > > > > don't know anyone who believes we should be committing code that
> > breaks
> > > > > tests - but we should be more strict with requiring green test
> runs,
> > > and
> > > > > perhaps more strict with reverting patches that break tests (or
> cause
> > > > them
> > > > > to be flakey).
> > > > >
> > > > > Ed also noted on the user list [0] that certain sections of the
> code
> > > > > itself are difficult to test because of singletons - I agree with
> the
> > > > > suggestion that it's time to revisit CASSANDRA-7837 and
> > CASSANDRA-10283
> > > > >
> > > > > Finally, we should also recall Jason's previous notes [1] that the
> > > actual
> > > > > test infrastructure available is limited - the system provided by
> > > > Datastax
> > > > > is not generally open to everyone (and not guaranteed to be
> > permanent),
> > > > and
&g

Re: Code quality, principles and rules

2017-03-17 Thread Edward Capriolo
On Fri, Mar 17, 2017 at 9:46 AM, Edward Capriolo 
wrote:

>
>
> On Fri, Mar 17, 2017 at 6:41 AM, Ryan Svihla  wrote:
>
>> Different DI frameworks have different initialization costs, even inside
>> of
>> spring even depending on how you wire up dependencies (did it use autowire
>> with reflection, parse a giant XML of explicit dependencies, etc).
>>
>> To back this assertion up for awhile in that community benching different
>> DI frameworks perf was a thing and you can find benchmarks galore with a
>> quick Google.
>>
>> The practical cost is also dependent on the lifecycles used (transient
>> versus Singleton style for example) and features used (Interceptors
>> depending on implementation can get expensive).
>>
>> So I think there should be some quantification of cost before a framework
>> is considered, something like dagger2 which uses codegen I wager is only a
>> cost at compile time (have not benched it, but looking at it's feature
>> set,
>> that's my guess) , Spring I know from experience even with the most
>> optimal
>> settings is slower on initialization time than doing by DI "by hand" at
>> minimum, and that can sometimes be substantial.
>>
>>
>> On Mar 17, 2017 12:29 AM, "Edward Capriolo" 
>> wrote:
>>
>> On Thu, Mar 16, 2017 at 5:18 PM, Jason Brown 
>> wrote:
>>
>> > >> do we have plan to integrate with a dependency injection framework?
>> >
>> > No, we (the maintainers) have been pretty much against more frameworks
>> due
>> > to performance reasons, overhead, and dependency management problems.
>> >
>> > On Thu, Mar 16, 2017 at 2:04 PM, Qingcun Zhou 
>> > wrote:
>> >
>> > > Since we're here, do we have plan to integrate with a dependency
>> > injection
>> > > framework like Dagger2? Otherwise it'll be difficult to write unit
>> test
>> > > cases.
>> > >
>> > > On Thu, Mar 16, 2017 at 1:16 PM, Edward Capriolo <
>> edlinuxg...@gmail.com>
>> > > wrote:
>> > >
>> > > > On Thu, Mar 16, 2017 at 3:10 PM, Jeff Jirsa 
>> wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > On 2017-03-16 10:32 (-0700), François Deliège <
>> > franc...@instagram.com>
>> > > > > wrote:
>> > > > > >
>> > > > > > To get this started, here is an initial proposal:
>> > > > > >
>> > > > > > Principles:
>> > > > > >
>> > > > > > 1. Tests always pass.  This is the starting point. If we don't
>> care
>> > > > > about test failures, then we should stop writing tests. A
>> recurring
>> > > > failing
>> > > > > test carries no signal and is better deleted.
>> > > > > > 2. The code is tested.
>> > > > > >
>> > > > > > Assuming we can align on these principles, here is a proposal
>> for
>> > > their
>> > > > > implementation.
>> > > > > >
>> > > > > > Rules:
>> > > > > >
>> > > > > > 1. Each new release passes all tests (no flakinesss).
>> > > > > > 2. If a patch has a failing test (test touching the same code
>> > path),
>> > > > the
>> > > > > code or test should be fixed prior to being accepted.
>> > > > > > 3. Bugs fixes should have one test that fails prior to the fix
>> and
>> > > > > passes after fix.
>> > > > > > 4. New code should have at least 90% test coverage.
>> > > > > >
>> > > > > First I was
>> > > > > I agree with all of these and hope they become codified and
>> > followed. I
>> > > > > don't know anyone who believes we should be committing code that
>> > breaks
>> > > > > tests - but we should be more strict with requiring green test
>> runs,
>> > > and
>> > > > > perhaps more strict with reverting patches that break tests (or
>> cause
>> > > > them
>> > > > > to be flakey).
>> > > > >
>> > > > > Ed also noted on the user list [0] that certain sections of the
>> code
>> > > > > itself are difficult to test because of singletons - I agree 

Re: Code quality, principles and rules

2017-03-17 Thread Edward Capriolo
On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston 
wrote:

> I think we’re getting a little ahead of ourselves talking about DI
> frameworks. Before that even becomes something worth talking about, we’d
> need to have made serious progress on un-spaghettifying Cassandra in the
> first place. It’s an extremely tall order. Adding a DI framework right now
> would be like throwing gasoline on a raging tire fire.
>
> Removing singletons seems to come up every 6-12 months, and usually
> abandoned once people figure out how difficult they are to remove properly.
> I do think removing them *should* be a long term goal, but we really need
> something more immediately actionable. Otherwise, nothing’s going to
> happen, and we’ll be having this discussion again in a year or so when
> everyone’s angry that Cassandra 5.0 still isn’t ready for production, a
> year after it’s release.
>
> That said, the reason singletons regularly get brought up is because doing
> extensive testing of anything in Cassandra is pretty much impossible, since
> the code is basically this big web of interconnected global state. Testing
> anything in isolation can’t be done, which, for a distributed database, is
> crazy. It’s a chronic problem that handicaps our ability to release a
> stable database.
>
> At this point, I think a more pragmatic approach would be to draft and
> enforce some coding standards that can be applied in day to day development
> that drive incremental improvement of the testing and testability of the
> project. What should be tested, how it should be tested. How to write new
> code that talks to the rest of Cassandra and is testable. How to fix bugs
> in old code in a way that’s testable. We should also have some guidelines
> around refactoring the wildly untested sections, how to get started, what
> to do, what not to do, etc.
>
> Thoughts?


To make the conversation practical. There is one class I personally really
want to refactor so it can be tested:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java

There is little coverage here. Questions like:
what errors cause the connection to restart?
when are undropable messages are dropped?
what happens when the queue fills up?
Infamous throw new AssertionError(ex); (which probably bubble up to nowhere)
what does the COALESCED strategy do in case XYZ.
A nifty label (wow a label you just never see those much!)
outer:
while (!isStopped)

Comments to jira's that probably are not explicitly tested:
// If we haven't retried this message yet, put it back on the queue to
retry after re-connecting.
// See CASSANDRA-5393 and CASSANDRA-12192.

If I were to undertake this cleanup, would there actually be support? IE if
this going to turn into an "it aint broken. don't fix it thing" or a "we
don't want to change stuff just to add tests" . Like will someone pledge to
agree its kinda wonky and merge the effort in < 1 years time?


Re: Code quality, principles and rules

2017-03-17 Thread Edward Capriolo
On Fri, Mar 17, 2017 at 2:31 PM, Jason Brown  wrote:

> To François's point about code coverage for new code, I think this makes a
> lot of sense wrt large features (like the current work on 8457/12229/9754).
> It's much simpler to (mentally, at least) isolate those changed sections
> and it'll show up better in a code coverage report. With small patches,
> that might be harder to achieve - however, as the patch should come with
> *some* tests (unless it's a truly trivial patch), it might just work itself
> out.
>
> On Fri, Mar 17, 2017 at 11:19 AM, Jason Brown 
> wrote:
>
> > As someone who spent a lot of time looking at the singletons topic in the
> > past, Blake brings a great perspective here. Figuring out and
> communicating
> > how best to test with the system we have (and of course incrementally
> > making that system easier to work with/test) seems like an achievable
> goal.
> >
> > On Fri, Mar 17, 2017 at 10:17 AM, Edward Capriolo  >
> > wrote:
> >
> >> On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston  >
> >> wrote:
> >>
> >> > I think we’re getting a little ahead of ourselves talking about DI
> >> > frameworks. Before that even becomes something worth talking about,
> we’d
> >> > need to have made serious progress on un-spaghettifying Cassandra in
> the
> >> > first place. It’s an extremely tall order. Adding a DI framework right
> >> now
> >> > would be like throwing gasoline on a raging tire fire.
> >> >
> >> > Removing singletons seems to come up every 6-12 months, and usually
> >> > abandoned once people figure out how difficult they are to remove
> >> properly.
> >> > I do think removing them *should* be a long term goal, but we really
> >> need
> >> > something more immediately actionable. Otherwise, nothing’s going to
> >> > happen, and we’ll be having this discussion again in a year or so when
> >> > everyone’s angry that Cassandra 5.0 still isn’t ready for production,
> a
> >> > year after it’s release.
> >> >
> >> > That said, the reason singletons regularly get brought up is because
> >> doing
> >> > extensive testing of anything in Cassandra is pretty much impossible,
> >> since
> >> > the code is basically this big web of interconnected global state.
> >> Testing
> >> > anything in isolation can’t be done, which, for a distributed
> database,
> >> is
> >> > crazy. It’s a chronic problem that handicaps our ability to release a
> >> > stable database.
> >> >
> >> > At this point, I think a more pragmatic approach would be to draft and
> >> > enforce some coding standards that can be applied in day to day
> >> development
> >> > that drive incremental improvement of the testing and testability of
> the
> >> > project. What should be tested, how it should be tested. How to write
> >> new
> >> > code that talks to the rest of Cassandra and is testable. How to fix
> >> bugs
> >> > in old code in a way that’s testable. We should also have some
> >> guidelines
> >> > around refactoring the wildly untested sections, how to get started,
> >> what
> >> > to do, what not to do, etc.
> >> >
> >> > Thoughts?
> >>
> >>
> >> To make the conversation practical. There is one class I personally
> really
> >> want to refactor so it can be tested:
> >>
> >> https://github.com/apache/cassandra/blob/trunk/src/java/org/
> >> apache/cassandra/net/OutboundTcpConnection.java
> >>
> >> There is little coverage here. Questions like:
> >> what errors cause the connection to restart?
> >> when are undropable messages are dropped?
> >> what happens when the queue fills up?
> >> Infamous throw new AssertionError(ex); (which probably bubble up to
> >> nowhere)
> >> what does the COALESCED strategy do in case XYZ.
> >> A nifty label (wow a label you just never see those much!)
> >> outer:
> >> while (!isStopped)
> >>
> >> Comments to jira's that probably are not explicitly tested:
> >> // If we haven't retried this message yet, put it back on the queue to
> >> retry after re-connecting.
> >> // See CASSANDRA-5393 and CASSANDRA-12192.
> >>
> >> If I were to undertake this cleanup, would there actually be support? IE
> >> if
> >> this going to turn into an "it aint broken. don't fix it thing" or a "we
> >> don't want to change stuff just to add tests" . Like will someone pledge
> >> to
> >> agree its kinda wonky and merge the effort in < 1 years time?
> >>
> >
> >
>

So ...:) If open a ticket to refactor OutboundTcpConnection.java to do
specific unit testing and possibly even pull things out to the point that I
can actually open a socket and to an end to end test will you/anyone
support that? (it sounds like your saying I must/should make a large
feature to add a test)


Re: Code quality, principles and rules

2017-03-19 Thread Edward Capriolo
On Saturday, March 18, 2017, Qingcun Zhou  wrote:

> I wanted to contribute some unit test cases. However the unit test approach
> in Cassandra seems weird to me after looking into some examples. Not sure
> if anyone else has the same feeling.
>
> Usually, at least for all Java projects I have seen, people use mock
> (mockito, powermock) for dependencies. And then in a particular test case
> you verify the behavior using junit.assert* or mockito.verify. However we
> don't use mockito in Cassandra. Is there any reason for this? Without
> these, how easy do people think about adding unit test cases?
>
>
> Besides that, we have lots of singletons and there are already a handful of
> tickets to eliminate them. Maybe I missed something but I'm not seeing much
> progress. Is anyone actively working on this?
>
> Maybe a related problem. Some unit test cases have method annotated with
> @BeforeClass to do initialization work. However, it not only initializes
> direct dependencies, but also indirect ones, including loading
> cassandra.yaml and initializing indirect dependencies. This seems to me
> more like functional/integration test but not unit test style.
>
>
> On Fri, Mar 17, 2017 at 2:56 PM, Jeremy Hanna  >
> wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-7837 may be some
> > interesting context regarding what's been worked on to get rid of
> > singletons and static initialization.
> >
> > > On Mar 17, 2017, at 4:47 PM, Jonathan Haddad  > wrote:
> > >
> > > I'd like to think that if someone refactors existing code, making it
> more
> > > testable (with tests, of course) it should be acceptable on it's own
> > > merit.  In fact, in my opinion it sometimes makes more sense to do
> these
> > > types of refactorings for the sole purpose of improving stability and
> > > testability as opposed to mixing them with features.
> > >
> > > You referenced the issue I fixed in one of the early emails.  The fix
> > > itself was a couple lines of code.  Refactoring the codebase to make it
> > > testable would have been a huge effort.  I wish I had time to do it.  I
> > > created CASSANDRA-13007 as a follow up with the intent of working on
> > > compaction from a purely architectural standpoint.  I think this type
> of
> > > thing should be done throughout the codebase.
> > >
> > > Removing the singletons is a good first step, my vote is we just rip
> off
> > > the bandaid, do it, and move forward.
> > >
> > > On Fri, Mar 17, 2017 at 2:20 PM Edward Capriolo  >
> > > wrote:
> > >
> > >>> On Fri, Mar 17, 2017 at 2:31 PM, Jason Brown  >
> > wrote:
> > >>>
> > >>> To François's point about code coverage for new code, I think this
> > makes
> > >> a
> > >>> lot of sense wrt large features (like the current work on
> > >> 8457/12229/9754).
> > >>> It's much simpler to (mentally, at least) isolate those changed
> > sections
> > >>> and it'll show up better in a code coverage report. With small
> patches,
> > >>> that might be harder to achieve - however, as the patch should come
> > with
> > >>> *some* tests (unless it's a truly trivial patch), it might just work
> > >> itself
> > >>> out.
> > >>>
> > >>> On Fri, Mar 17, 2017 at 11:19 AM, Jason Brown  >
> > >>> wrote:
> > >>>
> > >>>> As someone who spent a lot of time looking at the singletons topic
> in
> > >> the
> > >>>> past, Blake brings a great perspective here. Figuring out and
> > >>> communicating
> > >>>> how best to test with the system we have (and of course
> incrementally
> > >>>> making that system easier to work with/test) seems like an
> achievable
> > >>> goal.
> > >>>>
> > >>>> On Fri, Mar 17, 2017 at 10:17 AM, Edward Capriolo <
> > >> edlinuxg...@gmail.com 
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston <
> > >> beggles...@apple.com 
> > >>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I think we’re getting a little ahead of ourselves talking about DI
> > >>>>>> frameworks. Before that even becomes something worth ta

Re: Can we kill the wiki?

2017-03-19 Thread Edward Capriolo
Wikis are still good for collaberative design etc. Its a burden to edit the
docs and its not the place for all info.

On Friday, March 17, 2017, Murukesh Mohanan 
wrote:

> I wonder if the recent influx has anything to do with GSoC. The student
> application period begins in a few days. I don't see any Cassandra issues
> on the GSoC ideas list, though.
>
> On Sat, 18 Mar 2017 at 10:40 Anthony Grasso  >
> wrote:
>
> +1 to killing the wiki as well. If that is not possible, we should at least
> put a note on there saying it is deprecated and point people to the new
> docs.
>
> On 18 March 2017 at 08:09, Jonathan Haddad  > wrote:
>
> > +1 to killing the wiki.
> >
> > On Fri, Mar 17, 2017 at 2:08 PM Blake Eggleston  >
> > wrote:
> >
> > > With CASSANDRA-8700, docs were moved in tree, with the intention that
> > they
> > > would replace the wiki. However, it looks like we’re still getting
> > regular
> > > requests to edit the wiki. It seems like we should be directing these
> > folks
> > > to the in tree docs and either disabling edits for the wiki, or just
> > > removing it entirely, and replacing it with a link to the hosted docs.
> > I'd
> > > prefer we just remove it myself, makes things less confusing for
> > newcomers.
> > >
> > > Does that seem reasonable to everyone?
> >
>
> --
>
> Murukesh Mohanan,
> Yahoo! Japan
>


-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: splitting CQL parser & spec into separate repo

2017-03-21 Thread Edward Capriolo
On Tue, Mar 21, 2017 at 3:24 PM, Mark Dewey  wrote:

> I can immediately think of a project I would use that in. +1
>
> On Tue, Mar 21, 2017 at 12:18 PM Jonathan Haddad 
> wrote:
>
> > I created CASSANDRA-13284 a few days ago with the intent of starting a
> > discussion around the topic of breaking the CQL parser out into a
> separate
> > project.  I see a few benefits to doing it and was wondering what the
> folks
> > here thought as well.
> >
> > First off, the Java CQL parser would obviously continue to be the
> reference
> > parser.  I'd love to see other languages have CQL parsers as well, but
> the
> > intent here isn't for the OSS C* team to be responsible for maintaining
> > that.  My vision here is simply the ability to have some high level
> > CQLParser.parse(statement) call that returns the parse tree, nothing
> more.
> >
> > It would be nice to be able to leverage that parser in other projects
> such
> > as IDEs, code gen tools, etc.  It would be outstanding to be able to
> create
> > the parser tests in such a way that they can be referenced by other
> parsers
> > in other languages.  Yay code reuse.  It also has the benefit of making
> the
> > codebase a little more modular and a bit easier to understand.
> >
> > Thoughts?
> >
> > Jon
> >
>

It turns out that a similar thing was done with Hive.

https://calcite.apache.org/

https://calcite.apache.org/community/#apache-calcite-one-planner-fits-all

The challenge is typically adoption. The elevator pitch is like:
"EVERYONE WILL SHARE THIS AND IT WILL BE AWESOME". Maybe this is the wrong
word, but lets just say frenemies
exist and they do not like control of something moving to a shared medium.
Technical issues like ANTLR 3 vs ANTRL 4 etc.
For something like Hive the challenge is the parser/planner needs only be
fast enough for analytic queries but that would not
be the right move for say CQL.


Re: splitting CQL parser & spec into separate repo

2017-03-22 Thread Edward Capriolo
On Tue, Mar 21, 2017 at 5:45 PM, Anthony Grasso 
wrote:

> This is a great idea
>
> +1 (non-binding)
>
> On 22 March 2017 at 07:04, Edward Capriolo  wrote:
>
> > On Tue, Mar 21, 2017 at 3:24 PM, Mark Dewey  wrote:
> >
> > > I can immediately think of a project I would use that in. +1
> > >
> > > On Tue, Mar 21, 2017 at 12:18 PM Jonathan Haddad 
> > > wrote:
> > >
> > > > I created CASSANDRA-13284 a few days ago with the intent of starting
> a
> > > > discussion around the topic of breaking the CQL parser out into a
> > > separate
> > > > project.  I see a few benefits to doing it and was wondering what the
> > > folks
> > > > here thought as well.
> > > >
> > > > First off, the Java CQL parser would obviously continue to be the
> > > reference
> > > > parser.  I'd love to see other languages have CQL parsers as well,
> but
> > > the
> > > > intent here isn't for the OSS C* team to be responsible for
> maintaining
> > > > that.  My vision here is simply the ability to have some high level
> > > > CQLParser.parse(statement) call that returns the parse tree, nothing
> > > more.
> > > >
> > > > It would be nice to be able to leverage that parser in other projects
> > > such
> > > > as IDEs, code gen tools, etc.  It would be outstanding to be able to
> > > create
> > > > the parser tests in such a way that they can be referenced by other
> > > parsers
> > > > in other languages.  Yay code reuse.  It also has the benefit of
> making
> > > the
> > > > codebase a little more modular and a bit easier to understand.
> > > >
> > > > Thoughts?
> > > >
> > > > Jon
> > > >
> > >
> >
> > It turns out that a similar thing was done with Hive.
> >
> > https://calcite.apache.org/
> >
> > https://calcite.apache.org/community/#apache-calcite-one-
> planner-fits-all
> >
> > The challenge is typically adoption. The elevator pitch is like:
> > "EVERYONE WILL SHARE THIS AND IT WILL BE AWESOME". Maybe this is the
> wrong
> > word, but lets just say frenemies
> > exist and they do not like control of something moving to a shared
> medium.
> > Technical issues like ANTLR 3 vs ANTRL 4 etc.
> > For something like Hive the challenge is the parser/planner needs only be
> > fast enough for analytic queries but that would not
> > be the right move for say CQL.
> >
>

I believe you could accomplish a similar goal by making a multi-module
project https://maven.apache.org/guides/mini/guide-multiple-modules.html.
Probably not as easy thanks to ant, but I think that is a better route. One
there actually are N dependent projects in the wild you can make the case
for overhead which is both technical and in ASF based.


Re: splitting CQL parser & spec into separate repo

2017-03-23 Thread Edward Capriolo
On Thu, Mar 23, 2017 at 10:56 AM, Eric Evans 
wrote:

> On Wed, Mar 22, 2017 at 10:01 AM, Edward Capriolo 
> wrote:
> > I believe you could accomplish a similar goal by making a multi-module
> > project https://maven.apache.org/guides/mini/guide-multiple-modules.html
> .
> > Probably not as easy thanks to ant, but I think that is a better route.
> One
> > there actually are N dependent projects in the wild you can make the case
> > for overhead which is both technical and in ASF based.
>
> This was my first thought: If we were using Maven, we'd probably
> already have created this as a module[*].
>
>
> [*]: Maybe a surprise to some given how strongly I pushed back against
> it in the Early Days, but we would be so much better off at this point
> with Maven.
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>

Well the ant maven bit is a separate issue: It still could be done with
ant, it could be done in a way that the port is very easy.
http://ant.apache.org/easyant/history/trunk/ref/anttasks/SubModuletask.html


Re: DataStax Client List

2017-03-23 Thread Edward Capriolo
Well that is quite unsettling.

On Thu, Mar 23, 2017 at 10:33 AM, Theresa Taylor <
theresa.tay...@onlinedatatech.biz> wrote:

> Hi,
>
> Would you be interested in acquiring a list of DataStax users' information
> in an Excel sheet for unlimited marketing usage?
>
> List includes – First and Last name, Phone number, Email Address, Company
> Name, Job Title, Address, City, State, Zip, SIC code/Industry, Revenue and
> Company Size. The leads can also be further customized as per requirements.
>
> We can provide contact lists from any country/industry/title.
>
> If your target criteria are different kindly get back to us with your
> requirement with geography and job titles to provide you with counts and
> more information.
>
> Let me know your thoughts!
>
> Thanks,
>
>
> Theresa
> Senior Information Analyst
>
>
> If you wish not to receive marketing emails, please reply back “Opt
> Out” In headlines
>


Re: Spam Moderation

2017-03-23 Thread Edward Capriolo
On Thu, Mar 23, 2017 at 12:42 PM, Daryl Hawken 
wrote:

> +1.
>
> On Thu, Mar 23, 2017 at 12:10 PM, Michael Shuler 
> wrote:
>
> > I won't reply to the obvious spam to hilight it any further, so new
> > message..
> >
> > Could the mailing list moderator that approved the "client list" message
> > identify themselves and possibly explain how that was seen as a valid
> > message about the development of Apache Cassandra?
> >
> > --
> > Kind regards,
> > Michael
> >
>
>
>
> --
> *Most people have more than the average number of legs*
>

While the dev list is not clearly the place, and ithe email is spam looking
it is interesting to know that someone is marketing such a list. I have
spoken at different events an those entities likely have my email so I am
curious about the list.

I think the situation is much like the "Free bsd backdoor emails"
http://marc.info/?l=openbsd-tech&m=129236621626462&w=2 . IE even if you
believe  99.999% the info untrue do you pass the info along?


Re: [DISCUSS] Implementing code quality principles, and rules (was: Code quality, principles and rules)

2017-03-27 Thread Edward Capriolo
On Mon, Mar 27, 2017 at 7:03 PM, Josh McKenzie  wrote:

> How do we plan on verifying #4? Also, root-cause to tie back new code that
> introduces flaky tests (i.e. passes on commit, fails 5% of the time
> thereafter) is a non-trivial pursuit (thinking #2 here), and a pretty
> common problem in this environment.
>
> On Mon, Mar 27, 2017 at 6:51 PM, Nate McCall  wrote:
>
> > I don't want to lose track of the original idea from François, so
> > let's do this formally in preparation for a vote. Having this all in
> > place will make transition to new testing infrastructure more
> > goal-oriented and keep us more focused moving forward.
> >
> > Does anybody have specific feedback/discussion points on the following
> > (awesome, IMO) proposal:
> >
> > Principles:
> >
> > 1. Tests always pass. This is the starting point. If we don't care
> > about test failures, then we should stop writing tests. A recurring
> > failing test carries no signal and is better deleted.
> > 2. The code is tested.
> >
> > Assuming we can align on these principles, here is a proposal for
> > their implementation.
> >
> > Rules:
> >
> > 1. Each new release passes all tests (no flakinesss).
> > 2. If a patch has a failing test (test touching the same code path),
> > the code or test should be fixed prior to being accepted.
> > 3. Bugs fixes should have one test that fails prior to the fix and
> > passes after fix.
> > 4. New code should have at least 90% test coverage.
> >
>

True #4 is hard to verify in he current state. This was mentioned in a
separate thread: If the code was in submodules, the code coverage tools
should have less work to do because they typically only count coverage for
a module and the tests inside that module. At that point it should be easy
to write a plugin on top of something like this:
http://alvinalexander.com/blog/post/java/sample-cobertura-ant-build-script.

This is also an option:

https://about.sonarqube.com/news/2016/05/02/continuous-analysis-for-oss-projects.html


Re: Code quality, principles and rules

2017-03-29 Thread Edward Capriolo
On Sat, Mar 18, 2017 at 9:21 PM, Qingcun Zhou  wrote:

> I wanted to contribute some unit test cases. However the unit test approach
> in Cassandra seems weird to me after looking into some examples. Not sure
> if anyone else has the same feeling.
>
> Usually, at least for all Java projects I have seen, people use mock
> (mockito, powermock) for dependencies. And then in a particular test case
> you verify the behavior using junit.assert* or mockito.verify. However we
> don't use mockito in Cassandra. Is there any reason for this? Without
> these, how easy do people think about adding unit test cases?
>
>
> Besides that, we have lots of singletons and there are already a handful of
> tickets to eliminate them. Maybe I missed something but I'm not seeing much
> progress. Is anyone actively working on this?
>
> Maybe a related problem. Some unit test cases have method annotated with
> @BeforeClass to do initialization work. However, it not only initializes
> direct dependencies, but also indirect ones, including loading
> cassandra.yaml and initializing indirect dependencies. This seems to me
> more like functional/integration test but not unit test style.
>
>
> On Fri, Mar 17, 2017 at 2:56 PM, Jeremy Hanna 
> wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-7837 may be some
> > interesting context regarding what's been worked on to get rid of
> > singletons and static initialization.
> >
> > > On Mar 17, 2017, at 4:47 PM, Jonathan Haddad 
> wrote:
> > >
> > > I'd like to think that if someone refactors existing code, making it
> more
> > > testable (with tests, of course) it should be acceptable on it's own
> > > merit.  In fact, in my opinion it sometimes makes more sense to do
> these
> > > types of refactorings for the sole purpose of improving stability and
> > > testability as opposed to mixing them with features.
> > >
> > > You referenced the issue I fixed in one of the early emails.  The fix
> > > itself was a couple lines of code.  Refactoring the codebase to make it
> > > testable would have been a huge effort.  I wish I had time to do it.  I
> > > created CASSANDRA-13007 as a follow up with the intent of working on
> > > compaction from a purely architectural standpoint.  I think this type
> of
> > > thing should be done throughout the codebase.
> > >
> > > Removing the singletons is a good first step, my vote is we just rip
> off
> > > the bandaid, do it, and move forward.
> > >
> > > On Fri, Mar 17, 2017 at 2:20 PM Edward Capriolo  >
> > > wrote:
> > >
> > >>> On Fri, Mar 17, 2017 at 2:31 PM, Jason Brown 
> > wrote:
> > >>>
> > >>> To François's point about code coverage for new code, I think this
> > makes
> > >> a
> > >>> lot of sense wrt large features (like the current work on
> > >> 8457/12229/9754).
> > >>> It's much simpler to (mentally, at least) isolate those changed
> > sections
> > >>> and it'll show up better in a code coverage report. With small
> patches,
> > >>> that might be harder to achieve - however, as the patch should come
> > with
> > >>> *some* tests (unless it's a truly trivial patch), it might just work
> > >> itself
> > >>> out.
> > >>>
> > >>> On Fri, Mar 17, 2017 at 11:19 AM, Jason Brown 
> > >>> wrote:
> > >>>
> > >>>> As someone who spent a lot of time looking at the singletons topic
> in
> > >> the
> > >>>> past, Blake brings a great perspective here. Figuring out and
> > >>> communicating
> > >>>> how best to test with the system we have (and of course
> incrementally
> > >>>> making that system easier to work with/test) seems like an
> achievable
> > >>> goal.
> > >>>>
> > >>>> On Fri, Mar 17, 2017 at 10:17 AM, Edward Capriolo <
> > >> edlinuxg...@gmail.com
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> On Fri, Mar 17, 2017 at 12:33 PM, Blake Eggleston <
> > >> beggles...@apple.com
> > >>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I think we’re getting a little ahead of ourselves talking about DI
> > >>>>>> frameworks. Before that even becomes something worth talking
> about,
> > &g

Re: findbugs

2012-07-30 Thread Edward Capriolo
I am sure no one would have an issue with an optional findbugs target.

On Mon, Jul 30, 2012 at 10:32 AM, Radim Kolar  wrote:
> was any decision about findbugs made? you do not consider code style
> recommended by findbugs as good practice which should be followed?
>
> I can submit few findbugs patches, but it will probably turns into flamewar
> WE vs FINDBUGS like there:
> https://issues.apache.org/jira/browse/HADOOP-8619
>
> findbugs problems are pretty easy to fix and there are just 70 of them, it
> could be done in two days.
>
> I do not care about findbugs+cas-dev issue much because i need to fork
> cassandra anyway to get performance patches there. Its just matter of
> schedule for me if 1 should feed you fb patches before i fork it.


Re: maximum sstable size

2012-11-03 Thread Edward Capriolo
I have another ticket open for this.

On Sat, Nov 3, 2012 at 6:29 PM, Radim Kolar  wrote:
> done
> https://issues.apache.org/jira/browse/CASSANDRA-4897


Re: 2.0

2012-11-30 Thread Edward Capriolo
Good idea. Lets remove thrift, CQL3 is still beta, but I am willing to
upgrade to a version that removes thrift. Then when all our clients can not
connect they will be forced to get with the program.

On Fri, Nov 30, 2012 at 5:33 PM, Jason Brown  wrote:

> Hi Jonathan,
>
> I'm in favor of paying off the technical debt, as well, and I wonder if
> there is value in removing support for thrift with 2.0? We're currently in
> 'do as little as possible' mode with thrift, so should we aggressively cast
> it off and push the binary CQL protocol? Seems like a jump to '2.0', along
> with the other initiatives, would be a reasonable time/milestone to do so.
>
> Thanks,
>
> -Jason
>
>
> On Fri, Nov 30, 2012 at 12:12 PM, Jonathan Ellis 
> wrote:
>
> > The more I think about it, the more I think we should call 1.2-next,
> > 2.0.  I'd like to spend some time paying off our technical debt:
> >
> > - replace supercolumns with composites (CASSANDRA-3237)
> > - rewrite counters (CASSANDRA-4775)
> > - improve storage engine support for wide rows
> > - better stage management to improve latency (disruptor? lightweight
> > threads?  custom executor + queue?)
> > - improved repair (CASSANDRA-3362, 2699)
> >
> > Of course, we're planning some new features as well:
> > - triggers (CASSANDRA-1311)
> > - improved query fault tolerance (CASSANDRA-4705)
> > - row size limits (CASSANDRA-3929)
> > - cql3 integration for hadoop (CASSANDRA-4421)
> > - improved caching (CASSANDRA-1956, 2864)
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>


Re: 2.0

2012-12-01 Thread Edward Capriolo
I do not understand why everyone wants to force this issue on removing
thrift. If cql, cql sparse tables and the new transport are better people
will naturally begin to use them, but as it stands now I see the it
this way:

Thrift still has more clients for more languages, thrift has more higher
level clients for more languages.
Thrift has Hadoop support hive support and pig support in the wild.
Thrift has third party tools like Orm tools, support for tools like flume.

Most of cql3 features like collections do not work with compact tables,
and compact tables are much more space efficient then their cql3 sparse
counterparts, composite rows with UTf column names, blank rows, etc.
Cql3 binary client is only available for in beta stage for a few languages.

So the project can easily remove thrift today but until a majority of the
tooling by the community adopts the transport and for the most part cqls
sparse tables it is not going to mean anything. Many people already have
code live in production working fine with the old toolset and will be
unwilling to convert something just because

Think about it like this a company like mine that already has something in
production. Even if you could convince me us that cql native transport was
better, which by the way no one has showed me a vast performance reason to
this point, they still may not want to invest the resources to convert
their app. Many companies endured the painful transition from Cassandra 0.6
to Cassandra 0.7 conversion and they are not eagerly going to entertain
another change which is mostly cosmetic.

Also I find issues like this extremely frustrating.
https://issues.apache.org/jira/browse/CASSANDRA-4924

It seems like the project is drawing a hard line in the sand dividing
people. Is it the case that cql3's sparse tables can't be accessed
by thrift, or is it the case that no one wants to make this happen? Like is
it technically impossible? It seems not to me in Cassandra
Row key, column, and value are all still byte arrays right? So I do not see
why thrift users need to be locked out of them. Just like composites we
will figure out how to pack the bytes.

I hope that we can stop talking about removing thrift until there is some
consensus between active users that it is not in use anymore.
This consensus is not as simple as n committers saying that something is
technically not needed anymore. It has to look at the users, the number of
clients, the number of languages, the number of high level tools available.
In the mean time when issues like 4924 pop up it would be better if people
tried to find solutions for maximum forward and backward compatibility
instead of drawing a line and trying to shut thrift users out of things.

Avro was much the same way . I had a spirited debate on irc and got
basicallly insulted because i belived thrift was not dead. The glory of
avro never came true because it really did not work for clients outside a
few languages. Cql and the binary transport has to pass this same litmus
test. Let it gain momentum and have rock solid clients for 5 languages and
have higher level tools written on top of it then its easy to say thrift is
not needed anymore.


On Saturday, December 1, 2012, Sylvain Lebresne wrote:

> I agree on 2.0.
>
> For the thrift part, we've said clearly that we wouldn't remove it any time
> soon so let's stick to that. Besides, I would agree it's too soon anyway.
> What we can do however in the relatively short term on that front, is to
> pull thrift in it's own jar (we've almost removed all internal dependencies
> on thrift, and the few remaining ones will be easy to kill) and make that
> jar optional if you don't want to use it.
>
> --
> Sylvain
>
>
> On Sat, Dec 1, 2012 at 2:52 AM, Ray Slakinski 
> 
> >wrote:
>
> > I agree, I don't think its a great idea to drop thrift until the back
> > end tools are 100% compatible and have some level of agreement from the
> > major users of
> > Cassandra.
> >
> > Paying off technical dept though I'm all for, and I think its key to the
> > long term success of the application. Right now Supercolumns to someone
> > new coming to the system might think "Hey, these things look great. Lets
> > use them" and in a few months time hate all things that are cassandra.
> >
> > Ray Slakinski
> >
> > On 12/01, Jonathan Ellis wrote:
> > > As attractive as it would be to clean house, I think we owe it to our
> > > users to keep Thrift around for the forseeable future rather than
> > > orphan all Thrift-using applications (which is virtually everyone) on
> > > 1.2.
> > >
> > > On Sat, Dec 1, 2012 at 7:33 AM, Jason Brown 
> > > 
> >
> > wrote:
> > > > Hi Jonathan,
> > > >
> > > > I'm in favor of paying off the technical debt, as well, and I wonder
> if
> > > > there is value in removing support for thrift with 2.0? We're
> > currently in
> > > > 'do as little as possible' mode with thrift, so should we
> aggressively
> > cast
> > > > it off and push the binary CQL protocol? Seems lik

Re: Stable Hector version with cassandra 1.1.6

2012-12-04 Thread Edward Capriolo
One thing to note. The maven repo has moved from me.prettyprint to
org.Hector-client so that should aid in your searches of the maven repo.

On Tuesday, December 4, 2012, Bisht, Jaikrit  wrote:
>
> Hi,
>
> Could someone recommend the stable version of Hector libraries for
Cassandra 1.1.6?
>
> Regards
> Jay
>
> VISA EUROPE is a technology business that provides the brand, systems,
services and rules that make electronic payments between millions of
consumers, retailers, businesses and governments happen.  Visa Europe is a
membership association of more than 3,700 members that includes banks and
other payment service providers from 36 countries across Europe.  We
continually invest and innovate to create new and better ways to pay and be
paid.  For more information, please visit www.visaeurope.com.
>
> Please consider the environment before printing this email.
>
> This email (including attachments) is confidential and is solely intended
for the addressee. Unless you are the addressee, you may not read, use or
store this email in any way, or permit others to. If you have received it
in error, please contact Visa Europe on +44 (0)20 7937 8111.
>
> Visa Europe Services Inc. is a company incorporated in Delaware USA,
acting through its UK Establishment (UK Establishment number BR007632)
whose registered office is at 1 Sheldon Square, London, W2 6TT.
>


Re: Compund/Composite column names

2012-12-17 Thread Edward Capriolo
This was discussed in one of the tickets. The problem is that CQL3's sparse
tables is it has different metadata that has NOT been added to thrift's
CFMetaData. Thus thrift is unaware of exactly how to verify the insert.

Originally it was made impossible for thrift to see a sparse table (but
that restriction has been lifted) it seems. It is probably a bad idea to
thrift insert into a sparse table until Cassandra does not have two
distinct sources of meta information.





On Mon, Dec 17, 2012 at 9:52 AM, Vivek Mishra wrote:

> Looks like Thrift API is not working as expected?
>
> -Vivek
>
>
>
>
> 
>  From: Brian O'Neill 
> To: dev@cassandra.apache.org
> Cc: Vivek Mishra 
> Sent: Monday, December 17, 2012 8:12 PM
> Subject: Re: Compund/Composite column names
>
> FYI -- I'm still seeing this on 1.2-beta1.
>
> If you create a table via CQL, then insert into it (via Java API) with
> an incorrect number of components.  The insert works, but select *
> from CQL results in a TSocket read error.
>
> I showed this in the webinar last week, just in case people ran into
> it.  It would be great to translate the ArrayIndexOutofBoundsException
> from the server side into something meaningful in cqlsh to help people
> diagnose the problem.  (a regular user probably doesn't have access to
> the server-side logs)
>
> You can see it at minute 41 in the video from the webinar:
> http://www.youtube.com/watch?v=AdfugJxfd0o&feature=youtu.be
>
> -brian
>
>
> On Tue, Oct 9, 2012 at 9:39 AM, Jonathan Ellis  wrote:
> > Sounds like you're running into the keyspace drop bug.  It's "mostly"
> fixed
> > in 1.1.5 but you might need the latest from 1.1 branch.  1.1.6 will be
> > released soon with the final fix.
> > On Oct 9, 2012 1:58 AM, "Vivek Mishra"  wrote:
> >
> >>
> >>
> >> Ok. I am able to understand the problem now. Issue is:
> >>
> >> If i create a column family altercations as:
> >>
> >>
> >>
> **8
> >> CREATE TABLE altercations (
> >>instigator text,
> >>started_at timestamp,
> >>ships_destroyed int,
> >>energy_used float,
> >>alliance_involvement boolean,
> >>PRIMARY KEY (instigator,started_at,ships_destroyed)
> >>);
> >> /
> >>INSERT INTO altercations (instigator, started_at, ships_destroyed,
> >>  energy_used, alliance_involvement)
> >>  VALUES ('Jayne Cobb', '2012-07-23', 2, 4.6,
> 'false');
> >>
> >>
> *
> >>
> >> It works!
> >>
> >> But if i create a column family with compound primary key with 2
> composite
> >> column as:
> >>
> >>
> >>
> *
> >> CREATE TABLE altercations (
> >>instigator text,
> >>started_at timestamp,
> >>ships_destroyed int,
> >>energy_used float,
> >>alliance_involvement boolean,
> >>PRIMARY KEY (instigator,started_at)
> >>);
> >>
> >>
> >>
> *
> >> and Then drop this column family:
> >>
> >>
> >>
> *
> >> drop columnfamily altercations;
> >>
> >>
> *
> >>
> >> and then try to create same one with primary compound key with 3
> composite
> >> column:
> >>
> >>
> >>
> *
> >>
> >> CREATE TABLE altercations (
> >>instigator text,
> >>started_at timestamp,
> >>ships_destroyed int,
> >>energy_used float,
> >>alliance_involvement boolean,
> >>PRIMARY KEY (instigator,started_at,ships_destroyed)
> >>);
> >>
> >>
> *
> >>
> >> it gives me error: "TSocket read 0 bytes"
> >>
> >> Rest, as no column family is created, so nothing onwards will work.
> >>
> >> Is this an issue?
> >>
> >> -Vivek
> >>
> >>
> >> 
> >>  From: Jonathan Ellis 
> >> To: dev@cassandra.apache.org; Vivek Mishra 
> >> Sent: Tuesday, October 9, 2012 9:08 AM
> >> Subject: Re: Compund/Composite column names
> >>
> >> Works for me on latest 1.1 in cql3 mode.  cql2 mode gives a parse error.
> >>
> >> On Mon, Oct 8, 2012 at 9:18 PM, Vivek Mishra 
> >> wrote:
> >> > Hi All,
> >> >
> >> > I am trying to use compound primary key column name and i am referring
> >> to:
> >> > http://ww

Re: [VOTE CLOSED] Release Apache Cassandra 1.2.0-rc1

2013-01-01 Thread Edward Capriolo
Question. 1.2.0-beta2

Why does the thrift interface have 2 CQL methods?

  CqlResult execute_cql_query(1:required binary query, 2:required
Compression compression)
throws (1:InvalidRequestException ire,
2:UnavailableException ue,
3:TimedOutException te,
4:SchemaDisagreementException sde)

  CqlResult execute_cql3_query(1:required binary query, 2:required
Compression compression, 3:required ConsistencyLevel consistency)
throws (1:InvalidRequestException ire,
2:UnavailableException ue,
3:TimedOutException te,
4:SchemaDisagreementException sde)

Is this something we are going to continue? Just naming methods
execute_cql3_query? I wish we could have done the cassandra 0.6.X -> 0.7.X
migration this way:)

get(String keyspace, String column family, String rowkey, String column)

get7(String columnFamily, binay rowkey, binary column )


On Mon, Dec 3, 2012 at 12:34 PM, Sylvain Lebresne wrote:

> Alright, seems we can use a beta 3 before calling this a RC1.
> So I'm closing this vote and I'll rebrand this as beta3 and do a short 24h
> with that. And hopefully we'll have a true RC1 quickly after that.
>
> Stay tuned.
>
> --
> Sylvain
>
>
> On Mon, Dec 3, 2012 at 5:57 AM, Brandon Williams  wrote:
>
> > On Sun, Dec 2, 2012 at 10:45 PM, Jonathan Ellis 
> wrote:
> > > I'm not a fan of blocking a new rc because of bugs that are not
> > > regressions new in that release.  I'd also like to get more testing on
> > > the 1.2 fixes since b2.  But we can call it b3 instead of rc1 if you
> > > want.
> >
> > I agree with everything you've said.  I'm fine with calling it b3,
> > though I expect we'll have that ticket closed soon and could re-roll
> > an rc1 on Tuesday.
> >
> > -Brandon
> >
>


Re: [VOTE CLOSED] Release Apache Cassandra 1.2.0-rc1

2013-01-02 Thread Edward Capriolo
With thrift methods can not be over-loaded but objects can have optional
parameters.

In the future should we avoid:

CqlResult execute_cql3_query(1:required binary query, 2:required
Compression compression, 3:required ConsistencyLevel consistency)
throws (1:InvalidRequestException ire,
2:UnavailableException ue,
3:TimedOutException te,
4:SchemaDisagreementException sde)

Instead

CqlResult execute_cql3_query(1:required CqlRequestObject object)

and the CqlRequestObject can contain all optional parameters. I can not
find the exact reference, but I remember reading this is the way google has
suggested using protobufs, mark all fields optional always for maximum
compatibility.

On Tue, Jan 1, 2013 at 2:25 PM, Jonathan Ellis  wrote:

> On Tue, Jan 1, 2013 at 11:42 AM, Edward Capriolo 
> wrote:
> > Question. 1.2.0-beta2
> >
> > Why does the thrift interface have 2 CQL methods?
>
> To preserve cql2 compatibility.  cql3 pulls consistencylevel into the
> method call instead of the query language.
>
> > Is this something we are going to continue?
>
> When necessary for compatibility, yes.
>
> > I wish we could have done the cassandra 0.6.X -> 0.7.X
> > migration this way:)
>
> In retrospect, I agree.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>


Re: max_compaction_threshold removed - bad move

2013-01-09 Thread Edward Capriolo
:( Seems like a good thing to have, i can figure at least one degenerate
scenario where having that helps. The first being a currupt sstable...
compaction will never be able to remove it and then each compaction will
likely try to comact it again... and fail.

On Wed, Jan 9, 2013 at 10:35 AM, Brandon Williams  wrote:

> On Wed, Jan 9, 2013 at 9:21 AM, Radim Kolar  wrote:
> > removing max_compaction_threshold in 1.2 was bad move, keeping it low
> helps
> > compaction throughput because it lowers number of disk seeks.
>
> :(
>


Re: max_compaction_threshold removed - bad move

2013-01-09 Thread Edward Capriolo
Was the change well accounted for in the changes.TXT or the readme.txt?

It hasn't been removed, it has been renamed max_threshold and moved into
the compaction options map for CQL3 (nothing has changed for thrift or
CQL2). The CQL3 reference doc hasn't been updated correctly however, which
I'll fix.

^ statements like above are scary. I worry at this point cassandra is
becoming two separate databases in one, we rename something and move it
around, or give it a double meaning, or something. Things are becoming very
unclear and have different meaning in different contexts.

https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/cql3/CFDefinition.java#L90


"// Note that isCompact means here that no componet of the comparator
correspond to the column names
// defined in the CREATE TABLE QUERY. This is not exactly equivalent to the
'WITH COMPACT STORAGE'
// option when creating a table in that "static CF" without a composite
type will have isCompact == false
  // even though one must use 'WITH COMPACT STORAGE' to declare them."


Confused

On Wed, Jan 9, 2013 at 10:56 AM, Sylvain Lebresne wrote:

> On Wed, Jan 9, 2013 at 4:21 PM, Radim Kolar  wrote:
>
> > removing max_compaction_threshold in 1.2 was bad move, keeping it low
> > helps compaction throughput because it lowers number of disk seeks.
> >
>
> It hasn't been removed, it has been renamed max_threshold and moved into
> the compaction options map for CQL3 (nothing has changed for thrift or
> CQL2). The CQL3 reference doc hasn't been updated correctly however, which
> I'll fix.
>
>
> >
> > if you have redhat linux, check during install category "performance
> > tools" or something like that, you will get tools for disk monitoring.
> > Learn to use them.
> >
>
> Obviously no Cassandra dev has ever heard of a "disk monitoring tool".
> That's so helpful and not at all condescending for no good reason, thanks
> Radim.
>
> --
> Sylvain
>


Re: max_compaction_threshold removed - bad move

2013-01-09 Thread Edward Capriolo
If you want to complain about bad names in the code, start with the class
implementing keyspaces being called Table.

OMG that is terrible!

We should only be wrongfully calling a "column family" a "table" :)

(In hbase tables are actually a collection of column familes right so that
is probably where that came from)

On Wed, Jan 9, 2013 at 11:25 AM, Sylvain Lebresne wrote:

> On Wed, Jan 9, 2013 at 5:04 PM, Edward Capriolo  >wrote:
>
> > Was the change well accounted for in the changes.TXT or the readme.txt?
> >
>
> The news file says:
> "CQL3 is now considered final in this release. Compared to the beta
>  version that is part of 1.1, this final version has a few additions
>  (collections), but also some (incompatible) changes in the syntax for the
>  options of the create/alter keyspace/table statements.
>  (...)
>  Please refer to the CQL3 documentation for details"
>
> That last sentence refers to
> http://cassandra.apache.org/doc/cql3/CQL.htmland yes, that should be
> in the news file but that same url was pointing to
> the 1.1 CQL3 doc before 1.2.0 was release so I didn't wanted to list it in
> the news file for the betas and rcs and I forgot to add back the link to
> that news file for the final, my bad (I'm sorry and I will add the link to
> the NEWS file for the next release). And of course having forgotten to
> update the max_threshold thing in said reference doc was infortunate but
> that's fixed now.
>
> Now I know you are not happy with us having made breaking changes between
> CQL3 beta in 1.1 and CQL3 final in 1.2. I'm sorry we did, but I do am happy
> with the coherence of the language we have in that final so I think that
> was probably worth it in the end. I do want to stress that the goal was to
> have a CQL3 final for which we won't do breaking changes for the forseable
> future.
>
>
> >
> > "// Note that isCompact means here that no componet of the comparator
> > correspond to the column names
> > // defined in the CREATE TABLE QUERY. This is not exactly equivalent to
> the
> > 'WITH COMPACT STORAGE'
> > // option when creating a table in that "static CF" without a composite
> > type will have isCompact == false
> >   // even though one must use 'WITH COMPACT STORAGE' to declare them."
> >
> >
> > Confused
> >
>
> Granted that is not the cleanest thing ever and we could probably rename
> that isCompact variable but you do realize that is just an implementation
> "detail" that have no impact whatsoever on users. If you want to complain
> about bad names in the code, start with the class implementing keyspaces
> being called Table.
>
> --
> Sylvain
>


Re: Proposal: require Java7 for Cassandra 2.0

2013-02-07 Thread Edward Capriolo
Counter proposal java 8 and closures. Jk
On Thursday, February 7, 2013, Carl Yeksigian  wrote:
> +1
>
>
> On Wed, Feb 6, 2013 at 5:21 PM, Jonathan Ellis  wrote:
>
>> Java 6 EOL is this month.  Java 7 will be two years old when C* 2.0
>> comes out (July).  Anecdotally, a bunch of people are running C* on
>> Java7 with no issues, except for the Snappy-on-OS-X problem (which
>> will be moot if LZ4 becomes our default, as looks likely).
>>
>> Upgrading to Java7 lets us take advantage of new (two year old)
>> features as well as simplifying interoperability with other
>> dependencies, e.g., Jetty's BlockingArrayQueue requires java7.
>>
>> Thoughts?
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>>
>


Re: Understanding Read and Writes During Transient States

2013-02-16 Thread Edward Capriolo
When a node is joining/bootstrapping the ring and replication factor
is 3, the write operation should be delivered to 4 nodes. The three
current natural endpoints and the new one. In this way if the joining
node fails to join the other nodes did not miss any writes.

The joining node will not answer read requests until it is done bootstrapping.

On Fri, Feb 15, 2013 at 5:24 PM, Muntasir Raihan Rahman
 wrote:
> Hi,
>
> I am trying to understand what happens to reads and writes to cassandra
> while nodes leave or join the system. Specifically, what happens when a
> node is about to leave or join, but gets a read/write request?
>
> Any pointers on this?
>
> Muntasir.
>
> --
> Best Regards
> Muntasir Raihan Rahman
> Email: muntasir.rai...@gmail.com
> Phone: 1-217-979-9307
> Department of Computer Science,
> University of Illinois Urbana Champaign,
> 3111 Siebel Center,
> 201 N. Goodwin Avenue,
> Urbana, IL  61801


Re: Notes from committer's meeting: overview

2013-02-25 Thread Edward Capriolo
I am curious what you mean when you say "does the fat client work right
now?"

What does not work about it? I have a fat client app running same jvm as c*
it seems to work well.



On Monday, February 25, 2013, Jonathan Ellis  wrote:
> Last Thursday, DataStax put together a meeting of the active Cassandra
> committers in San Mateo.  Dave Brosius was unable to make it to the
> West coast, but Brandon, Eric, Gary, Jason, Pavel, Sylvain, Vijay,
> Yuki, and I were able to attend, with Aleksey and Jake able to attend
> part time over Google Hangout.
>
> We started by asking each committer to outline his top 3 priorities
> for 2.0.  There was pretty broad consensus around the following big
> items, which I will break out into separate threads:
>
> * Streaming and repair
> * Counters
>
> There was also a lot of consensus that we'll be able to ship some form
> of Triggers [1] in 2.0.  Gary's suggestion was to focus on getting the
> functionality nailed down first, then worry about classloader voodoo
> to allow live reloading.  There was also general agreement that we
> need to split jar loading from trigger definition, to allow a single
> trigger to be applied to be multiple tables.
>
> There was less consensus around CAS [2], primarily because of
> implementation difficulties.  (I've since read up some more on Paxos
> and Spinnaker and posted my thoughts to the ticket.)
>
> Other subjects discussed:
>
> A single Cassandra process does not scale well beyond 12 physical
> cores.  Further research is needed to understand why.  One possibility
> is GC overhead.  Vijay is going to test Azul's Zing VM to confirm or
> refute this.
>
> Server-side aggregation functions [3].  This would remove the need to
> pull a lot of data over the wire to a client unnecessarily.  There was
> some unease around moving beyond the relatively simple queries we've
> traditionally supported, but I think there was general agreement that
> this can be addressed by fencing aggregation to a single partition
> unless explicitly allowed otherwise a la ALLOW FILTERING [4].
>
> Extending cross-datacenter forwarding [5] to a "star" model.  That is,
> in the case of three or more datacenters, instead of the original
> coordinator in DC A sending to replicas in DC B & C, A would forward
> to B, which would forward to C.  Thus, the bandwidth required for any
> one DC would be constant as more datacenters are added.
>
> Vnode improvements such as a vnode-aware replication strategy [6].
>
> Cluster merging and splitting -- if I have multiple applications using
> a single cassandra cluster, and one gets a lot more traffic than the
> others, I may want to split that out into its own cluster.  I think
> there was a concrete proposal as to how this could work but someone
> else will have to fill that in because I didn't write it down.
>
> Auto-paging of SELECT queries for CQL [7], or put another way,
> transparent cursors for the native CQL driver.
>
> Make the storage engine more CQL-aware.  Low-hanging fruit here
> includes a prefix dictionary for all the composite cell names [8].
>
> Resurrecting the StorageProxy API aka Fat Client.  ("Does it even work
> right now?"  "Not really.")
>
> Reducing context switches and increasing fairness in client
> connections.  HSHA prefers to accept new connections vs servicing
> existing ones, so overload situations are problematic.
>
> "Gossip is unreliable at 100s of nodes."  Here again I missed any
> concrete proposals to address this.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1311.  Start with
>
https://issues.apache.org/jira/browse/CASSANDRA-1311?focusedCommentId=13492827&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13492827
> for the parts relevant to Vijay's proof of concept patch.
> [2] https://issues.apache.org/jira/browse/CASSANDRA-5062
> [3] https://issues.apache.org/jira/browse/CASSANDRA-4914
> [4] https://issues.apache.org/jira/browse/CASSANDRA-4915
> [5] https://issues.apache.org/jira/browse/CASSANDRA-3577
> [6] https://issues.apache.org/jira/browse/CASSANDRA-4123
> [7] https://issues.apache.org/jira/browse/CASSANDRA-4415
> [8] https://issues.apache.org/jira/browse/CASSANDRA-4175
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-02 Thread Edward Capriolo
It might be reasonable to enforce length on byte and string since this is
an upper limit, but just adding it to the grammer for compatability is just
more grammer. Personally I like nosql because of the nogrammer part, CQL
create table is not toocumbersome butI dont want to jump through hoops
specifying stuff that is not actually important in the final outcome.

On Sat, Mar 2, 2013 at 6:11 AM, Andrew Prendergast  wrote:

> *DESCRIPTION*
>
> When creating a table in all ANSI-SQL compliant RDBMS' the VARCHAR datatype
> takes a numeric parameter, however this parameter is generating errors in
> CQL3.
>
> *STEPS TO REPRODUCE*
>
> CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR(256)); // emits Bad
> Request: line 1:54 mismatched input '(' expecting ')'
>
> CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR); // this works
>
> *SUGGESTED RESOLUTION*
>
> The current fail-fast approach does not create the column so that the user
> is 100% clear that the length parameter means nothing to NOSQL.
>
> I would like to propose that the column length be allowed in the grammar
> (but ignored by cassandra), allowing better ANSI-SQL client compatibility.
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-02 Thread Edward Capriolo
If the syntax effectively does nothing I do not see the point of adding it.
CQL is never going to be 100% compatible ANSI-SQL dialect.

On Sat, Mar 2, 2013 at 12:19 PM, Michael Kjellman
wrote:

> Might want to create a Jira ticket at issues.apache.org instead of
> submitting the bug report thru email.
>
> On Mar 2, 2013, at 3:11 AM, "Andrew Prendergast" 
> wrote:
>
> > *DESCRIPTION*
> >
> > When creating a table in all ANSI-SQL compliant RDBMS' the VARCHAR
> datatype
> > takes a numeric parameter, however this parameter is generating errors in
> > CQL3.
> >
> > *STEPS TO REPRODUCE*
> >
> > CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR(256)); // emits
> Bad
> > Request: line 1:54 mismatched input '(' expecting ')'
> >
> > CREATE TABLE test (id BIGINT PRIMARY KEY, col1 VARCHAR); // this works
> >
> > *SUGGESTED RESOLUTION*
> >
> > The current fail-fast approach does not create the column so that the
> user
> > is 100% clear that the length parameter means nothing to NOSQL.
> >
> > I would like to propose that the column length be allowed in the grammar
> > (but ignored by cassandra), allowing better ANSI-SQL client
> compatibility.
>
> Copy, by Barracuda, helps you store, protect, and share all your amazing
>
> things. Start today: www.copy.com.
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Edward Capriolo
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less

Does your the tool handle the fact that foreign keys do not work? Or for
that matter, how are your dealing with the fact that a "primary key" in
cassandra is nothing like a "primary key" in a RDBMS?

Generally under the impression that CRUD tools that auto-generate CQL
schema's can give someone the rope to hang themselves.

On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast  wrote:

> Hi Tristan,
>
> I've spent the last couple weekends testing the CRUD DML stuff and its very
> close to meeting that objective (although NULL handling needs some tuning).
>
> The main hiccups are in the JDBC driver which I have been working through
> with Rick - once he accepts my patches it'll be pretty solid in terms of
> cross-platform compatibility.
>
> On the DDL, I personally have a need for similar compatibility. One app I'm
> working on  programmatically creates the schema for a rather big ETL
> environment. It includes a very nice abstraction that creates databases and
> tables to accommodate tuples as they pass through the pipeline and behaves
> the same regardless of which DBMS is being used as the storage engine.
>
> This is possible because it turns out there is a subset of DDL that is
> common to all of the DBMS platforms and it would be very useful to see that
> in Cassandra.
>
> ap
>
>
>
>
> On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
> wrote:
>
> > On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne  > >wrote:
> >
> > > > This is just one of a few small adjustments that can be made to the
> > > grammar
> > > > to make everyone's life easier while still maintaining the spirit of
> > > NOSQL.
> > >
> > > To be clear, I am *not* necessarily against making CQL3 closer to the
> > > ANSI-SQL
> > > as a convenience. But only if that doesn't compromise the language
> > > "integrity"
> > > and is justified. Adding a syntax with a well known semantic but
> without
> > >
> >
> > To me database DDL (such as the CREATE statement we are talking about) is
> > always going to be handled in a custom fashion by applications.
> > While ANSI SQL compatibility for CRUD operations is a great objective, I
> > don't think it really matters for DDL.
> >
> > Tristan
> >
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Edward Capriolo
Not to say that you can not do it. Or that it is impossible to do
correctly, but currently Cassandra does not allow it's validation to accept
parameters per column. IE you can set a column to be varchar UTF8Type, or
int int32Type but you CAN'T attach more properties to that type, such as
the size of the text or the integer.

I am very wary of Cassandra adding anymore schema. I signed up for a
schema-LESS database. If schema can be added that is not conflicting with
the original use cases so be it. However the latest round of "schema" has
caused COMPACT TABLES and CQL tables to be very different and essentially
not compatible with each other.

With schema and cassandra less is more.

On Tue, Mar 5, 2013 at 4:08 PM, Edward Capriolo wrote:

>
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
>
> Does your the tool handle the fact that foreign keys do not work? Or for
> that matter, how are your dealing with the fact that a "primary key" in
> cassandra is nothing like a "primary key" in a RDBMS?
>
> Generally under the impression that CRUD tools that auto-generate CQL
> schema's can give someone the rope to hang themselves.
>
> On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast <
> a...@andrewprendergast.com> wrote:
>
>> Hi Tristan,
>>
>> I've spent the last couple weekends testing the CRUD DML stuff and its
>> very
>> close to meeting that objective (although NULL handling needs some
>> tuning).
>>
>> The main hiccups are in the JDBC driver which I have been working through
>> with Rick - once he accepts my patches it'll be pretty solid in terms of
>> cross-platform compatibility.
>>
>> On the DDL, I personally have a need for similar compatibility. One app
>> I'm
>> working on  programmatically creates the schema for a rather big ETL
>> environment. It includes a very nice abstraction that creates databases
>> and
>> tables to accommodate tuples as they pass through the pipeline and behaves
>> the same regardless of which DBMS is being used as the storage engine.
>>
>> This is possible because it turns out there is a subset of DDL that is
>> common to all of the DBMS platforms and it would be very useful to see
>> that
>> in Cassandra.
>>
>> ap
>>
>>
>>
>>
>> On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
>> wrote:
>>
>> > On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne > > >wrote:
>> >
>> > > > This is just one of a few small adjustments that can be made to the
>> > > grammar
>> > > > to make everyone's life easier while still maintaining the spirit of
>> > > NOSQL.
>> > >
>> > > To be clear, I am *not* necessarily against making CQL3 closer to the
>> > > ANSI-SQL
>> > > as a convenience. But only if that doesn't compromise the language
>> > > "integrity"
>> > > and is justified. Adding a syntax with a well known semantic but
>> without
>> > >
>> >
>> > To me database DDL (such as the CREATE statement we are talking about)
>> is
>> > always going to be handled in a custom fashion by applications.
>> > While ANSI SQL compatibility for CRUD operations is a great objective, I
>> > don't think it really matters for DDL.
>> >
>> > Tristan
>> >
>>
>
>


Re: bug report - CQL3 grammar should ignore VARCHAR column length in CREATE statements

2013-03-05 Thread Edward Capriolo
yes. It doesn't use foreign keys or any constraints, they slow things down.

Exactly what you do not want. Check the history of the "features" that do
read before write. Counters, the old read before write secondary indexes,
the new collection functions that impose read before write.

Once people start using them they send an email to cassandra mailing list
that goes like this:
"
Subject: Why is Cassandra so slow?
Message: I am using secondary indexes and as I write data I seem my
READ_STAGE is filling up. What is going on? I thought cassandra was faster
then MySQL? Once my database gets bigger then X GB it slows to a crawl.
Please help.
"
If we make tools that design anti-pattern schema's people will use them, no
one wins.


On Tue, Mar 5, 2013 at 4:30 PM, Andrew Prendergast  wrote:

> *>
>
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
> *
> Thanks for the link Ed, I'm aware of all that.
>
> *> Does your the tool handle the fact that foreign keys do not work?
> *
> yes. It doesn't use foreign keys or any constraints, they slow things down.
>
> *> how are your dealing with the fact that a "primary key" in cassandra is
> nothing like a "primary key" in a RDBMS?
> *
> locality preserving sequences & natural keys. There are no range queries.
>
> *> Generally under the impression that CRUD tools that auto-generate CQL
> schema's can give someone the rope to hang themselves.
> *
> For those of us that know what we are doing and have had to put up with SQL
> based ETL, refining CQL3 would be life changing and ease the transition.
>
> ap
>
>
>
>
> On Wed, Mar 6, 2013 at 8:08 AM, Edward Capriolo  >wrote:
>
> >
> >
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/schema_vs_schema_less
> >
> > Does your the tool handle the fact that foreign keys do not work? Or for
> > that matter, how are your dealing with the fact that a "primary key" in
> > cassandra is nothing like a "primary key" in a RDBMS?
> >
> > Generally under the impression that CRUD tools that auto-generate CQL
> > schema's can give someone the rope to hang themselves.
> >
> > On Tue, Mar 5, 2013 at 3:46 PM, Andrew Prendergast <
> > a...@andrewprendergast.com
> > > wrote:
> >
> > > Hi Tristan,
> > >
> > > I've spent the last couple weekends testing the CRUD DML stuff and its
> > very
> > > close to meeting that objective (although NULL handling needs some
> > tuning).
> > >
> > > The main hiccups are in the JDBC driver which I have been working
> through
> > > with Rick - once he accepts my patches it'll be pretty solid in terms
> of
> > > cross-platform compatibility.
> > >
> > > On the DDL, I personally have a need for similar compatibility. One app
> > I'm
> > > working on  programmatically creates the schema for a rather big ETL
> > > environment. It includes a very nice abstraction that creates databases
> > and
> > > tables to accommodate tuples as they pass through the pipeline and
> > behaves
> > > the same regardless of which DBMS is being used as the storage engine.
> > >
> > > This is possible because it turns out there is a subset of DDL that is
> > > common to all of the DBMS platforms and it would be very useful to see
> > that
> > > in Cassandra.
> > >
> > > ap
> > >
> > >
> > >
> > >
> > > On Tue, Mar 5, 2013 at 8:26 PM, Tristan Tarrant
> > > wrote:
> > >
> > > > On Tue, Mar 5, 2013 at 10:20 AM, Sylvain Lebresne <
> > sylv...@datastax.com
> > > > >wrote:
> > > >
> > > > > > This is just one of a few small adjustments that can be made to
> the
> > > > > grammar
> > > > > > to make everyone's life easier while still maintaining the spirit
> > of
> > > > > NOSQL.
> > > > >
> > > > > To be clear, I am *not* necessarily against making CQL3 closer to
> the
> > > > > ANSI-SQL
> > > > > as a convenience. But only if that doesn't compromise the language
> > > > > "integrity"
> > > > > and is justified. Adding a syntax with a well known semantic but
> > > without
> > > > >
> > > >
> > > > To me database DDL (such as the CREATE statement we are talking
> about)
> > is
> > > > always going to be handled in a custom fashion by applications.
> > > > While ANSI SQL compatibility for CRUD operations is a great
> objective,
> > I
> > > > don't think it really matters for DDL.
> > > >
> > > > Tristan
> > > >
> > >
> >
>


Re: Bitmap indexes - reviving CASSANDRA-1472

2013-04-12 Thread Edward Capriolo
I am not sure about the collection case. But for compact storage you can
specify multiple-ranges in a slice query.

https://issues.apache.org/jira/browse/CASSANDRA-3885

I am not sure this will get you all the way to bit-map indexes but in a
wide row scenario it seems like you could support a "event contains 1 or
event contains 2 or event contains 3"

I am not sure how arbitrarily complex the CQL query handler can/will
become. For intravert (something I am dabling with) the concept is to apply
a server side function to the result of a slice.

https://github.com/zznate/intravert-ug/wiki/Filter-mode

There is a huge win in having multiple indexes behind the plugable index
support, not all of the plugable indexes and query options will be easy to
CQL-ify.




On Fri, Apr 12, 2013 at 10:52 AM, Jonathan Ellis  wrote:

> Something like this?
>
> SELECT * FROM users
> WHERE user_id IN (select user_id from events where type in (1, 2, 3))
>   AND user_id NOT IN (select user_id from events where type=4)
>
> This doesn't really look like a Cassandra query to me.  More like a
> query for Hive (or Drill, or Impala).
>
> But, I know Sylvain is looking forward to adding index support to
> Collections [1], so something like this might fit:
>
> SELECT * FROM users
> WHERE (events CONTAINS 1 OR events CONTAINS 2 OR events CONTAINS 3)
>AND NOT (events CONTAINS 4)
>
> However, even this is more than our current query planner can handle;
> we don't really handle disjunctions at all, except for the special
> case of IN on the partition key (which translates to multiget), let
> alone arbitrary logical predicates.
>
> I think that between "bitmap indexes" and "query planning," the latter
> is actually the hard part.  QueryProcessor is about at the limits of
> tractable complexity already; I think we'd need a new approach if we
> want to handle arbitrarily complex predicates like that.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-4511
>
>
> On Wed, Apr 10, 2013 at 4:40 PM, mrevilgnome 
> wrote:
> > What do you think about set manipulation via indexes in Cassandra? I'm
> > interested in answering queries such as give me all users that performed
> > event 1, 2, and 3, but not 4. If the answer is yes than I can make a case
> > for spending my time on C*. The only downside for us would be our current
> > prototype is in C++ so we would loose some performance and the ability to
> > dedicate an entire machine to caching/performing queries.
> >
> >
> > On Wed, Apr 10, 2013 at 11:57 AM, Jonathan Ellis 
> wrote:
> >
> >> If you mean, "Can someone help me figure out how to get started updating
> >> these old patches to trunk and cleaning out the Avro?" then yes, I've
> been
> >> knee-deep in indexing code recently.
> >>
> >>
> >> On Wed, Apr 10, 2013 at 11:34 AM, mrevilgnome 
> >> wrote:
> >>
> >> > I'm currently building a distributed cluster on top of cassandra to
> >> perform
> >> > fast set manipulation via bitmap indexes. This gives me the ability to
> >> > perform unions, intersections, and set subtraction across sub-queries.
> >> > Currently I'm storing index information for thousands of dimensions as
> >> > cassandra rows, and my cluster keeps this information cached,
> distributed
> >> > and replicated in order to answer queries.
> >> >
> >> > Every couple of days I think to myself this should really exist in C*.
> >> > Given all the benifits would there be any interest in
> >> > reviving CASSANDRA-1472?
> >> >
> >> > Some downsides are that this is very memory intensive, even for sparse
> >> > bitmaps.
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder, http://www.datastax.com
> >> @spyced
> >>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>


Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Edward Capriolo
This makes sense. Unless you are running major compaction a delete could
only happen if the bloom filters confirmed the row was not in the sstables
not being compacted. If your rows are wide the odds are that they are in
most/all sstables and then finally removing them would be tricky.


On Thu, May 16, 2013 at 12:00 PM, Louvet, Jacques <
jacques_lou...@cable.comcast.com> wrote:

>  Boris,
>
>  We hit exactly the same issue, and you are correct the newly created
> SSTables are the cause of why most of the column-tombstone not being purged.
>
>  There is an improvement in 1.2 train where both the minimum and maximum
> timestamp for a row is now stored and used during the compaction to
> determine if the portion of the row can be purged.
> However, this only appears to help Major compaction has the other
> restriction where all the files encompassing the deleted rows must be part
> of the compaction for the row to be purged still remains.
>
>  We have switched to column delete rather that row delete wherever
> practical. A little more work on the app, but a big improvement in reads
> due to much more efficient compaction.
>
>  Regards,
> Jacques
>
>   From: Boris Yen 
> Reply-To: "u...@cassandra.apache.org" 
> Date: Thursday, May 16, 2013 04:07
> To: "u...@cassandra.apache.org" , "
> dev@cassandra.apache.org" 
> Subject: Major compaction does not seems to free the disk space a lot if
> wide rows are used.
>
>  Hi All,
>
> Sorry for the wide distribution.
>
>  Our cassandra is running on 1.0.10. Recently, we are facing a weird
> situation. We have a column family containing wide rows (each row might
> have a few million of columns). We delete the columns on a daily basis and
> we also run major compaction on it everyday to free up disk space (the
> gc_grace is set to 600 seconds).
>
>  However, every time we run the major compaction, only 1 or 2GB disk space
> is freed. We tried to delete most of the data before running compaction,
> however, the result is pretty much the same.
>
>  So, we tried to check the source code. It seems that the column
> tombstones could only be purged when the row key is not in other sstables.
> I know the major compaction should include all sstables, however, in our
> use case, columns get inserted rapidly. This will make the cassandra flush
> the memtables to disk and create new sstables. The newly created sstables
> will have the same keys as the sstables that are being compacted (the
> compaction will take 2 or 3 hours to finish). My question is that will
> these newly created sstables be the cause of why most of the
> column-tombstone not being purged?
>
>  p.s. We also did some other tests. We inserted data to the same CF with
> the same wide-row pattern and deleted most of the data. This time we
> stopped all the writes to cassandra and did the compaction. The disk usage
> decreased dramatically.
>
>  Any suggestions or is this a know issue.
>
>  Thanks and Regards,
>  Boris
>


Re: CQL vs Thrift

2013-07-18 Thread Edward Capriolo
If you understand how cql collections are written you can decode them and
work with them from thrift. It's quite a chore and i would not suggest
trying yo do it however.

(I suspect tyler tried it and jonathan broke his hand jk)

There is a perl cassandra driver that did something like this.

On Wednesday, July 17, 2013, Jonathan Ellis  wrote:
> On Wed, Jul 17, 2013 at 4:03 PM, Tyler Hobbs  wrote:
>> I'll leave it to somebody else to comment on adding collections, etc to
>> Thrift.
>
> Doesn't make sense, since Thrift is all about the raw data cells, and
> collections are an abstraction layer on top of that.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>


Re: Fw: Fwd: CQL & Thrift

2013-08-30 Thread Edward Capriolo
This is always so hard to explain but

http://www.datastax.com/dev/blog/thrift-to-cql3

Get to the part that looks like this:

update column family user_profiles
with key_validation_class = UTF8Type
and comparator = UTF8Type
and column_metadata=[]

"Since the static column values validation types have been dropped, they
are not available to your client library anymore. In particular, as can be
seen in the output above, cqlsh display some value in a non human-readable
format. And unless the client library exposes an easy way to force the
deserialization format for a value, such deserialization will have to be
done manually in client code."

This I think the above is largest reason. Due to the way 'CQL'  wants to
present 'thrift' column familes, you have to lose your 'thrift' notion of
schema, because it is not compatible with the 'cql' notion of schema. I am
wrapping 'thrift' and 'cql' in quotes because 'CQL' is an access language,
but when you define tables as non-compact storage they gain 'features' that
make them not understandable by non-cql clients.

They have two different schema systems, two different access languages,
there is some compatibility between the two, but working out which feature
sets mix and match is more effort then just picking one.


On Fri, Aug 30, 2013 at 2:05 PM, Vivek Mishra wrote:

> fyi. Just curious to know the real reason behind "not to mix thrift and
> CQL3".
>
> Any pointers?
>
> -Vivek
>
>
>
> -- Forwarded message --
> From: Vivek Mishra 
> Date: Fri, Aug 30, 2013 at 11:21 PM
> Subject: Re: CQL & Thrift
> To: u...@cassandra.apache.org
>
>
>
> Hi,
> I understand that, but i want to understand the reason behind
> such behavior?  Is it because of maintaining different metadata objects for
> CQL3 and thrift?
>
> Any suggestion?
>
> -Vivek
>
>
>
> On Fri, Aug 30, 2013 at 11:15 PM, Jon Haddad  wrote:
>
> If you're going to work with CQL, work with CQL.  If you're going to work
> with Thrift, work with Thrift.  Don't mix.
> >
> >
> >On Aug 30, 2013, at 10:38 AM, Vivek Mishra  wrote:
> >
> >Hi,
> >>If i a create a table with CQL3 as
> >>
> >>
> >>create table user(user_id text PRIMARY KEY, first_name text, last_name
> text, emailid text);
> >>
> >>
> >>and create index as:
> >>create index on user(first_name);
> >>
> >>
> >>then inserted some data as:
> >>insert into user(user_id,first_name,last_name,"emailId")
> values('@mevivs','vivek','mishra','vivek.mis...@impetus.co.in');
> >>
> >>
> >>
> >>
> >>
> >>Then if update same column family using Cassandra-cli as:
> >>
> >>
> >>update column family user with key_validation_class='UTF8Type' and
> column_metadata=[{column_name:last_name, validation_class:'UTF8Type',
> index_type:KEYS},{column_name:first_name, validation_class:'UTF8Type',
> index_type:KEYS}];
> >>
> >>
> >>
> >>
> >>
> >>Now if i connect via cqlsh and explore user table, i can see column
> first_name,last_name are not part of table structure anymore. Here is the
> output:
> >>
> >>
> >>CREATE TABLE user (
> >>  key text PRIMARY KEY
> >>) WITH
> >>  bloom_filter_fp_chance=0.01 AND
> >>  caching='KEYS_ONLY' AND
> >>  comment='' AND
> >>  dclocal_read_repair_chance=0.00 AND
> >>  gc_grace_seconds=864000 AND
> >>  read_repair_chance=0.10 AND
> >>  replicate_on_write='true' AND
> >>  populate_io_cache_on_flush='false' AND
> >>  compaction={'class': 'SizeTieredCompactionStrategy'} AND
> >>  compression={'sstable_compression': 'SnappyCompressor'};
> >>
> >>
> >>cqlsh:cql3usage> select * from user;
> >>
> >>
> >> user_id
> >>-
> >> @mevivs
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>I understand that, CQL3 and thrift interoperability is an issue. But
> this looks to me a very basic scenario.
> >>
> >>
> >>
> >>
> >>
> >>
> >>Any suggestions? Or If anybody can explain a reason behind this?
> >>
> >>
> >>-Vivek
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >


Re: Node side processing

2014-02-27 Thread Edward Capriolo
Check intravert on github. I am working t get many of those features into
cassandra.

On Thursday, February 27, 2014, Brandon Williams  wrote:
> A few:
>
> https://issues.apache.org/jira/browse/CASSANDRA-4914
>
> https://issues.apache.org/jira/browse/CASSANDRA-5184
>
> https://issues.apache.org/jira/browse/CASSANDRA-6704
>
> https://issues.apache.org/jira/browse/CASSANDRA-6167
>
>
>
> On Thu, Feb 27, 2014 at 7:50 AM, David Semeria wrote:
>
>> Hi List,
>>
>> I was wondering whether there have been any past proposals for
>> implementing node side processing (NSP) in C*. By NSP, I mean the
passing a
>> reference to a Java class which would then process the result set before
it
>> being returned to the client.
>>
>> In our particular use case our clients typically loop through result sets
>> of a million or more rows to produce a tiny amount of output (sums,
means,
>> variance, etc). The bottleneck -- quite obviously -- is the need to
>> transfer a million rows to the client before processing can take place.
It
>> would be extremely useful to execute this processing on the coordinator
>> node and only transfer the results to the client.
>>
>> I mention this here because I can imagine other C* users having similar
>> requirements.
>>
>> Thanks
>>
>> D.
>>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
I am -1. For a few reasons:

Cassandra will be the only database ( that I know of ) where the only
official client to the database will live in source control outside of the
project. I would like some clarity on this development will go on in an
open source fashion. Namely:

1) Who does and how do they do regression testing between the database
server and the client? I.E. are the bugs "on the client" or "in the server"
hard to say when there is no official client.
2) How can an open source apache project depend on a non apache managed
resource to accomplish basic development? IE if there is a cassandra
committer that does not have commit on the driver source code get work done?
3) Who has the "final word" on how a feature is implemented in the native
protocol? Imagine there are two implementations of CQL native-cql-ruby and
native-cql-java. Let's say these libraries have both interpreted the
transport spec differently. One of them has to be broken to fix the
problem. Who resolves this issue and how?

"With static columns and LWT batch support [1] landing in 2.0.6, and
 UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
 done in CQL."

Do we mean CQL the transport, CQL the storage engine, CQL the procedure
engine (auto timestamps), or CQL the language? :)  Its hard for thrift to
"do things" when specific "read before write list collection operations"
are impossible to do from a "transport".

"To a large degree, this merely formalizes what is already de facto
reality.  Most thrift clients have not even added support for
atomic_batch_mutate and cas from 2.0, and popular clients like Astyanax are
migrating to the native protocol."

This is such a loaded statement, most committers have not even "committed"
to adding features to thrift. Take for example "
https://issues.apache.org/jira/browse/CASSANDRA-5435"; adding range
tombstones to thrift was actually a very simple effort. One day I just got
off my couch and went through the simple effort of pushing this along. What
is happening is a self fulfilling prophecy, if everyone throws tons of
development effort in one direction unsurprisingly the other direction lags
behind.



On Tue, Mar 11, 2014 at 1:43 PM, Gary Dusbabek  wrote:

> +1
>
>
>
> On Tue, Mar 11, 2014 at 12:00 PM, Jonathan Ellis 
> wrote:
>
> > CQL3 is almost two years old now and has proved to be the better API
> > that Cassandra needed.  CQL drivers have caught up with and passed the
> > Thrift ones in terms of features, performance, and usability.  CQL is
> > easier to learn and more productive than Thrift.
> >
> > With static columns and LWT batch support [1] landing in 2.0.6, and
> > UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
> > done in CQL.  Contrawise, CQL makes many things easy that are
> > difficult to impossible in Thrift.  New development is overwhelmingly
> > done using CQL.
> >
> > To date we have had an unofficial and poorly defined policy of "add
> > support for new features to Thrift when that is 'easy.'"  However,
> > even relatively simple Thrift changes can create subtle complications
> > for the rest of the server; for instance, allowing Thrift range
> > tombtones would make filter conversion for CASSANDRA-6506 more
> > difficult.
> >
> > Thus, I think it's time to officially close the book on Thrift.  We
> > will retain it for backwards compatibility, but we will commit to
> > adding no new features or changes to the Thrift API after 2.1.0.  This
> > will help send an unambiguous message to users and eliminate any
> > remaining confusion from supporting two APIs.  If any new use cases
> > come to light that can be done with Thrift but not CQL, we will commit
> > to supporting those in CQL.
> >
> > (To a large degree, this merely formalizes what is already de facto
> > reality.  Most thrift clients have not even added support for
> > atomic_batch_mutate and cas from 2.0, and popular clients like
> > Astyanax are migrating to the native protocol.)
> >
> > Reasonable?
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
> > [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
"The native protocol spec is the source of truth.  If Cassandra's behavior
doesn't match the spec, it's a bug.  Likewise for any drivers.  I'm not
sure how this makes it unclear whether a bug is server-side or
client-side.  Maybe an example scenario would be useful?"

In the near future. I am a cassadra committer. I find a bug between
cassanda server and java client driver. For example, the server is sending
an unsigned by the other is expecting a signed byte.

As a cassandra committer I can only change half of the equation. I change
the cassandra server, that would break the ruby-client. That won't work
will it?

My only recourse as a cassandra committer is to go ask some other entity to
change their driver.

"This means the spec is ambiguous.  In that case, I imagine the proper
solution would be to create a jira ticket and decide how to resolve the
ambiguity in the spec."

Yes but then after you change the spec, one client is broken and one is
not. Is one client more "official" then another? Do you change the spec to
match the client with "more users".

Think about mysql. Does it ship with a driver? Yes. Who writes the driver?
mysql. Where is the source code for this driver? Inside the same repository
as the server. Cassandra should be the same way.






On Tue, Mar 11, 2014 at 2:58 PM, Tyler Hobbs  wrote:

> On Tue, Mar 11, 2014 at 1:37 PM, Edward Capriolo  >wrote:
>
> >
> > 1) Who does and how do they do regression testing between the database
> > server and the client? I.E. are the bugs "on the client" or "in the
> server"
> > hard to say when there is no official client.
> >
>
> The native protocol spec is the source of truth.  If Cassandra's behavior
> doesn't match the spec, it's a bug.  Likewise for any drivers.  I'm not
> sure how this makes it unclear whether a bug is server-side or
> client-side.  Maybe an example scenario would be useful?
>
>
> > 2) How can an open source apache project depend on a non apache managed
> > resource to accomplish basic development? IE if there is a cassandra
> > committer that does not have commit on the driver source code get work
> > done?
> >
>
> Cassandra itself already depends on external projects for basic development
> (ant, libraries, etc).  The drivers are no different (and most are Apache
> licensed themselves).
>
>
> > 3) Who has the "final word" on how a feature is implemented in the native
> > protocol? Imagine there are two implementations of CQL native-cql-ruby
> and
> > native-cql-java. Let's say these libraries have both interpreted the
> > transport spec differently. One of them has to be broken to fix the
> > problem. Who resolves this issue and how?
>
>
> This means the spec is ambiguous.  In that case, I imagine the proper
> solution would be to create a jira ticket and decide how to resolve the
> ambiguity in the spec.
>
> Basically, I think you're looking for a reference implementation instead of
> a spec.  Perhaps a reference implementation would be useful, but that's a
> separate debate.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
"Other databases treat this issue differently, and there are a set of
tradeoffs.  Mysql's decision may not be the best for Cassandra."

Do you know of any other database that does not provide it's own driver?


On Tue, Mar 11, 2014 at 3:55 PM, Tyler Hobbs  wrote:

> On Tue, Mar 11, 2014 at 2:24 PM, Edward Capriolo  >wrote:
>
> > "The native protocol spec is the source of truth.  If Cassandra's
> behavior
> > doesn't match the spec, it's a bug.  Likewise for any drivers.  I'm not
> > sure how this makes it unclear whether a bug is server-side or
> > client-side.  Maybe an example scenario would be useful?"
> >
> > In the near future. I am a cassadra committer. I find a bug between
> > cassanda server and java client driver. For example, the server is
> sending
> > an unsigned by the other is expecting a signed byte.
> >
> > As a cassandra committer I can only change half of the equation. I change
> > the cassandra server, that would break the ruby-client. That won't work
> > will it?
> >
> > My only recourse as a cassandra committer is to go ask some other entity
> to
> > change their driver.
> >
>
> The solution would be:
> 1. Update the spec (for the current protocol version) to specify that it's
> an unsigned byte.  (Perhaps add a note that this will change in the next
> protocol version.)
> 2. In the next version of the protocol, specify that the byte is signed and
> change Cassandra's behavior to match this.   Note this change in the
> "changes" section of the spec.
>
> This doesn't break existing clients and it allows the behavior to be fixed
> with the next protocol version.  (Cassandra also supports multiple versions
> of the native protocol, fwiw.)
>
>
> >
> > "This means the spec is ambiguous.  In that case, I imagine the proper
> > solution would be to create a jira ticket and decide how to resolve the
> > ambiguity in the spec."
> >
> > Yes but then after you change the spec, one client is broken and one is
> > not. Is one client more "official" then another? Do you change the spec
> to
> > match the client with "more users".
> >
>
> You change the spec to match whatever Cassandra is doing.  It's not a
> matter of what driver is more popular.
>
>
> >
> > Think about mysql. Does it ship with a driver? Yes. Who writes the
> driver?
> > mysql. Where is the source code for this driver? Inside the same
> repository
> > as the server. Cassandra should be the same way.
>
>
> Other databases treat this issue differently, and there are a set of
> tradeoffs.  Mysql's decision may not be the best for Cassandra.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
"How about the myriad of thrift wrappers that aren't in-tree either?"

How about all the times we trashed hbase saying "hbase treats non java
people like second class citizens"

http://mail-archives.apache.org/mod_mbox/hbase-user/201108.mbox/%3ccafk14gsrnysj_oev2_utwc-+u4ssdmdsmp2dgrst90hoypw...@mail.gmail.com%3E

Nice to see us pulling a total 180.


On Tue, Mar 11, 2014 at 4:09 PM, Brandon Williams  wrote:

> How about the myriad of thrift wrappers that aren't in-tree either?
>
>
> On Tue, Mar 11, 2014 at 3:03 PM, Edward Capriolo  >wrote:
>
> > "Other databases treat this issue differently, and there are a set of
> > tradeoffs.  Mysql's decision may not be the best for Cassandra."
> >
> > Do you know of any other database that does not provide it's own driver?
> >
> >
> > On Tue, Mar 11, 2014 at 3:55 PM, Tyler Hobbs  wrote:
> >
> > > On Tue, Mar 11, 2014 at 2:24 PM, Edward Capriolo <
> edlinuxg...@gmail.com
> > > >wrote:
> > >
> > > > "The native protocol spec is the source of truth.  If Cassandra's
> > > behavior
> > > > doesn't match the spec, it's a bug.  Likewise for any drivers.  I'm
> not
> > > > sure how this makes it unclear whether a bug is server-side or
> > > > client-side.  Maybe an example scenario would be useful?"
> > > >
> > > > In the near future. I am a cassadra committer. I find a bug between
> > > > cassanda server and java client driver. For example, the server is
> > > sending
> > > > an unsigned by the other is expecting a signed byte.
> > > >
> > > > As a cassandra committer I can only change half of the equation. I
> > change
> > > > the cassandra server, that would break the ruby-client. That won't
> work
> > > > will it?
> > > >
> > > > My only recourse as a cassandra committer is to go ask some other
> > entity
> > > to
> > > > change their driver.
> > > >
> > >
> > > The solution would be:
> > > 1. Update the spec (for the current protocol version) to specify that
> > it's
> > > an unsigned byte.  (Perhaps add a note that this will change in the
> next
> > > protocol version.)
> > > 2. In the next version of the protocol, specify that the byte is signed
> > and
> > > change Cassandra's behavior to match this.   Note this change in the
> > > "changes" section of the spec.
> > >
> > > This doesn't break existing clients and it allows the behavior to be
> > fixed
> > > with the next protocol version.  (Cassandra also supports multiple
> > versions
> > > of the native protocol, fwiw.)
> > >
> > >
> > > >
> > > > "This means the spec is ambiguous.  In that case, I imagine the
> proper
> > > > solution would be to create a jira ticket and decide how to resolve
> the
> > > > ambiguity in the spec."
> > > >
> > > > Yes but then after you change the spec, one client is broken and one
> is
> > > > not. Is one client more "official" then another? Do you change the
> spec
> > > to
> > > > match the client with "more users".
> > > >
> > >
> > > You change the spec to match whatever Cassandra is doing.  It's not a
> > > matter of what driver is more popular.
> > >
> > >
> > > >
> > > > Think about mysql. Does it ship with a driver? Yes. Who writes the
> > > driver?
> > > > mysql. Where is the source code for this driver? Inside the same
> > > repository
> > > > as the server. Cassandra should be the same way.
> > >
> > >
> > > Other databases treat this issue differently, and there are a set of
> > > tradeoffs.  Mysql's decision may not be the best for Cassandra.
> > >
> > >
> > > --
> > > Tyler Hobbs
> > > DataStax <http://datastax.com/>
> > >
> >
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
If only some languages have support via a third party entity everyone who
does not have support is a second class citizen.

On Tue, Mar 11, 2014 at 4:16 PM, Jonathan Ellis  wrote:

> What part of the native protocol makes any language a second class citizen?
>
> On Tue, Mar 11, 2014 at 3:13 PM, Edward Capriolo 
> wrote:
> > "How about the myriad of thrift wrappers that aren't in-tree either?"
> >
> > How about all the times we trashed hbase saying "hbase treats non java
> > people like second class citizens"
> >
> >
> http://mail-archives.apache.org/mod_mbox/hbase-user/201108.mbox/%3ccafk14gsrnysj_oev2_utwc-+u4ssdmdsmp2dgrst90hoypw...@mail.gmail.com%3E
> >
> > Nice to see us pulling a total 180.
> >
> >
> > On Tue, Mar 11, 2014 at 4:09 PM, Brandon Williams 
> wrote:
> >
> >> How about the myriad of thrift wrappers that aren't in-tree either?
> >>
> >>
> >> On Tue, Mar 11, 2014 at 3:03 PM, Edward Capriolo  >> >wrote:
> >>
> >> > "Other databases treat this issue differently, and there are a set of
> >> > tradeoffs.  Mysql's decision may not be the best for Cassandra."
> >> >
> >> > Do you know of any other database that does not provide it's own
> driver?
> >> >
> >> >
> >> > On Tue, Mar 11, 2014 at 3:55 PM, Tyler Hobbs 
> wrote:
> >> >
> >> > > On Tue, Mar 11, 2014 at 2:24 PM, Edward Capriolo <
> >> edlinuxg...@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > "The native protocol spec is the source of truth.  If Cassandra's
> >> > > behavior
> >> > > > doesn't match the spec, it's a bug.  Likewise for any drivers.
>  I'm
> >> not
> >> > > > sure how this makes it unclear whether a bug is server-side or
> >> > > > client-side.  Maybe an example scenario would be useful?"
> >> > > >
> >> > > > In the near future. I am a cassadra committer. I find a bug
> between
> >> > > > cassanda server and java client driver. For example, the server is
> >> > > sending
> >> > > > an unsigned by the other is expecting a signed byte.
> >> > > >
> >> > > > As a cassandra committer I can only change half of the equation. I
> >> > change
> >> > > > the cassandra server, that would break the ruby-client. That won't
> >> work
> >> > > > will it?
> >> > > >
> >> > > > My only recourse as a cassandra committer is to go ask some other
> >> > entity
> >> > > to
> >> > > > change their driver.
> >> > > >
> >> > >
> >> > > The solution would be:
> >> > > 1. Update the spec (for the current protocol version) to specify
> that
> >> > it's
> >> > > an unsigned byte.  (Perhaps add a note that this will change in the
> >> next
> >> > > protocol version.)
> >> > > 2. In the next version of the protocol, specify that the byte is
> signed
> >> > and
> >> > > change Cassandra's behavior to match this.   Note this change in the
> >> > > "changes" section of the spec.
> >> > >
> >> > > This doesn't break existing clients and it allows the behavior to be
> >> > fixed
> >> > > with the next protocol version.  (Cassandra also supports multiple
> >> > versions
> >> > > of the native protocol, fwiw.)
> >> > >
> >> > >
> >> > > >
> >> > > > "This means the spec is ambiguous.  In that case, I imagine the
> >> proper
> >> > > > solution would be to create a jira ticket and decide how to
> resolve
> >> the
> >> > > > ambiguity in the spec."
> >> > > >
> >> > > > Yes but then after you change the spec, one client is broken and
> one
> >> is
> >> > > > not. Is one client more "official" then another? Do you change the
> >> spec
> >> > > to
> >> > > > match the client with "more users".
> >> > > >
> >> > >
> >> > > You change the spec to match whatever Cassandra is doing.  It's not
> a
> >> > > matter of what driver is more popular.
> >> > >
> >> > >
> >> > > >
> >> > > > Think about mysql. Does it ship with a driver? Yes. Who writes the
> >> > > driver?
> >> > > > mysql. Where is the source code for this driver? Inside the same
> >> > > repository
> >> > > > as the server. Cassandra should be the same way.
> >> > >
> >> > >
> >> > > Other databases treat this issue differently, and there are a set of
> >> > > tradeoffs.  Mysql's decision may not be the best for Cassandra.
> >> > >
> >> > >
> >> > > --
> >> > > Tyler Hobbs
> >> > > DataStax <http://datastax.com/>
> >> > >
> >> >
> >>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
"I am confused how any of this is relevant to Jonathan's original email."

Here is how:

I believe if native is the new official transport, Cassandra should include
the Java driver source code with the project.

Without the driver code inside the project how can someone use/develop the
software.


On Tue, Mar 11, 2014 at 4:24 PM, Brandon Williams  wrote:

> I am confused how any of this is relevant to Jonathan's original email.
>
>
> On Tue, Mar 11, 2014 at 3:13 PM, Edward Capriolo  >wrote:
>
> > "How about the myriad of thrift wrappers that aren't in-tree either?"
> >
> > How about all the times we trashed hbase saying "hbase treats non java
> > people like second class citizens"
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hbase-user/201108.mbox/%3ccafk14gsrnysj_oev2_utwc-+u4ssdmdsmp2dgrst90hoypw...@mail.gmail.com%3E
> >
> > Nice to see us pulling a total 180.
> >
> >
> > On Tue, Mar 11, 2014 at 4:09 PM, Brandon Williams 
> > wrote:
> >
> > > How about the myriad of thrift wrappers that aren't in-tree either?
> > >
> > >
> > > On Tue, Mar 11, 2014 at 3:03 PM, Edward Capriolo <
> edlinuxg...@gmail.com
> > > >wrote:
> > >
> > > > "Other databases treat this issue differently, and there are a set of
> > > > tradeoffs.  Mysql's decision may not be the best for Cassandra."
> > > >
> > > > Do you know of any other database that does not provide it's own
> > driver?
> > > >
> > > >
> > > > On Tue, Mar 11, 2014 at 3:55 PM, Tyler Hobbs 
> > wrote:
> > > >
> > > > > On Tue, Mar 11, 2014 at 2:24 PM, Edward Capriolo <
> > > edlinuxg...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > "The native protocol spec is the source of truth.  If Cassandra's
> > > > > behavior
> > > > > > doesn't match the spec, it's a bug.  Likewise for any drivers.
>  I'm
> > > not
> > > > > > sure how this makes it unclear whether a bug is server-side or
> > > > > > client-side.  Maybe an example scenario would be useful?"
> > > > > >
> > > > > > In the near future. I am a cassadra committer. I find a bug
> between
> > > > > > cassanda server and java client driver. For example, the server
> is
> > > > > sending
> > > > > > an unsigned by the other is expecting a signed byte.
> > > > > >
> > > > > > As a cassandra committer I can only change half of the equation.
> I
> > > > change
> > > > > > the cassandra server, that would break the ruby-client. That
> won't
> > > work
> > > > > > will it?
> > > > > >
> > > > > > My only recourse as a cassandra committer is to go ask some other
> > > > entity
> > > > > to
> > > > > > change their driver.
> > > > > >
> > > > >
> > > > > The solution would be:
> > > > > 1. Update the spec (for the current protocol version) to specify
> that
> > > > it's
> > > > > an unsigned byte.  (Perhaps add a note that this will change in the
> > > next
> > > > > protocol version.)
> > > > > 2. In the next version of the protocol, specify that the byte is
> > signed
> > > > and
> > > > > change Cassandra's behavior to match this.   Note this change in
> the
> > > > > "changes" section of the spec.
> > > > >
> > > > > This doesn't break existing clients and it allows the behavior to
> be
> > > > fixed
> > > > > with the next protocol version.  (Cassandra also supports multiple
> > > > versions
> > > > > of the native protocol, fwiw.)
> > > > >
> > > > >
> > > > > >
> > > > > > "This means the spec is ambiguous.  In that case, I imagine the
> > > proper
> > > > > > solution would be to create a jira ticket and decide how to
> resolve
> > > the
> > > > > > ambiguity in the spec."
> > > > > >
> > > > > > Yes but then after you change the spec, one client is broken and
> > one
> > > is
> > > > > > not. Is one client more "official" then another? Do you change
> the
> > > spec
> > > > > to
> > > > > > match the client with "more users".
> > > > > >
> > > > >
> > > > > You change the spec to match whatever Cassandra is doing.  It's
> not a
> > > > > matter of what driver is more popular.
> > > > >
> > > > >
> > > > > >
> > > > > > Think about mysql. Does it ship with a driver? Yes. Who writes
> the
> > > > > driver?
> > > > > > mysql. Where is the source code for this driver? Inside the same
> > > > > repository
> > > > > > as the server. Cassandra should be the same way.
> > > > >
> > > > >
> > > > > Other databases treat this issue differently, and there are a set
> of
> > > > > tradeoffs.  Mysql's decision may not be the best for Cassandra.
> > > > >
> > > > >
> > > > > --
> > > > > Tyler Hobbs
> > > > > DataStax <http://datastax.com/>
> > > > >
> > > >
> > >
> >
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
I will move on. I --coincidentally-- happen to have just added a thrift
feature
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/thrift_isn_t_going_anywhere.
I also have 2-3 jira's open to add thrift features.

Seems like an interesting time to call a vote that effectively adds
language that stops me from doing what I want to do.


On Tue, Mar 11, 2014 at 4:43 PM, Jonathan Ellis  wrote:

> Nobody can seriously use or develop against Cassandra with only the
> raw Thrift generated code either, so I agree that this is really a
> different discussion.
>
> On Tue, Mar 11, 2014 at 3:38 PM, Edward Capriolo 
> wrote:
> > "I am confused how any of this is relevant to Jonathan's original email."
> >
> > Here is how:
> >
> > I believe if native is the new official transport, Cassandra should
> include
> > the Java driver source code with the project.
> >
> > Without the driver code inside the project how can someone use/develop
> the
> > software.
> >
> >
> > On Tue, Mar 11, 2014 at 4:24 PM, Brandon Williams 
> wrote:
> >
> >> I am confused how any of this is relevant to Jonathan's original email.
> >>
> >>
> >> On Tue, Mar 11, 2014 at 3:13 PM, Edward Capriolo  >> >wrote:
> >>
> >> > "How about the myriad of thrift wrappers that aren't in-tree either?"
> >> >
> >> > How about all the times we trashed hbase saying "hbase treats non java
> >> > people like second class citizens"
> >> >
> >> >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/hbase-user/201108.mbox/%3ccafk14gsrnysj_oev2_utwc-+u4ssdmdsmp2dgrst90hoypw...@mail.gmail.com%3E
> >> >
> >> > Nice to see us pulling a total 180.
> >> >
> >> >
> >> > On Tue, Mar 11, 2014 at 4:09 PM, Brandon Williams 
> >> > wrote:
> >> >
> >> > > How about the myriad of thrift wrappers that aren't in-tree either?
> >> > >
> >> > >
> >> > > On Tue, Mar 11, 2014 at 3:03 PM, Edward Capriolo <
> >> edlinuxg...@gmail.com
> >> > > >wrote:
> >> > >
> >> > > > "Other databases treat this issue differently, and there are a
> set of
> >> > > > tradeoffs.  Mysql's decision may not be the best for Cassandra."
> >> > > >
> >> > > > Do you know of any other database that does not provide it's own
> >> > driver?
> >> > > >
> >> > > >
> >> > > > On Tue, Mar 11, 2014 at 3:55 PM, Tyler Hobbs 
> >> > wrote:
> >> > > >
> >> > > > > On Tue, Mar 11, 2014 at 2:24 PM, Edward Capriolo <
> >> > > edlinuxg...@gmail.com
> >> > > > > >wrote:
> >> > > > >
> >> > > > > > "The native protocol spec is the source of truth.  If
> Cassandra's
> >> > > > > behavior
> >> > > > > > doesn't match the spec, it's a bug.  Likewise for any drivers.
> >>  I'm
> >> > > not
> >> > > > > > sure how this makes it unclear whether a bug is server-side or
> >> > > > > > client-side.  Maybe an example scenario would be useful?"
> >> > > > > >
> >> > > > > > In the near future. I am a cassadra committer. I find a bug
> >> between
> >> > > > > > cassanda server and java client driver. For example, the
> server
> >> is
> >> > > > > sending
> >> > > > > > an unsigned by the other is expecting a signed byte.
> >> > > > > >
> >> > > > > > As a cassandra committer I can only change half of the
> equation.
> >> I
> >> > > > change
> >> > > > > > the cassandra server, that would break the ruby-client. That
> >> won't
> >> > > work
> >> > > > > > will it?
> >> > > > > >
> >> > > > > > My only recourse as a cassandra committer is to go ask some
> other
> >> > > > entity
> >> > > > > to
> >> > > > > > change their driver.
> >> > > > > >
> >> > > > >
> >> > > > > The solution would be:
> >> > > > > 1. Update the

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
With support officially deprecated that will be the only way to go. If a
user wants to add a function to thrift they will have to fork off
cassandra, code the function themselves write the internals, manage the
internals. I see this as being a very hard task because the server could
change rapidly with no regards to them. Also this could cause a
proliferation of functions. Could you imagine a thrift server with 300
methods :). This is why I think keeping the support in trunk and carefully
adding things would be sane, but seemingly no one wants to support it at
all so a fork is probably in order.


On Tue, Mar 11, 2014 at 7:46 PM, Russ Bradberry wrote:

> I would like to suggest the possibility of having the interface somewhat
> pluggable so another project can provide the Thrift interface as a drop in
> JAR. Thoughts?
>
> Sent from my iPhone
>
> > On Mar 11, 2014, at 7:26 PM, Edward Capriolo 
> wrote:
> >
> > If you are using thrift there probably isn't a reason to upgrade to 2.1
> >
> > What? Upgrading gets you performance regardless of your api.
> >
> > We have already gone from "no new feature" talk to "less enphisis on
> > testing".
> >
> > How comforting.
> >> On Tuesday, March 11, 2014, Dave Brosius 
> wrote:
> >>
> >> +1,
> >>
> >> altho supporting thrift in 2.1 seems overly conservative.
> >>
> >> If you are using thrift there probably isn't a reason to upgrade to 2.1,
> > in fact doing so will become an increasingly dumb idea as lesser and
> lesser
> > emphasis will be placed on testing with 2.1+. This would allow us to
> > greatly simplify the code footprint in 2.1
> >>
> >>
> >>
> >>
> >>> On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
> >>>
> >>> CQL3 is almost two years old now and has proved to be the better API
> >>> that Cassandra needed.  CQL drivers have caught up with and passed the
> >>> Thrift ones in terms of features, performance, and usability.  CQL is
> >>> easier to learn and more productive than Thrift.
> >>>
> >>> With static columns and LWT batch support [1] landing in 2.0.6, and
> >>> UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
> >>> done in CQL.  Contrawise, CQL makes many things easy that are
> >>> difficult to impossible in Thrift.  New development is overwhelmingly
> >>> done using CQL.
> >>>
> >>> To date we have had an unofficial and poorly defined policy of "add
> >>> support for new features to Thrift when that is 'easy.'"  However,
> >>> even relatively simple Thrift changes can create subtle complications
> >>> for the rest of the server; for instance, allowing Thrift range
> >>> tombtones would make filter conversion for CASSANDRA-6506 more
> >>> difficult.
> >>>
> >>> Thus, I think it's time to officially close the book on Thrift.  We
> >>> will retain it for backwards compatibility, but we will commit to
> >>> adding no new features or changes to the Thrift API after 2.1.0.  This
> >>> will help send an unambiguous message to users and eliminate any
> >>> remaining confusion from supporting two APIs.  If any new use cases
> >>> come to light that can be done with Thrift but not CQL, we will commit
> >>> to supporting those in CQL.
> >>>
> >>> (To a large degree, this merely formalizes what is already de facto
> >>> reality.  Most thrift clients have not even added support for
> >>> atomic_batch_mutate and cas from 2.0, and popular clients like
> >>> Astyanax are migrating to the native protocol.)
> >>>
> >>> Reasonable?
> >>>
> >>> [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
> >>> [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell check
> than
> > usual.
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
I meant to say that thrift provides a facade over the StorageProxy. Without
thrift the only user of the cassandra engine would be CQL. At that point
the storage engine would likely evolve less usable and plugable. Thrift
"has it easy" because it has friendly methods like
StorageProxy.batch_mutate() to call. Without that project level support
many of the things that plugable_application_x would want to call buried
inside a set of interfaces that are designed only with the CQL use case in
mind. In a simple case imagine something you want inside
cool_new_interface_x is marked private in cassandra. You then need to fork
the code, or convince upstream to make it accessible.

BTW I think you know, but I already took a stab at what your describing,
pluggable, rest, and jvm language (https://github.com/zznate/intravert-ug)


On Tue, Mar 11, 2014 at 8:16 PM, Russell Bradberry wrote:

> I didn't mean a someone should maintain a fork of Cassandra. More like
> something that could be dropped in. Just like clients have to keep up with
> the server, a project like this would also.  I think if the interface was
> pluggable it would also allow others to expand and come up with new
> interfaces that can possibly expand the user base.  One example would be a
> built in REST interface that doesn't rely on an external web server that
> translates requests to CQL, just drop in a JAR and the interface comes
> available.
>
> This would also lend itself to allow anyone to write an interface in any
> (JVM) language they want, if they want to add external stored procedures
> via this interface then they would be able to.   I'm for the removal of
> Thrift in the trunk, but I think there is a use-case for an extensible
> interface.
>
> I still seem to remember there was a few angry users when Avro was removed.
>
>
> On Tue, Mar 11, 2014 at 8:04 PM, Edward Capriolo  >wrote:
>
> > With support officially deprecated that will be the only way to go. If a
> > user wants to add a function to thrift they will have to fork off
> > cassandra, code the function themselves write the internals, manage the
> > internals. I see this as being a very hard task because the server could
> > change rapidly with no regards to them. Also this could cause a
> > proliferation of functions. Could you imagine a thrift server with 300
> > methods :). This is why I think keeping the support in trunk and
> carefully
> > adding things would be sane, but seemingly no one wants to support it at
> > all so a fork is probably in order.
> >
> >
> > On Tue, Mar 11, 2014 at 7:46 PM, Russ Bradberry  > >wrote:
> >
> > > I would like to suggest the possibility of having the interface
> somewhat
> > > pluggable so another project can provide the Thrift interface as a drop
> > in
> > > JAR. Thoughts?
> > >
> > > Sent from my iPhone
> > >
> > > > On Mar 11, 2014, at 7:26 PM, Edward Capriolo 
> > > wrote:
> > > >
> > > > If you are using thrift there probably isn't a reason to upgrade to
> 2.1
> > > >
> > > > What? Upgrading gets you performance regardless of your api.
> > > >
> > > > We have already gone from "no new feature" talk to "less enphisis on
> > > > testing".
> > > >
> > > > How comforting.
> > > >> On Tuesday, March 11, 2014, Dave Brosius 
> > > wrote:
> > > >>
> > > >> +1,
> > > >>
> > > >> altho supporting thrift in 2.1 seems overly conservative.
> > > >>
> > > >> If you are using thrift there probably isn't a reason to upgrade to
> > 2.1,
> > > > in fact doing so will become an increasingly dumb idea as lesser and
> > > lesser
> > > > emphasis will be placed on testing with 2.1+. This would allow us to
> > > > greatly simplify the code footprint in 2.1
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
> > > >>>
> > > >>> CQL3 is almost two years old now and has proved to be the better
> API
> > > >>> that Cassandra needed.  CQL drivers have caught up with and passed
> > the
> > > >>> Thrift ones in terms of features, performance, and usability.  CQL
> is
> > > >>> easier to learn and more productive than Thrift.
> > > >>>
> > > >>> With static columns and LWT batch support [1] landing in 2.0.6, and
> > > >>> UDT in 2.1 [2], I don&#x

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
I can agree with not liking the "construction kit approach".

Redis http://redis.io/commands 40 plus commands over telnet.

elastic search: json over http:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html

couch db: json over http and javascript:
http://docs.couchdb.org/en/latest/intro/tour.html

mongo db: json over binary api, with javascript and in database map reduce.

At this point it is just different strokes for different folks, some people
want a query api because they dont get nosql and some dont.


On Tue, Mar 11, 2014 at 8:35 PM, Jonathan Ellis  wrote:

> I don't think we're well-served by the "construction kit" approach.
> It's difficult enough to evaluate NoSQL without deciding if you should
> run CQLSandra or Hectorsandra or Intravertandra etc.
>
> On Tue, Mar 11, 2014 at 7:16 PM, Russell Bradberry 
> wrote:
> > I didn't mean a someone should maintain a fork of Cassandra. More like
> > something that could be dropped in. Just like clients have to keep up
> with
> > the server, a project like this would also.  I think if the interface was
> > pluggable it would also allow others to expand and come up with new
> > interfaces that can possibly expand the user base.  One example would be
> a
> > built in REST interface that doesn't rely on an external web server that
> > translates requests to CQL, just drop in a JAR and the interface comes
> > available.
> >
> > This would also lend itself to allow anyone to write an interface in any
> > (JVM) language they want, if they want to add external stored procedures
> > via this interface then they would be able to.   I'm for the removal of
> > Thrift in the trunk, but I think there is a use-case for an extensible
> > interface.
> >
> > I still seem to remember there was a few angry users when Avro was
> removed.
> >
> >
> > On Tue, Mar 11, 2014 at 8:04 PM, Edward Capriolo  >wrote:
> >
> >> With support officially deprecated that will be the only way to go. If a
> >> user wants to add a function to thrift they will have to fork off
> >> cassandra, code the function themselves write the internals, manage the
> >> internals. I see this as being a very hard task because the server could
> >> change rapidly with no regards to them. Also this could cause a
> >> proliferation of functions. Could you imagine a thrift server with 300
> >> methods :). This is why I think keeping the support in trunk and
> carefully
> >> adding things would be sane, but seemingly no one wants to support it at
> >> all so a fork is probably in order.
> >>
> >>
> >> On Tue, Mar 11, 2014 at 7:46 PM, Russ Bradberry  >> >wrote:
> >>
> >> > I would like to suggest the possibility of having the interface
> somewhat
> >> > pluggable so another project can provide the Thrift interface as a
> drop
> >> in
> >> > JAR. Thoughts?
> >> >
> >> > Sent from my iPhone
> >> >
> >> > > On Mar 11, 2014, at 7:26 PM, Edward Capriolo  >
> >> > wrote:
> >> > >
> >> > > If you are using thrift there probably isn't a reason to upgrade to
> 2.1
> >> > >
> >> > > What? Upgrading gets you performance regardless of your api.
> >> > >
> >> > > We have already gone from "no new feature" talk to "less enphisis on
> >> > > testing".
> >> > >
> >> > > How comforting.
> >> > >> On Tuesday, March 11, 2014, Dave Brosius  >
> >> > wrote:
> >> > >>
> >> > >> +1,
> >> > >>
> >> > >> altho supporting thrift in 2.1 seems overly conservative.
> >> > >>
> >> > >> If you are using thrift there probably isn't a reason to upgrade to
> >> 2.1,
> >> > > in fact doing so will become an increasingly dumb idea as lesser and
> >> > lesser
> >> > > emphasis will be placed on testing with 2.1+. This would allow us to
> >> > > greatly simplify the code footprint in 2.1
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>> On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
> >> > >>>
> >> > >>> CQL3 is almost two years old now and has proved to be the better
> API
> >> > >>> that Cassandra needed.  CQL drivers have caught up with and passed

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-11 Thread Edward Capriolo
If you are using thrift there probably isn't a reason to upgrade to 2.1

What? Upgrading gets you performance regardless of your api.

We have already gone from "no new feature" talk to "less enphisis on
testing".

How comforting.
On Tuesday, March 11, 2014, Dave Brosius  wrote:
>
> +1,
>
> altho supporting thrift in 2.1 seems overly conservative.
>
> If you are using thrift there probably isn't a reason to upgrade to 2.1,
in fact doing so will become an increasingly dumb idea as lesser and lesser
emphasis will be placed on testing with 2.1+. This would allow us to
greatly simplify the code footprint in 2.1
>
>
>
>
> On 03/11/2014 01:00 PM, Jonathan Ellis wrote:
>>
>> CQL3 is almost two years old now and has proved to be the better API
>> that Cassandra needed.  CQL drivers have caught up with and passed the
>> Thrift ones in terms of features, performance, and usability.  CQL is
>> easier to learn and more productive than Thrift.
>>
>> With static columns and LWT batch support [1] landing in 2.0.6, and
>> UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
>> done in CQL.  Contrawise, CQL makes many things easy that are
>> difficult to impossible in Thrift.  New development is overwhelmingly
>> done using CQL.
>>
>> To date we have had an unofficial and poorly defined policy of "add
>> support for new features to Thrift when that is 'easy.'"  However,
>> even relatively simple Thrift changes can create subtle complications
>> for the rest of the server; for instance, allowing Thrift range
>> tombtones would make filter conversion for CASSANDRA-6506 more
>> difficult.
>>
>> Thus, I think it's time to officially close the book on Thrift.  We
>> will retain it for backwards compatibility, but we will commit to
>> adding no new features or changes to the Thrift API after 2.1.0.  This
>> will help send an unambiguous message to users and eliminate any
>> remaining confusion from supporting two APIs.  If any new use cases
>> come to light that can be done with Thrift but not CQL, we will commit
>> to supporting those in CQL.
>>
>> (To a large degree, this merely formalizes what is already de facto
>> reality.  Most thrift clients have not even added support for
>> atomic_batch_mutate and cas from 2.0, and popular clients like
>> Astyanax are migrating to the native protocol.)
>>
>> Reasonable?
>>
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
>> [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
>>
>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
", I don't know of any use cases for Thrift that can't be
> done in CQL"

Can dynamic composites be used from CQL?


On Wed, Mar 12, 2014 at 4:44 AM, Sylvain Lebresne wrote:

> +1 to Jonathan's proposal.
>
>
> On Tue, Mar 11, 2014 at 6:00 PM, Jonathan Ellis  wrote:
>
> > CQL3 is almost two years old now and has proved to be the better API
> > that Cassandra needed.  CQL drivers have caught up with and passed the
> > Thrift ones in terms of features, performance, and usability.  CQL is
> > easier to learn and more productive than Thrift.
> >
> > With static columns and LWT batch support [1] landing in 2.0.6, and
> > UDT in 2.1 [2], I don't know of any use cases for Thrift that can't be
> > done in CQL.  Contrawise, CQL makes many things easy that are
> > difficult to impossible in Thrift.  New development is overwhelmingly
> > done using CQL.
> >
> > To date we have had an unofficial and poorly defined policy of "add
> > support for new features to Thrift when that is 'easy.'"  However,
> > even relatively simple Thrift changes can create subtle complications
> > for the rest of the server; for instance, allowing Thrift range
> > tombtones would make filter conversion for CASSANDRA-6506 more
> > difficult.
> >
> > Thus, I think it's time to officially close the book on Thrift.  We
> > will retain it for backwards compatibility, but we will commit to
> > adding no new features or changes to the Thrift API after 2.1.0.  This
> > will help send an unambiguous message to users and eliminate any
> > remaining confusion from supporting two APIs.  If any new use cases
> > come to light that can be done with Thrift but not CQL, we will commit
> > to supporting those in CQL.
> >
> > (To a large degree, this merely formalizes what is already de facto
> > reality.  Most thrift clients have not even added support for
> > atomic_batch_mutate and cas from 2.0, and popular clients like
> > Astyanax are migrating to the native protocol.)
> >
> > Reasonable?
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
> > [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Edward Capriolo
I am glad the project has is adoptimg unambigious language of their
position. It is nice to have the clarity that volunteer efforts to add
features to thrift will be rejected.

This is a shining example of how a volunteer  apache software foundation
project should be run. if users are attempting to add features, call a vote
and add language to stop them.

+1
On Wednesday, March 12, 2014, Sylvain Lebresne  wrote:
> On Wed, Mar 12, 2014 at 1:38 PM, Edward Capriolo wrote:
>
>> ", I don't know of any use cases for Thrift that can't be
>> > done in CQL"
>>
>> Can dynamic composites be used from CQL?
>>
>
> Sure, you can use any AbstractType Class you want as type in CQL the same
> way you
> would do it with the thrift API.
>
> --
> Sylvain
>
>
>
>>
>>
>> On Wed, Mar 12, 2014 at 4:44 AM, Sylvain Lebresne > >wrote:
>>
>> > +1 to Jonathan's proposal.
>> >
>> >
>> > On Tue, Mar 11, 2014 at 6:00 PM, Jonathan Ellis 
>> wrote:
>> >
>> > > CQL3 is almost two years old now and has proved to be the better API
>> > > that Cassandra needed.  CQL drivers have caught up with and passed
the
>> > > Thrift ones in terms of features, performance, and usability.  CQL is
>> > > easier to learn and more productive than Thrift.
>> > >
>> > > With static columns and LWT batch support [1] landing in 2.0.6, and
>> > > UDT in 2.1 [2], I don't know of any use cases for Thrift that can't
be
>> > > done in CQL.  Contrawise, CQL makes many things easy that are
>> > > difficult to impossible in Thrift.  New development is overwhelmingly
>> > > done using CQL.
>> > >
>> > > To date we have had an unofficial and poorly defined policy of "add
>> > > support for new features to Thrift when that is 'easy.'"  However,
>> > > even relatively simple Thrift changes can create subtle complications
>> > > for the rest of the server; for instance, allowing Thrift range
>> > > tombtones would make filter conversion for CASSANDRA-6506 more
>> > > difficult.
>> > >
>> > > Thus, I think it's time to officially close the book on Thrift.  We
>> > > will retain it for backwards compatibility, but we will commit to
>> > > adding no new features or changes to the Thrift API after 2.1.0.
 This
>> > > will help send an unambiguous message to users and eliminate any
>> > > remaining confusion from supporting two APIs.  If any new use cases
>> > > come to light that can be done with Thrift but not CQL, we will
commit
>> > > to supporting those in CQL.
>> > >
>> > > (To a large degree, this merely formalizes what is already de facto
>> > > reality.  Most thrift clients have not even added support for
>> > > atomic_batch_mutate and cas from 2.0, and popular clients like
>> > > Astyanax are migrating to the native protocol.)
>> > >
>> > > Reasonable?
>> > >
>> > > [1] https://issues.apache.org/jira/browse/CASSANDRA-6561
>> > > [2] https://issues.apache.org/jira/browse/CASSANDRA-5590
>> > >
>> > > --
>> > > Jonathan Ellis
>> > > Project Chair, Apache Cassandra
>> > > co-founder, http://www.datastax.com
>> > > @spyced
>> > >
>> >
>>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-13 Thread Edward Capriolo
There was a paging bug in 2.0 and a user just reported a bug sorting a one
row dataset.

So if you want to argue cql has surpassed thrift in all ways, one way it
clearly has not is correctness.

To demonatrate, search the changelog for cql bugs that return wrong result.
Then do the same search for thrift bugs that return the wrong result and
compare.

If nubes to the ml can pick up bugs and performance regressions it is a
serious issue.

On Wednesday, March 12, 2014, Jonathan Ellis  wrote:
> I don't know if an IN query already does this without source diving,
> but it could certainly do so without needing extra syntax.
>
> On Wed, Mar 12, 2014 at 7:16 PM, Nicolas Favre-Felix 
wrote:
>>> If any new use cases
>>> come to light that can be done with Thrift but not CQL, we will commit
>>> to supporting those in CQL.
>>
>> Hello,
>>
>> (going back to the original topic...)
>>
>> I just wanted to point out that there is in my opinion an important
>> use case that is doable in Thrift but not in CQL, which is to fetch
>> several CQL rows from the same partition in a single isolated read. We
>> lose the benefit of partition-level isolation if there is no way to
>> read rows together.
>> Of course we can perform range queries and even scan over
>> multi-dimensional clustering keys with CASSANDRA-4851, but we still
>> can't fetch rows using a set of clustering keys.
>>
>> I couldn't find a JIRA for this feature, does anyone know if there is
one?
>>
>> Cheers,
>> Nicolas
>>
>> --
>> For what it's worth, +1 on freezing Thrift.
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-13 Thread Edward Capriolo
"IMHO, If you put 1/4 of the energy into CQL that you do into fighting for
Thrift, I'm scared to think how amazing CQL would be."

I was just recently putting my energy into 4 thrift tickets, I then was
planning to help out on some CQL issues. But then this happened which I
feel was directly aimed at my efforts has left a very bitter taste in my
mouth.

Open source projects should not herd people into supporting only the things
that a specific group of people want. To give you an example of this this
conversation resulted in the creation of this ticket

https://issues.apache.org/jira/browse/CASSANDRA-6846

One of the committers ran over to the ticket and dropped a -1 on it right
away.

"I'm definitively -1 on putting any type of contract on the internals. They
are called internals for a reason, and if rewriting it all entirely
tomorrow is best for Cassandra, we should have the possibility to do so.
And if we're creating a new abstraction on top of it, well, we're just
creating a new API, and well, I really thing we should focus on having just
one API to Cassandra and focus efforts there."
This is not open source this is stalin source.




On Thu, Mar 13, 2014 at 9:09 AM, Michael Kjellman
wrote:

> Ed-
>
> I understand and respect your enthusiasm for Thrift, but it's ship has
> sailed. Yes- if you understand the low level thrift API I'm sure you can
> have a rewarding experience, but as someone who wrote a client and had to
> abstract thrift...I don't have many kind words, and I certainly have less
> hair on my head...
>
> Every line of code ever written has the chance of bugs and thankfully this
> project has a super dedicated group of people who are very very responsive
> at fixing those. The sorting and paging bugs might not have happened in
> thrift because that logic is and has always been pushed onto the client.
> (Where there are also lots of bugs). I like the model where the bug is
> fixed once for all languages and clients personally...
>
> CQL has worked for me in 9 different sets of application logic as of
> now..and C* is more accessible to others because of it. Application code is
> simpler, client code is simpler, learning curve for new uses is easier.
> Win. Win. Win.
>
> IMHO, If you put 1/4 of the energy into CQL that you do into fighting for
> Thrift, I'm scared to think how amazing CQL would be.
>
> Best,
> Michael
>
> > On Mar 13, 2014, at 5:59 AM, "Edward Capriolo" 
> wrote:
> >
> > There was a paging bug in 2.0 and a user just reported a bug sorting a
> one
> > row dataset.
> >
> > So if you want to argue cql has surpassed thrift in all ways, one way it
> > clearly has not is correctness.
> >
> > To demonatrate, search the changelog for cql bugs that return wrong
> result.
> > Then do the same search for thrift bugs that return the wrong result and
> > compare.
> >
> > If nubes to the ml can pick up bugs and performance regressions it is a
> > serious issue.
> >
> >> On Wednesday, March 12, 2014, Jonathan Ellis  wrote:
> >> I don't know if an IN query already does this without source diving,
> >> but it could certainly do so without needing extra syntax.
> >>
> >> On Wed, Mar 12, 2014 at 7:16 PM, Nicolas Favre-Felix  >
> > wrote:
> >>>> If any new use cases
> >>>> come to light that can be done with Thrift but not CQL, we will commit
> >>>> to supporting those in CQL.
> >>>
> >>> Hello,
> >>>
> >>> (going back to the original topic...)
> >>>
> >>> I just wanted to point out that there is in my opinion an important
> >>> use case that is doable in Thrift but not in CQL, which is to fetch
> >>> several CQL rows from the same partition in a single isolated read. We
> >>> lose the benefit of partition-level isolation if there is no way to
> >>> read rows together.
> >>> Of course we can perform range queries and even scan over
> >>> multi-dimensional clustering keys with CASSANDRA-4851, but we still
> >>> can't fetch rows using a set of clustering keys.
> >>>
> >>> I couldn't find a JIRA for this feature, does anyone know if there is
> > one?
> >>>
> >>> Cheers,
> >>> Nicolas
> >>>
> >>> --
> >>> For what it's worth, +1 on freezing Thrift.
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder, http://www.datastax.com
> >> @spyced
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell check
> than
> > usual.
>
> ===
>
>
>
> JOIN US! Live webinar featuring 451 Research & West Windsor Township:
>
> Best Practices in Security Convergence
>
> March 18 at 10am PDT.
>
> RSVP at http://www.barracuda.com/451webinar
>
>
>


  1   2   >