Reminder: critical fixes only in 2.1

2016-07-18 Thread Jonathan Ellis
We're at the stage of the release cycle where we should be committing
critical fixes only to the 2.1 branch.  Many people depend on 2.1 working
reliably and it's not worth the risk of introducing regressions for (e.g.)
performance improvements.

I think some of the patches committed so far for 2.1.16 do not meet this
bar and should be reverted.  I include a summary of what people have to
live with if we leave them unfixed:

https://issues.apache.org/jira/browse/CASSANDRA-11349
  Repair suffers false-negative tree mismatches and overstreams data.

https://issues.apache.org/jira/browse/CASSANDRA-10433
  Reduced performance on inserts (and reads?) (for Thrift clients only?)

https://issues.apache.org/jira/browse/CASSANDRA-12030
  Reduced performance on reads for workloads with range tombstones

Anyone want to make a case that these are more critical than they appear
and should not be reverted?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Reminder: critical fixes only in 2.1

2016-07-18 Thread Jeremiah D Jordan
Looking at those tickets in all three of them the “is this critical to fix” 
question came up in the JIRA discussion and it was decided that they were 
indeed critical enough to commit to 2.1.

> On Jul 18, 2016, at 11:47 AM, Jonathan Ellis  wrote:
> 
> We're at the stage of the release cycle where we should be committing
> critical fixes only to the 2.1 branch.  Many people depend on 2.1 working
> reliably and it's not worth the risk of introducing regressions for (e.g.)
> performance improvements.
> 
> I think some of the patches committed so far for 2.1.16 do not meet this
> bar and should be reverted.  I include a summary of what people have to
> live with if we leave them unfixed:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-11349
>  Repair suffers false-negative tree mismatches and overstreams data.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10433
>  Reduced performance on inserts (and reads?) (for Thrift clients only?)
> 
> https://issues.apache.org/jira/browse/CASSANDRA-12030
>  Reduced performance on reads for workloads with range tombstones
> 
> Anyone want to make a case that these are more critical than they appear
> and should not be reverted?
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced



Re: Reminder: critical fixes only in 2.1

2016-07-18 Thread Jonathan Ellis
Except there really wasn't.

Patch submitter: "I want this in 2.1."
Reviewer: "Okay."

That's not exactly the bar we're looking for.  To consider a performance
fix "critical" for example, you really need to show at the very least what
new workload you found that isn't able to live with it the way everyone
else did for the previous 15 releases.

I note that on 10433 the committer even said, "I'm not [sure] I agree this
is critical for 2.1 at this point, but as it's simple enough and has been
somewhat vetted on 2.2 by now, not going to argue."

So consider this me putting on my bad cop hat and opening up the argument.

On Mon, Jul 18, 2016 at 12:24 PM, Jeremiah D Jordan 
wrote:

> Looking at those tickets in all three of them the “is this critical to
> fix” question came up in the JIRA discussion and it was decided that they
> were indeed critical enough to commit to 2.1.
>
> > On Jul 18, 2016, at 11:47 AM, Jonathan Ellis  wrote:
> >
> > We're at the stage of the release cycle where we should be committing
> > critical fixes only to the 2.1 branch.  Many people depend on 2.1 working
> > reliably and it's not worth the risk of introducing regressions for
> (e.g.)
> > performance improvements.
> >
> > I think some of the patches committed so far for 2.1.16 do not meet this
> > bar and should be reverted.  I include a summary of what people have to
> > live with if we leave them unfixed:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-11349
> >  Repair suffers false-negative tree mismatches and overstreams data.
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-10433
> >  Reduced performance on inserts (and reads?) (for Thrift clients only?)
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-12030
> >  Reduced performance on reads for workloads with range tombstones
> >
> > Anyone want to make a case that these are more critical than they appear
> > and should not be reverted?
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: MSc Project - compaction strategy

2016-07-18 Thread steve landiss
So much for compaction of information eh?   

On Tuesday, July 12, 2016 10:06 AM, Pedro Gordo  
wrote:
 

 Hi

Yes, I just saw Marcus reply now, sorry for the duplicate email. The email
filters were not set up correctly. Thanks to both!

Best regards

Pedro Gordo

On 12 July 2016 at 12:39, Robert Stupp  wrote:

> As Markus already mentioned, the best place to discuss the idea of your
> compaction strategy is a lira ticket.
> Best would be to include as much details (written, not coded) as necessary
> to understand why this compaction strategy is useful and how it works.
>
> Implementation questions and clarifications on #cassandra-dev IRC
>
> Robert
>
> —
> Robert Stupp
> @snazy
>
> > On 12 Jul 2016, at 19:42, Pedro Gordo  wrote:
> >
> > Hi all
> >
> > I'm finishing an MSc in which my final project is to implement a new
> > compaction strategy in Cassandra. I've discussed the main points of the
> > strategy with other community members and received valuable feedback.
> > However, I understand this will be a tough challenge for someone who has
> > never worked with Cassandra, but after getting to know the technology,
> I've
> > found it fascinating. Since I wanted to contribute to an open source
> > project in my MSc Project, this makes Cassandra the ideal technology to
> go
> > forward, and hence why I've chosen it.
> >
> > However, since this is my first time contributing to an open source
> > project, I've some questions on how to proceed correctly. Looking at the
> How
> > To Contribute  page, I
> > see that we're supposed to create a ticket before starting working on it,
> > however, in this case, does someone need to validate the usefulness of
> the
> > strategy or can I just proceed and implement it, or do something else?
> > Also, is this the correct mailing list to be asking this sort of
> questions?
> > :)
> >
> > As for the code itself, if I have a question like "Should we be using an
> > abstract class for compaction classes?" or "What is this method supposed
> to
> > do?", can I ask it here? What is the best course of action to learn about
> > the details of the code in Cassandra? I already saw that it has some
> > comments, but probably won't be enough.
> >
> > The strategy I have in mind will be very simple until I finish the MSc.
> > After the submission, I'll improve it with other features and feedback I
> > got, but for the moment, I'll keep it at a basic level. The strategy will
> > start only during certain periods of time (for example a time of the day
> > where the cluster has little traffic (1)), during which, the rows will be
> > made unique across all SSTables. These new tables will be capped at a
> > configurable size, so after compaction, we can have multiple tables
> > created. This operation only happens if, after a prior analysis, we find
> > that the row exists in a number of SSTables above a certain threshold.
> What
> > I'm trying to address here is the continuous high CPU usage of the LCS
> (1),
> > but also the need for lots of disc space when we have big SSTables
> > resulting from STCS. I suppose it's a naive strategy, but the aim here is
> > to give me experience with C*, and of course I'll be happy to take
> > suggestions. But I'll probably only use the ideas after delivering the
> > project because, at the moment, I need to keep it simple. Otherwise, I'll
> > never be able to submit it. :)
> >
> > Sorry for the long email, and thanks for all the help in advance! I'm
> very
> > excited about this project and look forward to being part of this
> community!
> >
> > Best regards Pedro Gordo
>
>

  

Re: MSc Project - compaction strategy

2016-07-18 Thread Chris Mattmann
Dev discussion about the project should ideally be on the dev list.

Further, all *decisions* must be on the dev list for the project.
JIRA has the negative impact that it is lost in many people’s email
filters and hard to parse the signal from the noise.

I would consider some well formed emails to the dev list as part
of your plan as well so that the community can follow along.

Cheers,
Chris




On 7/18/16, 10:42 PM, "steve landiss"  wrote:

>So much for compaction of information eh?   
>
>On Tuesday, July 12, 2016 10:06 AM, Pedro Gordo 
>  wrote:
> 
>
> Hi
>
>Yes, I just saw Marcus reply now, sorry for the duplicate email. The email
>filters were not set up correctly. Thanks to both!
>
>Best regards
>
>Pedro Gordo
>
>On 12 July 2016 at 12:39, Robert Stupp  wrote:
>
>> As Markus already mentioned, the best place to discuss the idea of your
>> compaction strategy is a lira ticket.
>> Best would be to include as much details (written, not coded) as necessary
>> to understand why this compaction strategy is useful and how it works.
>>
>> Implementation questions and clarifications on #cassandra-dev IRC
>>
>> Robert
>>
>> —
>> Robert Stupp
>> @snazy
>>
>> > On 12 Jul 2016, at 19:42, Pedro Gordo  wrote:
>> >
>> > Hi all
>> >
>> > I'm finishing an MSc in which my final project is to implement a new
>> > compaction strategy in Cassandra. I've discussed the main points of the
>> > strategy with other community members and received valuable feedback.
>> > However, I understand this will be a tough challenge for someone who has
>> > never worked with Cassandra, but after getting to know the technology,
>> I've
>> > found it fascinating. Since I wanted to contribute to an open source
>> > project in my MSc Project, this makes Cassandra the ideal technology to
>> go
>> > forward, and hence why I've chosen it.
>> >
>> > However, since this is my first time contributing to an open source
>> > project, I've some questions on how to proceed correctly. Looking at the
>> How
>> > To Contribute  page, I
>> > see that we're supposed to create a ticket before starting working on it,
>> > however, in this case, does someone need to validate the usefulness of
>> the
>> > strategy or can I just proceed and implement it, or do something else?
>> > Also, is this the correct mailing list to be asking this sort of
>> questions?
>> > :)
>> >
>> > As for the code itself, if I have a question like "Should we be using an
>> > abstract class for compaction classes?" or "What is this method supposed
>> to
>> > do?", can I ask it here? What is the best course of action to learn about
>> > the details of the code in Cassandra? I already saw that it has some
>> > comments, but probably won't be enough.
>> >
>> > The strategy I have in mind will be very simple until I finish the MSc.
>> > After the submission, I'll improve it with other features and feedback I
>> > got, but for the moment, I'll keep it at a basic level. The strategy will
>> > start only during certain periods of time (for example a time of the day
>> > where the cluster has little traffic (1)), during which, the rows will be
>> > made unique across all SSTables. These new tables will be capped at a
>> > configurable size, so after compaction, we can have multiple tables
>> > created. This operation only happens if, after a prior analysis, we find
>> > that the row exists in a number of SSTables above a certain threshold.
>> What
>> > I'm trying to address here is the continuous high CPU usage of the LCS
>> (1),
>> > but also the need for lots of disc space when we have big SSTables
>> > resulting from STCS. I suppose it's a naive strategy, but the aim here is
>> > to give me experience with C*, and of course I'll be happy to take
>> > suggestions. But I'll probably only use the ideas after delivering the
>> > project because, at the moment, I need to keep it simple. Otherwise, I'll
>> > never be able to submit it. :)
>> >
>> > Sorry for the long email, and thanks for all the help in advance! I'm
>> very
>> > excited about this project and look forward to being part of this
>> community!
>> >
>> > Best regards Pedro Gordo
>>
>>
>
>