from:"Anuj Wadehra"

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-08-17 Thread Anuj Wadehra

Hi,

I think CASSANDRA-14227 is pending for long time now. Though, the  data loss 
issue was addressed in CASSANDRA-14092, Cassandra users are still prohibited to 
use long TTLs (20+ years) as the maximum expiration timestamp that can be 
represented by the storage engine is 2038-01-19T03:14:06+00:00 (due to the 
encoding of localExpirationTime as an int32). As per JIRA comments, the fix 
seems relatively simple. Considering high impact/returns and relatively less 
efforts, are there any plans to prioritize this fix for upcoming releases? 

Thanks
Anuj

On Saturday, 27 January, 2018, 8:35:20 PM IST, Anuj Wadehra 
 wrote: 

Hi Paulo,

Thanks for coming out with the Emergency Hot Fix!! 
The patch will help many Cassandra users in saving their precious data.
I think the criticality and urgency of the bug is too high. How can we make 
sure that maximum Cassandra users are alerted about the silent deletion 
problem? What are formal ways of working for broadcasting such critical alerts? 
I still see that the JIRA is marked as a "Major" defect and not a "Blocker". 
What worst can happen to a database than irrecoverable silent deletion of 
successfully inserted data. I hope you understand.

ThanksAnuj

  On Fri, 26 Jan 2018 at 18:57, Paulo Motta wrote:  > 
I have serious concerns regarding reducing the TTL to 15 yrs.The patch will 
immediately break all existing applications in Production which are using 15+ 
yrs TTL.

In order to prevent applications from breaking I will update the patch
to automatically set the maximum TTL to '03:14:08 UTC 19 January 2038'
when it overflows and log a warning as a initial measure.  We will
work on extending this limit or lifting this limitation, probably for
the 3.0+ series due to the large scale compatibility changes required
on lower versions, but community patches are always welcome.

Companies that cannot upgrade to a version with the proper fix will
need to workaround this limitation in some other way: do a batch job
to delete old data periodically, perform deletes with timestamps in
the future, etc.

> If its a 32 bit timestamp, can't we just save/read localDeletionTime as 
> unsinged int?

The proper fix will likely be along these lines, but this involve many
changes throughout the codebase where localDeletionTime is consumed
and extensive testing, reviewing, etc, so we're now looking into a
emergency hot fix to prevent silent data loss while the permanent fix is
not in place.

2018-01-26 6:27 GMT-02:00 Anuj Wadehra :
> Hi Jeff,
> One correction in my last message: "it may be more feasible to SUPPORT (not 
> extend) the 20 year limit in Cassandra in 2.1/2.2".
> I completely agree that the existing 20 years TTL support is okay for older 
> versions.
>
> If I have understood your last message correctly, upcoming patches are on 
> following lines :
>
> 1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 
> & 2.2 would support the existing 20 year TTL limit and ensure that there is 
> no data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
> unlikely to update the sstable format.
> 4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
> support beyond 2038).
> I think that the JIRA priority should be increased from "Major" to "Blocker" 
> as the JIRA may cause unexpected data loss. Also, all impacted versions 
> should be included in the JIRA. This will attract the due attention of all 
> Cassandra users.
> ThanksAnuj
>    On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
> wrote:
>
>  Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY have a 
> shorter life cycle than patching Cassandra in production. But, in the 
> interest of the larger Cassandra user community, we should put our best 
> effort to avoid breaking all the affected applications in production. We 
> should also consider that updating business logic as per the new 15 year TTL 
> constraint may have business implications for many users. I have a limited 
> understanding about the complexity of the code patch, but it may be more 
> feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather than 
> asking all impacted users to do an immediate business logic adaptation. 
> Moreover, now that we officially support Cassandra 2.1 & 2.2 until 4.0 
> release and provide critical fixes for 2.1, it becomes even more reasonable 
> to provide this extremely critical patch for 2.1 & 2.2 (unless its absolutely 
> impossible). Still, many users use Cassandra 2.1 and 2.2 in their most 
> critical production systems.
>
> Thanks
> Anuj
>
>    On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
>wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to ch

Impact of removing compactions_in_progress folder

2015-04-13 Thread Anuj Wadehra

Often we face errors on Cassandra start regarding unfinished compactions 
particularly when cassandra was abrupty shut down . Problem gets resolved when 
we delete /var/lib/cassandra/data/system/compactions_in_progress folder. Does 
deletion of the folder has any impact on  integrity of data or any other aspect?



Thanks

Anuj Wadehra

Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Anuj Wadehra

Recently we faced an issue where every repair operation caused addition of 
hundreds of sstables (CASSANDRA-9146). In order to bring situation under 
control and make sure reads are not impacted, we were left with no option but 
to run major compaction to ensure that thousands of tiny sstables are compacted.


Queries:
Does major compaction has any drawback after automatic tombstone compaction got 
implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? 
I understand that the huge SSTable created after major compaction wont be 
compacted with new data any time soon but is that a problem if purged data is 
removed via automatic tombstone compaction? If we major compaction results in a 
huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We 
tried running sstablesplit after major compaction to split the big sstable but 
as new sstables were of same size they are again compacted into single huge 
table once Cassandra was started after executing sstablesplit.



Thanks

Anuj Wadehra

Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Anuj Wadehra

I havent got much response regarding this on user list..so posting it on dev 
list too..


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" 
Date:Tue, 14 Apr, 2015 at 7:05 am
Subject:Drawbacks of Major Compaction now that Automatic Tombstone Compaction 
Exists

Recently we faced an issue where every repair operation caused addition of 
hundreds of sstables (CASSANDRA-9146). In order to bring situation under 
control and make sure reads are not impacted, we were left with no option but 
to run major compaction to ensure that thousands of tiny sstables are compacted.


Queries:
Does major compaction has any drawback after automatic tombstone compaction got 
implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? 
I understand that the huge SSTable created after major compaction wont be 
compacted with new data any time soon but is that a problem if purged data is 
removed via automatic tombstone compaction? If we major compaction results in a 
huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We 
tried running sstablesplit after major compaction to split the big sstable but 
as new sstables were of same size they are again compacted into single huge 
table once Cassandra was started after executing sstablesplit.



Thanks

Anuj Wadehra

Repair with -pr and -local after CASSANDRA-7450

2015-05-18 Thread Anuj Wadehra

Hi,
This is regarding execution of repair -pr in local DC.
CASSANDRA-7313 disabled using pr with local option. Later, CASSANDRA-7450 
allowed it. But when I look at the code of Cassandra 2.0.13, I see that using 
pr with local is still illegal:
How to do Repair with pr and local DC option in 2.0.13/2.0.14 ? We dont want to 
run full repair on each node of  a DC. Moreover, we dont want to incur cross DC 
repair.

public void forceKeyspaceRepairPrimaryRange(final String keyspaceName, boolean 
isSequential, boolean isLocal, final String... columnFamilies) throws 
IOException
    {
    // primary range repair can only be performed for whole cluster.
    // NOTE: we should omit the param but keep API as is for now.
    if (isLocal)
    {
    throw new IllegalArgumentException("You need to run primary range 
repair on all nodes in the cluster.");
    }

    forceKeyspaceRepairRange(keyspaceName, 
getLocalPrimaryRanges(keyspaceName), isSequential ? 
RepairParallelism.SEQUENTIAL : RepairParallelism.PARALLEL, false, 
columnFamilies);
    }


ThanksAnuj

Contribution to Cassandra Community and Branching Strategy

2015-05-18 Thread Anuj Wadehra

Hi,
I want to submit patch for Cassandra JIRA tickets. 

I have some questions:
1. As per http://wiki.apache.org/cassandra/HowToContribute, we need to clone 
trunk and provide patch on that. So, I need to understand how this patch is 
going to be merged in 2.0.x and 2.1.x ?
2. Where can I get the detailed Branching strategy followed by Cassandra?
3. How to say "that I am looking into a JIRA"? Should I just put a patch when 
ready?
ThanksAnuj Wadehra

Re: Repair with -pr and -local after CASSANDRA-7450

2015-05-18 Thread Anuj Wadehra

Ok. So, does that mean that -pr is not available in 2.0.x untill you are 
willing to pay additional cost for cross DC repair (which I think is not 
practical)?


Thanks

Anuj

Sent from Yahoo Mail on Android

From:"Yuki Morishita" 
Date:Tue, 19 May, 2015 at 1:06 am
Subject:Re: Repair with -pr and -local after CASSANDRA-7450

CASSANDRA-7450 is for version 2.1.1 and higher. So it is not available in 2.0.x.


On Mon, May 18, 2015 at 1:43 PM, Anuj Wadehra  wrote:
> Hi,
> This is regarding execution of repair -pr in local DC.
> CASSANDRA-7313 disabled using pr with local option. Later, CASSANDRA-7450 
> allowed it. But when I look at the code of Cassandra 2.0.13, I see that using 
> pr with local is still illegal:
> How to do Repair with pr and local DC option in 2.0.13/2.0.14 ? We dont want 
> to run full repair on each node of  a DC. Moreover, we dont want to incur 
> cross DC repair.
>
> public void forceKeyspaceRepairPrimaryRange(final String keyspaceName, 
> boolean isSequential, boolean isLocal, final String... columnFamilies) throws 
> IOException
>    {
>        // primary range repair can only be performed for whole cluster.
>        // NOTE: we should omit the param but keep API as is for now.
>        if (isLocal)
>        {
>            throw new IllegalArgumentException("You need to run primary range 
>repair on all nodes in the cluster.");
>        }
>
>        forceKeyspaceRepairRange(keyspaceName, 
>getLocalPrimaryRanges(keyspaceName), isSequential ? 
>RepairParallelism.SEQUENTIAL : RepairParallelism.PARALLEL, false, 
>columnFamilies);
>    }
>
>
> ThanksAnuj





-- 
Yuki Morishita
t:yukim (http://twitter.com/yukim

)

Is deletion of compactions_in_progress files safe?

2015-05-23 Thread Anuj Wadehra

I need to understand how compaction Algorithm works at high level.

1. What is the significance of systems.compactions_in_progress table and files 
in compactions_in_progress directory. 
2. We are on 2.0.3. We frequently face scenarios where Cassandra fails to 
restart with Exception regarding unfinished compactions. The problem  is how to 
deal with such scenario "Till we upgrade".

When we delete files in compactions_in_progress folder, we are able to start 
Cassandra. Is that a SAFE thing to do in PRODUCTION? Are there any better 
alternatives? We are concerned about data integrity.
 
Thanks
Anuj

Behavior of nodetool stop compaction

2015-05-23 Thread Anuj Wadehra

Firing nodetool stop command prints CompactionInterruptedException stacktrace.

1. Exception stacktrace gives an impression of killing a compaction forcefully, 
is nodetool stop COMPACTION a clean way to interrupt an ongoing MAJOR 
compaction?

2. How the logic works? When nodetool stop is fired to stop MINOR compactions, 
are we temporarily suspending all minor compactions and these will resume 
automatically afterwards or the work done by in progress compactions is 
discarded ?

Thanks
Anuj

Re: Behavior of nodetool stop compaction

2015-06-06 Thread Anuj Wadehra

org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296)
    at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    ... 3 more

ThanksAnuj Wadehra 

 On Monday, 25 May 2015 2:44 PM, Jason Wee  wrote:

 Hello, could you paste the exception and also show what is the cassandra
version running?

jason

On Sun, May 24, 2015 at 2:12 AM, Anuj Wadehra 
wrote:

> Firing nodetool stop command prints CompactionInterruptedException
> stacktrace.
>
> 1. Exception stacktrace gives an impression of killing a compaction
> forcefully, is nodetool stop COMPACTION a clean way to interrupt an ongoing
> MAJOR compaction?
>
> 2. How the logic works? When nodetool stop is fired to stop MINOR
> compactions, are we temporarily suspending all minor compactions and these
> will resume automatically afterwards or the work done by in progress
> compactions is discarded ?
>
> Thanks
> Anuj
>
>

Re: Behavior of nodetool stop compaction

2015-06-06 Thread Anuj Wadehra

org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296)
    at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    ... 3 more

ThanksAnuj Wadehra 

 On Monday, 25 May 2015 2:44 PM, Jason Wee  wrote:

 Hello, could you paste the exception and also show what is the cassandra
version running?

jason

On Sun, May 24, 2015 at 2:12 AM, Anuj Wadehra 
wrote:

> Firing nodetool stop command prints CompactionInterruptedException
> stacktrace.
>
> 1. Exception stacktrace gives an impression of killing a compaction
> forcefully, is nodetool stop COMPACTION a clean way to interrupt an ongoing
> MAJOR compaction?
>
> 2. How the logic works? When nodetool stop is fired to stop MINOR
> compactions, are we temporarily suspending all minor compactions and these
> will resume automatically afterwards or the work done by in progress
> compactions is discarded ?
>
> Thanks
> Anuj
>
>

Re: Versioning policy?

2016-01-14 Thread Anuj Wadehra

Hi Jonathan,
Thanks for the crisp communication regarding the tick tock release & EOL.
I think its worth considering some points regarding EOL policy and it would be 
great if you can share your thoughts on below points:
1.  EOL of a release should be based on "most stable"/"production ready" 
version date rather than "GA" date of subsequent major releases.
2.  I think we should have "Formal EOL Announcement" on Apache Cassandra 
website.  
3. "Formal EOL Announcement" should come at least 6 months before the EOL, so 
that users get reasonable time to  upgrade.
4. EOL Policy (even if flexible) should be stated on Apache Cassandra website

EOL thread on users mailing list ended with the conclusion of raising a 
Wishlist JIRA but I think above points are more about working on policy and 
processes rather than just a wish list. 

ThanksAnuj

Sent from Yahoo Mail on Android 

  On Thu, 14 Jan, 2016 at 10:57 pm, Jonathan Ellis wrote:   
Hi Maciek,

First let's talk about the tick-tock series, currently 3.x.  This is pretty
simple: outside of the regular monthly releases, we will release fixes for
critical bugs against the most recent bugfix release, the way we did
recently with 3.1.1 for CASSANDRA-10822 [1].  No older tick-tock releases
will be patched.

Now, we also have three other release series currently being supported:

2.1.x: supported with critical fixes only until 4.0 is released, projected
in November 2016 [2]
2.2.x: maintained until 4.0 is released
3.0.x: maintained for 6 months after 4.0, i.e. projected until May 2017

I will add this information to the releases page [3].

[1]
https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCAKkz8Q3StqRFHfMgCMRYaaPdg+HE5N5muBtFVt-=v690pzp...@mail.gmail.com%3E
[2] 4.0 will be an ordinary tick-tock release after 3.11, but we will be
sunsetting deprecated features like Thrift so bumping the major version
seems appropriate
[3] http://cassandra.apache.org/download/

On Sun, Jan 10, 2016 at 9:29 PM, Maciek Sakrejda  wrote:

> There was a discussion recently about changing the Cassandra EOL policy on
> the users list [1], but it didn't really go anywhere. I wanted to ask here
> instead to clear up the status quo first. What's the current versioning
> policy? The tick-tock versioning blog post [2] states in passing that two
> major releases are maintained, but I have not found this as an official
> policy stated anywhere. For comparison, the Postgres project lays this out
> very clearly [3]. To be clear, I'm not looking for any official support,
> I'm just asking for clarification regarding the maintenance policy: if a
> critical bug or security vulnerability is found in version X.Y.Z, when can
> I expect it to be fixed in a bugfix patch to that major version, and when
> do I need to upgrade to the next major version.
>
> [1]: http://www.mail-archive.com/user@cassandra.apache.org/msg45324.html
> [2]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
> [3]: http://www.postgresql.org/support/versioning/
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Repair when a replica is Down

2016-01-15 Thread Anuj Wadehra

Hi 
We are on 2.0.14,RF=3 in a 3 node cluster. We use repair -pr . Recently, we 
observed that repair -pr for all nodes fails if a node is down. Then I found 
the JIRA 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
where an intentional decision was taken to abort the repair if a replica is 
down.
I need to understand the reasoning behind aborting the repair instead of 
proceeding with available replicas.
I have following concerns with the approach:
We say that we have a fault tolerant Cassandra system such that we can afford 
single node failure because RF=3 and we read/write at QUORUM.But when a node 
goes down and we are not sure how much time will be needed to restore the node, 
entire system health is in question as gc_grace_period is approaching and we 
are not able to run repair -pr on any of the nodes.
Then there is a dilemma:Whether to remove the faulty node well before gc grace 
period so that we get enough time to save data by repairing other two nodes?
This may cause massive streaming which may be unnecessary if we are able to 
bring back the faulty node before gc grace period.
OR
Wait and hope that the issue will be resolved before gc grace time and we will 
have some buffer to run repair -pr on all nodes.
OR
Increase the gc grace period temporarily. Then we should have capacity planning 
to accomodate the extra storage needed for extra gc grace that may be needed in 
case of node failure scenarios.

I need to understand the recommeded approach too for maintaing a fault tolerant 
system which can handle such node failures without hiccups.

ThanksAnuj

Re: Repair when a replica is Down

2016-01-16 Thread Anuj Wadehra

Hi
I have intentionally posted this message to the dev mailing list instead of 
users list because its regarding a conscious design decision taken regarding a 
bug and I feel that dev team is the most appropriate team who could respond to 
it. Please let me know if there are better ways to get it addressed.
ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Fri, 15 Jan, 2016 at 11:36 pm, Anuj Wadehra wrote: 
  Hi 
We are on 2.0.14,RF=3 in a 3 node cluster. We use repair -pr . Recently, we 
observed that repair -pr for all nodes fails if a node is down. Then I found 
the JIRA 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
where an intentional decision was taken to abort the repair if a replica is 
down.
I need to understand the reasoning behind aborting the repair instead of 
proceeding with available replicas.
I have following concerns with the approach:
We say that we have a fault tolerant Cassandra system such that we can afford 
single node failure because RF=3 and we read/write at QUORUM.But when a node 
goes down and we are not sure how much time will be needed to restore the node, 
entire system health is in question as gc_grace_period is approaching and we 
are not able to run repair -pr on any of the nodes.
Then there is a dilemma:Whether to remove the faulty node well before gc grace 
period so that we get enough time to save data by repairing other two nodes?
This may cause massive streaming which may be unnecessary if we are able to 
bring back the faulty node before gc grace period.
OR
Wait and hope that the issue will be resolved before gc grace time and we will 
have some buffer to run repair -pr on all nodes.
OR
Increase the gc grace period temporarily. Then we should have capacity planning 
to accomodate the extra storage needed for extra gc grace that may be needed in 
case of node failure scenarios.

I need to understand the recommeded approach too for maintaing a fault tolerant 
system which can handle such node failures without hiccups.

ThanksAnuj

Re: Versioning policy?

2016-01-16 Thread Anuj Wadehra

Hi Jonathan

It would be really nice if you could share your thoughts on the four points 
raised regarding the Cassandra EOL process. I think similar things happen for 
other open source products and it would be really nice if we could streamline 
such things for Apache Cassandra.

ThanksAnuj

Sent from Yahoo Mail on Android 

  On Thu, 14 Jan, 2016 at 11:28 pm, Anuj Wadehra wrote: 
  Hi Jonathan,
Thanks for the crisp communication regarding the tick tock release & EOL.
I think its worth considering some points regarding EOL policy and it would be 
great if you can share your thoughts on below points:
1.  EOL of a release should be based on "most stable"/"production ready" 
version date rather than "GA" date of subsequent major releases.
2.  I think we should have "Formal EOL Announcement" on Apache Cassandra 
website.  
3. "Formal EOL Announcement" should come at least 6 months before the EOL, so 
that users get reasonable time to  upgrade.
4. EOL Policy (even if flexible) should be stated on Apache Cassandra website

EOL thread on users mailing list ended with the conclusion of raising a 
Wishlist JIRA but I think above points are more about working on policy and 
processes rather than just a wish list. 

ThanksAnuj

Sent from Yahoo Mail on Android 

  On Thu, 14 Jan, 2016 at 10:57 pm, Jonathan Ellis wrote:  
Hi Maciek,

First let's talk about the tick-tock series, currently 3.x.  This is pretty
simple: outside of the regular monthly releases, we will release fixes for
critical bugs against the most recent bugfix release, the way we did
recently with 3.1.1 for CASSANDRA-10822 [1].  No older tick-tock releases
will be patched.

Now, we also have three other release series currently being supported:

2.1.x: supported with critical fixes only until 4.0 is released, projected
in November 2016 [2]
2.2.x: maintained until 4.0 is released
3.0.x: maintained for 6 months after 4.0, i.e. projected until May 2017

I will add this information to the releases page [3].

[1]
https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCAKkz8Q3StqRFHfMgCMRYaaPdg+HE5N5muBtFVt-=v690pzp...@mail.gmail.com%3E
[2] 4.0 will be an ordinary tick-tock release after 3.11, but we will be
sunsetting deprecated features like Thrift so bumping the major version
seems appropriate
[3] http://cassandra.apache.org/download/

On Sun, Jan 10, 2016 at 9:29 PM, Maciek Sakrejda  wrote:

> There was a discussion recently about changing the Cassandra EOL policy on
> the users list [1], but it didn't really go anywhere. I wanted to ask here
> instead to clear up the status quo first. What's the current versioning
> policy? The tick-tock versioning blog post [2] states in passing that two
> major releases are maintained, but I have not found this as an official
> policy stated anywhere. For comparison, the Postgres project lays this out
> very clearly [3]. To be clear, I'm not looking for any official support,
> I'm just asking for clarification regarding the maintenance policy: if a
> critical bug or security vulnerability is found in version X.Y.Z, when can
> I expect it to be fixed in a bugfix patch to that major version, and when
> do I need to upgrade to the next major version.
>
> [1]: http://www.mail-archive.com/user@cassandra.apache.org/msg45324.html
> [2]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
> [3]: http://www.postgresql.org/support/versioning/
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: Versioning policy?

2016-01-16 Thread Anuj Wadehra

I was not referring to Enterprise support here. When I said Open source 
"product" by mistake, I was just referring to some other Apache open source 
projects like Apache Cassandra where you get EOL announcements, info etc on the 
main Apache web site. I think all four points are very relevant in context of 
an Open source project and thats why I wanted to thoughts on these points.



ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Sun, 17 Jan, 2016 at 1:43 am, Michael 
Kjellman wrote:   Correct, this is an open source 
project. 

If you want a Enterprise support story Datastax has an Enterprise option for 
you. 

> On Jan 16, 2016, at 11:19 AM, Anuj Wadehra  wrote:
> 
> Hi Jonathan
> 
> It would be really nice if you could share your thoughts on the four points 
> raised regarding the Cassandra EOL process. I think similar things happen for 
> other open source products and it would be really nice if we could streamline 
> such things for Apache Cassandra.
> 
> ThanksAnuj
> 
> Sent from Yahoo Mail on Android 
> 
>  On Thu, 14 Jan, 2016 at 11:28 pm, Anuj Wadehra 
>wrote:  Hi Jonathan,
> Thanks for the crisp communication regarding the tick tock release & EOL.
> I think its worth considering some points regarding EOL policy and it would 
> be great if you can share your thoughts on below points:
> 1.  EOL of a release should be based on "most stable"/"production ready" 
> version date rather than "GA" date of subsequent major releases.
> 2.  I think we should have "Formal EOL Announcement" on Apache Cassandra 
> website.  
> 3. "Formal EOL Announcement" should come at least 6 months before the EOL, so 
> that users get reasonable time to      upgrade.
> 4. EOL Policy (even if flexible) should be stated on Apache Cassandra website
> 
> EOL thread on users mailing list ended with the conclusion of raising a 
> Wishlist JIRA but I think above points are more about working on policy and 
> processes rather than just a wish list. 
> 
> ThanksAnuj
> 
> 
> 
> Sent from Yahoo Mail on Android 
> 
>  On Thu, 14 Jan, 2016 at 10:57 pm, Jonathan Ellis wrote:  
>Hi Maciek,
> 
> First let's talk about the tick-tock series, currently 3.x.  This is pretty
> simple: outside of the regular monthly releases, we will release fixes for
> critical bugs against the most recent bugfix release, the way we did
> recently with 3.1.1 for CASSANDRA-10822 [1].  No older tick-tock releases
> will be patched.
> 
> Now, we also have three other release series currently being supported:
> 
> 2.1.x: supported with critical fixes only until 4.0 is released, projected
> in November 2016 [2]
> 2.2.x: maintained until 4.0 is released
> 3.0.x: maintained for 6 months after 4.0, i.e. projected until May 2017
> 
> I will add this information to the releases page [3].
> 
> [1]
> https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCAKkz8Q3StqRFHfMgCMRYaaPdg+HE5N5muBtFVt-=v690pzp...@mail.gmail.com%3E
> [2] 4.0 will be an ordinary tick-tock release after 3.11, but we will be
> sunsetting deprecated features like Thrift so bumping the major version
> seems appropriate
> [3] http://cassandra.apache.org/download/
> 
>> On Sun, Jan 10, 2016 at 9:29 PM, Maciek Sakrejda  wrote:
>> 
>> There was a discussion recently about changing the Cassandra EOL policy on
>> the users list [1], but it didn't really go anywhere. I wanted to ask here
>> instead to clear up the status quo first. What's the current versioning
>> policy? The tick-tock versioning blog post [2] states in passing that two
>> major releases are maintained, but I have not found this as an official
>> policy stated anywhere. For comparison, the Postgres project lays this out
>> very clearly [3]. To be clear, I'm not looking for any official support,
>> I'm just asking for clarification regarding the maintenance policy: if a
>> critical bug or security vulnerability is found in version X.Y.Z, when can
>> I expect it to be fixed in a bugfix patch to that major version, and when
>> do I need to upgrade to the next major version.
>> 
>> [1]: http://www.mail-archive.com/user@cassandra.apache.org/msg45324.html
>> [2]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
>> [3]: http://www.postgresql.org/support/versioning/
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: Versioning policy?

2016-01-18 Thread Anuj Wadehra

HAPPY to see that Apache Cassandra web site has been updated to include EOL 
information :)  Thanks !!!

I have some queries on the updated content:

1. Earlier, Apache web site always used to show 2 Cassandra versions - one 
which is "most stable" (production-ready) and other for development use. Now, I 
can't see that "most stable"(production-ready) tag on any version. 
Site says "The latest tick-tock release is 3.2" 
As per the tick-tock logic, Does that mean 3.1 is the latest most stable 
Cassandra version available today as it had bug-fixes for 3.0 ? Is 3.1 
production ready? If NO, then how would production users on earlier releases 
get indication on their next upgrade version e.g. 3.x or 2.2 ?

2. I am assuming that going forward EOL announcements will be published at the 
Apache web site before hand just like some other Apache projects do. Is that 
assumption valid? 
It will certainly help to get such insights before hand on Apache site so that 
community users can prepare their upgrade road map.
ThanksAnuj

On Sunday, 17 January 2016 12:48 AM, Anuj Wadehra  
wrote:

 Hi Jonathan

It would be really nice if you could share your thoughts on the four points 
raised regarding the Cassandra EOL process. I think similar things happen for 
other open source products and it would be really nice if we could streamline 
such things for Apache Cassandra.

ThanksAnuj

Sent from Yahoo Mail on Android 

 On Thu, 14 Jan, 2016 at 11:28 pm, Anuj Wadehra wrote:  
Hi Jonathan,
Thanks for the crisp communication regarding the tick tock release & EOL.
I think its worth considering some points regarding EOL policy and it would be 
great if you can share your thoughts on below points:
1.  EOL of a release should be based on "most stable"/"production ready" 
version date rather than "GA" date of subsequent major releases.
2.  I think we should have "Formal EOL Announcement" on Apache Cassandra 
website.  
3. "Formal EOL Announcement" should come at least 6 months before the EOL, so 
that users get reasonable time to  upgrade.
4. EOL Policy (even if flexible) should be stated on Apache Cassandra website

EOL thread on users mailing list ended with the conclusion of raising a 
Wishlist JIRA but I think above points are more about working on policy and 
processes rather than just a wish list. 

ThanksAnuj

Sent from Yahoo Mail on Android 

  On Thu, 14 Jan, 2016 at 10:57 pm, Jonathan Ellis wrote:  
Hi Maciek,

First let's talk about the tick-tock series, currently 3.x.  This is pretty
simple: outside of the regular monthly releases, we will release fixes for
critical bugs against the most recent bugfix release, the way we did
recently with 3.1.1 for CASSANDRA-10822 [1].  No older tick-tock releases
will be patched.

Now, we also have three other release series currently being supported:

2.1.x: supported with critical fixes only until 4.0 is released, projected
in November 2016 [2]
2.2.x: maintained until 4.0 is released
3.0.x: maintained for 6 months after 4.0, i.e. projected until May 2017

I will add this information to the releases page [3].

[1]
https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCAKkz8Q3StqRFHfMgCMRYaaPdg+HE5N5muBtFVt-=v690pzp...@mail.gmail.com%3E
[2] 4.0 will be an ordinary tick-tock release after 3.11, but we will be
sunsetting deprecated features like Thrift so bumping the major version
seems appropriate
[3] http://cassandra.apache.org/download/

On Sun, Jan 10, 2016 at 9:29 PM, Maciek Sakrejda  wrote:

> There was a discussion recently about changing the Cassandra EOL policy on
> the users list [1], but it didn't really go anywhere. I wanted to ask here
> instead to clear up the status quo first. What's the current versioning
> policy? The tick-tock versioning blog post [2] states in passing that two
> major releases are maintained, but I have not found this as an official
> policy stated anywhere. For comparison, the Postgres project lays this out
> very clearly [3]. To be clear, I'm not looking for any official support,
> I'm just asking for clarification regarding the maintenance policy: if a
> critical bug or security vulnerability is found in version X.Y.Z, when can
> I expect it to be fixed in a bugfix patch to that major version, and when
> do I need to upgrade to the next major version.
>
> [1]: http://www.mail-archive.com/user@cassandra.apache.org/msg45324.html
> [2]: http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
> [3]: http://www.postgresql.org/support/versioning/
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: Repair when a replica is Down

2016-01-19 Thread Anuj Wadehra

Thanks Tyler !!
I understand that we need to consider a node as lost when its down for gc grace 
and bootstrap it. My question is more about the JIRA 
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-2290
where an intentional decision was taken to abort the repair if a single replica 
is down. Precisely, I need to understand the reasoning behind aborting the 
repair instead of proceeding with available replicas. As it is related to a 
specific fix, I thought that developers involved in the decision could better 
explain the reasoning. So, I posted it on dev list first.
Consider a scenario where I have a 20 node clsuter, RF=5, Read/Write Quorum, gc 
grace period=20. My cluster is fault tolerant and it can afford 2 node failure. 
Suddenly, one node goes down due to some hardware issue. Its 10 days since my 
node is down, none of the 19 nodes are being repaired and now its decision 
time. I am not sure how soon issue would be fixed may be 8 days before gc 
grace, so I shouldnt remove node early and add node back as it would cause 
unnecessary streaming. At the same time, if I dont remove the failed node, my 
entire system health would be in question and it would be a panic situation as 
no data got repaired in last 10 days and gc grace is approaching. I need 
sufficient time to repair 19 nodes.
What looked like a fault tolerant system which can afford 2 node failure, 
required urgent attention and manual decision making when a single node went 
down. Why cant we just go ahead and repair remaining replicas if some replicas 
are down? If failed node comes up before gc grace period, we would run repair 
to fix inconsistencies and otheriwse we would discard data and bootstrap. I 
think that would be a really robust fault tolerant system.

ThanksAnuj

  On Tue, 19 Jan, 2016 at 9:44 pm, Tyler Hobbs wrote:   On 
Fri, Jan 15, 2016 at 12:06 PM, Anuj Wadehra 
wrote:

> Increase the gc grace period temporarily. Then we should have capacity
> planning to accomodate the extra storage needed for extra gc grace that may
> be needed in case of node failure scenarios.

I would do this.  Nodes that are down for longer than gc_grace_seconds
should not re-enter the cluster, because they may contain data that has
been deleted and the tombstone has already been purged (repairing doesn't
change this).  Bringing them back up will result in "zombie" data.

Also, I do think that the user mailing list is a better place for the first
round of this conversation.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Repair when a replica is Down

2016-01-19 Thread Anuj Wadehra

There is a JIRA 
Issue https://issues.apache.org/jira/browse/CASSANDRA-10446 . 
But its open with Minor prority and type as Improvement. I think its a very 
valid concern for all and especially for users who have bigger clusters. More 
of an issue related with Design decision rather than an improvement. Can we 
change its priority so that it gets appropriate attention?

ThanksAnuj 
 
  On Tue, 19 Jan, 2016 at 10:35 pm, Tyler Hobbs wrote:   
On Tue, Jan 19, 2016 at 10:44 AM, Anuj Wadehra  wrote:


Consider a scenario where I have a 20 node clsuter, RF=5, Read/Write Quorum, gc 
grace period=20. My cluster is fault tolerant and it can afford 2 node failure. 
Suddenly, one node goes down due to some hardware issue. Its 10 days since my 
node is down, none of the 19 nodes are being repaired and now its decision 
time. I am not sure how soon issue would be fixed may be 8 days before gc 
grace, so I shouldnt remove node early and add node back as it would cause 
unnecessary streaming. At the same time, if I dont remove the failed node, my 
entire system health would be in question and it would be a panic situation as 
no data got repaired in last 10 days and gc grace is approaching. I need 
sufficient time to repair 19 nodes.
What looked like a fault tolerant system which can afford 2 node failure, 
required urgent attention and manual decision making when a single node went 
down. Why cant we just go ahead and repair remaining replicas if some replicas 
are down? If failed node comes up before gc grace period, we would run repair 
to fix inconsistencies and otheriwse we would discard data and bootstrap. I 
think that would be a really robust fault tolerant system.

That makes sense.  It seems like having the option to ignore down replicas 
during repair could be at least somewhat helpful, although it may be tricky to 
decide how this should interact with incremental repairs.  If there isn't a 
jira ticket for this already, can you open one with the scenario above?


-- 
Tyler Hobbs
DataStax

Re: Repair when a replica is Down

2016-01-19 Thread Anuj Wadehra

Hi Tyler,
I think the scenario needs some correction. 20 node clsuter, RF=5, Read/Write 
Quorum, gc grace period=20. If a node goes down, repair -pr would fail on 4 
nodes maintaining replicas and full repair would fail on even greater no.of 
number of nodes but not 19. Please confirm.
Anyways the system health would get impacted as multiple nodes are not 
repairing with a single node failure.
ThanksAnujSent from Yahoo Mail on Android 
 
  On Tue, 19 Jan, 2016 at 10:48 pm, Anuj Wadehra wrote: 
  There is a JIRA 
Issue https://issues.apache.org/jira/browse/CASSANDRA-10446 . 
But its open with Minor prority and type as Improvement. I think its a very 
valid concern for all and especially for users who have bigger clusters. More 
of an issue related with Design decision rather than an improvement. Can we 
change its priority so that it gets appropriate attention?

ThanksAnuj 
 
  On Tue, 19 Jan, 2016 at 10:35 pm, Tyler Hobbs wrote:  
On Tue, Jan 19, 2016 at 10:44 AM, Anuj Wadehra  wrote:


Consider a scenario where I have a 20 node clsuter, RF=5, Read/Write Quorum, gc 
grace period=20. My cluster is fault tolerant and it can afford 2 node failure. 
Suddenly, one node goes down due to some hardware issue. Its 10 days since my 
node is down, none of the 19 nodes are being repaired and now its decision 
time. I am not sure how soon issue would be fixed may be 8 days before gc 
grace, so I shouldnt remove node early and add node back as it would cause 
unnecessary streaming. At the same time, if I dont remove the failed node, my 
entire system health would be in question and it would be a panic situation as 
no data got repaired in last 10 days and gc grace is approaching. I need 
sufficient time to repair 19 nodes.
What looked like a fault tolerant system which can afford 2 node failure, 
required urgent attention and manual decision making when a single node went 
down. Why cant we just go ahead and repair remaining replicas if some replicas 
are down? If failed node comes up before gc grace period, we would run repair 
to fix inconsistencies and otheriwse we would discard data and bootstrap. I 
think that would be a really robust fault tolerant system.

That makes sense.  It seems like having the option to ignore down replicas 
during repair could be at least somewhat helpful, although it may be tricky to 
decide how this should interact with incremental repairs.  If there isn't a 
jira ticket for this already, can you open one with the scenario above?


-- 
Tyler Hobbs
DataStax

Re: Repair when a replica is Down

2016-01-19 Thread Anuj Wadehra

Actually I have not checked how repair -pr abort logic is implemented in code. 
So irrespective of repair pr or full repair scenarios, problem can be stated as 
follows:
20 node cluster, RF=5, Read/Write Quorum, gc grace period=20. If a node goes 
down, 1/20 th of data for which the failed node was responsible(owner) cant be 
repaired as 1 out of 5 replicas is down. This will put entire system health in 
question just because of single node failure.

ThanksAnuj




Sent from Yahoo Mail on Android 
 
  On Tue, 19 Jan, 2016 at 11:12 pm, Anuj Wadehra wrote: 
  Hi Tyler,
I think the scenario needs some correction. 20 node clsuter, RF=5, Read/Write 
Quorum, gc grace period=20. If a node goes down, repair -pr would fail on 4 
nodes maintaining replicas and full repair would fail on even greater no.of 
number of nodes but not 19. Please confirm.
Anyways the system health would get impacted as multiple nodes are not 
repairing with a single node failure.
ThanksAnujSent from Yahoo Mail on Android 
 
  On Tue, 19 Jan, 2016 at 10:48 pm, Anuj Wadehra wrote: 
  There is a JIRA 
Issue https://issues.apache.org/jira/browse/CASSANDRA-10446 . 
But its open with Minor prority and type as Improvement. I think its a very 
valid concern for all and especially for users who have bigger clusters. More 
of an issue related with Design decision rather than an improvement. Can we 
change its priority so that it gets appropriate attention?

ThanksAnuj 
 
  On Tue, 19 Jan, 2016 at 10:35 pm, Tyler Hobbs wrote:  
On Tue, Jan 19, 2016 at 10:44 AM, Anuj Wadehra  wrote:


Consider a scenario where I have a 20 node clsuter, RF=5, Read/Write Quorum, gc 
grace period=20. My cluster is fault tolerant and it can afford 2 node failure. 
Suddenly, one node goes down due to some hardware issue. Its 10 days since my 
node is down, none of the 19 nodes are being repaired and now its decision 
time. I am not sure how soon issue would be fixed may be 8 days before gc 
grace, so I shouldnt remove node early and add node back as it would cause 
unnecessary streaming. At the same time, if I dont remove the failed node, my 
entire system health would be in question and it would be a panic situation as 
no data got repaired in last 10 days and gc grace is approaching. I need 
sufficient time to repair 19 nodes.
What looked like a fault tolerant system which can afford 2 node failure, 
required urgent attention and manual decision making when a single node went 
down. Why cant we just go ahead and repair remaining replicas if some replicas 
are down? If failed node comes up before gc grace period, we would run repair 
to fix inconsistencies and otheriwse we would discard data and bootstrap. I 
think that would be a really robust fault tolerant system.

That makes sense.  It seems like having the option to ignore down replicas 
during repair could be at least somewhat helpful, although it may be tricky to 
decide how this should interact with incremental repairs.  If there isn't a 
jira ticket for this already, can you open one with the scenario above?


-- 
Tyler Hobbs
DataStax

Re: 3.1 status?

2016-01-19 Thread Anuj Wadehra

I agree with the thought of not recommending any production ready version. If 
something is not production ready, it should ideally be release candidate and 
when GA happens, it should implicitly mean stable as it is assumed that the GA 
is only done for production ready releases.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Wed, 20 Jan, 2016 at 11:03 am, Jonathan Ellis wrote:   
On Tue, Jan 19, 2016 at 11:17 PM, Jack Krupansky 
wrote:

> It's great to see clear support status marked on the 3.0.x and 2.x releases
> on the download page now. A couple more questions...
>
> 1. What is the support and stability status of 3.1 and 3.2 (as opposed to
> 3.2.1)? Are they "for non-production development only"? Are they considered
> "stable"? The page should say.
>

I disagree that the page should make a recommendation here, but see below.

2. Is there simply no "stable" release for 3.x, or is the latest tick-tock
> release by definition considered "stable"?
>

If you want to have that mental box, then I would put the most recent bug
fix release in it.  (3.1.1 will be going back on the download page soon;
removing it was an oversight.)


> 3. The first paragraph says "If a critical bug is found, a patch will be
> released against the most recent bug fix release", but in fact the latest
> critical patch (3.2.1) is against a feature release, not a bug fix release.
> Should that simply say "... against the most recent tick-tock release"
> regardless of whether it was an even (feature) or odd (bug fix) release?
>

Case by case basis.  In this instance, the bug that prompted the release
was a new regression, so there was no need to patch 3.1.  (And no, I don't
want to belabor the syntax on the download page to spell this out in minute
detail.)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

EOL for COMPACT STORAGE

2016-01-31 Thread Anuj Wadehra

Hi,
Is there any plan to completely phase out COMPACT STORAGE table format such 
that backward compatability is broken in future?

ThanksAnuj

Re: EOL for COMPACT STORAGE

2016-02-01 Thread Anuj Wadehra

I would appreciate if someone from Dev team could reply?
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Sun, 31 Jan, 2016 at 7:23 pm, Anuj Wadehra wrote:  
 Hi,
Is there any plan to completely phase out COMPACT STORAGE table format such 
that backward compatability is broken in future?

ThanksAnuj

Criteria for upgrading to 3.x releases in PROD

2016-04-10 Thread Anuj Wadehra

Hi,
Tick-Tock release strategy in 3.x was a good intiative to ensure frequent & 
stable releases. While odd releases are supposed to get all the bug fixes and 
should be most stable, many people like me, who got used to the comforting 
"production ready/stable" tag on Apache website,  are still reluctant to take 
latest 3.x odd releases into production. I think the hesitation is somewhat 
justified as processes often take time to mature.
So here I would like to ask the experts, people who know the ground situation, 
people who actively develop it and manage it. Considering the current scenario, 
What should be a resonable criteria for taking 3.x releases in production? 


ThanksAnuj

Re: Criteria for upgrading to 3.x releases in PROD

2016-04-11 Thread Anuj Wadehra

Can someone help me with this one?
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Sun, 10 Apr, 2016 at 11:07 PM, Anuj Wadehra wrote: 
  Hi,
Tick-Tock release strategy in 3.x was a good intiative to ensure frequent & 
stable releases. While odd releases are supposed to get all the bug fixes and 
should be most stable, many people like me, who got used to the comforting 
"production ready/stable" tag on Apache website,  are still reluctant to take 
latest 3.x odd releases into production. I think the hesitation is somewhat 
justified as processes often take time to mature.
So here I would like to ask the experts, people who know the ground situation, 
people who actively develop it and manage it. Considering the current scenario, 
What should be a resonable criteria for taking 3.x releases in production? 


ThanksAnuj

Re: Criteria for upgrading to 3.x releases in PROD

2016-04-18 Thread Anuj Wadehra

Hi All,
For last several months, the "most stable version" question pops up on the user 
mailing list and then people get all sorts of responses/suggestions..
If you are conservative go for x if adventurous y..
If you have good risk appetite go for x else y..
If you want features go for x else y..

Unfortunately, all above responses dont help many users..but only reinforce the 
low confidence in latest releases.Who wants to be adventurous in Production? 
Who wants to test his risk appetite in Production? And who would want features 
for stability in Production? Not many..I am sure.
So my question is:
Would it be a wise decision to mention the "most stable/production ready" 
version (as it used to be before 3.x) on the Apache website till tick-tock 
release strategy evolves and matures?
 That will somewhat contradict the tick-tock philosphy of stable odd releases 
but would be more realistic as every big change needs time to stabilise. Its 
slightly unfair, if users are kept in confused state till the strategy matures 
and starts delivering solid stable builds.
I think the question is more appropriate in dev list so I have kept it here.
ThanksAnuj
Sent from Yahoo Mail on Android 
 
  On Mon, 11 Apr, 2016 at 11:39 PM, Aleksey Yeschenko 
wrote:   The answer will depend on how conservative you are.

The most conservative choice overall would be to go with the 2.2.x line.

3.0.x if you want to the new nice and shiny 3.0 things, but can tolerate some 
risk (the branch has a lot of relatively new core code, and hasn’t yet been 
tried out by as many users as the 2.x branch had).

The latest odd 3.x if you want the shiniest (3.5 to be released soon, with 
features like the new SASI secondary indexes support). Also, there hasn’t yet 
been that much divergence between 3.0.x and 3.x, so risk levels are around the 
same, so long as you limit yourself to only the features present in 3.0.x.

Either way, make sure to properly test whatever release you go for in staging 
first, as Michael says, and you’ll be alright.

-- 
AY

On 11 April 2016 at 18:42:31, Anuj Wadehra (anujw_2...@yahoo.co.in.invalid) 
wrote:

Can someone help me with this one?  
ThanksAnuj  

Sent from Yahoo Mail on Android  

On Sun, 10 Apr, 2016 at 11:07 PM, Anuj Wadehra wrote: 
Hi,  
Tick-Tock release strategy in 3.x was a good intiative to ensure frequent & 
stable releases. While odd releases are supposed to get all the bug fixes and 
should be most stable, many people like me, who got used to the comforting 
"production ready/stable" tag on Apache website,  are still reluctant to take 
latest 3.x odd releases into production. I think the hesitation is somewhat 
justified as processes often take time to mature.  
So here I would like to ask the experts, people who know the ground situation, 
people who actively develop it and manage it. Considering the current scenario, 
What should be a resonable criteria for taking 3.x releases in production?   


ThanksAnuj

Re: Criteria for upgrading to 3.x releases in PROD

2016-04-18 Thread Anuj Wadehra

I am sorry but here, I am not expecting thousands to decide a stable version 
for my use case. I have a serious question about publishing some info on the 
Apache website. As dev list has active contributors, I posted it here. If not 
this forum, Whats the best way to put your suggestions regarding Apache content 
and initiate a meaningful and conclusive discussion thread? 

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Mon, 18 Apr, 2016 at 11:27 PM, Michael 
Kjellman wrote:   This is best for the users 
list. Test the releases yourself and then decide when it's ready for your use 
case, ops team, and organization. This is a personal decision and not one for 
*thousands* of others on this mailing list to make for you.

best,
kjellman

> On Apr 18, 2016, at 10:54 AM, Anuj Wadehra  
> wrote:
> 
> Hi All,
> For last several months, the "most stable version" question pops up on the 
> user mailing list and then people get all sorts of responses/suggestions..
> If you are conservative go for x if adventurous y..
> If you have good risk appetite go for x else y..
> If you want features go for x else y..
> 
> Unfortunately, all above responses dont help many users..but only reinforce 
> the low confidence in latest releases.Who wants to be adventurous in 
> Production? Who wants to test his risk appetite in Production? And who would 
> want features for stability in Production? Not many..I am sure.
> So my question is:
> Would it be a wise decision to mention the "most stable/production ready" 
> version (as it used to be before 3.x) on the Apache website till tick-tock 
> release strategy evolves and matures?
>  That will somewhat contradict the tick-tock philosphy of stable odd releases 
>but would be more realistic as every big change needs time to stabilise. Its 
>slightly unfair, if users are kept in confused state till the strategy matures 
>and starts delivering solid stable builds.
> I think the question is more appropriate in dev list so I have kept it here.
> ThanksAnuj
> Sent from Yahoo Mail on Android 
> 
>  On Mon, 11 Apr, 2016 at 11:39 PM, Aleksey Yeschenko 
>wrote:  The answer will depend on how conservative you are.
> 
> The most conservative choice overall would be to go with the 2.2.x line.
> 
> 3.0.x if you want to the new nice and shiny 3.0 things, but can tolerate some 
> risk (the branch has a lot of relatively new core code, and hasn’t yet been 
> tried out by as many users as the 2.x branch had).
> 
> The latest odd 3.x if you want the shiniest (3.5 to be released soon, with 
> features like the new SASI secondary indexes support). Also, there hasn’t yet 
> been that much divergence between 3.0.x and 3.x, so risk levels are around 
> the same, so long as you limit yourself to only the features present in 3.0.x.
> 
> Either way, make sure to properly test whatever release you go for in staging 
> first, as Michael says, and you’ll be alright.
> 
> -- 
> AY
> 
> On 11 April 2016 at 18:42:31, Anuj Wadehra (anujw_2...@yahoo.co.in.invalid) 
> wrote:
> 
> Can someone help me with this one?  
> ThanksAnuj  
> 
> Sent from Yahoo Mail on Android  
> 
> On Sun, 10 Apr, 2016 at 11:07 PM, Anuj Wadehra wrote: 
> Hi,  
> Tick-Tock release strategy in 3.x was a good intiative to ensure frequent & 
> stable releases. While odd releases are supposed to get all the bug fixes and 
> should be most stable, many people like me, who got used to the comforting 
> "production ready/stable" tag on Apache website,  are still reluctant to take 
> latest 3.x odd releases into production. I think the hesitation is somewhat 
> justified as processes often take time to mature.  
> So here I would like to ask the experts, people who know the ground 
> situation, people who actively develop it and manage it. Considering the 
> current scenario, What should be a resonable criteria for taking 3.x releases 
> in production?  
> 
> 
> ThanksAnuj  
> 
> 
> 
> 
>

Re: Criteria for upgrading to 3.x releases in PROD

2016-04-18 Thread Anuj Wadehra

Hi All,
Let me reiterate, my question is not about selecting right Cassandra for me. 
The intent is to get dev community response on below question.
Question:
Would it be a wise decision to mention the "most stable/production
ready" version (as it used to be before 3.x) on the Apache website till
tick-tock release strategy evolves and matures?

Drivers for posting above info on website:
 I have read all the posts/forums and realized that there is no absolute answer 
for selecting Production Ready Cassandra version one should use..Even now, 
people often hesitate to recommend latest releases for Prod and go back to 2.1 
and 2.2..In every suggestion there are too many ifs..like I said...if you want 
features x..if u want rock solid y..if you are adventurous zno offense but  
who would not want a rock solid version for Production? Who would want features 
for stability in Prod? And who would want to take risks in Prod? 
 The stability of a release should NOT depend my risk appetite and use case..if 
some version of 2.1 or 2.2 or 3.0.x is stable for production why not put that 
info until tick-tock matures? 

Please realize that everyone goes for thorough testing before upgrading but the 
scope of application testing cant uncover most critical bugs..Community 
guidance and a bigger picture on stability can help the community until 
tick-tock matures and we deliver stable production ready releases.

ThanksAnuj
Sent from Yahoo Mail on Android 

  On Tue, 19 Apr, 2016 at 3:01 AM, Carlos Rolo wrote:   My 
blog post regarding this:

https://www.pythian.com/blog/cassandra-version-production/

There is a choice for everyone, and explained.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 18, 2016 at 7:12 PM, Anuj Wadehra <
anujw_2...@yahoo.co.in.invalid> wrote:

> I am sorry but here, I am not expecting thousands to decide a stable
> version for my use case. I have a serious question about publishing some
> info on the Apache website. As dev list has active contributors, I posted
> it here. If not this forum, Whats the best way to put your suggestions
> regarding Apache content and initiate a meaningful and conclusive
> discussion thread?
>
> ThanksAnuj
>
> Sent from Yahoo Mail on Android
>
>  On Mon, 18 Apr, 2016 at 11:27 PM, Michael Kjellman<
> mkjell...@internalcircle.com> wrote:  This is best for the users list.
> Test the releases yourself and then decide when it's ready for your use
> case, ops team, and organization. This is a personal decision and not one
> for *thousands* of others on this mailing list to make for you.
>
> best,
> kjellman
>
> > On Apr 18, 2016, at 10:54 AM, Anuj Wadehra
>  wrote:
> >
> > Hi All,
> > For last several months, the "most stable version" question pops up on
> the user mailing list and then people get all sorts of
> responses/suggestions..
> > If you are conservative go for x if adventurous y..
> > If you have good risk appetite go for x else y..
> > If you want features go for x else y..
> >
> > Unfortunately, all above responses dont help many users..but only
> reinforce the low confidence in latest releases.Who wants to be adventurous
> in Production? Who wants to test his risk appetite in Production? And who
> would want features for stability in Production? Not many..I am sure.
> > So my question is:
> > Would it be a wise decision to mention the "most stable/production
> ready" version (as it used to be before 3.x) on the Apache website till
> tick-tock release strategy evolves and matures?
> >  That will somewhat contradict the tick-tock philosphy of stable odd
> releases but would be more realistic as every big change needs time to
> stabilise. Its slightly unfair, if users are kept in confused state till
> the strategy matures and starts delivering solid stable builds.
> > I think the question is more appropriate in dev list so I have kept it
> here.
> > ThanksAnuj
> > Sent from Yahoo Mail on Android
> >
> >  On Mon, 11 Apr, 2016 at 11:39 PM, Aleksey Yeschenko
> wrote:  The answer will depend on how conservative you are.
> >
> > The most conservative choice overall would be to go with the 2.2.x line.
> >
> > 3.0.x if you want to the new nice and shiny 3.0 things, but can tolerate
> some risk (the branch has a lot of relatively new core code, and hasn’t yet
> been tried out by as many users as the 2.x branch had).
> >
> > The latest odd 3.x if you want the shiniest (3.5 to be released soon,
>

Re: Criteria for upgrading to 3.x releases in PROD

2016-04-23 Thread Anuj Wadehra

Jonathan,
I understand you point. In my perspective, people in production usually prefer 
stability over features and would always want at least emergency fix releases 
if not fully supported versions.I am glad that today we have such releases 
which are very stable and not yet EOL. Its just that users are tempted to use 
latest odd releases as per the tick-tock strategy highlighted on the website 
and then probably fallback to previous ones after discussing stable versions on 
various forums. I just wanted to make their decisions simpler :) I agree with 
you - Every thing cant be white and black..stable and unstable..At the same..I 
feel.. most of the time there would be a single stable release which is not EOL.
Thanks for your time.


Anuj
Sent from Yahoo Mail on Android 
 
  On Tue, 19 Apr, 2016 at 7:06 AM, Jonathan Ellis wrote:   
Anuj,

The problem is that this question defies a simplistic answer like "version
X is the most stable" (are you willing to use unsupported releases?  what
about emergency-fix-only?  what features can you not live without?) so
we're intentionally resisting the urge to oversimplify the situation.

On Mon, Apr 18, 2016 at 8:25 PM, Anuj Wadehra <
anujw_2...@yahoo.co.in.invalid> wrote:

> Hi All,
> Let me reiterate, my question is not about selecting right Cassandra for
> me. The intent is to get dev community response on below question.
> Question:
> Would it be a wise decision to mention the "most stable/production
> ready" version (as it used to be before 3.x) on the Apache website till
> tick-tock release strategy evolves and matures?
>
> Drivers for posting above info on website:
>  I have read all the posts/forums and realized that there is no absolute
> answer for selecting Production Ready Cassandra version one should
> use..Even now, people often hesitate to recommend latest releases for Prod
> and go back to 2.1 and 2.2..In every suggestion there are too many
> ifs..like I said...if you want features x..if u want rock solid y..if you
> are adventurous zno offense but  who would not want a rock solid
> version for Production? Who would want features for stability in Prod? And
> who would want to take risks in Prod?
>  The stability of a release should NOT depend my risk appetite and use
> case..if some version of 2.1 or 2.2 or 3.0.x is stable for production why
> not put that info until tick-tock matures?
>
> Please realize that everyone goes for thorough testing before upgrading
> but the scope of application testing cant uncover most critical
> bugs..Community guidance and a bigger picture on stability can help the
> community until tick-tock matures and we deliver stable production ready
> releases.
>
>
>
> ThanksAnuj
> Sent from Yahoo Mail on Android
>
>  On Tue, 19 Apr, 2016 at 3:01 AM, Carlos Rolo wrote:
>  My blog post regarding this:
>
> https://www.pythian.com/blog/cassandra-version-production/
>
> There is a choice for everyone, and explained.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: *
> linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Mon, Apr 18, 2016 at 7:12 PM, Anuj Wadehra <
> anujw_2...@yahoo.co.in.invalid> wrote:
>
> > I am sorry but here, I am not expecting thousands to decide a stable
> > version for my use case. I have a serious question about publishing some
> > info on the Apache website. As dev list has active contributors, I posted
> > it here. If not this forum, Whats the best way to put your suggestions
> > regarding Apache content and initiate a meaningful and conclusive
> > discussion thread?
> >
> > ThanksAnuj
> >
> > Sent from Yahoo Mail on Android
> >
> >  On Mon, 18 Apr, 2016 at 11:27 PM, Michael Kjellman<
> > mkjell...@internalcircle.com> wrote:  This is best for the users list.
> > Test the releases yourself and then decide when it's ready for your use
> > case, ops team, and organization. This is a personal decision and not one
> > for *thousands* of others on this mailing list to make for you.
> >
> > best,
> > kjellman
> >
> > > On Apr 18, 2016, at 10:54 AM, Anuj Wadehra
> >  wrote:
> > >
> > > Hi All,
> > > For last several months, the "most stable version" question pops up on
> > the user mailing list and then people get all sorts of
> > responses/suggestions..
> > > If you are conservative go for x if adventurous y..
> > > If you have good risk appetite go

Re: Criteria for upgrading to 3.x releases in PROD

2016-04-23 Thread Anuj Wadehra

Jack,
The question was about publishing "most stable" release on Apache website as it 
done before 3.x.
Regarding your comments, I still feel adventure cant happen in production 
systems. And you should certainly test every release before upgrading but you 
woulf not like to upgrade to latest releases based on your limited testing. I 
feel that you cant do exhaustive testing of the database and can easily miss 
critical corner cases which may trigger in production. But its just my 
perspective of looking at things. People may think differently.
Thanks All of you for your comments !! 

ThanksAnuj
Sent from Yahoo Mail on Android 

  On Sun, 24 Apr, 2016 at 1:28 AM, Jack Krupansky 
wrote:   Is the question whether a new application can go into production with 
3.x,
or whether an existing application in production with 2.x.y should be
upgraded to 3.x?

For the latter, a "If it ain't broke, don't fix it" philosophy is best. And
if there are critical bug fixes needed, simply upgrade the 2.x line that
you are already on. Or if your production is on 3.0.x, upgrade to 3.0.x+k.

For the former, we aren't hearing people hollering that 3.x is crap, so it
is reasonably safe for a new app going into production, subject to your own
testing.

Given the relative stability of 3.x due to the tick-tock and "trunk always
releasable" strategies, users are no longer faced with the kind of wild
instabilities of the past.

Ultimately, stability really is subjective and in the eye of the beholder -
how conservative or adventurous are you and your organization. Sure, maybe
2.2.x is more stable in some abstract sense, but for a new app, why start
so far behind the curve? In fact, for a new app you should be trying to
take advantage of new features and performance improvements, like
materialized views, SASI, and wide rows coming soon.

In the past, upgrading from 2.x to 2.y was a big deal. That just isn't a
problem with upgrading from 3.x to 3.y. At least in theory, and again,
nobody has been hollering about having problems doing that.

For EOL, you will have to judge for yourself how long it may take your
organization to carefully migrate a production 2.x system to 3.x somewhere
down the road. No need to rush, but don't wait until the last minute
either. And I suspect that you won't even want to think about upgrading 2.x
to 4.x - IOW, upgrade to 3.x well before 3.x EOL.

-- Jack Krupansky

On Sat, Apr 23, 2016 at 3:28 PM, Anuj Wadehra <
anujw_2...@yahoo.co.in.invalid> wrote:

> Jonathan,
> I understand you point. In my perspective, people in production usually
> prefer stability over features and would always want at least emergency fix
> releases if not fully supported versions.I am glad that today we have such
> releases which are very stable and not yet EOL. Its just that users are
> tempted to use latest odd releases as per the tick-tock strategy
> highlighted on the website and then probably fallback to previous ones
> after discussing stable versions on various forums. I just wanted to make
> their decisions simpler :) I agree with you - Every thing cant be white and
> black..stable and unstable..At the same..I feel.. most of the time there
> would be a single stable release which is not EOL.
> Thanks for your time.
>
>
> Anuj
> Sent from Yahoo Mail on Android
>
>  On Tue, 19 Apr, 2016 at 7:06 AM, Jonathan Ellis
> wrote:  Anuj,
>
> The problem is that this question defies a simplistic answer like "version
> X is the most stable" (are you willing to use unsupported releases?  what
> about emergency-fix-only?  what features can you not live without?) so
> we're intentionally resisting the urge to oversimplify the situation.
>
> On Mon, Apr 18, 2016 at 8:25 PM, Anuj Wadehra <
> anujw_2...@yahoo.co.in.invalid> wrote:
>
> > Hi All,
> > Let me reiterate, my question is not about selecting right Cassandra for
> > me. The intent is to get dev community response on below question.
> > Question:
> > Would it be a wise decision to mention the "most stable/production
> > ready" version (as it used to be before 3.x) on the Apache website till
> > tick-tock release strategy evolves and matures?
> >
> > Drivers for posting above info on website:
> >  I have read all the posts/forums and realized that there is no absolute
> > answer for selecting Production Ready Cassandra version one should
> > use..Even now, people often hesitate to recommend latest releases for
> Prod
> > and go back to 2.1 and 2.2..In every suggestion there are too many
> > ifs..like I said...if you want features x..if u want rock solid y..if you
> > are adventurous zno offense but  who would not want a rock solid
> > version for Production? Who would want features for stability in Prod?
>

Re: [Proposal] Mandatory comments

2016-05-02 Thread Anuj Wadehra

I think this is very basic. If its not followed till now, we should do that now 
on.Just a suggestion.
So,there should be a rule and may be a code review checklist point to verify 
that "quality" of comments is ok and comments are sufficient.
Regarding, high level comments, I feel that they are wonderful and often you 
can effortlessly get the low level design by reading them. The only drawback is 
their maintenance. Maintenance of "big picture" when code starts drifiting from 
it is tough. You will make an important change in class X and should take care 
that the big picture is updated at some other place. One may debate that such 
instances would be rare.

ThanksAnuj
Sent from Yahoo Mail on Android 

  On Tue, 3 May, 2016 at 12:26 AM, Sylvain Lebresne 
wrote:   On Mon, May 2, 2016 at 7:16 PM, Jonathan Ellis  
wrote:

> What I'd like to see is more comments like the one in StreamSession:
> something that can give me the "big picture" for a piece of functionality.
>

I wholeheartedly agree that we need more of those. I don't think though that
we need a single kind of comment, nor even that we're lacking on a single
kind.

>
> I wonder if focusing on class-based comments might miss an opportunity
> here.

I don't meant to imply any exclusive focusing by my suggestion. I'm
constantly
seeing classes that not well explained and methods that make complex and
undocumented assumptions, so I'm very much convinced improvements are
needed on that front. Without, again, invalidating the equal need for big
picture
comments.

>
> Is this a case for package-level javadoc, and organizing our class
> hierarchy better along those lines?
>

I agree that this would probably be the best place for those bit-picture
documentation. I'd be totally fine adding on top of the rule I suggested
another
one that says:
  - If you create a new package, you should have a package level javadoc
that
    describe the big picture of what that package is about.

I do want to note that I'm trying to focus the discussion here on a few
simple
concrete points we could hopefully easily agree on so that we improve our
ways
moving forward and I'd personally love to focus on that first. That won't
fix
existing code by itself, but the optimistic in me hopes that if we get more
consistent
quality of comments in new code, our inconfort with the lack of comments in
old
code will grow and we'll end up fixing it naturally over time.

--
Sylvain

>
> On Mon, May 2, 2016 at 11:26 AM, Sylvain Lebresne 
> wrote:
>
> > There could be disagreement on this, but I think that we are in general
> not
> > very good at commenting Cassandra code and I would suggest that we make a
> > collective effort to improve on this. And to help with that goal, I would
> > like
> > to suggest that we add the following rule to our code style guide
> > (https://wiki.apache.org/cassandra/CodeStyle):
> >  - Every public class and method must have a quality javadoc. That
> > javadoc, on
> >    top of describing what the class/method does, should call particular
> >    attention to the assumptions that the class/method does, if any.
> >
> > And of course, we'd also start enforcing that rule by not +1ing code
> unless
> > it
> > adheres to this rule.
> >
> > Note that I'm not pretending this arguably simplistic rule will magically
> > make
> > comments perfect, it won't. It's still up to us to write good and
> complete
> > comments, and it doesn't even talk about comments within methods that are
> > important too. But I think some basic rule here would be beneficial and
> > that
> > one feels sensible to me.
> >
> > Looking forward to other's opinions and feedbacks on this proposal.
> >
> > --
> > Sylvain
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Possible Bug: bucket_low has no effect in STCS

2016-06-13 Thread Anuj Wadehra

Hi,

I am trying to understand the algorithm of STCS. As per my current 
understanding of the code, there seems to be no impact of setting bucket_low in 
the STCS compaction algorithm. Moreover, I see some optimization. I would 
appreciate if some designer can correct me or confirm that it's a bug sonthat I 
can raise a JIRA.


Details
--
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in 
ascending order and then iterates over them one by one to associate them to an 
existing/new bucket. When, iterating sstables in ascending order of size, I 
can't find ANY single scenario where the current sstable in the outer loop 
iteration is below the oldAverageSize of any existing bucket. Current sstable 
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL 
existing buckets as ALL previous sstables in existing buckets were 
smaller/equal in size to the sstable being iterated.

So, there is NO scenario when size > (oldAverageSize * bucketLow) and size < 
oldAverageSize, so bucket_low property never comes into play no matter what 
value you set for it.


Also, while iteraitng over sstables (sortedfiles) by size in ascending order, 
there is no point iterating over all existing buckets. We could just start from 
the LAST bucket where previous sstable was associated.  oldAverageSize of ALL 
other buckets will NEVER allow the sstable being iterated.

 for (Entry> entry : buckets.entrySet())
{...}



Thanks
Anuj

Re: Possible Bug: bucket_low has no effect in STCS

2016-06-14 Thread Anuj Wadehra

Can any developer confirm the issue?

ThanksAnuj


Sent from Yahoo Mail on Android 
 
  On Mon, 13 Jun, 2016 at 11:15 PM, Anuj Wadehra wrote: 
  Hi,

I am trying to understand the algorithm of STCS. As per my current 
understanding of the code, there seems to be no impact of setting bucket_low in 
the STCS compaction algorithm. Moreover, I see some optimization. I would 
appreciate if some designer can correct me or confirm that it's a bug sonthat I 
can raise a JIRA.


Details
--
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in 
ascending order and then iterates over them one by one to associate them to an 
existing/new bucket. When, iterating sstables in ascending order of size, I 
can't find ANY single scenario where the current sstable in the outer loop 
iteration is below the oldAverageSize of any existing bucket. Current sstable 
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL 
existing buckets as ALL previous sstables in existing buckets were 
smaller/equal in size to the sstable being iterated.

So, there is NO scenario when size > (oldAverageSize * bucketLow) and size < 
oldAverageSize, so bucket_low property never comes into play no matter what 
value you set for it.


Also, while iteraitng over sstables (sortedfiles) by size in ascending order, 
there is no point iterating over all existing buckets. We could just start from 
the LAST bucket where previous sstable was associated.  oldAverageSize of ALL 
other buckets will NEVER allow the sstable being iterated.

 for (Entry> entry : buckets.entrySet())
            {...}



Thanks
Anuj

Re: Possible Bug: bucket_low has no effect in STCS

2016-06-15 Thread Anuj Wadehra

Should I raise JIRA ?? Or some develiper with knowledge of STCS could confirm 
the bug ??

Anuj



Sent from Yahoo Mail on Android 
 
  On Tue, 14 Jun, 2016 at 12:52 PM, Anuj Wadehra wrote: 
  Can any developer confirm the issue?

ThanksAnuj


Sent from Yahoo Mail on Android 
 
  On Mon, 13 Jun, 2016 at 11:15 PM, Anuj Wadehra wrote: 
  Hi,

I am trying to understand the algorithm of STCS. As per my current 
understanding of the code, there seems to be no impact of setting bucket_low in 
the STCS compaction algorithm. Moreover, I see some optimization. I would 
appreciate if some designer can correct me or confirm that it's a bug sonthat I 
can raise a JIRA.


Details
--
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in 
ascending order and then iterates over them one by one to associate them to an 
existing/new bucket. When, iterating sstables in ascending order of size, I 
can't find ANY single scenario where the current sstable in the outer loop 
iteration is below the oldAverageSize of any existing bucket. Current sstable 
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL 
existing buckets as ALL previous sstables in existing buckets were 
smaller/equal in size to the sstable being iterated.

So, there is NO scenario when size > (oldAverageSize * bucketLow) and size < 
oldAverageSize, so bucket_low property never comes into play no matter what 
value you set for it.


Also, while iteraitng over sstables (sortedfiles) by size in ascending order, 
there is no point iterating over all existing buckets. We could just start from 
the LAST bucket where previous sstable was associated.  oldAverageSize of ALL 
other buckets will NEVER allow the sstable being iterated.

 for (Entry> entry : buckets.entrySet())
            {...}



Thanks
Anuj

Re: [VOTE] Release Apache Cassandra 3.8

2016-07-29 Thread Anuj Wadehra

Hi Michael,
Just found an issue in Changes.txt. 
Add cross-DC latency metrics (CASSANDRA-11596) should be Track message latency
across DCs CASSANDRA-11569.
ThanksAnujSent from Yahoo Mail on Android 
 
  On Thu, 21 Jul, 2016 at 3:18 AM, Michael Shuler wrote:   
I propose the following artifacts for release as 3.8.

sha1: c3ded0551f538f7845602b27d53240cd8129265c
Git:
http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.8-tentative
Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1123/org/apache/cassandra/apache-cassandra/3.8/
Staging repository:
https://repository.apache.org/content/repositories/orgapachecassandra-1123/

The debian packages are available here: http://people.apache.org/~mshuler/

The vote will be open for 72 hours (longer if needed).

[1]: http://goo.gl/oGNH0i (CHANGES.txt)
[2]: http://goo.gl/KjMtUn (NEWS.txt)
[3]: https://goo.gl/TxVLKo (3.8 Test Summary)

Re: A proposal to move away from Jira-centric development

2016-08-15 Thread Anuj Wadehra

Hi,
I think tracking things in a tool would be better than having mailing 
lists+JIRA. To make feature JIRAs easier to comprehend, we can close every JIRA 
discussion with an attached Design proposal (mandatory). Once design is frozen 
and complete, one can start with the implementation. 
Not sure about JIRA customizations possible.It would be good if we could 
customize JIRA tickets to keep discussions isolated from approved design 
(within single JIRA ticket).
I personally find it tough to go through long JIRA discussions, just to 
understand the final design concluded for a problem/feature.
Discussing initial thoughts about pain areas,improvements etc can be done on 
the dev mailing list. 
ThanksAnuj


 
 
  On Mon, 15 Aug, 2016 at 7:52 PM, Jonathan Ellis wrote:   A 
long time ago, I was a proponent of keeping most development discussions
on Jira, where tickets can be self contained and the threadless nature
helps keep discussions from getting sidetracked.

But Cassandra was a lot smaller then, and as we've grown it has become
necessary to separate out the signal (discussions of new features and major
changes) from the noise of routine bug reports.

I propose that we take advantage of the dev list to perform that
separation.  Major new features and architectural improvements should be
discussed first here, then when consensus on design is achieved, moved to
Jira for implementation and review.

I think this will also help with the problem when the initial idea proves
to be unworkable and gets revised substantially later after much
discussion.  It can be difficult to figure out what the conclusion was, as
review comments start to pile up afterwards.  Having that discussion on the
list, and summarizing on Jira, would mitigate this.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-10 Thread Anuj Wadehra

Hi,

We need to understand that time is precious for all of us. Even if a developer 
has intentions to contribute, he may take months to contribute his first patch 
or may be longer. Some common entry barriers are:
1. Difficult to identify low hanging fruits. 30 JIRA comments on a ticket and a 
new comer is LOST, even though the exact fix may be much simpler.
2. Dead JIRA discussions with no clue on the current status of the ticket.
3. No response on new JIRAs raised. Response time  to validate/reject the 
problem is important. Should I pick? Is it really a bug? Maybe some expert can 
confirm it first and then I can pick it..
4.Ping Pong JIRAs: Your read 10 comments of a ticket then see duplicates and 
related ones..then read 30 more comments and then so on till you land up on 
same JIRA which is not concluded yet.
Possible Solution for above 4 points:
A. Add a new JIRA field to crisply summarize what conclusive discussion has 
taken place till now ,what's the status of current JIRA, proposed/feasible 
solution etc.
B. Mark low hanging fruits regularly.
C. Validate/Reject newly reported JIRAs on priority. Using dev list to 
validate/reject the issue before logging the JIRA??
D. Make sure that duplicates are real proven duplicates.

5. Insufficient code comments. 
Solution: Coding comments should be a mandatory part of code review checklist. 
It makes reviews faster and encourage people to understand the flow and fix 
things on their own.
6. Insufficient Design documentation.
Solution:Detailed documentation for at least new features so that people are 
comfortable with the design. Reading English and understanding diagrams/flows 
is much simpler than just jumping into the code upfront.
7. No/Little formal communication on active development and way forward.
Solution: What about a monthly summary of New/Hot/critical JIRAs and new 
feature development (with JIRA links so that topics of interest are 
accessible)? 

ThanksAnuj

  On Thu, 10 Nov, 2016 at 7:09 AM, Nate McCall wrote:   I 
like the idea of a goal-based approach. I think that would make
coming to a consensus a bit easier particularly if a larger number of
people are involved.

On Tue, Nov 8, 2016 at 8:04 PM, Dikang Gu  wrote:
> My 2 cents. I'm wondering is it a good idea to have some high level goals
> for the major release? For example, the goals could be something like:
> 1. Improve the scalability/reliability/performance by X%.
> 2. Add Y new features (feature A, B, C, D...).
> 3. Fix Z known issues  (issue A, B, C, D...).
>
> I feel If we can have the high level goals, it would be easy to pick the
> jiras to be included in the release.
>
> Does it make sense?
>
> Thanks
> Dikang.
>
> On Mon, Nov 7, 2016 at 11:22 AM, Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
>
>> Recently there was another discussion on documentation and comments [1]
>>
>> On one hand, documentation and comments will help newcomers to familiarise
>> themselves with the codebase. On the other - one may get up to speed by
>> reading the code and adding some docs. Such things may require less
>> oversight and can play some role in improving diversity / increasing an
>> amount of involved people.
>>
>> Same thing with tests. There are some areas where tests need some
>> refactoring / improvements, or even just splitting them from one file to
>> multiple. It's a good way to experience the process and get involved into
>> discussion.
>>
>> For that, we could add some issues with subtasks (just a few for starters)
>> or even just a wiki page with a doc/test wishlist where everyone could add
>> a couple of points.
>>
>> Docs and tests could be used in addition to lhf issues, helping people,
>> having comprehensive and quick process and everything else that was
>> mentioned in this thread.
>>
>> Thank you.
>>
>> [1]
>> http://mail-archives.apache.org/mod_mbox/cassandra-dev/201605.mbox/%
>> 3ccakkz8q088ojbvhycyz2_2eotqk4y-svwiwksinpt6rr9pop...@mail.gmail.com%3E
>>
>> On Mon, Nov 7, 2016 at 5:38 PM Aleksey Yeschenko 
>> wrote:
>>
>> > Agreed.
>> >
>> > --
>> > AY
>> >
>> > On 7 November 2016 at 16:38:07, Jeff Jirsa (jeff.ji...@crowdstrike.com)
>> > wrote:
>> >
>> > ‘Accepted’ JIRA status seems useful, but would encourage something more
>> > explicit like ‘Concept Accepted’ or similar to denote that the concept is
>> > agreed upon, but the actual patch itself may not be accepted yet.
>> >
>> > /bikeshed.
>> >
>> > On 11/7/16, 2:56 AM, "Ben Slater"  wrote:
>> >
>> > >Thanks Dave. The shepherd concept sounds a lot like I had in mind (and a
>> > >better name).
>> > >
>> > >One other thing I noted from the Mesos process - they have an “Accepted”
>> > >jira status that comes after open and means “at least one Mesos
>> developer
>> > >thought that the ideas proposed in the issue are worth pursuing
>> further”.
>> > >Might also be something to consider as part of a process like this?
>> > >
>> > >Cheers
>> > >Ben
>> > >
>> > >On Mon, 7 Nov 2016 at 09:37 Dave Lester  wrote:
>> > >
>

Re: Broader community involvement in 4.0 (WAS Re: Rough roadmap for 4.0)

2016-11-11 Thread Anuj Wadehra

Thanks for the information Jeremy. 

My main concern is around making JIRAs easy to understand. I am not sure how 
community feels about it. But, I have personally observed that long discussion 
thread on JIRAs is not user friendly for someone trying to understand the 
ticket or may be trying to contribute to the discussion/fix . I strongly feel 
that there should be a better way e.g. a summary field in JIRA which filters 
out the discussions, arguments, solutions etc.and just crisply summarizes the 
problem, solution under discussion and the current status. Sometimes 
description of the defect is not sufficient.   For a new comer trying to 
understand a JIRA, this summary would be a good start to understand the problem 
upfront and then if you want to go into details, you can understand the long 
JIRA thread.
Also, some JIRAs are in dead state and you don't get a clue what's the current 
status after so much discussion over the ticket? Some JIRAs are neither 
rejected nor validated, so even if its a bug, some people would be reluctant to 
pick JIRAs which have not been validated yet.

ThanksAnuj

On Friday, 11 November 2016 1:40 AM, Jeremy Hanna 
 wrote:

 Regarding low hanging fruit, on the How To Contribute page [1] we’ve tried to 
keep a list of lhf tickets [2] linked to help people get started.  They are 
usually good starting points and don’t require much context.  I rarely see 
duplicates from lhf tickets.

Regarding duplicates, in my experience those who resolve tickets as duplicates 
are generally pretty good.

I think the safest bet to start is to look at How To Contribute page and the 
lhf labeled tickets.

[1] https://wiki.apache.org/cassandra/HowToContribute 
<https://wiki.apache.org/cassandra/HowToContribute>
[2] 
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+=+12310865+AND+labels+=+lhf+AND+status+!=+resolved

<https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+=+12310865+AND+labels+=+lhf+AND+status+!=+resolved>

> On Nov 10, 2016, at 12:06 PM, Anuj Wadehra  
> wrote:
> 
> 
> Hi,
> 
> We need to understand that time is precious for all of us. Even if a 
> developer has intentions to contribute, he may take months to contribute his 
> first patch or may be longer. Some common entry barriers are:
> 1. Difficult to identify low hanging fruits. 30 JIRA comments on a ticket and 
> a new comer is LOST, even though the exact fix may be much simpler.
> 2. Dead JIRA discussions with no clue on the current status of the ticket.
> 3. No response on new JIRAs raised. Response time  to validate/reject the 
> problem is important. Should I pick? Is it really a bug? Maybe some expert 
> can confirm it first and then I can pick it..
> 4.Ping Pong JIRAs: Your read 10 comments of a ticket then see duplicates and 
> related ones..then read 30 more comments and then so on till you land up on 
> same JIRA which is not concluded yet.
> Possible Solution for above 4 points:
> A. Add a new JIRA field to crisply summarize what conclusive discussion has 
> taken place till now ,what's the status of current JIRA, proposed/feasible 
> solution etc.
> B. Mark low hanging fruits regularly.
> C. Validate/Reject newly reported JIRAs on priority. Using dev list to 
> validate/reject the issue before logging the JIRA??
> D. Make sure that duplicates are real proven duplicates.
> 
> 5. Insufficient code comments. 
> Solution: Coding comments should be a mandatory part of code review 
> checklist. It makes reviews faster and encourage people to understand the 
> flow and fix things on their own.
> 6. Insufficient Design documentation.
> Solution:Detailed documentation for at least new features so that people are 
> comfortable with the design. Reading English and understanding diagrams/flows 
> is much simpler than just jumping into the code upfront.
> 7. No/Little formal communication on active development and way forward.
> Solution: What about a monthly summary of New/Hot/critical JIRAs and new 
> feature development (with JIRA links so that topics of interest are 
> accessible)? 
> 
> ThanksAnuj
> 
> 
>  On Thu, 10 Nov, 2016 at 7:09 AM, Nate McCall wrote:  I 
>like the idea of a goal-based approach. I think that would make
> coming to a consensus a bit easier particularly if a larger number of
> people are involved.
> 
> On Tue, Nov 8, 2016 at 8:04 PM, Dikang Gu  wrote:
>> My 2 cents. I'm wondering is it a good idea to have some high level goals
>> for the major release? For example, the goals could be something like:
>> 1. Improve the scalability/reliability/performance by X%.
>> 2. Add Y new features (feature A, B, C, D...).
>> 3. Fix Z known issues  (issue A, B, C, D...).
>> 
>> I feel If we can have the high

Re: Wrapping up tick-tock

2017-01-14 Thread Anuj Wadehra

Hi,
Now that we are rethinking versioning and release frequency, there exists an 
opportunity to make life easier for Cassandra users.
How often mailing lists are discussing:
"Which Cassandra version is stable for production?"OR"Is x version stable?"
Your release version should indicate your confidence on the stability of the 
release , is it a bug fix or a feature release, are there any breaking changes 
or not.

+1 semver and alpha/beta/GA releases
So that you dont find every second Cassandra user asking about the latest 
stable Cassandra version.
Thanks
Anuj 
 
  On Sat, 14 Jan, 2017 at 1:04 AM, Jeff Jirsa wrote:   Mick 
proposed it (semver) in one of the release proposals, and I dropped
the ball on sending out the actual "vote on which release plan we want to
use" email, because I messed up and got busy.



On Fri, Jan 13, 2017 at 11:26 AM, Russell Bradberry 
wrote:

> Has any thought been given to SemVer?
>
> http://semver.org/
>
> -Russ
>
> On 1/13/17, 1:57 PM, "Jason Brown"  wrote:
>
>    It's fine to limit the minimum time between major releases to six
> months,
>    but I do not think we should force a major just because n months have
>    passed. I think we should up the major only when we have significant
>    (possibly breaking) changes/features. It would seem odd to have a 6.0
>    that's basically the same as 4.0 (in terms of features and
> protocol/format
>    compatibility).
>
>    Thoughts?
>
>    On Wed, Jan 11, 2017 at 1:58 AM, Stefan Podkowinski 
>    wrote:
>
>    > I honestly don't understand the release cadence discussion. The 3.x
> branch
>    > is far from production ready. Is this really the time to plan the
> next
>    > major feature releases on top of it, instead of focusing to
> stabilize 3.x
>    > first? Who knows how long that would take, even if everyone would
>    > exclusively work on bug fixing (which I think should happen).
>    >
>    > On Tue, Jan 10, 2017 at 4:29 PM, Jonathan Haddad 
>    > wrote:
>    >
>    > > I don't see why it has to be one extreme (yearly) or another
> (monthly).
>    > > When you had originally proposed Tick Tock, you wrote:
>    > >
>    > > "The primary goal is to improve release quality.  Our current
> major “dot
>    > > zero” releases require another five or six months to make them
> stable
>    > > enough for production.  This is directly related to how we pile
> features
>    > in
>    > > for 9 to 12 months and release all at once.  The interactions
> between the
>    > > new features are complex and not always obvious.  2.1 was no
> exception,
>    > > despite DataStax hiring a full tme test engineering team
> specifically for
>    > > Apache Cassandra."
>    > >
>    > > I agreed with you at the time that the yearly cycle was too long
> to be
>    > > adding features before cutting a release, and still do now.
> Instead of
>    > > elastic banding all the way back to a process which wasn't working
>    > before,
>    > > why not try somewhere in the middle?  A release every 6 months
> (with
>    > > monthly bug fixes for a year) gives:
>    > >
>    > > 1. long enough time to stabilize (1 year vs 1 month)
>    > > 2. not so long things sit around untested forever
>    > > 3. only 2 releases (current and previous) to do bug fix support at
> any
>    > > given time.
>    > >
>    > > Jon
>    > >
>    > > On Tue, Jan 10, 2017 at 6:56 AM Jonathan Ellis 
>    > wrote:
>    > >
>    > > > Hi all,
>    > > >
>    > > > We’ve had a few threads now about the successes and failures of
> the
>    > > > tick-tock release process and what to do to replace it, but they
> all
>    > died
>    > > > out without reaching a robust consensus.
>    > > >
>    > > > In those threads we saw several reasonable options proposed, but
> from
>    > my
>    > > > perspective they all operated in a kind of theoretical fantasy
> land of
>    > > > testing and development resources.  In particular, it takes
> around a
>    > > > person-week of effort to verify that a release is ready.  That
> is,
>    > going
>    > > > through all the test suites, inspecting and re-running failing
> tests to
>    > > see
>    > > > if there is a product problem or a flaky test.
>    > > >
>    > > > (I agree that in a perfect world this wouldn’t be necessary
> because
>    > your
>    > > > test ci is always green, but see my previous framing of the
> perfect
>    > world
>    > > > as a fantasy land.  It’s also worth noting that this is a common
>    > problem
>    > > > for large OSS projects, not necessarily something to beat
> ourselves up
>    > > > over, but in any case, that's our reality right now.)
>    > > >
>    > > > I submit that any process that assumes a monthly release cadence
> is not
>    > > > realistic from a resourcing standpoint for this validation.
> Notably,
>    > we
>    > > > have struggled to marshal this for 3.10 for two months now.
>    > > >
>    > > > Therefore, I suggest first that we collectively roll up our
> sleeves to
>    > > vet
>    > >

Restore Snapshot

2017-06-27 Thread Anuj Wadehra

Hi,
I am curious to know how people practically use Snapshot restore provided that 
snapshot restore may lead to inconsistent reads until full repair is run on the 
node being restored ( if you have dropped mutations in your cluster).
Example:9 am snapshot taken on all 3 nodes10 am mutation drop on node 311 am 
snapshot restore on node 1. Now the data is only on node 2 if we are writing at 
quorum and we will observe inconsistent reads till we repair node 1.
If you use restore snapshot with join_ring equal to false, repair the node and 
then join the restored node when repair completes, the node will not lead to 
inconsistent reads but will miss writes during the time its being repaired as 
simply booting the node with join_ring=false would also stop pushing writes to 
the node ( unlike boostrap with join_ring=false where writes are pushed to the 
node being bootstrapped) and thus you would need another full repair to make 
data of the node restored via snapshot in sync with other nodes.
Its hard to believe that a simple snapshot restore scenario is still broken and 
people are not complaining. So, I thought of asking the community members..how 
do you practically use snapshot restore while addressing the read inconsistency 
issue.
ThanksAnuj
Sent from Yahoo Mail on Android

Re: Restore Snapshot

2017-06-27 Thread Anuj Wadehra

I mistakenly posted it on dev mailing list. Please ignore. Posting it on user 
mailing list. :)
ThanksAnuj 

Sent from Yahoo Mail on Android 
 
  On Tue, Jun 27, 2017 at 7:01 PM, Anuj Wadehra 
wrote:   Hi,
I am curious to know how people practically use Snapshot restore provided that 
snapshot restore may lead to inconsistent reads until full repair is run on the 
node being restored ( if you have dropped mutations in your cluster).
Example:9 am snapshot taken on all 3 nodes10 am mutation drop on node 311 am 
snapshot restore on node 1. Now the data is only on node 2 if we are writing at 
quorum and we will observe inconsistent reads till we repair node 1.
If you use restore snapshot with join_ring equal to false, repair the node and 
then join the restored node when repair completes, the node will not lead to 
inconsistent reads but will miss writes during the time its being repaired as 
simply booting the node with join_ring=false would also stop pushing writes to 
the node ( unlike boostrap with join_ring=false where writes are pushed to the 
node being bootstrapped) and thus you would need another full repair to make 
data of the node restored via snapshot in sync with other nodes.
Its hard to believe that a simple snapshot restore scenario is still broken and 
people are not complaining. So, I thought of asking the community members..how 
do you practically use snapshot restore while addressing the read inconsistency 
issue.
ThanksAnuj
Sent from Yahoo Mail on Android

URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra

Hi,

For all those people who use MAX TTL=20 years for inserting/updating data in 
production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can silently 
cause irrecoverable Data Loss. This seems like a certain TOP MOST BLOCKER to 
me. I think the category of the JIRA must be raised to BLOCKER from Major. 
Unfortunately, the JIRA is still "Unassigned" and no one seems to be actively 
working on it. Just like any other critical vulnerability, this vulnerability 
demands immediate attention from some very experienced folks to bring out an 
Urgent Fast Track Patch for all currently Supported Cassandra versions 2.1,2.2 
and 3.x. As per my understanding of the JIRA comments, the changes may not be 
that trivial for older releases. So, community support on the patch is very 
much appreciated. 

Thanks
Anuj

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra

Hi Jeremiah,
Validation is on TTL value not on (system_time+ TTL). You can test it with 
below example. Insert is successful, overflow happens silently and data is lost:
create table test(name text primary key,age int);
insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
select * from test where name='test_20yrs';

 name | age
--+-

(0 rows)

insert into test(name,age) values('test_20yr_plus_1',30) USING TTL 
630720001;InvalidRequest: Error from server: code=2200 [Invalid query] 
message="ttl is too large. requested (630720001) maximum (63072)"
ThanksAnuj
On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan 
 wrote:  

 Where is the dataloss?  Does the INSERT operation return successfully to the 
client in this case?  From reading the linked issues it sounds like you get an 
error client side.

-Jeremiah

> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra  
> wrote:
> 
> Hi,
> 
> For all those people who use MAX TTL=20 years for inserting/updating data in 
> production, https://issues.apache.org/jira/browse/CASSANDRA-14092 can 
> silently cause irrecoverable Data Loss. This seems like a certain TOP MOST 
> BLOCKER to me. I think the category of the JIRA must be raised to BLOCKER 
> from Major. Unfortunately, the JIRA is still "Unassigned" and no one seems to 
> be actively working on it. Just like any other critical vulnerability, this 
> vulnerability demands immediate attention from some very experienced folks to 
> bring out an Urgent Fast Track Patch for all currently Supported Cassandra 
> versions 2.1,2.2 and 3.x. As per my understanding of the JIRA comments, the 
> changes may not be that trivial for older releases. So, community support on 
> the patch is very much appreciated. 
> 
> Thanks
> Anuj

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra

 Hi Paulo,

Thanks for looking into the issue on priority. I have serious concerns 
regarding reducing the TTL to 15 yrs.The patch will immediately break all 
existing applications in Production which are using 15+ yrs TTL. And then they 
would be stuck again until all the logic in Production software is modified and 
the software is upgraded immediately. This may take days. Such heavy downtime 
is generally not acceptable for any business. Yes, they will not have silent 
data loss but they would not be able to do any business either. I think the 
permanent fix must be prioritized and put on extremely fast track. This is a 
certain Blocker and the impact could be enormous--with and without the 15 year 
short-term patch.

And believe me --there are plenty such business use cases where you use very 
long TTLs such as 20 yrs for compliance and other reasons.

Thanks
Anuj

On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
 wrote:  
 
 why are people inserting data with a 15+ year TTL? sorta curious about the 
actual use case for that.

> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
> 
> The assertion was working fine until yesterday 03:14 UTC.
> 
> The long term solution would be to work with a long instead of a int. The
> serialized seems to be a variable-int already, so that should be fine
> already.
> 
> If you change the assertion to 15 years, then applications might fail, as
> they might be setting a 15+ year ttl.
> 
> regards,
> Christian
> 
> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
> wrote:
> 
>> Thanks for raising this. Agreed this is bad, when I filed
>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>> 3.0+
>> 
>> I propose adding the assertion back so writes will fail, and reduce
>> the max TTL to something like 15 years for the time being while we
>> figure a long term solution.
>> 
>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
>>> If you aren’t getting an error, then I agree, that is very bad.  Looking
>> at the 3.0 code it looks like the assertion checking for overflow was
>> dropped somewhere along the way, I had only been looking into 2.1 where you
>> get an assertion error that fails the query.
>>> 
>>> -Jeremiah
>>> 
>>>> On Jan 25, 2018, at 2:21 PM, Anuj Wadehra 
>> wrote:
>>>> 
>>>> 
>>>> Hi Jeremiah,
>>>> Validation is on TTL value not on (system_time+ TTL). You can test it
>> with below example. Insert is successful, overflow happens silently and
>> data is lost:
>>>> create table test(name text primary key,age int);
>>>> insert into test(name,age) values('test_20yrs',30) USING TTL 63072;
>>>> select * from test where name='test_20yrs';
>>>> 
>>>> name | age
>>>> --+-
>>>> 
>>>> (0 rows)
>>>> 
>>>> insert into test(name,age) values('test_20yr_plus_1',30) USING TTL
>> 630720001;InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="ttl is too large. requested (630720001) maximum (63072)"
>>>> ThanksAnuj
>>>>  On Friday 26 January 2018, 12:11:03 AM IST, J. D. Jordan <
>> jeremiah.jor...@gmail.com> wrote:
>>>> 
>>>> Where is the dataloss?  Does the INSERT operation return successfully
>> to the client in this case?  From reading the linked issues it sounds like
>> you get an error client side.
>>>> 
>>>> -Jeremiah
>>>> 
>>>>> On Jan 25, 2018, at 1:24 PM, Anuj Wadehra 
>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> For all those people who use MAX TTL=20 years for inserting/updating
>> data in production, https://issues.apache.org/jira/browse/CASSANDRA-14092
>> can silently cause irrecoverable Data Loss. This seems like a certain TOP
>> MOST BLOCKER to me. I think the category of the JIRA must be raised to
>> BLOCKER from Major. Unfortunately, the JIRA is still "Unassigned" and no
>> one seems to be actively working on it. Just like any other critical
>> vulnerability, this vulnerability demands immediate attention from some
>> very experienced folks to bring out an Urgent Fast Track Patch for all
>> currently Supported Cassandra versions 2.1,2.2 and 3.x. As per my
>> understanding of the JIRA comments, the changes may not be that trivial for
>> older releases. So, community support on the patch is very much appreciated.
>>

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-25 Thread Anuj Wadehra

 Hi Jeff,

Thanks for the prompt action! I agree that patching an application MAY have a 
shorter life cycle than patching Cassandra in production. But, in the interest 
of the larger Cassandra user community, we should put our best effort to avoid 
breaking all the affected applications in production. We should also consider 
that updating business logic as per the new 15 year TTL constraint may have 
business implications for many users. I have a limited understanding about the 
complexity of the code patch, but it may be more feasible to extend the 20 year 
limit in Cassandra in 2.1/2.2 rather than asking all impacted users to do an 
immediate business logic adaptation. Moreover, now that we officially support 
Cassandra 2.1 & 2.2 until 4.0 release and provide critical fixes for 2.1, it 
becomes even more reasonable to provide this extremely critical patch for 2.1 & 
2.2 (unless its absolutely impossible). Still, many users use Cassandra 2.1 and 
2.2 in their most critical production systems.

Thanks
Anuj

On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
wrote:  

 We’ll get patches out. They almost certainly aren’t going to change the 
sstable format for old versions (unless whoever writes the patch makes a great 
argument for it), so there’s probably not going to be post-2038 ttl support for 
2.1/2.2. For those old versions, we can definitely make it not lose data, but 
we almost certainly aren’t going to make the ttl go past 2038 in old versions. 

More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
patched version should start by patching their app to not write invalid ttls - 
your app release cycle is almost certainly faster than db patch / review / test 
/ release / validation, and you can avoid the data loss application side by 
calculating the ttl explicitly. It’s not the best solution, but it beats doing 
nothing, and we’re not rushing out a release in less than a day (we haven’t 
even started a vote, and voting window is 72 hours for members to review and 
approve or reject the candidate).

-- 
Jeff Jirsa

> On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
> 
> Patches welcome.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
>> wrote:
>> 
>> Hi Paulo,
>> 
>> Thanks for looking into the issue on priority. I have serious concerns 
>> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
>> existing applications in Production which are using 15+ yrs TTL. And then 
>> they would be stuck again until all the logic in Production software is 
>> modified and the software is upgraded immediately. This may take days. Such 
>> heavy downtime is generally not acceptable for any business. Yes, they will 
>> not have silent data loss but they would not be able to do any business 
>> either. I think the permanent fix must be prioritized and put on extremely 
>> fast track. This is a certain Blocker and the impact could be enormous--with 
>> and without the 15 year short-term patch.
>> 
>> And believe me --there are plenty such business use cases where you use very 
>> long TTLs such as 20 yrs for compliance and other reasons.
>> 
>> Thanks
>> Anuj
>> 
>>  On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
>> wrote:  
>> 
>> why are people inserting data with a 15+ year TTL? sorta curious about the 
>> actual use case for that.
>> 
>>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>>> 
>>> The assertion was working fine until yesterday 03:14 UTC.
>>> 
>>> The long term solution would be to work with a long instead of a int. The
>>> serialized seems to be a variable-int already, so that should be fine
>>> already.
>>> 
>>> If you change the assertion to 15 years, then applications might fail, as
>>> they might be setting a 15+ year ttl.
>>> 
>>> regards,
>>> Christian
>>> 
>>> On Thu, Jan 25, 2018 at 9:19 PM, Paulo Motta 
>>> wrote:
>>> 
>>>> Thanks for raising this. Agreed this is bad, when I filed
>>>> CASSANDRA-14092 I thought a write would fail when localDeletionTime
>>>> overflows (as it is with 2.1), but that doesn't seem to be the case on
>>>> 3.0+
>>>> 
>>>> I propose adding the assertion back so writes will fail, and reduce
>>>> the max TTL to something like 15 years for the time being while we
>>>> figure a long term solution.
>>>> 
>>>> 2018-01-25 18:05 GMT-02:00 Jeremiah D Jordan :
>>>>> If you aren’t getting an error, then I agree, that is very bad.  Looking
>>>> at the 3.0 code it looks like the asser

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-26 Thread Anuj Wadehra

Hi Jeff,
One correction in my last message: "it may be more feasible to SUPPORT (not 
extend) the 20 year limit in Cassandra in 2.1/2.2". 
I completely agree that the existing 20 years TTL support is okay for older 
versions. 

If I have understood your last message correctly, upcoming patches are on 
following lines :

1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 & 
2.2 would support the existing 20 year TTL limit and ensure that there is no 
data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
unlikely to update the sstable format.
4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
support beyond 2038).
I think that the JIRA priority should be increased from "Major" to "Blocker" as 
the JIRA may cause unexpected data loss. Also, all impacted versions should be 
included in the JIRA. This will attract the due attention of all Cassandra 
users.
ThanksAnuj
On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
 wrote:  

  Hi Jeff,

Thanks for the prompt action! I agree that patching an application MAY have a 
shorter life cycle than patching Cassandra in production. But, in the interest 
of the larger Cassandra user community, we should put our best effort to avoid 
breaking all the affected applications in production. We should also consider 
that updating business logic as per the new 15 year TTL constraint may have 
business implications for many users. I have a limited understanding about the 
complexity of the code patch, but it may be more feasible to extend the 20 year 
limit in Cassandra in 2.1/2.2 rather than asking all impacted users to do an 
immediate business logic adaptation. Moreover, now that we officially support 
Cassandra 2.1 & 2.2 until 4.0 release and provide critical fixes for 2.1, it 
becomes even more reasonable to provide this extremely critical patch for 2.1 & 
2.2 (unless its absolutely impossible). Still, many users use Cassandra 2.1 and 
2.2 in their most critical production systems.

Thanks
Anuj

    On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
wrote:  

 We’ll get patches out. They almost certainly aren’t going to change the 
sstable format for old versions (unless whoever writes the patch makes a great 
argument for it), so there’s probably not going to be post-2038 ttl support for 
2.1/2.2. For those old versions, we can definitely make it not lose data, but 
we almost certainly aren’t going to make the ttl go past 2038 in old versions. 

More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
patched version should start by patching their app to not write invalid ttls - 
your app release cycle is almost certainly faster than db patch / review / test 
/ release / validation, and you can avoid the data loss application side by 
calculating the ttl explicitly. It’s not the best solution, but it beats doing 
nothing, and we’re not rushing out a release in less than a day (we haven’t 
even started a vote, and voting window is 72 hours for members to review and 
approve or reject the candidate).

-- 
Jeff Jirsa

> On Jan 25, 2018, at 9:07 PM, Jeff Jirsa  wrote:
> 
> Patches welcome.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 25, 2018, at 8:15 PM, Anuj Wadehra  
>> wrote:
>> 
>> Hi Paulo,
>> 
>> Thanks for looking into the issue on priority. I have serious concerns 
>> regarding reducing the TTL to 15 yrs.The patch will immediately break all 
>> existing applications in Production which are using 15+ yrs TTL. And then 
>> they would be stuck again until all the logic in Production software is 
>> modified and the software is upgraded immediately. This may take days. Such 
>> heavy downtime is generally not acceptable for any business. Yes, they will 
>> not have silent data loss but they would not be able to do any business 
>> either. I think the permanent fix must be prioritized and put on extremely 
>> fast track. This is a certain Blocker and the impact could be enormous--with 
>> and without the 15 year short-term patch.
>> 
>> And believe me --there are plenty such business use cases where you use very 
>> long TTLs such as 20 yrs for compliance and other reasons.
>> 
>> Thanks
>> Anuj
>> 
>>  On Friday 26 January 2018, 4:57:13 AM IST, Michael Kjellman 
>> wrote:  
>> 
>> why are people inserting data with a 15+ year TTL? sorta curious about the 
>> actual use case for that.
>> 
>>> On Jan 25, 2018, at 12:36 PM, horschi  wrote:
>>> 
>>> The assertion was working fine until yesterday 03:14 UTC.
>>> 
>>> The long term solution would be to work with a long instead of a int. The
>>> serialized seems to be a variable-int already, so that should be fine
>>> al

Re: URGENT: CASSANDRA-14092 causes Data Loss

2018-01-27 Thread Anuj Wadehra

Hi Paulo,

Thanks for coming out with the Emergency Hot Fix!! 
The patch will help many Cassandra users in saving their precious data.
I think the criticality and urgency of the bug is too high. How can we make 
sure that maximum Cassandra users are alerted about the silent deletion 
problem? What are formal ways of working for broadcasting such critical alerts? 
I still see that the JIRA is marked as a "Major" defect and not a "Blocker". 
What worst can happen to a database than irrecoverable silent deletion of 
successfully inserted data. I hope you understand.

ThanksAnuj

  On Fri, 26 Jan 2018 at 18:57, Paulo Motta wrote:   
> I have serious concerns regarding reducing the TTL to 15 yrs.The patch will 
immediately break all existing applications in Production which are using 15+ 
yrs TTL.

In order to prevent applications from breaking I will update the patch
to automatically set the maximum TTL to '03:14:08 UTC 19 January 2038'
when it overflows and log a warning as a initial measure.  We will
work on extending this limit or lifting this limitation, probably for
the 3.0+ series due to the large scale compatibility changes required
on lower versions, but community patches are always welcome.

Companies that cannot upgrade to a version with the proper fix will
need to workaround this limitation in some other way: do a batch job
to delete old data periodically, perform deletes with timestamps in
the future, etc.

> If its a 32 bit timestamp, can't we just save/read localDeletionTime as 
> unsinged int?

The proper fix will likely be along these lines, but this involve many
changes throughout the codebase where localDeletionTime is consumed
and extensive testing, reviewing, etc, so we're now looking into a
emergency hot fix to prevent silent data loss while the permanent fix is
not in place.

2018-01-26 6:27 GMT-02:00 Anuj Wadehra :
> Hi Jeff,
> One correction in my last message: "it may be more feasible to SUPPORT (not 
> extend) the 20 year limit in Cassandra in 2.1/2.2".
> I completely agree that the existing 20 years TTL support is okay for older 
> versions.
>
> If I have understood your last message correctly, upcoming patches are on 
> following lines :
>
> 1. New Patches shall be released for 2.1, 2.2 and 3.x.2. The patches for 2.1 
> & 2.2 would support the existing 20 year TTL limit and ensure that there is 
> no data loss when 20 year is set as TTL.3. The patches for 2.1 and 2.2 are 
> unlikely to update the sstable format.
> 4. 3.x patches may even remove the 20 year TTL constraint (and extend TTL 
> support beyond 2038).
> I think that the JIRA priority should be increased from "Major" to "Blocker" 
> as the JIRA may cause unexpected data loss. Also, all impacted versions 
> should be included in the JIRA. This will attract the due attention of all 
> Cassandra users.
> ThanksAnuj
>    On Friday 26 January 2018, 12:47:18 PM IST, Anuj Wadehra 
> wrote:
>
>  Hi Jeff,
>
> Thanks for the prompt action! I agree that patching an application MAY have a 
> shorter life cycle than patching Cassandra in production. But, in the 
> interest of the larger Cassandra user community, we should put our best 
> effort to avoid breaking all the affected applications in production. We 
> should also consider that updating business logic as per the new 15 year TTL 
> constraint may have business implications for many users. I have a limited 
> understanding about the complexity of the code patch, but it may be more 
> feasible to extend the 20 year limit in Cassandra in 2.1/2.2 rather than 
> asking all impacted users to do an immediate business logic adaptation. 
> Moreover, now that we officially support Cassandra 2.1 & 2.2 until 4.0 
> release and provide critical fixes for 2.1, it becomes even more reasonable 
> to provide this extremely critical patch for 2.1 & 2.2 (unless its absolutely 
> impossible). Still, many users use Cassandra 2.1 and 2.2 in their most 
> critical production systems.
>
> Thanks
> Anuj
>
>    On Friday 26 January 2018, 11:06:30 AM IST, Jeff Jirsa  
>wrote:
>
>  We’ll get patches out. They almost certainly aren’t going to change the 
>sstable format for old versions (unless whoever writes the patch makes a great 
>argument for it), so there’s probably not going to be post-2038 ttl support 
>for 2.1/2.2. For those old versions, we can definitely make it not lose data, 
>but we almost certainly aren’t going to make the ttl go past 2038 in old 
>versions.
>
> More importantly, any company trying to do 20 year ttl’s that’s waiting for a 
> patched version should start by patching their app to not write invalid ttls 
> - your app release cycle is almost certainly faster than db patch / review / 
> test / release / validation

Design Proposal for Auditing feature in Cassandra

2018-02-26 Thread Anuj Wadehra

Hi,
Apache Cassandra doesn't provides an auditing feature. As Database auditing is 
critical for any production level database like Apache Cassandra, our team is 
keen on designing & implementing this feature in Apache Cassandra. 
I have submitted the Design proposal for "Database Auditing" feature under the 
JIRA: https://issues.apache.org/jira/browse/CASSANDRA-12151 . Can some of you 
please review the proposal and share your feedback?


ThanksAnuj

Run Mixed Workload using two instances on one node

2015-03-16 Thread Anuj Wadehra

Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the 
feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some 
CF are used for both OLTP and Reporting while others are solely used for 
Reporting.Every business transaction synchronously updates the main OLTP CF and 
asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP 
performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on 
File sytem/shared disk.Reporting should use these Records to create Reporting 
DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger 
customers can be given an option to have dedicated OLTP and Reporting nodes. 
So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or 
OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve 
Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 
64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
A. Install 2 Cassandra instances on each node one for OLTP and other for 
Reporting
B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers 
replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
C. RAM is abundant and often under-utilized , so assign 8GB each for 2 
Cassandra instance
D. To make sure that Reporting is not able to overload CPU, tune 
concurrent_reads,concurrent_writes 
 OLTP client will only write to OLTP DB and generate DB record. Reporting 
client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical 
nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and 
limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj

50 matches

Mail list logo