Re: Testing 4.0 upgrading

2018-10-24 Thread Tommy Stendahl
Hi Jason,

Thanks for responding.

I have created two jiras for these things: CASSANDRA-14841 and 
CASSANDRA-14842.

I understand the initial 4.0->3.x connection problem, since the incoming 
connection from the old node isn't accepted the 4.0 node will never know 
that it tries to connect to a old node and continuous to try to connect 
on the storage_port instead on the ssl_storage_port.

I will enable SSL debugging and activate wire tracing and see if I can 
find out more.

I can also add that I have upgraded all nodes in the cluster to 4.0 and 
once a node was upgraded to 4.0 it started talking to the other 4.0 node 
as expected. I had data in the cluster from before the upgrade and after 
doing "nodetool upgradesstables" there where no issues restarting the 
clients.

Now I will reinstall the old version and do the upgrade again and 
hopefully I can have more information soon.

/Tommy


On 2018-10-23 19:12, Jason Brown wrote:
> Hi Tommy,
>
> Thank you for taking the time to start kicking the tires on the upgrade to
> 4.0. It looks like you've found two bugs:
>
> 1) "Unknown column coordinator_port during deserialization" (reported on
> 3.x nodes)
>
> - looks like the 4.0 node isn't filtering out a column from a system table
> that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you
> open a JIRA for this, and tag @aweisberg?
>
> 2) SSL connection problems
>
> I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in
> MessagingService. TL;DR we don't know the version of a peer when restarting
> and need to wait for that peer to connect to the local node and pass it's
> correct messaging version (if the local node cannot interpret the handhsake
> from the peer). However, why for the inbound connection to the 4.0 node it
> is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from
> there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x
> node to see exactly what it is trying to do? Also, you can enable wire
> tracing on the 4.0 node by setting this value to true [3] and recompiling.
> We can followup further in the jira.
>
> Thanks!
>
> -Jason
>
> [1]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668
> [2]
> https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
> [3]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84
>
> On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl 
> wrote:
>
>> Hi,
>>
>> I have been testing upgrade to 4.0, I started out with a cluster with
>> 3.0.15 and server encryption enabled. Due to some issues in my
>> environment I did upgrade one of the nodes to 3.11.3, I think this
>> turned out to be a good thing since I could observer the behaviour of
>> upgrading from both 3.0.15 and 3.11.3 at the same time.
>>
>> At first I didn't have much success at all, it look like found multiple
>> issues mostly with server encryption so I decided to simplify thing and
>> disabled server encryption.
>>
>> So with server encryption disabled the upgrade was working ok, what I
>> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first
>> 4.0 node started.
>>
>> 3.0.15 exception:
>> 2018-10-22T11:05:38.883+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223
>> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244
>> ,5,main]
>> java.lang.RuntimeException: Unknown column coordinator_port during
>> deserialization
>>   at
>> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98)
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>   at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>>
>> ~[apache-cassandra-3.0.15.jar:3.0.15]
>>
>> 3.11.3 exception:
>> 2018-10-22T11:12:05.060+0200 ERROR
>> [MessagingService-Incoming-/10.216.193.244] CassandraDaemo

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Joshua McKenzie
| The risk from such a patch is very low
If I had a nickel for every time I've heard that... ;)

I'm neutral on the default change, -.5 (i.e. don't agree with it but won't
die on that hill) on the data structure change post-freeze. We put this in,
and that's a slippery slope as I'm sure we can find numerous other
seemingly low-risk trivial optimizations and rewrites that cumulatively
would make a "feature-freeze" effectively meaningless as a tool to start
stabilizing the contents of the release.

In isolation many changes look innocuous. In the context of an organically
grown open-source code-base that's this old, I've learned that it pays to
be very, very cautious.

On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa  wrote:

> My objection (-0.5) is based on freeze not in code complexity
>
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith 
> wrote:
> >
> > To discuss the concerns about the patch for a more efficient
> representation:
> >
> > The risk from such a patch is very low.  It’s a very simple in-memory
> data structure, that we can introduce thorough fuzz tests for.  The reason
> to exclude it would be for reasons of wanting to begin strictly enforcing
> the freeze only.  This is a good enough reason in my book, which is why I’m
> neutral on its addition.  I just wanted to provide some context for
> everyone else's voting intention.
> >
> >
> >> On 23 Oct 2018, at 16:51, Ariel Weisberg  wrote:
> >>
> >> Hi,
> >>
> >> I just asked Jeff. He is -0 and -0.5 respectively.
> >>
> >> Ariel
> >>
> >>> On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
> >>> I’m +1 change of default.  I think Jeff was -1 on that though.
> >>>
> >>>
>  On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
> 
>  Hi,
> 
>  To summarize who we have heard from so far
> 
>  WRT to changing just the default:
> 
>  +1:
>  Jon Haddadd
>  Ben Bromhead
>  Alain Rodriguez
>  Sankalp Kohli (not explicit)
> 
>  -0:
>  Sylvaine Lebresne
>  Jeff Jirsa
> 
>  Not sure:
>  Kurt Greaves
>  Joshua Mckenzie
>  Benedict Elliot Smith
> 
>  WRT to change the representation:
> 
>  +1:
>  There are only conditional +1s at this point
> 
>  -0:
>  Sylvaine Lebresne
> 
>  -.5:
>  Jeff Jirsa
> 
>  This (
> https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
> is a rough cut of the change for the representation. It needs better
> naming, unit tests, javadoc etc. but it does implement the change.
> 
>  Ariel
> > On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> > Sorry, to be clear - I'm +1 on changing the configuration default,
> but I
> > think changing the compression in memory representations warrants
> further
> > discussion and investigation before making a case for or against it
> yet.
> > An optimization that reduces in memory cost by over 50% sounds
> pretty good
> > and we never were really explicit that those sort of optimizations
> would be
> > excluded after our feature freeze.  I don't think they should
> necessarily
> > be excluded at this time, but it depends on the size and risk of the
> patch.
> >
> >> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad 
> wrote:
> >>
> >> I think we should try to do the right thing for the most people
> that we
> >> can.  The number of folks impacted by 64KB is huge.  I've worked on
> a lot
> >> of clusters created by a lot of different teams, going from brand
> new to
> >> pretty damn knowledgeable.  I can't think of a single time over the
> last 2
> >> years that I've seen a cluster use non-default settings for
> compression.
> >> With only a handful of exceptions, I've lowered the chunk size
> considerably
> >> (usually to 4 or 8K) and the impact has always been very noticeable,
> >> frequently resulting in hardware reduction and cost savings.  Of
> all the
> >> poorly chosen defaults we have, this is one of the biggest
> offenders that I
> >> see.  There's a good reason ScyllaDB  claims they're so much faster
> than
> >> Cassandra - we ship a DB that performs poorly for 90+% of teams
> because we
> >> ship for a specific use case, not a general one (time series on
> memory
> >> constrained boxes being the specific use case)
> >>
> >> This doesn't impact existing tables, just new ones.  More and more
> teams
> >> are using Cassandra as a general purpose database, we should
> acknowledge
> >> that adjusting our defaults accordingly.  Yes, we use a little bit
> more
> >> memory on new tables if we just change this setting, and what we
> get out of
> >> it is a massive performance win.
> >>
> >> I'm +1 on the change as well.
> >>
> >>
> >>
> >> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli <
> kohlisank...@gmail.com>
> >> wrote:
> >>
> 

Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Benedict Elliott Smith
If you undertake sufficiently many low risk things, some will bite you, I think 
everyone understands that.  It’s still valuable to factor a risk assessment 
into the equation, I think?

Either way, somebody asked who didn’t have the context to easily answer, so I 
did my best to offer them that information so they could make an informed 
decision.  I’m not campaigning for its inclusion, just trying to facilitate a 
collective decision.






> On 24 Oct 2018, at 16:27, Joshua McKenzie  wrote:
> 
> | The risk from such a patch is very low
> If I had a nickel for every time I've heard that... ;)
> 
> I'm neutral on the default change, -.5 (i.e. don't agree with it but won't
> die on that hill) on the data structure change post-freeze. We put this in,
> and that's a slippery slope as I'm sure we can find numerous other
> seemingly low-risk trivial optimizations and rewrites that cumulatively
> would make a "feature-freeze" effectively meaningless as a tool to start
> stabilizing the contents of the release.
> 
> In isolation many changes look innocuous. In the context of an organically
> grown open-source code-base that's this old, I've learned that it pays to
> be very, very cautious.
> 
> On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa  wrote:
> 
>> My objection (-0.5) is based on freeze not in code complexity
>> 
>> 
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>>> On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith 
>> wrote:
>>> 
>>> To discuss the concerns about the patch for a more efficient
>> representation:
>>> 
>>> The risk from such a patch is very low.  It’s a very simple in-memory
>> data structure, that we can introduce thorough fuzz tests for.  The reason
>> to exclude it would be for reasons of wanting to begin strictly enforcing
>> the freeze only.  This is a good enough reason in my book, which is why I’m
>> neutral on its addition.  I just wanted to provide some context for
>> everyone else's voting intention.
>>> 
>>> 
 On 23 Oct 2018, at 16:51, Ariel Weisberg  wrote:
 
 Hi,
 
 I just asked Jeff. He is -0 and -0.5 respectively.
 
 Ariel
 
> On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
> I’m +1 change of default.  I think Jeff was -1 on that though.
> 
> 
>> On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
>> 
>> Hi,
>> 
>> To summarize who we have heard from so far
>> 
>> WRT to changing just the default:
>> 
>> +1:
>> Jon Haddadd
>> Ben Bromhead
>> Alain Rodriguez
>> Sankalp Kohli (not explicit)
>> 
>> -0:
>> Sylvaine Lebresne
>> Jeff Jirsa
>> 
>> Not sure:
>> Kurt Greaves
>> Joshua Mckenzie
>> Benedict Elliot Smith
>> 
>> WRT to change the representation:
>> 
>> +1:
>> There are only conditional +1s at this point
>> 
>> -0:
>> Sylvaine Lebresne
>> 
>> -.5:
>> Jeff Jirsa
>> 
>> This (
>> https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce)
>> is a rough cut of the change for the representation. It needs better
>> naming, unit tests, javadoc etc. but it does implement the change.
>> 
>> Ariel
>>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
>>> Sorry, to be clear - I'm +1 on changing the configuration default,
>> but I
>>> think changing the compression in memory representations warrants
>> further
>>> discussion and investigation before making a case for or against it
>> yet.
>>> An optimization that reduces in memory cost by over 50% sounds
>> pretty good
>>> and we never were really explicit that those sort of optimizations
>> would be
>>> excluded after our feature freeze.  I don't think they should
>> necessarily
>>> be excluded at this time, but it depends on the size and risk of the
>> patch.
>>> 
 On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad 
>> wrote:
 
 I think we should try to do the right thing for the most people
>> that we
 can.  The number of folks impacted by 64KB is huge.  I've worked on
>> a lot
 of clusters created by a lot of different teams, going from brand
>> new to
 pretty damn knowledgeable.  I can't think of a single time over the
>> last 2
 years that I've seen a cluster use non-default settings for
>> compression.
 With only a handful of exceptions, I've lowered the chunk size
>> considerably
 (usually to 4 or 8K) and the impact has always been very noticeable,
 frequently resulting in hardware reduction and cost savings.  Of
>> all the
 poorly chosen defaults we have, this is one of the biggest
>> offenders that I
 see.  There's a good reason ScyllaDB  claims they're so much faster
>> than
 Cassandra - we ship a DB that performs poorly for 90+% of teams
>> because we
 ship for a specific use case, not a general one (time series on
>> memory
 constrained boxes bei

Re: Testing 4.0 upgrading

2018-10-24 Thread Nate McCall
> Now I will reinstall the old version and do the upgrade again and
> hopefully I can have more information soon.
>

Hi Tommy,
Thanks for taking the time to brave an upgrade test this early on - it
is super helpful to get this feedback.

Anyone else that has bandwidth, we very much appreciate these types of
tests, so please keep them coming.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-13241 lower default chunk_length_in_kb

2018-10-24 Thread Joshua McKenzie
+1. I use the smiley to let you know I'm mostly just giving you shit. ;)

On Wed, Oct 24, 2018 at 11:43 AM Benedict Elliott Smith 
wrote:

> If you undertake sufficiently many low risk things, some will bite you, I
> think everyone understands that.  It’s still valuable to factor a risk
> assessment into the equation, I think?
>
> Either way, somebody asked who didn’t have the context to easily answer,
> so I did my best to offer them that information so they could make an
> informed decision.  I’m not campaigning for its inclusion, just trying to
> facilitate a collective decision.
>
>
>
>
>
>
> > On 24 Oct 2018, at 16:27, Joshua McKenzie  wrote:
> >
> > | The risk from such a patch is very low
> > If I had a nickel for every time I've heard that... ;)
> >
> > I'm neutral on the default change, -.5 (i.e. don't agree with it but
> won't
> > die on that hill) on the data structure change post-freeze. We put this
> in,
> > and that's a slippery slope as I'm sure we can find numerous other
> > seemingly low-risk trivial optimizations and rewrites that cumulatively
> > would make a "feature-freeze" effectively meaningless as a tool to start
> > stabilizing the contents of the release.
> >
> > In isolation many changes look innocuous. In the context of an
> organically
> > grown open-source code-base that's this old, I've learned that it pays to
> > be very, very cautious.
> >
> > On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa  wrote:
> >
> >> My objection (-0.5) is based on freeze not in code complexity
> >>
> >>
> >>
> >> --
> >> Jeff Jirsa
> >>
> >>
> >>> On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith <
> bened...@apache.org>
> >> wrote:
> >>>
> >>> To discuss the concerns about the patch for a more efficient
> >> representation:
> >>>
> >>> The risk from such a patch is very low.  It’s a very simple in-memory
> >> data structure, that we can introduce thorough fuzz tests for.  The
> reason
> >> to exclude it would be for reasons of wanting to begin strictly
> enforcing
> >> the freeze only.  This is a good enough reason in my book, which is why
> I’m
> >> neutral on its addition.  I just wanted to provide some context for
> >> everyone else's voting intention.
> >>>
> >>>
>  On 23 Oct 2018, at 16:51, Ariel Weisberg  wrote:
> 
>  Hi,
> 
>  I just asked Jeff. He is -0 and -0.5 respectively.
> 
>  Ariel
> 
> > On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote:
> > I’m +1 change of default.  I think Jeff was -1 on that though.
> >
> >
> >> On 23 Oct 2018, at 16:46, Ariel Weisberg  wrote:
> >>
> >> Hi,
> >>
> >> To summarize who we have heard from so far
> >>
> >> WRT to changing just the default:
> >>
> >> +1:
> >> Jon Haddadd
> >> Ben Bromhead
> >> Alain Rodriguez
> >> Sankalp Kohli (not explicit)
> >>
> >> -0:
> >> Sylvaine Lebresne
> >> Jeff Jirsa
> >>
> >> Not sure:
> >> Kurt Greaves
> >> Joshua Mckenzie
> >> Benedict Elliot Smith
> >>
> >> WRT to change the representation:
> >>
> >> +1:
> >> There are only conditional +1s at this point
> >>
> >> -0:
> >> Sylvaine Lebresne
> >>
> >> -.5:
> >> Jeff Jirsa
> >>
> >> This (
> >>
> https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce
> )
> >> is a rough cut of the change for the representation. It needs better
> >> naming, unit tests, javadoc etc. but it does implement the change.
> >>
> >> Ariel
> >>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote:
> >>> Sorry, to be clear - I'm +1 on changing the configuration default,
> >> but I
> >>> think changing the compression in memory representations warrants
> >> further
> >>> discussion and investigation before making a case for or against it
> >> yet.
> >>> An optimization that reduces in memory cost by over 50% sounds
> >> pretty good
> >>> and we never were really explicit that those sort of optimizations
> >> would be
> >>> excluded after our feature freeze.  I don't think they should
> >> necessarily
> >>> be excluded at this time, but it depends on the size and risk of
> the
> >> patch.
> >>>
>  On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad <
> j...@jonhaddad.com>
> >> wrote:
> 
>  I think we should try to do the right thing for the most people
> >> that we
>  can.  The number of folks impacted by 64KB is huge.  I've worked
> on
> >> a lot
>  of clusters created by a lot of different teams, going from brand
> >> new to
>  pretty damn knowledgeable.  I can't think of a single time over
> the
> >> last 2
>  years that I've seen a cluster use non-default settings for
> >> compression.
>  With only a handful of exceptions, I've lowered the chunk size
> >> considerably
>  (usually to 4 or 8K) and the impact has always been very
> noticeable,
>  freque