Re: Testing 4.0 upgrading
Hi Jason, Thanks for responding. I have created two jiras for these things: CASSANDRA-14841 and CASSANDRA-14842. I understand the initial 4.0->3.x connection problem, since the incoming connection from the old node isn't accepted the 4.0 node will never know that it tries to connect to a old node and continuous to try to connect on the storage_port instead on the ssl_storage_port. I will enable SSL debugging and activate wire tracing and see if I can find out more. I can also add that I have upgraded all nodes in the cluster to 4.0 and once a node was upgraded to 4.0 it started talking to the other 4.0 node as expected. I had data in the cluster from before the upgrade and after doing "nodetool upgradesstables" there where no issues restarting the clients. Now I will reinstall the old version and do the upgrade again and hopefully I can have more information soon. /Tommy On 2018-10-23 19:12, Jason Brown wrote: > Hi Tommy, > > Thank you for taking the time to start kicking the tires on the upgrade to > 4.0. It looks like you've found two bugs: > > 1) "Unknown column coordinator_port during deserialization" (reported on > 3.x nodes) > > - looks like the 4.0 node isn't filtering out a column from a system table > that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you > open a JIRA for this, and tag @aweisberg? > > 2) SSL connection problems > > I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in > MessagingService. TL;DR we don't know the version of a peer when restarting > and need to wait for that peer to connect to the local node and pass it's > correct messaging version (if the local node cannot interpret the handhsake > from the peer). However, why for the inbound connection to the 4.0 node it > is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from > there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x > node to see exactly what it is trying to do? Also, you can enable wire > tracing on the 4.0 node by setting this value to true [3] and recompiling. > We can followup further in the jira. > > Thanks! > > -Jason > > [1] > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668 > [2] > https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html > [3] > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84 > > On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl > wrote: > >> Hi, >> >> I have been testing upgrade to 4.0, I started out with a cluster with >> 3.0.15 and server encryption enabled. Due to some issues in my >> environment I did upgrade one of the nodes to 3.11.3, I think this >> turned out to be a good thing since I could observer the behaviour of >> upgrading from both 3.0.15 and 3.11.3 at the same time. >> >> At first I didn't have much success at all, it look like found multiple >> issues mostly with server encryption so I decided to simplify thing and >> disabled server encryption. >> >> So with server encryption disabled the upgrade was working ok, what I >> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first >> 4.0 node started. >> >> 3.0.15 exception: >> 2018-10-22T11:05:38.883+0200 ERROR >> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223 >> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244 >> ,5,main] >> java.lang.RuntimeException: Unknown column coordinator_port during >> deserialization >> at >> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> >> 3.11.3 exception: >> 2018-10-22T11:12:05.060+0200 ERROR >> [MessagingService-Incoming-/10.216.193.244] CassandraDaemo
Re: CASSANDRA-13241 lower default chunk_length_in_kb
| The risk from such a patch is very low If I had a nickel for every time I've heard that... ;) I'm neutral on the default change, -.5 (i.e. don't agree with it but won't die on that hill) on the data structure change post-freeze. We put this in, and that's a slippery slope as I'm sure we can find numerous other seemingly low-risk trivial optimizations and rewrites that cumulatively would make a "feature-freeze" effectively meaningless as a tool to start stabilizing the contents of the release. In isolation many changes look innocuous. In the context of an organically grown open-source code-base that's this old, I've learned that it pays to be very, very cautious. On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa wrote: > My objection (-0.5) is based on freeze not in code complexity > > > > -- > Jeff Jirsa > > > > On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith > wrote: > > > > To discuss the concerns about the patch for a more efficient > representation: > > > > The risk from such a patch is very low. It’s a very simple in-memory > data structure, that we can introduce thorough fuzz tests for. The reason > to exclude it would be for reasons of wanting to begin strictly enforcing > the freeze only. This is a good enough reason in my book, which is why I’m > neutral on its addition. I just wanted to provide some context for > everyone else's voting intention. > > > > > >> On 23 Oct 2018, at 16:51, Ariel Weisberg wrote: > >> > >> Hi, > >> > >> I just asked Jeff. He is -0 and -0.5 respectively. > >> > >> Ariel > >> > >>> On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote: > >>> I’m +1 change of default. I think Jeff was -1 on that though. > >>> > >>> > On 23 Oct 2018, at 16:46, Ariel Weisberg wrote: > > Hi, > > To summarize who we have heard from so far > > WRT to changing just the default: > > +1: > Jon Haddadd > Ben Bromhead > Alain Rodriguez > Sankalp Kohli (not explicit) > > -0: > Sylvaine Lebresne > Jeff Jirsa > > Not sure: > Kurt Greaves > Joshua Mckenzie > Benedict Elliot Smith > > WRT to change the representation: > > +1: > There are only conditional +1s at this point > > -0: > Sylvaine Lebresne > > -.5: > Jeff Jirsa > > This ( > https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce) > is a rough cut of the change for the representation. It needs better > naming, unit tests, javadoc etc. but it does implement the change. > > Ariel > > On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote: > > Sorry, to be clear - I'm +1 on changing the configuration default, > but I > > think changing the compression in memory representations warrants > further > > discussion and investigation before making a case for or against it > yet. > > An optimization that reduces in memory cost by over 50% sounds > pretty good > > and we never were really explicit that those sort of optimizations > would be > > excluded after our feature freeze. I don't think they should > necessarily > > be excluded at this time, but it depends on the size and risk of the > patch. > > > >> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad > wrote: > >> > >> I think we should try to do the right thing for the most people > that we > >> can. The number of folks impacted by 64KB is huge. I've worked on > a lot > >> of clusters created by a lot of different teams, going from brand > new to > >> pretty damn knowledgeable. I can't think of a single time over the > last 2 > >> years that I've seen a cluster use non-default settings for > compression. > >> With only a handful of exceptions, I've lowered the chunk size > considerably > >> (usually to 4 or 8K) and the impact has always been very noticeable, > >> frequently resulting in hardware reduction and cost savings. Of > all the > >> poorly chosen defaults we have, this is one of the biggest > offenders that I > >> see. There's a good reason ScyllaDB claims they're so much faster > than > >> Cassandra - we ship a DB that performs poorly for 90+% of teams > because we > >> ship for a specific use case, not a general one (time series on > memory > >> constrained boxes being the specific use case) > >> > >> This doesn't impact existing tables, just new ones. More and more > teams > >> are using Cassandra as a general purpose database, we should > acknowledge > >> that adjusting our defaults accordingly. Yes, we use a little bit > more > >> memory on new tables if we just change this setting, and what we > get out of > >> it is a massive performance win. > >> > >> I'm +1 on the change as well. > >> > >> > >> > >> On Sat, Oct 20, 2018 at 4:21 AM Sankalp Kohli < > kohlisank...@gmail.com> > >> wrote: > >> >
Re: CASSANDRA-13241 lower default chunk_length_in_kb
If you undertake sufficiently many low risk things, some will bite you, I think everyone understands that. It’s still valuable to factor a risk assessment into the equation, I think? Either way, somebody asked who didn’t have the context to easily answer, so I did my best to offer them that information so they could make an informed decision. I’m not campaigning for its inclusion, just trying to facilitate a collective decision. > On 24 Oct 2018, at 16:27, Joshua McKenzie wrote: > > | The risk from such a patch is very low > If I had a nickel for every time I've heard that... ;) > > I'm neutral on the default change, -.5 (i.e. don't agree with it but won't > die on that hill) on the data structure change post-freeze. We put this in, > and that's a slippery slope as I'm sure we can find numerous other > seemingly low-risk trivial optimizations and rewrites that cumulatively > would make a "feature-freeze" effectively meaningless as a tool to start > stabilizing the contents of the release. > > In isolation many changes look innocuous. In the context of an organically > grown open-source code-base that's this old, I've learned that it pays to > be very, very cautious. > > On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa wrote: > >> My objection (-0.5) is based on freeze not in code complexity >> >> >> >> -- >> Jeff Jirsa >> >> >>> On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith >> wrote: >>> >>> To discuss the concerns about the patch for a more efficient >> representation: >>> >>> The risk from such a patch is very low. It’s a very simple in-memory >> data structure, that we can introduce thorough fuzz tests for. The reason >> to exclude it would be for reasons of wanting to begin strictly enforcing >> the freeze only. This is a good enough reason in my book, which is why I’m >> neutral on its addition. I just wanted to provide some context for >> everyone else's voting intention. >>> >>> On 23 Oct 2018, at 16:51, Ariel Weisberg wrote: Hi, I just asked Jeff. He is -0 and -0.5 respectively. Ariel > On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote: > I’m +1 change of default. I think Jeff was -1 on that though. > > >> On 23 Oct 2018, at 16:46, Ariel Weisberg wrote: >> >> Hi, >> >> To summarize who we have heard from so far >> >> WRT to changing just the default: >> >> +1: >> Jon Haddadd >> Ben Bromhead >> Alain Rodriguez >> Sankalp Kohli (not explicit) >> >> -0: >> Sylvaine Lebresne >> Jeff Jirsa >> >> Not sure: >> Kurt Greaves >> Joshua Mckenzie >> Benedict Elliot Smith >> >> WRT to change the representation: >> >> +1: >> There are only conditional +1s at this point >> >> -0: >> Sylvaine Lebresne >> >> -.5: >> Jeff Jirsa >> >> This ( >> https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce) >> is a rough cut of the change for the representation. It needs better >> naming, unit tests, javadoc etc. but it does implement the change. >> >> Ariel >>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote: >>> Sorry, to be clear - I'm +1 on changing the configuration default, >> but I >>> think changing the compression in memory representations warrants >> further >>> discussion and investigation before making a case for or against it >> yet. >>> An optimization that reduces in memory cost by over 50% sounds >> pretty good >>> and we never were really explicit that those sort of optimizations >> would be >>> excluded after our feature freeze. I don't think they should >> necessarily >>> be excluded at this time, but it depends on the size and risk of the >> patch. >>> On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad >> wrote: I think we should try to do the right thing for the most people >> that we can. The number of folks impacted by 64KB is huge. I've worked on >> a lot of clusters created by a lot of different teams, going from brand >> new to pretty damn knowledgeable. I can't think of a single time over the >> last 2 years that I've seen a cluster use non-default settings for >> compression. With only a handful of exceptions, I've lowered the chunk size >> considerably (usually to 4 or 8K) and the impact has always been very noticeable, frequently resulting in hardware reduction and cost savings. Of >> all the poorly chosen defaults we have, this is one of the biggest >> offenders that I see. There's a good reason ScyllaDB claims they're so much faster >> than Cassandra - we ship a DB that performs poorly for 90+% of teams >> because we ship for a specific use case, not a general one (time series on >> memory constrained boxes bei
Re: Testing 4.0 upgrading
> Now I will reinstall the old version and do the upgrade again and > hopefully I can have more information soon. > Hi Tommy, Thanks for taking the time to brave an upgrade test this early on - it is super helpful to get this feedback. Anyone else that has bandwidth, we very much appreciate these types of tests, so please keep them coming. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: CASSANDRA-13241 lower default chunk_length_in_kb
+1. I use the smiley to let you know I'm mostly just giving you shit. ;) On Wed, Oct 24, 2018 at 11:43 AM Benedict Elliott Smith wrote: > If you undertake sufficiently many low risk things, some will bite you, I > think everyone understands that. It’s still valuable to factor a risk > assessment into the equation, I think? > > Either way, somebody asked who didn’t have the context to easily answer, > so I did my best to offer them that information so they could make an > informed decision. I’m not campaigning for its inclusion, just trying to > facilitate a collective decision. > > > > > > > > On 24 Oct 2018, at 16:27, Joshua McKenzie wrote: > > > > | The risk from such a patch is very low > > If I had a nickel for every time I've heard that... ;) > > > > I'm neutral on the default change, -.5 (i.e. don't agree with it but > won't > > die on that hill) on the data structure change post-freeze. We put this > in, > > and that's a slippery slope as I'm sure we can find numerous other > > seemingly low-risk trivial optimizations and rewrites that cumulatively > > would make a "feature-freeze" effectively meaningless as a tool to start > > stabilizing the contents of the release. > > > > In isolation many changes look innocuous. In the context of an > organically > > grown open-source code-base that's this old, I've learned that it pays to > > be very, very cautious. > > > > On Tue, Oct 23, 2018 at 3:33 PM Jeff Jirsa wrote: > > > >> My objection (-0.5) is based on freeze not in code complexity > >> > >> > >> > >> -- > >> Jeff Jirsa > >> > >> > >>> On Oct 23, 2018, at 8:59 AM, Benedict Elliott Smith < > bened...@apache.org> > >> wrote: > >>> > >>> To discuss the concerns about the patch for a more efficient > >> representation: > >>> > >>> The risk from such a patch is very low. It’s a very simple in-memory > >> data structure, that we can introduce thorough fuzz tests for. The > reason > >> to exclude it would be for reasons of wanting to begin strictly > enforcing > >> the freeze only. This is a good enough reason in my book, which is why > I’m > >> neutral on its addition. I just wanted to provide some context for > >> everyone else's voting intention. > >>> > >>> > On 23 Oct 2018, at 16:51, Ariel Weisberg wrote: > > Hi, > > I just asked Jeff. He is -0 and -0.5 respectively. > > Ariel > > > On Tue, Oct 23, 2018, at 11:50 AM, Benedict Elliott Smith wrote: > > I’m +1 change of default. I think Jeff was -1 on that though. > > > > > >> On 23 Oct 2018, at 16:46, Ariel Weisberg wrote: > >> > >> Hi, > >> > >> To summarize who we have heard from so far > >> > >> WRT to changing just the default: > >> > >> +1: > >> Jon Haddadd > >> Ben Bromhead > >> Alain Rodriguez > >> Sankalp Kohli (not explicit) > >> > >> -0: > >> Sylvaine Lebresne > >> Jeff Jirsa > >> > >> Not sure: > >> Kurt Greaves > >> Joshua Mckenzie > >> Benedict Elliot Smith > >> > >> WRT to change the representation: > >> > >> +1: > >> There are only conditional +1s at this point > >> > >> -0: > >> Sylvaine Lebresne > >> > >> -.5: > >> Jeff Jirsa > >> > >> This ( > >> > https://github.com/aweisberg/cassandra/commit/a9ae85daa3ede092b9a1cf84879fb1a9f25b9dce > ) > >> is a rough cut of the change for the representation. It needs better > >> naming, unit tests, javadoc etc. but it does implement the change. > >> > >> Ariel > >>> On Fri, Oct 19, 2018, at 3:42 PM, Jonathan Haddad wrote: > >>> Sorry, to be clear - I'm +1 on changing the configuration default, > >> but I > >>> think changing the compression in memory representations warrants > >> further > >>> discussion and investigation before making a case for or against it > >> yet. > >>> An optimization that reduces in memory cost by over 50% sounds > >> pretty good > >>> and we never were really explicit that those sort of optimizations > >> would be > >>> excluded after our feature freeze. I don't think they should > >> necessarily > >>> be excluded at this time, but it depends on the size and risk of > the > >> patch. > >>> > On Sat, Oct 20, 2018 at 8:38 AM Jonathan Haddad < > j...@jonhaddad.com> > >> wrote: > > I think we should try to do the right thing for the most people > >> that we > can. The number of folks impacted by 64KB is huge. I've worked > on > >> a lot > of clusters created by a lot of different teams, going from brand > >> new to > pretty damn knowledgeable. I can't think of a single time over > the > >> last 2 > years that I've seen a cluster use non-default settings for > >> compression. > With only a handful of exceptions, I've lowered the chunk size > >> considerably > (usually to 4 or 8K) and the impact has always been very > noticeable, > freque