Hi Jason, Thanks for responding.
I have created two jiras for these things: CASSANDRA-14841 and CASSANDRA-14842. I understand the initial 4.0->3.x connection problem, since the incoming connection from the old node isn't accepted the 4.0 node will never know that it tries to connect to a old node and continuous to try to connect on the storage_port instead on the ssl_storage_port. I will enable SSL debugging and activate wire tracing and see if I can find out more. I can also add that I have upgraded all nodes in the cluster to 4.0 and once a node was upgraded to 4.0 it started talking to the other 4.0 node as expected. I had data in the cluster from before the upgrade and after doing "nodetool upgradesstables" there where no issues restarting the clients. Now I will reinstall the old version and do the upgrade again and hopefully I can have more information soon. /Tommy On 2018-10-23 19:12, Jason Brown wrote: > Hi Tommy, > > Thank you for taking the time to start kicking the tires on the upgrade to > 4.0. It looks like you've found two bugs: > > 1) "Unknown column coordinator_port during deserialization" (reported on > 3.x nodes) > > - looks like the 4.0 node isn't filtering out a column from a system table > that 3.0 doesn't know about. Most likely due to CASSANDRA-7544. Can you > open a JIRA for this, and tag @aweisberg? > > 2) SSL connection problems > > I unserstand the 4.0 -> 3.X connection problem, and documented it at [1] in > MessagingService. TL;DR we don't know the version of a peer when restarting > and need to wait for that peer to connect to the local node and pass it's > correct messaging version (if the local node cannot interpret the handhsake > from the peer). However, why for the inbound connection to the 4.0 node it > is seeing SSLv2 is unclear. Can you open a separate JIRA, and we'll go from > there? In the meantime, maybe enable the JDK's SSL debugging [2] on the 3.x > node to see exactly what it is trying to do? Also, you can enable wire > tracing on the 4.0 node by setting this value to true [3] and recompiling. > We can followup further in the jira. > > Thanks! > > -Jason > > [1] > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L1668 > [2] > https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html > [3] > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/NettyFactory.java#L84 > > On Tue, Oct 23, 2018 at 2:44 AM Tommy Stendahl <tommy.stend...@ericsson.com> > wrote: > >> Hi, >> >> I have been testing upgrade to 4.0, I started out with a cluster with >> 3.0.15 and server encryption enabled. Due to some issues in my >> environment I did upgrade one of the nodes to 3.11.3, I think this >> turned out to be a good thing since I could observer the behaviour of >> upgrading from both 3.0.15 and 3.11.3 at the same time. >> >> At first I didn't have much success at all, it look like found multiple >> issues mostly with server encryption so I decided to simplify thing and >> disabled server encryption. >> >> So with server encryption disabled the upgrade was working ok, what I >> did notice was exceptions in the 3.0.15 and 3.11.3 nodes once the first >> 4.0 node started. >> >> 3.0.15 exception: >> 2018-10-22T11:05:38.883+0200 ERROR >> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:223 >> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244 >> ,5,main] >> java.lang.RuntimeException: Unknown column coordinator_port during >> deserialization >> at >> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:447) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:647) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:584) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92) >> >> ~[apache-cassandra-3.0.15.jar:3.0.15] >> >> 3.11.3 exception: >> 2018-10-22T11:12:05.060+0200 ERROR >> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228 >> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244 >> ,5,main] >> java.lang.RuntimeException: Unknown column coordinator_port during >> deserialization >> at >> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452) >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> at >> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94) >> >> ~[apache-cassandra-3.11.3.jar:3.11.3] >> >> I got one or two of these exceptions every minute, I could not see that >> this was causing any problems but I don't really like exceptions in the >> log. Should we expect exceptions like this during an upgrade or is this >> a problem we should look in to? >> >> So now I enabled server encryption again, with this configuration: >> storage_port: 12700 >> ssl_storage_port: 12701 >> server_encryption_options: >> # set to true for allowing secure incoming connections >> enabled: true >> # If enabled and optional are both set to true, encrypted and >> unencrypted connections are handled on the storage_port >> optional: false >> # if enabled, will open up an encrypted listening socket on >> ssl_storage_port. Should be used >> # during upgrade to 4.0; otherwise, set to false. >> enable_legacy_ssl_storage_port: true >> # on outbound connections, determine which type of peers to >> securely connect to. 'enabled' must be set to true. >> internode_encryption: all >> keystore: /usr/share/cassandra/.ssl/cil-intern/server/keystore.jks >> keystore_password: '*********' >> truststore: /usr/share/cassandra/.ssl/cil-intern/server/truststore.jks >> truststore_password: '**********' >> # More advanced defaults below: >> protocol: TLSv1.2 >> # store_type: JKS >> cipher_suites: >> >> [TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA] >> # require_client_auth: false >> # require_endpoint_verification: false >> >> As far as I understand this configuration is correct but the 4.0 node >> can't contact the old nodes. I have tried with the default protocol >> setting also but it makes no difference. The behaviour is different in >> 3.0.15 and 3.11.3, in 3.0.15 I get this exception: >> 2018-10-22T11:57:44.366+0200 ERROR >> [MessagingService-NettyInbound-Thread-3-8] >> InboundHandshakeHandler.java:300 Failed to properly handshake with peer >> /10.216.193.246:58296. Closing the channel. >> io.netty.handler.codec.DecoderException: >> javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled >> at >> >> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459) >> at >> >> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) >> at >> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >> at >> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >> at >> >> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) >> at >> >> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) >> at >> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) >> at >> >> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) >> at >> >> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) >> at >> >> io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:808) >> at >> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:417) >> at >> io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:317) >> at >> >> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) >> at >> >> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: javax.net.ssl.SSLHandshakeException: SSLv2Hello is disabled >> at >> sun.security.ssl.InputRecord.handleUnknownRecord(InputRecord.java:637) >> at sun.security.ssl.InputRecord.read(InputRecord.java:527) >> at >> sun.security.ssl.EngineInputRecord.read(EngineInputRecord.java:382) >> at >> sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:962) >> at >> sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:907) >> at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) >> at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) >> at >> io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294) >> at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275) >> at >> io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177) >> at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221) >> at >> >> io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) >> at >> >> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) >> ... 14 common frames omitted >> >> and in the 4.0 debug log I see this related to the 3.0.15 node: >> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1] >> OutboundMessagingConnection.java:259 connection attempt 788 to >> 10.216.193.246:12700 (GOSSIP) >> 2018-10-22T13:25:34.414+0200 DEBUG [ScheduledFastTasks:1] >> NettyFactory.java:321 creating outbound bootstrap to peer >> 10.216.193.246:12700, compression: true, encryption: enabled (jdk), >> coalesce: DISABLED, protocolV >> ersion: 12 >> 2018-10-22T13:25:34.526+0200 DEBUG [GossipTasks:1] >> OutboundMessagingConnection.java:259 connection attempt 789 to >> 10.216.193.246:12700 (GOSSIP) >> 2018-10-22T13:25:34.527+0200 DEBUG [GossipTasks:1] NettyFactory.java:321 >> creating outbound bootstrap to peer 10.216.193.246:12700, compression: >> true, encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: >> 12 >> >> In the 3.11.3 node I don't see any errors, I do see this: >> 2018-10-22T13:20:36.270+0200 INFO [HANDSHAKE-/10.216.193.244] >> OutboundTcpConnection.java:561 Handshaking version with /10.216.193.244 >> >> and in the 4.0 node I find this: >> 2018-10-22T13:20:36.270+0200 INFO >> [MessagingService-NettyInbound-Thread-3-1] >> InboundHandshakeHandler.java:146 connection from peer >> /10.216.193.243:54262, protocol = TLSv1.2, cipher suite = >> TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 >> 2018-10-22T13:20:41.147+0200 DEBUG [main] >> OutboundMessagingConnection.java:259 connection attempt 0 to >> 10.216.193.243:12700 (GOSSIP) >> 2018-10-22T13:20:41.147+0200 DEBUG [main] NettyFactory.java:321 creating >> outbound bootstrap to peer 10.216.193.243:12700, compression: true, >> encryption: enabled (jdk), coalesce: DISABLED, protocolVersion: 11 >> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1] >> OutboundMessagingConnection.java:259 connection attempt 1 to >> 10.216.193.243:12700 (GOSSIP) >> 2018-10-22T13:20:41.248+0200 DEBUG [ScheduledFastTasks:1] >> NettyFactory.java:321 creating outbound bootstrap to peer >> 10.216.193.243:12700, compression: true, encryption: enabled (jdk), >> coalesce: DISABLED, protocolV >> ersion: 11 >> >> It looks like the 4.0 node doesn’t accept incoming messages from the old >> nodes on the ssl_storage_port and continues to try to contact the old >> nodes on the storage_port that are disabled in the old nodes. I don't >> really know what to make of this, any ideas or suggestions? >> >> Regards, >> Tommy >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org