Re: [VOTE] CEP-17: SSTable format API
+1 On 16/11/21 2:27, Nate McCall wrote: > +1 > > > On Tue, Nov 16, 2021 at 8:43 AM Branimir Lambov wrote: > >> Hi everyone, >> >> I would like to start a vote on this CEP. >> >> Proposal: >> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API >> >> Discussion: >> >> https://lists.apache.org/thread.html/r636bebcab4e678dbee042285449193e8e75d3753200a1b404fcc7196%40%3Cdev.cassandra.apache.org%3E >> >> The vote will be open for 72 hours. >> A vote passes if there are at least three binding +1s and no binding >> vetoes. >> >> Regards, >> Branimir >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [VOTE] CEP-17: SSTable format API
+1 > On 15 Nov 2021, at 19:42, Branimir Lambov wrote: > > Hi everyone, > > I would like to start a vote on this CEP. > > Proposal: > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API > > Discussion: > https://lists.apache.org/thread.html/r636bebcab4e678dbee042285449193e8e75d3753200a1b404fcc7196%40%3Cdev.cassandra.apache.org%3E > > The vote will be open for 72 hours. > A vote passes if there are at least three binding +1s and no binding vetoes. > > Regards, > Branimir - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: > > > The second question is about key rotation. If an operator needs to > > roll the key because it was compromised or there is some policy around > > that, we should be able to provide some way to rotate it. Our idea is > > to write a tool (either a subcommand of nodetool (rewritesstables) > > command or a completely standalone one in tools) which would take the > > first, original key, the second, new key and dir with sstables as > > input and it would literally took the data and it would rewrite it to > > the second set of sstables which would be encrypted with the second > > key. What do you think about this? > > I would rather suggest that “what key encrypted this” be part of the > sstable metadata, and allow there to be multiple keys in the system. This > way you can just add a new “current key” so new sstables use the new key, but > existing sstables would use the old key. An operator could then trigger a > “nodetool upgradesstables —all” to rewrite the existing sstables with the new > “current key”. > > There's a much better approach to solve this issue. You can stored a > wrapped key in an encryption info file alone side the SSTable file. > Here's how it works: > 1. randomly generate a key Kr > 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable > file on disk > 3. derive a key encryption key KEK from the SSTable file's information > (e.g.: table UUID + generation) and the user chosen master key Km, so > you have KEK = KDF(UUID+GEN, Km) > 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) > 5. hash the Km, the hash will used as a key ID to identify which master > key was used to encrypt the key Kr if the server has multiple master > keys in use > 6. store the the WKr and the hash of Km in a separate file alone side > the SSTable file > > In the read path, the Kr should be kept in memory to help improve > performance and this will also allow zero-downtime master key rotation. > > During a key rotation: > 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) > 2. read the WKr from the encryption information file, and unwrap > (decrypt) it using the KEK to get the Kr > 3. derive a new KEK' from the new master key Km' in the same way as above > 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') > 5. hash the new master key Km', and store it together with the WKr' in > the encryption info file > > Since the key rotation only involves rewriting the encryption info file, > the operation should take only a few milliseconds per SSTable file, it > will be much faster than decrypting and then re-encrypting the SSTable data. > > > > On 15/11/2021 18:42, Jeremiah D Jordan wrote: > > > >> On Nov 14, 2021, at 3:53 PM, Stefan > >> Miklosovic wrote: > >> > >> Hey, > >> > >> there are two points we are not completely sure about. > >> > >> The first one is streaming. If there is a cluster of 5 nodes, each > >> node has its own unique encryption key. Hence, if a SSTable is stored > >> on a disk with the key for node 1 and this is streamed to node 2 - > >> which has a different key - it would not be able to decrypt that. Our > >> idea is to actually send data over the wire _decrypted_ however it > >> would be still secure if internode communication is done via TLS. Is > >> this approach good with you? > >> > > So would you fail startup if someone enabled sstable encryption but did not > > have TLS for internode communication? Another concern here is making sure > > zero copy streaming does not get triggered for this case. > > Have you considered having some way to distribute the keys to all nodes > > such that you don’t need to decrypt on the sending side? Having to do this > > will mean a lot more overhead for the sending side of a streaming operation. > > > >> The second question is about key rotation. If an operator needs to > >> roll the key because it was compromised or there is some policy around > >> that, we should be able to provide some way to rotate it. Our idea is > >> to write a tool (either a subcommand of nodetool (rewritesstables) > >> command or a completely standalone one in tools) which would take the > >> first, original key, the second, new key and dir with sstables as > >> input and it would literally took the data and it would rewrite it to > >> the second set of sstables which would be encrypted with the second > >> key. What do you think about this? > > I would rather suggest that “what key encrypted this” be part of the > > sstable metadata, and allow there to be multiple
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I really believe we likely need a CEP for this. This gets complicated pretty fast with all the details attached and I do not want to have endless discussions about this in the ticket. I can clearly see this is something a broader audience needs to vote on eventually. On Tue, 16 Nov 2021 at 09:56, Stefan Miklosovic wrote: > > Hi Bowen, Very interesting idea indeed. So if I got it right, the very > key for the actual sstable encryption would be always the same, it is > just what is wrapped would differ. So if we rotate, we basically only > change Km hence KEK hence the result of wrapping but there would still > be the original Kr key used. > > Jeremiah - I will prepare that branch very soon. > > On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: > > > > > The second question is about key rotation. If an operator needs to > > > roll the key because it was compromised or there is some policy around > > > that, we should be able to provide some way to rotate it. Our idea is > > > to write a tool (either a subcommand of nodetool (rewritesstables) > > > command or a completely standalone one in tools) which would take the > > > first, original key, the second, new key and dir with sstables as > > > input and it would literally took the data and it would rewrite it to > > > the second set of sstables which would be encrypted with the second > > > key. What do you think about this? > > > > I would rather suggest that “what key encrypted this” be part of the > > sstable metadata, and allow there to be multiple keys in the system. This > > way you can just add a new “current key” so new sstables use the new key, > > but existing sstables would use the old key. An operator could then > > trigger a “nodetool upgradesstables —all” to rewrite the existing sstables > > with the new “current key”. > > > > There's a much better approach to solve this issue. You can stored a > > wrapped key in an encryption info file alone side the SSTable file. > > Here's how it works: > > 1. randomly generate a key Kr > > 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable > > file on disk > > 3. derive a key encryption key KEK from the SSTable file's information > > (e.g.: table UUID + generation) and the user chosen master key Km, so > > you have KEK = KDF(UUID+GEN, Km) > > 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) > > 5. hash the Km, the hash will used as a key ID to identify which master > > key was used to encrypt the key Kr if the server has multiple master > > keys in use > > 6. store the the WKr and the hash of Km in a separate file alone side > > the SSTable file > > > > In the read path, the Kr should be kept in memory to help improve > > performance and this will also allow zero-downtime master key rotation. > > > > During a key rotation: > > 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) > > 2. read the WKr from the encryption information file, and unwrap > > (decrypt) it using the KEK to get the Kr > > 3. derive a new KEK' from the new master key Km' in the same way as above > > 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') > > 5. hash the new master key Km', and store it together with the WKr' in > > the encryption info file > > > > Since the key rotation only involves rewriting the encryption info file, > > the operation should take only a few milliseconds per SSTable file, it > > will be much faster than decrypting and then re-encrypting the SSTable data. > > > > > > > > On 15/11/2021 18:42, Jeremiah D Jordan wrote: > > > > > >> On Nov 14, 2021, at 3:53 PM, Stefan > > >> Miklosovic wrote: > > >> > > >> Hey, > > >> > > >> there are two points we are not completely sure about. > > >> > > >> The first one is streaming. If there is a cluster of 5 nodes, each > > >> node has its own unique encryption key. Hence, if a SSTable is stored > > >> on a disk with the key for node 1 and this is streamed to node 2 - > > >> which has a different key - it would not be able to decrypt that. Our > > >> idea is to actually send data over the wire _decrypted_ however it > > >> would be still secure if internode communication is done via TLS. Is > > >> this approach good with you? > > >> > > > So would you fail startup if someone enabled sstable encryption but did > > > not have TLS for internode communication? Another concern here is making > > > sure zero copy streaming does not get triggered for this case. > > > Have you considered having some way to distribute the keys to all nodes > > > such that you don’t need to decrypt on the sending side? Having to do > > > this will mean a lot more overhead for the sending side of a streaming > > > operation. > > > > > >> The second question is about key rotation. If an operator needs to > > >> roll the key because it was compromised or there is some policy around > > >> that, we should be able to provide some way to rotate it. Our idea is > > >> to write a tool
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many encrypt-at-rest applications. One good example is the LUKS full disk encryption, which also supports multiple keys to unlock (decrypt) the same data. Multiple unlocking keys is only possible because the actual key used to encrypt the data is randomly generated and then stored encrypted by (a key derived from) a user chosen key. If this approach is adopted, the streaming process can share the Kr without disclosing the Km, therefore enableling zero-copy streaming. On 16/11/2021 08:56, Stefan Miklosovic wrote: Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some way to rotate it. Our idea is to write a tool (either a subcommand of nodetool (rewritesstables) command or a completely standalone one in tools) which would take the first, original key, the second, new key and dir with sstables as input and it would literally took the data and it would rewrite it to the second set of sstables which would be encrypted with the second key. What do you think about this? I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system. This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key. An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”. There's a much better approach to solve this issue. You can stored a wrapped key in an encryption info file alone side the SSTable file. Here's how it works: 1. randomly generate a key Kr 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable file on disk 3. derive a key encryption key KEK from the SSTable file's information (e.g.: table UUID + generation) and the user chosen master key Km, so you have KEK = KDF(UUID+GEN, Km) 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) 5. hash the Km, the hash will used as a key ID to identify which master key was used to encrypt the key Kr if the server has multiple master keys in use 6. store the the WKr and the hash of Km in a separate file alone side the SSTable file In the read path, the Kr should be kept in memory to help improve performance and this will also allow zero-downtime master key rotation. During a key rotation: 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) 2. read the WKr from the encryption information file, and unwrap (decrypt) it using the KEK to get the Kr 3. derive a new KEK' from the new master key Km' in the same way as above 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') 5. hash the new master key Km', and store it together with the WKr' in the encryption info file Since the key rotation only involves rewriting the encryption info file, the operation should take only a few milliseconds per SSTable file, it will be much faster than decrypting and then re-encrypting the SSTable data. On 15/11/2021 18:42, Jeremiah D Jordan wrote: On Nov 14, 2021, at 3:53 PM, Stefan Miklosovic wrote: Hey, there are two points we are not completely sure about. The first one is streaming. If there is a cluster of 5 nodes, each node has its own unique encryption key. Hence, if a SSTable is stored on a disk with the key for node 1 and this is streamed to node 2 - which has a different key - it would not be able to decrypt that. Our idea is to actually send data over the wire _decrypted_ however it would be still secure if internode communication is done via TLS. Is this approach good with you? So would you fail startup if someone enabled sstable encryption but did not have TLS for internode communication? Another concern here is making sure zero copy streaming does not get triggered for this case. Have you considered having some way to distribute the keys to all nodes such that you don’t need to decrypt on the sending side? Having to do this will mean a lot more overhead for the sending side of a streaming operation. The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some way to rotate it. Our idea is to write a tool (either a subcommand of nodetool (rewritesstables) command
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: > > Yes, that's correct. The actual key used to encrypt the SSTable will > stay the same once the SSTable is created. This is a widely used > practice in many encrypt-at-rest applications. One good example is the > LUKS full disk encryption, which also supports multiple keys to unlock > (decrypt) the same data. Multiple unlocking keys is only possible > because the actual key used to encrypt the data is randomly generated > and then stored encrypted by (a key derived from) a user chosen key. > > If this approach is adopted, the streaming process can share the Kr > without disclosing the Km, therefore enableling zero-copy streaming. > > On 16/11/2021 08:56, Stefan Miklosovic wrote: > > Hi Bowen, Very interesting idea indeed. So if I got it right, the very > > key for the actual sstable encryption would be always the same, it is > > just what is wrapped would differ. So if we rotate, we basically only > > change Km hence KEK hence the result of wrapping but there would still > > be the original Kr key used. > > > > Jeremiah - I will prepare that branch very soon. > > > > On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: > >>> The second question is about key rotation. If an operator needs to > >>> roll the key because it was compromised or there is some policy > >>> around > >>> that, we should be able to provide some way to rotate it. Our idea is > >>> to write a tool (either a subcommand of nodetool (rewritesstables) > >>> command or a completely standalone one in tools) which would take the > >>> first, original key, the second, new key and dir with sstables as > >>> input and it would literally took the data and it would rewrite it to > >>> the second set of sstables which would be encrypted with the second > >>> key. What do you think about this? > >> I would rather suggest that “what key encrypted this” be part of the > >> sstable metadata, and allow there to be multiple keys in the system. This > >> way you can just add a new “current key” so new sstables use the new key, > >> but existing sstables would use the old key. An operator could then > >> trigger a “nodetool upgradesstables —all” to rewrite the existing sstables > >> with the new “current key”. > >> > >> There's a much better approach to solve this issue. You can stored a > >> wrapped key in an encryption info file alone side the SSTable file. > >> Here's how it works: > >> 1. randomly generate a key Kr > >> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable > >> file on disk > >> 3. derive a key encryption key KEK from the SSTable file's information > >> (e.g.: table UUID + generation) and the user chosen master key Km, so > >> you have KEK = KDF(UUID+GEN, Km) > >> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) > >> 5. hash the Km, the hash will used as a key ID to identify which master > >> key was used to encrypt the key Kr if the server has multiple master > >> keys in use > >> 6. store the the WKr and the hash of Km in a separate file alone side > >> the SSTable file > >> > >> In the read path, the Kr should be kept in memory to help improve > >> performance and this will also allow zero-downtime master key rotation. > >> > >> During a key rotation: > >> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) > >> 2. read the WKr from the encryption information file, and unwrap > >> (decrypt) it using the KEK to get the Kr > >> 3. derive a new KEK' from the new master key Km' in the same way as above > >> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') > >> 5. hash the new master key Km', and store it together with the WKr' in > >> the encryption info file > >> > >> Since the key rotation only involves rewriting the encryption info file, > >> the operation should take only a few milliseconds per SSTable file, it > >> will be much faster than decrypting and then re-encrypting the SSTable > >> data. > >> > >> > >> > >> On 15/11/2021 18:42, Jeremiah D Jordan wrote: > On Nov 14, 2021, at 3:53 PM, Stefan > Miklosovic wrote: > > Hey, > > there are two points we are not completely sure about. > > The first one is streaming. If there is a cluster of 5 nodes, each > node has its own unique encryption key. Hence, if a SSTable is stored > on a disk with the key for node 1 and this is streamed to node 2 - > which has a different key - it would not be able to decrypt that. Our > idea is to actually send data over the wire _decrypted_ however it > would be
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s). Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action. From: Stefan Miklosovic Date: Tuesday, 16 November 2021 at 10:45 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: > > Yes, that's correct. The actual key used to encrypt the SSTable will > stay the same once the SSTable is created. This is a widely used > practice in many encrypt-at-rest applications. One good example is the > LUKS full disk encryption, which also supports multiple keys to unlock > (decrypt) the same data. Multiple unlocking keys is only possible > because the actual key used to encrypt the data is randomly generated > and then stored encrypted by (a key derived from) a user chosen key. > > If this approach is adopted, the streaming process can share the Kr > without disclosing the Km, therefore enableling zero-copy streaming. > > On 16/11/2021 08:56, Stefan Miklosovic wrote: > > Hi Bowen, Very interesting idea indeed. So if I got it right, the very > > key for the actual sstable encryption would be always the same, it is > > just what is wrapped would differ. So if we rotate, we basically only > > change Km hence KEK hence the result of wrapping but there would still > > be the original Kr key used. > > > > Jeremiah - I will prepare that branch very soon. > > > > On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: > >>> The second question is about key rotation. If an operator needs to > >>> roll the key because it was compromised or there is some policy > >>> around > >>> that, we should be able to provide some way to rotate it. Our idea is > >>> to write a tool (either a subcommand of nodetool (rewritesstables) > >>> command or a completely standalone one in tools) which would take the > >>> first, original key, the second, new key and dir with sstables as > >>> input and it would literally took the data and it would rewrite it to > >>> the second set of sstables which would be encrypted with the second > >>> key. What do you think about this? > >> I would rather suggest that “what key encrypted this” be part of the > >> sstable metadata, and allow there to be multiple keys in the system. This > >> way you can just add a new “current key” so new sstables use the new key, > >> but existing sstables would use the old key. An operator could then > >> trigger a “nodetool upgradesstables —all” to rewrite the existing sstables > >> with the new “current key”. > >> > >> There's a much better approach to solve this issue. You can stored a > >> wrapped key in an encryption info file alone side the SSTable file. > >> Here's how it works: > >> 1. randomly generate a key Kr > >> 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable > >> file on disk > >> 3. derive a key encryption key KEK from the SSTable file's information > >> (e.g.: table UUID + generation) and the user chosen master key Km, so > >> you have KEK = KDF(UUID+GEN, Km) > >> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) > >> 5. hash the Km, the hash will used as a key ID to identify which master > >> key was used to encrypt the key Kr if the server has multiple master > >> keys in use > >> 6. store the the WKr and the hash of Km in a separate file alone side > >> the SSTable file > >> > >> In the read path, the Kr should be kept in memory to help improve > >> performance and this will also allow zero-downtime master key rotation. > >> > >> During a key rotation: > >> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) > >> 2. read the WKr from the encryption information file, and unwrap > >> (decrypt) it using the KEK to get the Kr > >> 3. derive a new KEK' from the new master key Km' in the same way as above > >> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') > >> 5. hash the new master key Km', and store it together with the WKr' in > >> the encryption info file > >> > >> Since the key rotation only involves rewriting the encryption info file, > >> the operation should take only a few milliseconds per SSTable file, it > >> will be much faster than decrypting and then re-en
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
No, the Km does not need to be the same across nodes. Each node can store their own encryption info file created by their own Km. The streaming process only requires the Kr is shared. A quick description of the streaming process via an insecure connection: 1. the sender unwrap the wrapped key WKr with their Km, and get the key Kr 2. the sender and the receiver use DH key exchange to establish a shared secret Ks, so that sender and receiver both know the Ks 3. the sender derives a KEKs from the table info (SSTable gen is not persisted across nodes) & streaming info (TODO) and the shared secret Ks, so KEKs = KDF(Table UUID + TBD STREAMING INFO, Ks) 4. the sender wraps the Kr with KEKs to get WKrs = KW(Kr, KEKs) 5. the sender sends WKrs and the (encrypted) SSTable file to the receiver 6. the receiver derives the KEKs in the same way as the sender 7. the receiver unwraps WKrs using the the KEKs and get Kr 8. the receiver wraps the Kr with a KEK' derived from their own Km This enables zero-copy streaming, and the Kr is never send in plaintext over an insecure communication channel. An passive observer cannot learn anything about the Kr. If the streaming is done over TLS, the Kr can be send over a TLS connection without all the additional work. The SSTable can be send via insecure connection to enable zero-copy streaming. An HMAC of the SSTable should also be send over TLS to ensure the SSTable has not been damaged or modified. On 16/11/2021 10:45, Stefan Miklosovic wrote: Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many encrypt-at-rest applications. One good example is the LUKS full disk encryption, which also supports multiple keys to unlock (decrypt) the same data. Multiple unlocking keys is only possible because the actual key used to encrypt the data is randomly generated and then stored encrypted by (a key derived from) a user chosen key. If this approach is adopted, the streaming process can share the Kr without disclosing the Km, therefore enableling zero-copy streaming. On 16/11/2021 08:56, Stefan Miklosovic wrote: Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some way to rotate it. Our idea is to write a tool (either a subcommand of nodetool (rewritesstables) command or a completely standalone one in tools) which would take the first, original key, the second, new key and dir with sstables as input and it would literally took the data and it would rewrite it to the second set of sstables which would be encrypted with the second key. What do you think about this? I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system. This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key. An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”. There's a much better approach to solve this issue. You can stored a wrapped key in an encryption info file alone side the SSTable file. Here's how it works: 1. randomly generate a key Kr 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable file on disk 3. derive a key encryption key KEK from the SSTable file's information (e.g.: table UUID + generation) and the user chosen master key Km, so you have KEK = KDF(UUID+GEN, Km) 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) 5. hash the Km, the hash will used as a key ID to identify which master key was used to encrypt the key Kr if the server has multiple master keys in use 6. store the the WKr and the hash of Km in a separate file alone side the SSTable file In the read path, the Kr should be kept in memory to help improve performance and this will also allow zero-downtime master key rotation. During a key rotation: 1. derive the KEK in t
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I think authenticating a receiving node is important, but it is perhaps not in the scope of this ticket (or CEP if it becomes one). This applies to not only encrypted SSTables, but also unencrypted SSTables. A malicious node can join the cluster and send bogus requests to other nodes is a general problem not specific to the on-disk encryption. On 16/11/2021 10:50, bened...@apache.org wrote: I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s). Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action. From: Stefan Miklosovic Date: Tuesday, 16 November 2021 at 10:45 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many encrypt-at-rest applications. One good example is the LUKS full disk encryption, which also supports multiple keys to unlock (decrypt) the same data. Multiple unlocking keys is only possible because the actual key used to encrypt the data is randomly generated and then stored encrypted by (a key derived from) a user chosen key. If this approach is adopted, the streaming process can share the Kr without disclosing the Km, therefore enableling zero-copy streaming. On 16/11/2021 08:56, Stefan Miklosovic wrote: Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some way to rotate it. Our idea is to write a tool (either a subcommand of nodetool (rewritesstables) command or a completely standalone one in tools) which would take the first, original key, the second, new key and dir with sstables as input and it would literally took the data and it would rewrite it to the second set of sstables which would be encrypted with the second key. What do you think about this? I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system. This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key. An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”. There's a much better approach to solve this issue. You can stored a wrapped key in an encryption info file alone side the SSTable file. Here's how it works: 1. randomly generate a key Kr 2. encrypt the SSTable file with the key Kr, store the encrypted SSTable file on disk 3. derive a key encryption key KEK from the SSTable file's information (e.g.: table UUID + generation) and the user chosen master key Km, so you have KEK = KDF(UUID+GEN, Km) 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, KEK) 5. hash the Km, the hash will used as a key ID to identify which master key was used to encrypt the key Kr if the server has multiple master keys in use 6. store the the WKr and the hash of Km in a separate file alone side the SSTable file In the read path, the Kr should be kept in memory to help improve performance and this will also allow zero-downtime master key rotation. During a key rotation: 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) 2. read the WKr from the encryption information file, and unwrap (decrypt) it using the KEK to get the Kr 3. derive a new KEK' from the new master key Km' in the same way as above 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') 5. hash the new master key Km', and store it together with the WKr' in the encryption info file Since the key rotation only involves rewriting the encryption info file, the operati
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured. It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest. From: Bowen Song Date: Tuesday, 16 November 2021 at 11:56 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think authenticating a receiving node is important, but it is perhaps not in the scope of this ticket (or CEP if it becomes one). This applies to not only encrypted SSTables, but also unencrypted SSTables. A malicious node can join the cluster and send bogus requests to other nodes is a general problem not specific to the on-disk encryption. On 16/11/2021 10:50, bened...@apache.org wrote: > I assume the key would be decrypted before being streamed, or perhaps > encrypted using a public key provided to you by the receiving node. This > would permit efficient “zero copy” streaming for the data portion, but not > require any knowledge of the recipient node’s master key(s). > > Either way, we would still want to ensure we had some authentication of the > recipient node before streaming the file as it would effectively be decrypted > to any node that could request this streaming action. > > > From: Stefan Miklosovic > Date: Tuesday, 16 November 2021 at 10:45 > To: dev@cassandra.apache.org > Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > Ok but this also means that Km would need to be the same for all nodes right? > > If we are rolling in node by node fashion, Km is changed at node 1, we > change the wrapped key which is stored on disk and we stream this > table to the other node which is still on the old Km. Would this work? > I think we would need to rotate first before anything is streamed. Or > no? > > On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: >> Yes, that's correct. The actual key used to encrypt the SSTable will >> stay the same once the SSTable is created. This is a widely used >> practice in many encrypt-at-rest applications. One good example is the >> LUKS full disk encryption, which also supports multiple keys to unlock >> (decrypt) the same data. Multiple unlocking keys is only possible >> because the actual key used to encrypt the data is randomly generated >> and then stored encrypted by (a key derived from) a user chosen key. >> >> If this approach is adopted, the streaming process can share the Kr >> without disclosing the Km, therefore enableling zero-copy streaming. >> >> On 16/11/2021 08:56, Stefan Miklosovic wrote: >>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very >>> key for the actual sstable encryption would be always the same, it is >>> just what is wrapped would differ. So if we rotate, we basically only >>> change Km hence KEK hence the result of wrapping but there would still >>> be the original Kr key used. >>> >>> Jeremiah - I will prepare that branch very soon. >>> >>> On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: > The second question is about key rotation. If an operator needs to > roll the key because it was compromised or there is some policy > around > that, we should be able to provide some way to rotate it. Our idea > is > to write a tool (either a subcommand of nodetool (rewritesstables) > command or a completely standalone one in tools) which would take > the > first, original key, the second, new key and dir with sstables as > input and it would literally took the data and it would rewrite it > to > the second set of sstables which would be encrypted with the second > key. What do you think about this? I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system. This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key. An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”. There's a much better approach to solve th
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Thanks for the insights of everybody. I would like to return to Km. If we require that all Km's are the same before streaming, is it not true that we do not need to move any secrets around at all? So TLS would not be required either as only encrypted tables would ever be streamed. That way Kr would never ever leave the node and new Km would be rolled over first. To use correct Km, we would have hash of that upon received table from the recipient's perspective. This would also avoid the fairly complex algorithm in the last Bowen's reply when I got that right. On Tue, 16 Nov 2021 at 13:02, bened...@apache.org wrote: > > We already have the facility to authenticate peers, I am suggesting we should > e.g. refuse to enable encryption if there is no such facility configured for > a replica, or fail to start if there is encrypted data present and no > authentication facility configured. > > It is in my opinion much more problematic to remove encryption from data and > ship it to another node in the network than it is to ship data that is > already unencrypted to another node on the network. Either is bad, but it is > probably fine to leave the unencrypted case to the cognizance of the operator > who may be happy relying on their general expectation that there are no > nefarious actors on the network. Encrypting data suggests this is not an > acceptable assumption, so I think we should make it harder for users that > require encryption to accidentally misconfigure in this way, since they > probably have higher security expectations (and compliance requirements) than > users that do not encrypt their data at rest. > > > From: Bowen Song > Date: Tuesday, 16 November 2021 at 11:56 > To: dev@cassandra.apache.org > Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > I think authenticating a receiving node is important, but it is perhaps > not in the scope of this ticket (or CEP if it becomes one). This applies > to not only encrypted SSTables, but also unencrypted SSTables. A > malicious node can join the cluster and send bogus requests to other > nodes is a general problem not specific to the on-disk encryption. > > On 16/11/2021 10:50, bened...@apache.org wrote: > > I assume the key would be decrypted before being streamed, or perhaps > > encrypted using a public key provided to you by the receiving node. This > > would permit efficient “zero copy” streaming for the data portion, but not > > require any knowledge of the recipient node’s master key(s). > > > > Either way, we would still want to ensure we had some authentication of the > > recipient node before streaming the file as it would effectively be > > decrypted to any node that could request this streaming action. > > > > > > From: Stefan Miklosovic > > Date: Tuesday, 16 November 2021 at 10:45 > > To: dev@cassandra.apache.org > > Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > > Ok but this also means that Km would need to be the same for all nodes > > right? > > > > If we are rolling in node by node fashion, Km is changed at node 1, we > > change the wrapped key which is stored on disk and we stream this > > table to the other node which is still on the old Km. Would this work? > > I think we would need to rotate first before anything is streamed. Or > > no? > > > > On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: > >> Yes, that's correct. The actual key used to encrypt the SSTable will > >> stay the same once the SSTable is created. This is a widely used > >> practice in many encrypt-at-rest applications. One good example is the > >> LUKS full disk encryption, which also supports multiple keys to unlock > >> (decrypt) the same data. Multiple unlocking keys is only possible > >> because the actual key used to encrypt the data is randomly generated > >> and then stored encrypted by (a key derived from) a user chosen key. > >> > >> If this approach is adopted, the streaming process can share the Kr > >> without disclosing the Km, therefore enableling zero-copy streaming. > >> > >> On 16/11/2021 08:56, Stefan Miklosovic wrote: > >>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very > >>> key for the actual sstable encryption would be always the same, it is > >>> just what is wrapped would differ. So if we rotate, we basically only > >>> change Km hence KEK hence the result of wrapping but there would still > >>> be the original Kr key used. > >>> > >>> Jeremiah - I will prepare that branch very soon. > >>> > >>> On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: > > The second question is about key rotation. If an operator needs to > > roll the key because it was compromised or there is some policy > > around > > that, we should be able to provide some way to rotate it. Our > > idea is > > to write a tool (either a subcommand of nodetool (rewritesstables) > > command or a completely standalone one in tools) which would take >
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I think a warning message is fine, but Cassandra should not enforce network encryption when on-disk encryption is enabled. It's definitely a valid use case to have Cassandra over IPSec without enabling TLS. On 16/11/2021 12:02, bened...@apache.org wrote: We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured. It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest. From: Bowen Song Date: Tuesday, 16 November 2021 at 11:56 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think authenticating a receiving node is important, but it is perhaps not in the scope of this ticket (or CEP if it becomes one). This applies to not only encrypted SSTables, but also unencrypted SSTables. A malicious node can join the cluster and send bogus requests to other nodes is a general problem not specific to the on-disk encryption. On 16/11/2021 10:50, bened...@apache.org wrote: I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s). Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action. From: Stefan Miklosovic Date: Tuesday, 16 November 2021 at 10:45 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many encrypt-at-rest applications. One good example is the LUKS full disk encryption, which also supports multiple keys to unlock (decrypt) the same data. Multiple unlocking keys is only possible because the actual key used to encrypt the data is randomly generated and then stored encrypted by (a key derived from) a user chosen key. If this approach is adopted, the streaming process can share the Kr without disclosing the Km, therefore enableling zero-copy streaming. On 16/11/2021 08:56, Stefan Miklosovic wrote: Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some way to rotate it. Our idea is to write a tool (either a subcommand of nodetool (rewritesstables) command or a completely standalone one in tools) which would take the first, original key, the second, new key and dir with sstables as input and it would literally took the data and it would rewrite it to the second set of sstables which would be encrypted with the second key. What do you think about this? I would rather suggest that “what key encrypted this” be part of the sstable metadata, and allow there to be multiple keys in the system. This way you can just add a new “current key” so new sstables use the new key, but existing sstables would use the old key. An operator could then trigger a “nodetool upgradesstables —all” to rewrite the existing sstables with the new “current key”
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I’m not suggesting enforcing network encryption, just prohibiting unauthenticated connections from peers so that we do not effectively offer a decrypt-all-the-data endpoint. If as an operator you know that it is impossible for unauthenticated peers to open a connection due to your network configuration, then we can offer some special SafeAllowAllInternodeAuthenticator that permits things to proceed as normal, but we should definitely ensure operators have considered internode authentication in the case we have at rest encryption. It’s far too easy for this to be overlooked otherwise, and for an operator to thereby fail to protect their data. From: Bowen Song Date: Tuesday, 16 November 2021 at 12:33 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think a warning message is fine, but Cassandra should not enforce network encryption when on-disk encryption is enabled. It's definitely a valid use case to have Cassandra over IPSec without enabling TLS. On 16/11/2021 12:02, bened...@apache.org wrote: > We already have the facility to authenticate peers, I am suggesting we should > e.g. refuse to enable encryption if there is no such facility configured for > a replica, or fail to start if there is encrypted data present and no > authentication facility configured. > > It is in my opinion much more problematic to remove encryption from data and > ship it to another node in the network than it is to ship data that is > already unencrypted to another node on the network. Either is bad, but it is > probably fine to leave the unencrypted case to the cognizance of the operator > who may be happy relying on their general expectation that there are no > nefarious actors on the network. Encrypting data suggests this is not an > acceptable assumption, so I think we should make it harder for users that > require encryption to accidentally misconfigure in this way, since they > probably have higher security expectations (and compliance requirements) than > users that do not encrypt their data at rest. > > > From: Bowen Song > Date: Tuesday, 16 November 2021 at 11:56 > To: dev@cassandra.apache.org > Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > I think authenticating a receiving node is important, but it is perhaps > not in the scope of this ticket (or CEP if it becomes one). This applies > to not only encrypted SSTables, but also unencrypted SSTables. A > malicious node can join the cluster and send bogus requests to other > nodes is a general problem not specific to the on-disk encryption. > > On 16/11/2021 10:50, bened...@apache.org wrote: >> I assume the key would be decrypted before being streamed, or perhaps >> encrypted using a public key provided to you by the receiving node. This >> would permit efficient “zero copy” streaming for the data portion, but not >> require any knowledge of the recipient node’s master key(s). >> >> Either way, we would still want to ensure we had some authentication of the >> recipient node before streaming the file as it would effectively be >> decrypted to any node that could request this streaming action. >> >> >> From: Stefan Miklosovic >> Date: Tuesday, 16 November 2021 at 10:45 >> To: dev@cassandra.apache.org >> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption >> Ok but this also means that Km would need to be the same for all nodes right? >> >> If we are rolling in node by node fashion, Km is changed at node 1, we >> change the wrapped key which is stored on disk and we stream this >> table to the other node which is still on the old Km. Would this work? >> I think we would need to rotate first before anything is streamed. Or >> no? >> >> On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: >>> Yes, that's correct. The actual key used to encrypt the SSTable will >>> stay the same once the SSTable is created. This is a widely used >>> practice in many encrypt-at-rest applications. One good example is the >>> LUKS full disk encryption, which also supports multiple keys to unlock >>> (decrypt) the same data. Multiple unlocking keys is only possible >>> because the actual key used to encrypt the data is randomly generated >>> and then stored encrypted by (a key derived from) a user chosen key. >>> >>> If this approach is adopted, the streaming process can share the Kr >>> without disclosing the Km, therefore enableling zero-copy streaming. >>> >>> On 16/11/2021 08:56, Stefan Miklosovic wrote: Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: >>The second question is
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
If the same user chosen key Km is used across all nodes in the same cluster, the sender will only need to share their SSTable generation GEN with the receiving side. This is because the receiving side will need to use the GEN to reproduce the KEK used in the source node. The receiving side will then need to unwrap Kr with the KEK and re-wrap it with a new KEK' derived from their own GEN. GEN is not considered as a secret. On 16/11/2021 12:13, Stefan Miklosovic wrote: Thanks for the insights of everybody. I would like to return to Km. If we require that all Km's are the same before streaming, is it not true that we do not need to move any secrets around at all? So TLS would not be required either as only encrypted tables would ever be streamed. That way Kr would never ever leave the node and new Km would be rolled over first. To use correct Km, we would have hash of that upon received table from the recipient's perspective. This would also avoid the fairly complex algorithm in the last Bowen's reply when I got that right. On Tue, 16 Nov 2021 at 13:02, bened...@apache.org wrote: We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured. It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest. From: Bowen Song Date: Tuesday, 16 November 2021 at 11:56 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think authenticating a receiving node is important, but it is perhaps not in the scope of this ticket (or CEP if it becomes one). This applies to not only encrypted SSTables, but also unencrypted SSTables. A malicious node can join the cluster and send bogus requests to other nodes is a general problem not specific to the on-disk encryption. On 16/11/2021 10:50, bened...@apache.org wrote: I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s). Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action. From: Stefan Miklosovic Date: Tuesday, 16 November 2021 at 10:45 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many encrypt-at-rest applications. One good example is the LUKS full disk encryption, which also supports multiple keys to unlock (decrypt) the same data. Multiple unlocking keys is only possible because the actual key used to encrypt the data is randomly generated and then stored encrypted by (a key derived from) a user chosen key. If this approach is adopted, the streaming process can share the Kr without disclosing the Km, therefore enableling zero-copy streaming. On 16/11/2021 08:56, Stefan Miklosovic wrote: Hi Bowen, Very interesting idea indeed. So if I got it right, the very key for the actual sstable encryption would be always the same, it is just what is wrapped would differ. So if we rotate, we basically only change Km hence KEK hence the result of wrapping but there would still be the original Kr key used. Jeremiah - I will prepare that branch very soon. On Tue, 16 Nov 2021 at 01:09, Bowen Song wrote: The second question is about key rotation. If an operator needs to roll the key because it was compromised or there is some policy around that, we should be able to provide some wa
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Ok, but this does not need to be something which is _explicitly_ sent to it as I believe a receiving node can derive this on its own - if we way that gen is a hash of keyspace + table + table id, for example (which is same across the cluster for each node). On Tue, 16 Nov 2021 at 13:55, Bowen Song wrote: > > If the same user chosen key Km is used across all nodes in the same > cluster, the sender will only need to share their SSTable generation GEN > with the receiving side. This is because the receiving side will need to > use the GEN to reproduce the KEK used in the source node. The receiving > side will then need to unwrap Kr with the KEK and re-wrap it with a new > KEK' derived from their own GEN. GEN is not considered as a secret. > > > On 16/11/2021 12:13, Stefan Miklosovic wrote: > > Thanks for the insights of everybody. > > > > I would like to return to Km. If we require that all Km's are the same > > before streaming, is it not true that we do not need to move any > > secrets around at all? So TLS would not be required either as only > > encrypted tables would ever be streamed. That way Kr would never ever > > leave the node and new Km would be rolled over first. To use correct > > Km, we would have hash of that upon received table from the > > recipient's perspective. This would also avoid the fairly complex > > algorithm in the last Bowen's reply when I got that right. > > > > On Tue, 16 Nov 2021 at 13:02, bened...@apache.org > > wrote: > >> We already have the facility to authenticate peers, I am suggesting we > >> should e.g. refuse to enable encryption if there is no such facility > >> configured for a replica, or fail to start if there is encrypted data > >> present and no authentication facility configured. > >> > >> It is in my opinion much more problematic to remove encryption from data > >> and ship it to another node in the network than it is to ship data that is > >> already unencrypted to another node on the network. Either is bad, but it > >> is probably fine to leave the unencrypted case to the cognizance of the > >> operator who may be happy relying on their general expectation that there > >> are no nefarious actors on the network. Encrypting data suggests this is > >> not an acceptable assumption, so I think we should make it harder for > >> users that require encryption to accidentally misconfigure in this way, > >> since they probably have higher security expectations (and compliance > >> requirements) than users that do not encrypt their data at rest. > >> > >> > >> From: Bowen Song > >> Date: Tuesday, 16 November 2021 at 11:56 > >> To: dev@cassandra.apache.org > >> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > >> I think authenticating a receiving node is important, but it is perhaps > >> not in the scope of this ticket (or CEP if it becomes one). This applies > >> to not only encrypted SSTables, but also unencrypted SSTables. A > >> malicious node can join the cluster and send bogus requests to other > >> nodes is a general problem not specific to the on-disk encryption. > >> > >> On 16/11/2021 10:50, bened...@apache.org wrote: > >>> I assume the key would be decrypted before being streamed, or perhaps > >>> encrypted using a public key provided to you by the receiving node. This > >>> would permit efficient “zero copy” streaming for the data portion, but > >>> not require any knowledge of the recipient node’s master key(s). > >>> > >>> Either way, we would still want to ensure we had some authentication of > >>> the recipient node before streaming the file as it would effectively be > >>> decrypted to any node that could request this streaming action. > >>> > >>> > >>> From: Stefan Miklosovic > >>> Date: Tuesday, 16 November 2021 at 10:45 > >>> To: dev@cassandra.apache.org > >>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > >>> Ok but this also means that Km would need to be the same for all nodes > >>> right? > >>> > >>> If we are rolling in node by node fashion, Km is changed at node 1, we > >>> change the wrapped key which is stored on disk and we stream this > >>> table to the other node which is still on the old Km. Would this work? > >>> I think we would need to rotate first before anything is streamed. Or > >>> no? > >>> > >>> On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: > Yes, that's correct. The actual key used to encrypt the SSTable will > stay the same once the SSTable is created. This is a widely used > practice in many encrypt-at-rest applications. One good example is the > LUKS full disk encryption, which also supports multiple keys to unlock > (decrypt) the same data. Multiple unlocking keys is only possible > because the actual key used to encrypt the data is randomly generated > and then stored encrypted by (a key derived from) a user chosen key. > > If this approach is adopted, the streaming process can share the Kr > without disclosing
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
Then you are reusing the same KEK for all SSTable files belong to the same Cassandra table. The reason to have KEK derived from some unique information is to avoid reusing keys which may open up some attack vectors. On that thought, table UUID+GEN is actually not good enough, because the table UUID is the same across all nodes and the GEN is only unique on a given node. The proper solution may require adding an additional UUID field to each SSTable file header, and then use that UUID in the KDF. If this is implemented, no additional information will need to be send during a streaming session, as the receiving end will have received the SSTable file with the header information anyway. On 16/11/2021 13:05, Stefan Miklosovic wrote: Ok, but this does not need to be something which is _explicitly_ sent to it as I believe a receiving node can derive this on its own - if we way that gen is a hash of keyspace + table + table id, for example (which is same across the cluster for each node). On Tue, 16 Nov 2021 at 13:55, Bowen Song wrote: If the same user chosen key Km is used across all nodes in the same cluster, the sender will only need to share their SSTable generation GEN with the receiving side. This is because the receiving side will need to use the GEN to reproduce the KEK used in the source node. The receiving side will then need to unwrap Kr with the KEK and re-wrap it with a new KEK' derived from their own GEN. GEN is not considered as a secret. On 16/11/2021 12:13, Stefan Miklosovic wrote: Thanks for the insights of everybody. I would like to return to Km. If we require that all Km's are the same before streaming, is it not true that we do not need to move any secrets around at all? So TLS would not be required either as only encrypted tables would ever be streamed. That way Kr would never ever leave the node and new Km would be rolled over first. To use correct Km, we would have hash of that upon received table from the recipient's perspective. This would also avoid the fairly complex algorithm in the last Bowen's reply when I got that right. On Tue, 16 Nov 2021 at 13:02, bened...@apache.org wrote: We already have the facility to authenticate peers, I am suggesting we should e.g. refuse to enable encryption if there is no such facility configured for a replica, or fail to start if there is encrypted data present and no authentication facility configured. It is in my opinion much more problematic to remove encryption from data and ship it to another node in the network than it is to ship data that is already unencrypted to another node on the network. Either is bad, but it is probably fine to leave the unencrypted case to the cognizance of the operator who may be happy relying on their general expectation that there are no nefarious actors on the network. Encrypting data suggests this is not an acceptable assumption, so I think we should make it harder for users that require encryption to accidentally misconfigure in this way, since they probably have higher security expectations (and compliance requirements) than users that do not encrypt their data at rest. From: Bowen Song Date: Tuesday, 16 November 2021 at 11:56 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I think authenticating a receiving node is important, but it is perhaps not in the scope of this ticket (or CEP if it becomes one). This applies to not only encrypted SSTables, but also unencrypted SSTables. A malicious node can join the cluster and send bogus requests to other nodes is a general problem not specific to the on-disk encryption. On 16/11/2021 10:50, bened...@apache.org wrote: I assume the key would be decrypted before being streamed, or perhaps encrypted using a public key provided to you by the receiving node. This would permit efficient “zero copy” streaming for the data portion, but not require any knowledge of the recipient node’s master key(s). Either way, we would still want to ensure we had some authentication of the recipient node before streaming the file as it would effectively be decrypted to any node that could request this streaming action. From: Stefan Miklosovic Date: Tuesday, 16 November 2021 at 10:45 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption Ok but this also means that Km would need to be the same for all nodes right? If we are rolling in node by node fashion, Km is changed at node 1, we change the wrapped key which is stored on disk and we stream this table to the other node which is still on the old Km. Would this work? I think we would need to rotate first before anything is streamed. Or no? On Tue, 16 Nov 2021 at 11:17, Bowen Song wrote: Yes, that's correct. The actual key used to encrypt the SSTable will stay the same once the SSTable is created. This is a widely used practice in many encrypt-at-rest applications. One good example is the LUKS full disk enc
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I think a CEP is wise (or a more thorough design document on the ticket) given how easy it is to do security incorrectly and key management, rotation and key derivation are not particularly straightforward. I am curious what advantage Cassandra implementing encryption has over asking the user to use an encrypted filesystem or disks instead where the kernel or device will undoubtedly be able to do the crypto more efficiently than we can in the JVM and we wouldn't have to further complicate the storage engine? I think the state of encrypted filesystems (e.g. LUKS on Linux) is significantly more user friendly these days than it was in 2015 when that ticket was created. If the application has existing exfiltration paths (e.g. backups) it's probably better to encrypt/decrypt in the backup/restore process via something extremely fast (and modern) like piping through age [1] isn't it? [1] https://github.com/FiloSottile/age -Joey On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic wrote: > > Hi list, > > an engineer from Intel - Shylaja Kokoori (who is watching this list > closely) has retrofitted the original code from CASSANDRA-9633 work in > times of 3.4 to the current trunk with my help here and there, mostly > cosmetic. > > I would like to know if there is a general consensus about me going to > create a CEP for this feature or what is your perception on this. I > know we have it a little bit backwards here as we should first discuss > and then code but I am super glad that we have some POC we can > elaborate further on and CEP would just cement and summarise the > approach / other implementation aspects of this feature. > > I think that having 9633 merged will fill quite a big operational gap > when it comes to security. There are a lot of enterprises who desire > this feature so much. I can not remember when I last saw a ticket with > 50 watchers which was inactive for such a long time. > > Regards > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I don't object to having the discussion about whether we actually need this feature at all :) Let's hear from people in the field what their perception is on this. Btw, if we should rely on file system encryption, for what reason is there encryption of commit logs and hints already? So this should be removed? I find it rather strange to offer commit log and hints encryption at rest but for some reason sstable encryption would be omitted. On Tue, 16 Nov 2021 at 15:46, Joseph Lynch wrote: > > I think a CEP is wise (or a more thorough design document on the > ticket) given how easy it is to do security incorrectly and key > management, rotation and key derivation are not particularly > straightforward. > > I am curious what advantage Cassandra implementing encryption has over > asking the user to use an encrypted filesystem or disks instead where > the kernel or device will undoubtedly be able to do the crypto more > efficiently than we can in the JVM and we wouldn't have to further > complicate the storage engine? I think the state of encrypted > filesystems (e.g. LUKS on Linux) is significantly more user friendly > these days than it was in 2015 when that ticket was created. > > If the application has existing exfiltration paths (e.g. backups) it's > probably better to encrypt/decrypt in the backup/restore process via > something extremely fast (and modern) like piping through age [1] > isn't it? > > [1] https://github.com/FiloSottile/age > > -Joey > > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic > wrote: > > > > Hi list, > > > > an engineer from Intel - Shylaja Kokoori (who is watching this list > > closely) has retrofitted the original code from CASSANDRA-9633 work in > > times of 3.4 to the current trunk with my help here and there, mostly > > cosmetic. > > > > I would like to know if there is a general consensus about me going to > > create a CEP for this feature or what is your perception on this. I > > know we have it a little bit backwards here as we should first discuss > > and then code but I am super glad that we have some POC we can > > elaborate further on and CEP would just cement and summarise the > > approach / other implementation aspects of this feature. > > > > I think that having 9633 merged will fill quite a big operational gap > > when it comes to security. There are a lot of enterprises who desire > > this feature so much. I can not remember when I last saw a ticket with > > 50 watchers which was inactive for such a long time. > > > > Regards > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [VOTE] CEP-17: SSTable format API
+1 On Tue, 16 Nov 2021 at 08:39, Sam Tunnicliffe wrote: > +1 > > > On 15 Nov 2021, at 19:42, Branimir Lambov wrote: > > > > Hi everyone, > > > > I would like to start a vote on this CEP. > > > > Proposal: > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API > > > > Discussion: > > > https://lists.apache.org/thread.html/r636bebcab4e678dbee042285449193e8e75d3753200a1b404fcc7196%40%3Cdev.cassandra.apache.org%3E > > > > The vote will be open for 72 hours. > > A vote passes if there are at least three binding +1s and no binding > vetoes. > > > > Regards, > > Branimir > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> I find it rather strange to offer commit log and hints encryption at rest but for some reason sstable encryption would be omitted. I also think file/disk encryption may be superior in those cases, but I imagine they were easier to implement in that you don't have to worry nearly as much about key management since both commit logs and hints are short lived files that should never leave the box (except maybe for CDC but I feel like that's similar to backup in terms of "exfiltration by design"). To be clear, I think in 2015 this feature would have been extremely useful, but with operating systems and cloud providers often offering full disk encryption by default now and doing it with really good (performant and secure) implementations ... I question if it's something we want to sink cycles into. -Joey On Tue, Nov 16, 2021 at 7:01 AM Stefan Miklosovic wrote: > > I don't object to having the discussion about whether we actually need > this feature at all :) > > Let's hear from people in the field what their perception is on this. > > Btw, if we should rely on file system encryption, for what reason is > there encryption of commit logs and hints already? So this should be > removed? I find it rather strange to offer commit log and hints > encryption at rest but for some reason sstable encryption would be > omitted. > > On Tue, 16 Nov 2021 at 15:46, Joseph Lynch wrote: > > > > I think a CEP is wise (or a more thorough design document on the > > ticket) given how easy it is to do security incorrectly and key > > management, rotation and key derivation are not particularly > > straightforward. > > > > I am curious what advantage Cassandra implementing encryption has over > > asking the user to use an encrypted filesystem or disks instead where > > the kernel or device will undoubtedly be able to do the crypto more > > efficiently than we can in the JVM and we wouldn't have to further > > complicate the storage engine? I think the state of encrypted > > filesystems (e.g. LUKS on Linux) is significantly more user friendly > > these days than it was in 2015 when that ticket was created. > > > > If the application has existing exfiltration paths (e.g. backups) it's > > probably better to encrypt/decrypt in the backup/restore process via > > something extremely fast (and modern) like piping through age [1] > > isn't it? > > > > [1] https://github.com/FiloSottile/age > > > > -Joey > > > > > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic > > wrote: > > > > > > Hi list, > > > > > > an engineer from Intel - Shylaja Kokoori (who is watching this list > > > closely) has retrofitted the original code from CASSANDRA-9633 work in > > > times of 3.4 to the current trunk with my help here and there, mostly > > > cosmetic. > > > > > > I would like to know if there is a general consensus about me going to > > > create a CEP for this feature or what is your perception on this. I > > > know we have it a little bit backwards here as we should first discuss > > > and then code but I am super glad that we have some POC we can > > > elaborate further on and CEP would just cement and summarise the > > > approach / other implementation aspects of this feature. > > > > > > I think that having 9633 merged will fill quite a big operational gap > > > when it comes to security. There are a lot of enterprises who desire > > > this feature so much. I can not remember when I last saw a ticket with > > > 50 watchers which was inactive for such a long time. > > > > > > Regards > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
On Tue, 16 Nov 2021 at 16:17, Joseph Lynch wrote: > > > I find it rather strange to offer commit log and hints > encryption at rest but for some reason sstable encryption would be > omitted. > > I also think file/disk encryption may be superior in those cases Just for the record, I do not have any particular opinion / I am not leaning towards any solution as of now when it comes to superiority / inferiority of file system encryption. It would be very beneficial if more people expressed their views on this matter. but > I imagine they were easier to implement in that you don't have to > worry nearly as much about key management since both commit logs and > hints are short lived files that should never leave the box (except > maybe for CDC but I feel like that's similar to backup in terms of > "exfiltration by design"). > > To be clear, I think in 2015 this feature would have been extremely > useful, but with operating systems and cloud providers often offering > full disk encryption by default now and doing it with really good > (performant and secure) implementations ... I question if it's > something we want to sink cycles into. > > -Joey > > On Tue, Nov 16, 2021 at 7:01 AM Stefan Miklosovic > wrote: > > > > I don't object to having the discussion about whether we actually need > > this feature at all :) > > > > Let's hear from people in the field what their perception is on this. > > > > Btw, if we should rely on file system encryption, for what reason is > > there encryption of commit logs and hints already? So this should be > > removed? I find it rather strange to offer commit log and hints > > encryption at rest but for some reason sstable encryption would be > > omitted. > > > > On Tue, 16 Nov 2021 at 15:46, Joseph Lynch wrote: > > > > > > I think a CEP is wise (or a more thorough design document on the > > > ticket) given how easy it is to do security incorrectly and key > > > management, rotation and key derivation are not particularly > > > straightforward. > > > > > > I am curious what advantage Cassandra implementing encryption has over > > > asking the user to use an encrypted filesystem or disks instead where > > > the kernel or device will undoubtedly be able to do the crypto more > > > efficiently than we can in the JVM and we wouldn't have to further > > > complicate the storage engine? I think the state of encrypted > > > filesystems (e.g. LUKS on Linux) is significantly more user friendly > > > these days than it was in 2015 when that ticket was created. > > > > > > If the application has existing exfiltration paths (e.g. backups) it's > > > probably better to encrypt/decrypt in the backup/restore process via > > > something extremely fast (and modern) like piping through age [1] > > > isn't it? > > > > > > [1] https://github.com/FiloSottile/age > > > > > > -Joey > > > > > > > > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic > > > wrote: > > > > > > > > Hi list, > > > > > > > > an engineer from Intel - Shylaja Kokoori (who is watching this list > > > > closely) has retrofitted the original code from CASSANDRA-9633 work in > > > > times of 3.4 to the current trunk with my help here and there, mostly > > > > cosmetic. > > > > > > > > I would like to know if there is a general consensus about me going to > > > > create a CEP for this feature or what is your perception on this. I > > > > know we have it a little bit backwards here as we should first discuss > > > > and then code but I am super glad that we have some POC we can > > > > elaborate further on and CEP would just cement and summarise the > > > > approach / other implementation aspects of this feature. > > > > > > > > I think that having 9633 merged will fill quite a big operational gap > > > > when it comes to security. There are a lot of enterprises who desire > > > > this feature so much. I can not remember when I last saw a ticket with > > > > 50 watchers which was inactive for such a long time. > > > > > > > > Regards > > > > > > > > - > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For add
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I don't like the idea that FDE Full Disk Encryption as an alternative to application managed encryption at rest. Each has their own advantages and disadvantages. For example, if the encryption key is the same across nodes in the same cluster, and Cassandra can share the key securely between authenticated nodes, rolling restart of the servers will be a lot simpler than if the servers were using FDE - someone will have to type in the passphrase on each reboot, or have a script to mount the encrypted device over SSH and then start Cassandra service after a reboot. Another valid use case of encryption implemented in Cassandra is selectively encrypt some tables, but leave others unencrypted. Doing this outside Cassandra on the filesystem level is very tedious and error-prone - a lots of symlinks and pretty hard to handle newly created tables or keyspaces. However, I don't know if there's enough demand to justify the above use cases. On 16/11/2021 14:45, Joseph Lynch wrote: I think a CEP is wise (or a more thorough design document on the ticket) given how easy it is to do security incorrectly and key management, rotation and key derivation are not particularly straightforward. I am curious what advantage Cassandra implementing encryption has over asking the user to use an encrypted filesystem or disks instead where the kernel or device will undoubtedly be able to do the crypto more efficiently than we can in the JVM and we wouldn't have to further complicate the storage engine? I think the state of encrypted filesystems (e.g. LUKS on Linux) is significantly more user friendly these days than it was in 2015 when that ticket was created. If the application has existing exfiltration paths (e.g. backups) it's probably better to encrypt/decrypt in the backup/restore process via something extremely fast (and modern) like piping through age [1] isn't it? [1] https://github.com/FiloSottile/age -Joey On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic wrote: Hi list, an engineer from Intel - Shylaja Kokoori (who is watching this list closely) has retrofitted the original code from CASSANDRA-9633 work in times of 3.4 to the current trunk with my help here and there, mostly cosmetic. I would like to know if there is a general consensus about me going to create a CEP for this feature or what is your perception on this. I know we have it a little bit backwards here as we should first discuss and then code but I am super glad that we have some POC we can elaborate further on and CEP would just cement and summarise the approach / other implementation aspects of this feature. I think that having 9633 merged will fill quite a big operational gap when it comes to security. There are a lot of enterprises who desire this feature so much. I can not remember when I last saw a ticket with 50 watchers which was inactive for such a long time. Regards - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
For FDE you'd probably have the key file in a tmpfs pulled from a remote secret manager and when the machine boots it mounts the encrypted partition that contains your data files. I'm not aware of anyone doing FDE with a password in production. If you wanted selective encryption it would make sense to me to support placing keyspaces on different data directories (this may already be possible) but since crypto in the kernel is so cheap I don't know why you'd do selective encryption. Also I think it's worth noting many hosting providers (e.g. AWS) just encrypt the disks for you so you can check the "data is encrypted at rest" box. I think Cassandra will be pretty handicapped by being in the JVM which generally has very slow crypto. I'm slightly concerned that we're already slow at streaming and compaction, and adding slow JVM crypto will make C* even less competitive. For example, if we have to disable full sstable streaming (zero copy or otherwise) I think that would be very unfortunate (although Bowen's approach of sharing one secret across the cluster and then having files use a key derivation function may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to try to offload to native crypto like how internode networking did with tcnative to fix the perf issues with netty TLS with JVM crypto I'd feel a little less concerned but ... crypto that is both secure and performant in the JVM is a hard problem ... I guess I'm just concerned we're going to introduce something that is either insecure or too slow to be useful. -Joey On Tue, Nov 16, 2021 at 8:10 AM Bowen Song wrote: > > I don't like the idea that FDE Full Disk Encryption as an alternative to > application managed encryption at rest. Each has their own advantages > and disadvantages. > > For example, if the encryption key is the same across nodes in the same > cluster, and Cassandra can share the key securely between authenticated > nodes, rolling restart of the servers will be a lot simpler than if the > servers were using FDE - someone will have to type in the passphrase on > each reboot, or have a script to mount the encrypted device over SSH and > then start Cassandra service after a reboot. > > Another valid use case of encryption implemented in Cassandra is > selectively encrypt some tables, but leave others unencrypted. Doing > this outside Cassandra on the filesystem level is very tedious and > error-prone - a lots of symlinks and pretty hard to handle newly created > tables or keyspaces. > > However, I don't know if there's enough demand to justify the above use > cases. > > > On 16/11/2021 14:45, Joseph Lynch wrote: > > I think a CEP is wise (or a more thorough design document on the > > ticket) given how easy it is to do security incorrectly and key > > management, rotation and key derivation are not particularly > > straightforward. > > > > I am curious what advantage Cassandra implementing encryption has over > > asking the user to use an encrypted filesystem or disks instead where > > the kernel or device will undoubtedly be able to do the crypto more > > efficiently than we can in the JVM and we wouldn't have to further > > complicate the storage engine? I think the state of encrypted > > filesystems (e.g. LUKS on Linux) is significantly more user friendly > > these days than it was in 2015 when that ticket was created. > > > > If the application has existing exfiltration paths (e.g. backups) it's > > probably better to encrypt/decrypt in the backup/restore process via > > something extremely fast (and modern) like piping through age [1] > > isn't it? > > > > [1] https://github.com/FiloSottile/age > > > > -Joey > > > > > > On Sat, Nov 13, 2021 at 6:01 AM Stefan Miklosovic > > wrote: > >> Hi list, > >> > >> an engineer from Intel - Shylaja Kokoori (who is watching this list > >> closely) has retrofitted the original code from CASSANDRA-9633 work in > >> times of 3.4 to the current trunk with my help here and there, mostly > >> cosmetic. > >> > >> I would like to know if there is a general consensus about me going to > >> create a CEP for this feature or what is your perception on this. I > >> know we have it a little bit backwards here as we should first discuss > >> and then code but I am super glad that we have some POC we can > >> elaborate further on and CEP would just cement and summarise the > >> approach / other implementation aspects of this feature. > >> > >> I think that having 9633 merged will fill quite a big operational gap > >> when it comes to security. There are a lot of enterprises who desire > >> this feature so much. I can not remember when I last saw a ticket with > >> 50 watchers which was inactive for such a long time. > >> > >> Regards > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > > --
Re: [VOTE] CEP-17: SSTable format API
+1 On Tue, Nov 16, 2021 at 10:14 AM Andrés de la Peña wrote: > +1 > > On Tue, 16 Nov 2021 at 08:39, Sam Tunnicliffe wrote: > > > +1 > > > > > On 15 Nov 2021, at 19:42, Branimir Lambov wrote: > > > > > > Hi everyone, > > > > > > I would like to start a vote on this CEP. > > > > > > Proposal: > > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API > > > > > > Discussion: > > > > > > https://lists.apache.org/thread.html/r636bebcab4e678dbee042285449193e8e75d3753200a1b404fcc7196%40%3Cdev.cassandra.apache.org%3E > > > > > > The vote will be open for 72 hours. > > > A vote passes if there are at least three binding +1s and no binding > > vetoes. > > > > > > Regards, > > > Branimir > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > >