Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I wanted to provide a bit of background in the interest we've seen in this ticket/feature (at Instaclustr) - essentially it comes down to in-db encryption at rest being a feature that compliance people are used to seeing in databases and having a very hard time believing that operating system level encryption is an equivalent control (whatever the reality may be). I've seen this be a significant obstacle for people who want to adopt Apache Cassandra many times and an insurmountable obstacle on multiple occasions. From what I've seen, I think this is one of the most watched tickets with the most "is this coming soon" comments in the project backlog and it's something we pretty regularly get asked whether we know if/when it's coming. That said, I completely agree that we don't want to be engaging in security theatre or " introducing something that is either insecure or too slow to be useful." and I think there are some really good suggestions in this thread to come up with a strong solution for what will undoubtedly be a pretty complex and major change. Cheers Ben On Wed, 17 Nov 2021 at 03:34, Joseph Lynch wrote: > For FDE you'd probably have the key file in a tmpfs pulled from a > remote secret manager and when the machine boots it mounts the > encrypted partition that contains your data files. I'm not aware of > anyone doing FDE with a password in production. If you wanted > selective encryption it would make sense to me to support placing > keyspaces on different data directories (this may already be possible) > but since crypto in the kernel is so cheap I don't know why you'd do > selective encryption. Also I think it's worth noting many hosting > providers (e.g. AWS) just encrypt the disks for you so you can check > the "data is encrypted at rest" box. > > I think Cassandra will be pretty handicapped by being in the JVM which > generally has very slow crypto. I'm slightly concerned that we're > already slow at streaming and compaction, and adding slow JVM crypto > will make C* even less competitive. For example, if we have to disable > full sstable streaming (zero copy or otherwise) I think that would be > very unfortunate (although Bowen's approach of sharing one secret > across the cluster and then having files use a key derivation function > may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to > try to offload to native crypto like how internode networking did with > tcnative to fix the perf issues with netty TLS with JVM crypto I'd > feel a little less concerned but ... crypto that is both secure and > performant in the JVM is a hard problem ... > > I guess I'm just concerned we're going to introduce something that is > either insecure or too slow to be useful. > > -Joey > > On Tue, Nov 16, 2021 at 8:10 AM Bowen Song wrote: > > > > I don't like the idea that FDE Full Disk Encryption as an alternative to > > application managed encryption at rest. Each has their own advantages > > and disadvantages. > > > > For example, if the encryption key is the same across nodes in the same > > cluster, and Cassandra can share the key securely between authenticated > > nodes, rolling restart of the servers will be a lot simpler than if the > > servers were using FDE - someone will have to type in the passphrase on > > each reboot, or have a script to mount the encrypted device over SSH and > > then start Cassandra service after a reboot. > > > > Another valid use case of encryption implemented in Cassandra is > > selectively encrypt some tables, but leave others unencrypted. Doing > > this outside Cassandra on the filesystem level is very tedious and > > error-prone - a lots of symlinks and pretty hard to handle newly created > > tables or keyspaces. > > > > However, I don't know if there's enough demand to justify the above use > > cases. > > > > > > On 16/11/2021 14:45, Joseph Lynch wrote: > > > I think a CEP is wise (or a more thorough design document on the > > > ticket) given how easy it is to do security incorrectly and key > > > management, rotation and key derivation are not particularly > > > straightforward. > > > > > > I am curious what advantage Cassandra implementing encryption has over > > > asking the user to use an encrypted filesystem or disks instead where > > > the kernel or device will undoubtedly be able to do the crypto more > > > efficiently than we can in the JVM and we wouldn't have to further > > > complicate the storage engine? I think the state of encrypted > > > filesystems (e.g. LUKS on Linux) is significantly more user friendly > > > these days than it was in 2015 when that ticket was created. > > > > > > If the application has existing exfiltration paths (e.g. backups) it's > > > probably better to encrypt/decrypt in the backup/restore process via > > > something extremely fast (and modern) like piping through age [1] > > > isn't it? > > > > > > [1] https://github.com/FiloSottile/age > > > > > > -Joey > > > > > > > > > On Sat, Nov 13, 2021 at 6:01 AM Ste
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
I agree with Joey that most users may be better served by OS level encryption, but I also think this ticket can likely be delivered fairly easily. If we have a new contributor willing to produce a patch then the overhead for the project in shepherding it shouldn’t be that onerous. If we also have known use cases in the community then on balance there’s a good chance it will be a net positive investment for the project to enable users that desire in-database encryption. It might even spur further improvements to e.g. streaming performance. I would scope the work to the minimum viable (but efficient) solution. So, in my view, that would mean encrypting per-sstable encryption keys with per-node master keys that can be rotated cheaply, requiring authentication to receive a stream containing both the unencrypted sstable encryption key and the encrypted sstable, and the receiving node encrypting the encryption key before serializing it to disk. Since there are already compression hooks, this means only a little bit of special handling, and I _anticipate_ the patch should be quite modest for such a notable feature. From: Ben Slater Date: Thursday, 18 November 2021 at 09:07 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I wanted to provide a bit of background in the interest we've seen in this ticket/feature (at Instaclustr) - essentially it comes down to in-db encryption at rest being a feature that compliance people are used to seeing in databases and having a very hard time believing that operating system level encryption is an equivalent control (whatever the reality may be). I've seen this be a significant obstacle for people who want to adopt Apache Cassandra many times and an insurmountable obstacle on multiple occasions. From what I've seen, I think this is one of the most watched tickets with the most "is this coming soon" comments in the project backlog and it's something we pretty regularly get asked whether we know if/when it's coming. That said, I completely agree that we don't want to be engaging in security theatre or " introducing something that is either insecure or too slow to be useful." and I think there are some really good suggestions in this thread to come up with a strong solution for what will undoubtedly be a pretty complex and major change. Cheers Ben On Wed, 17 Nov 2021 at 03:34, Joseph Lynch wrote: > For FDE you'd probably have the key file in a tmpfs pulled from a > remote secret manager and when the machine boots it mounts the > encrypted partition that contains your data files. I'm not aware of > anyone doing FDE with a password in production. If you wanted > selective encryption it would make sense to me to support placing > keyspaces on different data directories (this may already be possible) > but since crypto in the kernel is so cheap I don't know why you'd do > selective encryption. Also I think it's worth noting many hosting > providers (e.g. AWS) just encrypt the disks for you so you can check > the "data is encrypted at rest" box. > > I think Cassandra will be pretty handicapped by being in the JVM which > generally has very slow crypto. I'm slightly concerned that we're > already slow at streaming and compaction, and adding slow JVM crypto > will make C* even less competitive. For example, if we have to disable > full sstable streaming (zero copy or otherwise) I think that would be > very unfortunate (although Bowen's approach of sharing one secret > across the cluster and then having files use a key derivation function > may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to > try to offload to native crypto like how internode networking did with > tcnative to fix the perf issues with netty TLS with JVM crypto I'd > feel a little less concerned but ... crypto that is both secure and > performant in the JVM is a hard problem ... > > I guess I'm just concerned we're going to introduce something that is > either insecure or too slow to be useful. > > -Joey > > On Tue, Nov 16, 2021 at 8:10 AM Bowen Song wrote: > > > > I don't like the idea that FDE Full Disk Encryption as an alternative to > > application managed encryption at rest. Each has their own advantages > > and disadvantages. > > > > For example, if the encryption key is the same across nodes in the same > > cluster, and Cassandra can share the key securely between authenticated > > nodes, rolling restart of the servers will be a lot simpler than if the > > servers were using FDE - someone will have to type in the passphrase on > > each reboot, or have a script to mount the encrypted device over SSH and > > then start Cassandra service after a reboot. > > > > Another valid use case of encryption implemented in Cassandra is > > selectively encrypt some tables, but leave others unencrypted. Doing > > this outside Cassandra on the filesystem level is very tedious and > > error-prone - a lots of symlinks and pretty hard to handle
Re: [DISCUSS] Mentoring newcomers
You have my bow. On Fri, Nov 12, 2021 at 11:05 AM Benjamin Lerer wrote: > Hi everybody > > As discussed in the *Creating a new slack channel for newcomers* thead, a > solution to help newcomers engage with the project would be to provide a > list of mentors that newcomers can contact when they feel insecure about > asking questions through our cassandra-dev channel or through the mailing > list. > > I would like to collect the list of people that are interested in helping > out newcomers so that we can post that list on our website. >
Re: Implementing a secondary index
Hi Claude, In code space, the best place to start would be the secondary index API and the manager that maintains the indexes on a per-table basis: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/Index.java https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/SecondaryIndexManager.java If you have any questions about either, feel free to reach out, either here or in ASF Slack. P.S. If you're interested in where secondary indexing in Cassandra is headed, follow https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-7%3A+Storage+Attached+Index . On Wed, Nov 17, 2021 at 4:34 AM DuyHai Doan wrote: > Hello Claude > > I have written a blog post about 2nd index architecture a long time ago but > most of the content should still be relevant, worth checking > > https://www.doanduyhai.com/blog/?p=13191 > > Regards > > Duy Hai DOAN > > Le mer. 17 nov. 2021 à 10:17, Claude Warren > > a écrit : > > > Greetings, > > > > I am looking to implement a Multidimensional Bloom filter index [1] [2] > on > > a Cassandra table. OK, I know that is a lot to take in. What I need is > > any documentation that explains the architecture of the index options, or > > someone I can ask questions of -- a mentor if you will. > > > > I have a proof of concept for the index that works from the client side > > [3]. What I want to do is move some of that processing to the server > > side. > > > > I basically I think I need to do the following: > > > >1. On each partition create an SST to store the index data. This > table > >comprises, 2 integer data points and the primary key for the data > table. > >2. When the index cell gets updated in the original table (there will > >only be on column), update one or more rows in the SST table. > >3. When querying perform multiple queries against the index data, and > >return the primary key values (or the data associated with the primary > > keys > >-- I am unclear on this bit). > > > > Any help or guidance would be appreciated, > > Claude > > > > [1] https://archive.org/details/arxiv-1501.01941/mode/2up > > [2] https://archive.fosdem.org/2020/schedule/event/bloom_filters/ > > [3] https://github.com/Claude-at-Instaclustr/blooming_cassandra > > > > > > > > > > -- > > > > [image: Instaclustr logo] > > > > > > *Claude Warren* > > > > Principal Software Engineer > > > > Instaclustr > > >
Re: [VOTE] CEP-17: SSTable format API
+1 > On Nov 17, 2021, at 1:22 AM, Benjamin Lerer wrote: > > +1 > > Le mar. 16 nov. 2021 à 18:05, Joshua McKenzie a > écrit : > >> +1 >> >> On Tue, Nov 16, 2021 at 10:14 AM Andrés de la Peña >> wrote: >> >>> +1 >>> >>> On Tue, 16 Nov 2021 at 08:39, Sam Tunnicliffe wrote: >>> +1 > On 15 Nov 2021, at 19:42, Branimir Lambov >> wrote: > > Hi everyone, > > I would like to start a vote on this CEP. > > Proposal: > >>> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API > > Discussion: > >>> >> https://lists.apache.org/thread.html/r636bebcab4e678dbee042285449193e8e75d3753200a1b404fcc7196%40%3Cdev.cassandra.apache.org%3E > > The vote will be open for 72 hours. > A vote passes if there are at least three binding +1s and no binding vetoes. > > Regards, > Branimir - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
RE: Resurrection of CASSANDRA-9633 - SSTable encryption
To address Joey's concern, the OpenJDK JVM and its derivatives optimize Java crypto based on the underlying HW capabilities. For example, if the underlying HW supports AES-NI, JVM intrinsics will use those for crypto operations. Likewise, the new vector AES available on the latest Intel platform is utilized by the JVM while running on that platform to make crypto operations faster. >From our internal experiments, we see single digit % regression when >transparent data encryption is enabled. -Original Message- From: bened...@apache.org Sent: Thursday, November 18, 2021 1:23 AM To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I agree with Joey that most users may be better served by OS level encryption, but I also think this ticket can likely be delivered fairly easily. If we have a new contributor willing to produce a patch then the overhead for the project in shepherding it shouldn't be that onerous. If we also have known use cases in the community then on balance there's a good chance it will be a net positive investment for the project to enable users that desire in-database encryption. It might even spur further improvements to e.g. streaming performance. I would scope the work to the minimum viable (but efficient) solution. So, in my view, that would mean encrypting per-sstable encryption keys with per-node master keys that can be rotated cheaply, requiring authentication to receive a stream containing both the unencrypted sstable encryption key and the encrypted sstable, and the receiving node encrypting the encryption key before serializing it to disk. Since there are already compression hooks, this means only a little bit of special handling, and I _anticipate_ the patch should be quite modest for such a notable feature. From: Ben Slater Date: Thursday, 18 November 2021 at 09:07 To: dev@cassandra.apache.org Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption I wanted to provide a bit of background in the interest we've seen in this ticket/feature (at Instaclustr) - essentially it comes down to in-db encryption at rest being a feature that compliance people are used to seeing in databases and having a very hard time believing that operating system level encryption is an equivalent control (whatever the reality may be). I've seen this be a significant obstacle for people who want to adopt Apache Cassandra many times and an insurmountable obstacle on multiple occasions. From what I've seen, I think this is one of the most watched tickets with the most "is this coming soon" comments in the project backlog and it's something we pretty regularly get asked whether we know if/when it's coming. That said, I completely agree that we don't want to be engaging in security theatre or " introducing something that is either insecure or too slow to be useful." and I think there are some really good suggestions in this thread to come up with a strong solution for what will undoubtedly be a pretty complex and major change. Cheers Ben On Wed, 17 Nov 2021 at 03:34, Joseph Lynch wrote: > For FDE you'd probably have the key file in a tmpfs pulled from a > remote secret manager and when the machine boots it mounts the > encrypted partition that contains your data files. I'm not aware of > anyone doing FDE with a password in production. If you wanted > selective encryption it would make sense to me to support placing > keyspaces on different data directories (this may already be possible) > but since crypto in the kernel is so cheap I don't know why you'd do > selective encryption. Also I think it's worth noting many hosting > providers (e.g. AWS) just encrypt the disks for you so you can check > the "data is encrypted at rest" box. > > I think Cassandra will be pretty handicapped by being in the JVM which > generally has very slow crypto. I'm slightly concerned that we're > already slow at streaming and compaction, and adding slow JVM crypto > will make C* even less competitive. For example, if we have to disable > full sstable streaming (zero copy or otherwise) I think that would be > very unfortunate (although Bowen's approach of sharing one secret > across the cluster and then having files use a key derivation function > may avoid that). Maybe if we did something like CASSANDRA-15294 [1] to > try to offload to native crypto like how internode networking did with > tcnative to fix the perf issues with netty TLS with JVM crypto I'd > feel a little less concerned but ... crypto that is both secure and > performant in the JVM is a hard problem ... > > I guess I'm just concerned we're going to introduce something that is > either insecure or too slow to be useful. > > -Joey > > On Tue, Nov 16, 2021 at 8:10 AM Bowen Song wrote: > > > > I don't like the idea that FDE Full Disk Encryption as an > > alternative to application managed encryption at rest. Each has > > their own advantages
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
On Thu, Nov 18, 2021 at 7:23 PM Kokoori, Shylaja wrote: > To address Joey's concern, the OpenJDK JVM and its derivatives optimize > Java crypto based on the underlying HW capabilities. For example, if the > underlying HW supports AES-NI, JVM intrinsics will use those for crypto > operations. Likewise, the new vector AES available on the latest Intel > platform is utilized by the JVM while running on that platform to make > crypto operations faster. > Which JDK version were you running? We have had a number of issues with the JVM being 2-10x slower than native crypto on Java 8 (especially MD5, SHA1, and AES-GCM) and to a lesser extent Java 11 (usually ~2x slower). Again I think we could get the JVM crypto penalty down to ~2x native if we linked in e.g. ACCP by default [1, 2] but even the very best Java crypto I've seen (fully utilizing hardware instructions) is still ~2x slower than native code. The operating system has a number of advantages here in that they don't pay JVM allocation costs or the JNI barrier (in the case of ACCP) and the kernel also takes advantage of hardware instructions. > From our internal experiments, we see single digit % regression when > transparent data encryption is enabled. > Which workloads are you testing and how are you measuring the regression? I suspect that compaction, repair (validation compaction), streaming, and quorum reads are probably much slower (probably ~10x slower for the throughput bound operations and ~2x slower on the read path). As compaction/repair/streaming usually take up between 10-20% of available CPU cycles making them 2x slower might show up as <10% overall utilization increase when you've really regressed 100% or more on key metrics (compaction throughput, streaming throughput, memory allocation rate, etc ...). For example, if compaction was able to achieve 2 MiBps of throughput before encryption and it was only able to achieve 1MiBps of throughput afterwards, that would be a huge real world impact to operators as compactions now take twice as long. I think a CEP or details on the ticket that indicate the performance tests and workloads that will be run might be wise? Perhaps something like "encryption creates no more than a 1% regression of: compaction throughput (MiBps), streaming throughput (MiBps), repair validation throughput (duration of full repair on the entire cluster), read throughput at 10ms p99 tail at quorum consistency (QPS handled while not exceeding P99 SLO of 10ms), etc ... while a sustained load is applied to a multi-node cluster"? Even a microbenchmark that just sees how long it takes to encrypt and decrypt a 500MiB dataset using the proposed JVM implementation versus encrypting it with a native implementation might be enough to confirm/deny. For example, keypipe (C, [3]) achieves around 2.8 GiBps symmetric of AES-GCM and age (golang, ChaCha20-Poly1305, [4]) achieves about 1.6 GiBps encryption and 1.0 GiBps decryption; from my past experiences with Java crypto is it would achieve maybe 200 MiBps of _non-authenticated_ AES. Cheers, -Joey [1] https://issues.apache.org/jira/browse/CASSANDRA-15294 [2] https://github.com/corretto/amazon-corretto-crypto-provider [3] https://github.com/FiloSottile/age [4] https://github.com/hashbrowncipher/keypipe#encryption
Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> > I've seen this be a significant obstacle for people who want to adopt > Apache Cassandra many times and an insurmountable obstacle on multiple > occasions. From what I've seen, I think this is one of the most watched > tickets with the most "is this coming soon" comments in the project backlog > and it's something we pretty regularly get asked whether we know if/when > it's coming. > I agree encrypted data at rest is a very important feature, but in the six years since the ticket was originally proposed other systems kept getting better at a faster rate, especially easy to use full disk and filesystem encryption. LUKS+LVM in Linux is genuinely excellent and is relatively easy to setup today while that was _not_ true five years ago. > That said, I completely agree that we don't want to be engaging in security > theatre or " introducing something that is either insecure or too slow to > be useful." and I think there are some really good suggestions in this > thread to come up with a strong solution for what will undoubtedly be a > pretty complex and major change. > I think it's important to realize that for us to check the "data is encrypted at rest" box we have to do a lot more than what's currently been implemented. We have to design a pluggable key management system that either retrieves the keys from a remote system (e.g. KMS) or gives some way to load them directly into the process memory (virtual table? or maybe loads them from a tmpfs mounted directory?). We can't just put the key in the yaml file. This will also affect debuggability since we have to encrypt every file that is ever produced by Cassandra including logs (which contain primary keys) and heap dumps which are vital to debugging so we'll have to ship custom tools to decrypt those things so humans can actually read them to debug problems. If our primary goal is facilitating our users in being compliant with encryption at rest policies, I believe it is much easier to check that box by encrypting the entire disk or filesystem than building partial solutions into Cassandra. -Joey