+1! This is something I always hoped we would get to with Analytics. +1 on creating a new DataLayer, it would be good to flesh out in a bit more detail how the SSTableKey will keep it flexible for different backup layouts. I think something also to consider is that many people will encrypt backups in S3 (sometimes at an individual SSTable or file level).
On Tue, 7 Oct 2025 at 10:24, Liu Cao <[email protected]> wrote: > Hi fellow Cassandra devs, > > I'd like to propose CEP-56: Spark Bulk Reading from Cassandra Backup > Uploaded to Object Storage > > This is about enabling cassandra-analytics to perform bulk reading from > Cassandra snapshot backups stored in object storage like S3. This approach > aims to decouple bulk reading from the online cassandra cluster (including > side-car), providing full isolation and predictable performance for > analytics. > > The initial object storage support would be AWS S3. Please help review the > new public interfaces and example usage and provide any feedback as needed. > > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-56%3A+Spark+Bulk+Reading+from+Cassandra+Backup+Uploaded+to+Object+Storage > > > Best Regards, > > -- > > Liu Cao > > >
