Re: [DISCUSS] CEP-56 Spark Bulk Reading from Cassandra Backup Uploaded to Object Storage

mapyourown Sat, 18 Oct 2025 11:55:31 -0700

+1 I think this is a good feature to have. However, my question is whether
we need additional parameters to connect to the S3 bucket such as object
key, IAM credentials with read access, or access key and secret key to
enable the connection and access the S3 bucket. And do think it is better
to use the Java AWS SDK, as it has a better feature to handle the
connection. Note sure how much detail we will discuss in the initial
feature request.


On Tue, Oct 7, 2025 at 12:52 PM James Berragan <[email protected]> wrote:

> +1! This is something I always hoped we would get to with Analytics. +1 on
> creating a new DataLayer, it would be good to flesh out in a bit more
> detail how the SSTableKey will keep it flexible for different backup
> layouts. I think something also to consider is that many people will
> encrypt backups in S3 (sometimes at an individual SSTable or file level).
>
> On Tue, 7 Oct 2025 at 10:24, Liu Cao <[email protected]> wrote:
>
>> Hi fellow Cassandra devs,
>>
>> I'd like to propose CEP-56: Spark Bulk Reading from Cassandra Backup
>> Uploaded to Object Storage
>>
>> This is about enabling cassandra-analytics to perform bulk reading from
>> Cassandra snapshot backups stored in object storage like S3. This approach
>> aims to decouple bulk reading from the online cassandra cluster (including
>> side-car), providing full isolation and predictable performance for
>> analytics.
>>
>> The initial object storage support would be AWS S3. Please help review
>> the new public interfaces and example usage and provide any feedback as
>> needed.
>>
>>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-56%3A+Spark+Bulk+Reading+from+Cassandra+Backup+Uploaded+to+Object+Storage
>>
>>
>> Best Regards,
>>
>> --
>>
>> Liu Cao
>>
>>
>>

Re: [DISCUSS] CEP-56 Spark Bulk Reading from Cassandra Backup Uploaded to Object Storage

Reply via email to