Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread guo Maxwell
In my mind , it may be better to support most cloud storage : aws,
azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
seems there may need a filesystem interface layer for object storage. And
should we support ,distributed system like hdfs ,or something else. We
should first discuss what should be done and what should not be done. It
simply only supports S3, which feels a bit customized for a certain user
and is not universal enough.Am I right ?

Claude Warren, Jr  于2023年9月26日周二 14:36写道:

> My intention is to develop an S3 storage system using
> https://github.com/carlspring/s3fs-nio
>
> There are several issues yet to be solved:
>
>1. There are some internal calls that create files in the table
>directory that do not use the channel proxy.  I believe that these are
>making calls on File objects.  I think those File objects are Cassandra
>File objects not Java I/O File objects, but am unsure.
>2. Determine if the carlspring s3fs-nio library will be performant
>enough to work in the long run.  There may be issues with it:
>   1. Downloading entire files before using them rather than using
>   views into larger remotely stored files.
>   2. Requiring a complete file to upload rather than using the
>   partial upload capability of the S3 interface.
>
>
>
> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>
>> "Rather than building this piece by piece, I think it'd be awesome if
>> someone drew up an end-to-end plan to implement tiered storage, so we can
>> make sure we're discussing the whole final state, and not an implementation
>> detail of one part of the final state?"
>>
>> Do agree with jeff for this ~~~ If these feature can be supported in oss
>> cassandra , I think it will be very popular, whether in  a private
>> deployment environment or a public cloud service (our experience can prove
>> it). In addition, it is also a cost-cutting option for users too
>>
>> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>>
>>>
>>> - I think this is a great step forward.
>>> - Being able to move sstables around between tiers of storage is a
>>> feature Cassandra desperately needs, especially if one of those tiers is
>>> some sort of object storage
>>> - This looks like it's a foundational piece that enables that. Perhaps
>>> by a team that's already implemented this end to end?
>>> - Rather than building this piece by piece, I think it'd be awesome if
>>> someone drew up an end-to-end plan to implement tiered storage, so we can
>>> make sure we're discussing the whole final state, and not an implementation
>>> detail of one part of the final state?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
 I have just filed CEP-36 [1] to allow for keyspace/table storage
 outside of the standard storage space.

 There are two desires  driving this change:

1. The ability to temporarily move some keyspaces/tables to storage
outside the normal directory tree to other disk so that compaction can
occur in situations where there is not enough disk space for compaction 
 and
the processing to the moved data can not be suspended.
2. The ability to store infrequently used data on slower cheaper
storage layers.

 I have a working POC implementation [2] though there are some issues
 still to be solved and much logging to be reduced.

 I look forward to productive discussions,
 Claude

 [1]
 https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
 [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory



>>
>> --
>> you are the apple of my eye !
>>
>

-- 
you are the apple of my eye !


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev
The intention of the CEP is to lay the groundwork to allow development of
ChannelProxyFactories that are pluggable in Cassandra.  In this way any
storage system can be a candidate for Cassandra storage provided
FileChannels can be created for the system.

As I stated before I think that there may be a need for a
java.nio.FileSystem implementation for  the proxies but I have not had the
time to dig into it yet.

Claude


On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:

> In my mind , it may be better to support most cloud storage : aws,
> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
> seems there may need a filesystem interface layer for object storage. And
> should we support ,distributed system like hdfs ,or something else. We
> should first discuss what should be done and what should not be done. It
> simply only supports S3, which feels a bit customized for a certain user
> and is not universal enough.Am I right ?
>
> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>
>> My intention is to develop an S3 storage system using
>> https://github.com/carlspring/s3fs-nio
>>
>> There are several issues yet to be solved:
>>
>>1. There are some internal calls that create files in the table
>>directory that do not use the channel proxy.  I believe that these are
>>making calls on File objects.  I think those File objects are Cassandra
>>File objects not Java I/O File objects, but am unsure.
>>2. Determine if the carlspring s3fs-nio library will be performant
>>enough to work in the long run.  There may be issues with it:
>>   1. Downloading entire files before using them rather than using
>>   views into larger remotely stored files.
>>   2. Requiring a complete file to upload rather than using the
>>   partial upload capability of the S3 interface.
>>
>>
>>
>> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>>
>>> "Rather than building this piece by piece, I think it'd be awesome if
>>> someone drew up an end-to-end plan to implement tiered storage, so we can
>>> make sure we're discussing the whole final state, and not an implementation
>>> detail of one part of the final state?"
>>>
>>> Do agree with jeff for this ~~~ If these feature can be supported in oss
>>> cassandra , I think it will be very popular, whether in  a private
>>> deployment environment or a public cloud service (our experience can prove
>>> it). In addition, it is also a cost-cutting option for users too
>>>
>>> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>>>

 - I think this is a great step forward.
 - Being able to move sstables around between tiers of storage is a
 feature Cassandra desperately needs, especially if one of those tiers is
 some sort of object storage
 - This looks like it's a foundational piece that enables that. Perhaps
 by a team that's already implemented this end to end?
 - Rather than building this piece by piece, I think it'd be awesome if
 someone drew up an end-to-end plan to implement tiered storage, so we can
 make sure we're discussing the whole final state, and not an implementation
 detail of one part of the final state?






 On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
 dev@cassandra.apache.org> wrote:

> I have just filed CEP-36 [1] to allow for keyspace/table storage
> outside of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to
>storage outside the normal directory tree to other disk so that 
> compaction
>can occur in situations where there is not enough disk space for 
> compaction
>and the processing to the moved data can not be suspended.
>2. The ability to store infrequently used data on slower cheaper
>storage layers.
>
> I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>
>>>
>>> --
>>> you are the apple of my eye !
>>>
>>
>
> --
> you are the apple of my eye !
>


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Josh McKenzie
> it may be better to support most cloud storage
> It simply only supports S3, which feels a bit customized for a certain user 
> and is not universal enough.Am I right ?
I agree w/the eventual goal (and constraint on design now) of supporting most 
popular cloud storage vendors, but if we have someone with an itch to scratch 
and at the end of that we end up with first steps in a compatible direction to 
ultimately supporting decoupled / abstracted storage systems, that's fantastic.

To Jeff's point - so long as we can think about and chart a general path of 
where we want to go, if Claude has the time and inclination to handle 
abstracting out the API in that direction and one implementation, that's 
fantastic IMO.

I know there's some other folks out there who've done some interception / 
refactoring of the FileChannel stuff to support disaggregated storage; curious 
what their experiences were like.


On Tue, Sep 26, 2023, at 4:20 AM, Claude Warren, Jr via dev wrote:
> The intention of the CEP is to lay the groundwork to allow development of 
> ChannelProxyFactories that are pluggable in Cassandra.  In this way any 
> storage system can be a candidate for Cassandra storage provided FileChannels 
> can be created for the system. 
> 
> As I stated before I think that there may be a need for a java.nio.FileSystem 
> implementation for  the proxies but I have not had the time to dig into it 
> yet.
> 
> Claude
> 
> 
> On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:
>> In my mind , it may be better to support most cloud storage : aws, 
>> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it 
>> seems there may need a filesystem interface layer for object storage. And 
>> should we support ,distributed system like hdfs ,or something else. We 
>> should first discuss what should be done and what should not be done. It 
>> simply only supports S3, which feels a bit customized for a certain user and 
>> is not universal enough.Am I right ?
>> 
>> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>>> My intention is to develop an S3 storage system using  
>>> https://github.com/carlspring/s3fs-nio 
>>> 
>>> There are several issues yet to be solved:
>>>  1. There are some internal calls that create files in the table directory 
>>> that do not use the channel proxy.  I believe that these are making calls 
>>> on File objects.  I think those File objects are Cassandra File objects not 
>>> Java I/O File objects, but am unsure.
>>>  2. Determine if the carlspring s3fs-nio library will be performant enough 
>>> to work in the long run.  There may be issues with it:
>>>1. Downloading entire files before using them rather than using views 
>>> into larger remotely stored files.
>>>2. Requiring a complete file to upload rather than using the partial 
>>> upload capability of the S3 interface.
>>> 
>>> 
>>> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
 "Rather than building this piece by piece, I think it'd be awesome if 
 someone drew up an end-to-end plan to implement tiered storage, so we can 
 make sure we're discussing the whole final state, and not an 
 implementation detail of one part of the final state?"
 
 Do agree with jeff for this ~~~ If these feature can be supported in oss 
 cassandra , I think it will be very popular, whether in  a private 
 deployment environment or a public cloud service (our experience can prove 
 it). In addition, it is also a cost-cutting option for users too
 
 Jeff Jirsa  于2023年9月26日周二 00:11写道:
> 
> - I think this is a great step forward. 
> - Being able to move sstables around between tiers of storage is a 
> feature Cassandra desperately needs, especially if one of those tiers is 
> some sort of object storage
> - This looks like it's a foundational piece that enables that. Perhaps by 
> a team that's already implemented this end to end? 
> - Rather than building this piece by piece, I think it'd be awesome if 
> someone drew up an end-to-end plan to implement tiered storage, so we can 
> make sure we're discussing the whole final state, and not an 
> implementation detail of one part of the final state?
> 
> 
> 
> 
> 
> 
> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev 
>  wrote:
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside 
>> of the standard storage space.  
>> 
>> There are two desires  driving this change:
>>  1. The ability to temporarily move some keyspaces/tables to storage 
>> outside the normal directory tree to other disk so that compaction can 
>> occur in situations where there is not enough disk space for compaction 
>> and the processing to the moved data can not be suspended.
>>  2. The ability to store infrequently used data on slower cheaper 
>> storage layers.
>> I have a working POC implementation [2] though there are som

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread guo Maxwell
Yeah, there is so much things to do as cassandra (share-nothing) is
different from some other system like hbase , So I think we can break the
final goal into multiple steps. first is what Claude proposed. But I
suggest that this design can make the interface more scalable and we can
consider the implementation of cloud storage. so that someone can extend
the interface in the future.

Josh McKenzie  于2023年9月26日周二 18:40写道:

> it may be better to support most cloud storage
> It simply only supports S3, which feels a bit customized for a certain
> user and is not universal enough.Am I right ?
>
> I agree w/the eventual goal (and constraint on design now) of supporting
> most popular cloud storage vendors, but if we have someone with an itch to
> scratch and at the end of that we end up with first steps in a compatible
> direction to ultimately supporting decoupled / abstracted storage systems,
> that's fantastic.
>
> To Jeff's point - so long as we can think about and chart a general path
> of where we want to go, if Claude has the time and inclination to handle
> abstracting out the API in that direction and one implementation, that's
> fantastic IMO.
>
> I know there's some other folks out there who've done some interception /
> refactoring of the FileChannel stuff to support disaggregated storage;
> curious what their experiences were like.
>
>
> On Tue, Sep 26, 2023, at 4:20 AM, Claude Warren, Jr via dev wrote:
>
> The intention of the CEP is to lay the groundwork to allow development of
> ChannelProxyFactories that are pluggable in Cassandra.  In this way any
> storage system can be a candidate for Cassandra storage provided
> FileChannels can be created for the system.
>
> As I stated before I think that there may be a need for a
> java.nio.FileSystem implementation for  the proxies but I have not had the
> time to dig into it yet.
>
> Claude
>
>
> On Tue, Sep 26, 2023 at 9:01 AM guo Maxwell  wrote:
>
> In my mind , it may be better to support most cloud storage : aws,
> azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it
> seems there may need a filesystem interface layer for object storage. And
> should we support ,distributed system like hdfs ,or something else. We
> should first discuss what should be done and what should not be done. It
> simply only supports S3, which feels a bit customized for a certain user
> and is not universal enough.Am I right ?
>
> Claude Warren, Jr  于2023年9月26日周二 14:36写道:
>
> My intention is to develop an S3 storage system using
> https://github.com/carlspring/s3fs-nio
>
> There are several issues yet to be solved:
>
>1. There are some internal calls that create files in the table
>directory that do not use the channel proxy.  I believe that these are
>making calls on File objects.  I think those File objects are Cassandra
>File objects not Java I/O File objects, but am unsure.
>2. Determine if the carlspring s3fs-nio library will be performant
>enough to work in the long run.  There may be issues with it:
>1. Downloading entire files before using them rather than using views
>   into larger remotely stored files.
>   2. Requiring a complete file to upload rather than using the
>   partial upload capability of the S3 interface.
>
>
>
> On Tue, Sep 26, 2023 at 4:11 AM guo Maxwell  wrote:
>
> "Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?"
>
> Do agree with jeff for this ~~~ If these feature can be supported in oss
> cassandra , I think it will be very popular, whether in  a private
> deployment environment or a public cloud service (our experience can prove
> it). In addition, it is also a cost-cutting option for users too
>
> Jeff Jirsa  于2023年9月26日周二 00:11写道:
>
>
> - I think this is a great step forward.
> - Being able to move sstables around between tiers of storage is a feature
> Cassandra desperately needs, especially if one of those tiers is some sort
> of object storage
> - This looks like it's a foundational piece that enables that. Perhaps by
> a team that's already implemented this end to end?
> - Rather than building this piece by piece, I think it'd be awesome if
> someone drew up an end-to-end plan to implement tiered storage, so we can
> make sure we're discussing the whole final state, and not an implementation
> detail of one part of the final state?
>
>
>
>
>
>
> On Sun, Sep 24, 2023 at 11:49 PM Claude Warren, Jr via dev <
> dev@cassandra.apache.org> wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
>
> There are two desires  driving this change:
>
>1. The ability to temporarily move some keyspaces/tables to storage
>outside the normal directory tree to other disk so that compaction can
>occur in situati

CASSANDRA-18773 compaction speedup

2023-09-26 Thread Miklosovic, Stefan
Hi list,

there is CASSANDRA-18773 we want to merge to 4.0 up to trunk (hence it will be 
in 5.0 (alpha2)) and I want to be sure we are all OK with that (especially for 
that 5.0 alpha release).

The patch is significantly speeding up the compaction throughput for cases when 
you have a lot of SSTables in a key-value table without secondary index.

My colleague Cameron Zemek has identified and fixed the issue together with 
help of Branimir Lambov. 

It is a little bit hard to believe but for cases when your table contains 
thousands of SSTables and it does not have any 2i's, (tested on around cca 2500 
SSTables), we saw the speedup of 50x (fifty times) on compaction throughput for 
major compactions. It is also, reportedly, affecting operations when switching 
from STCS to LCS.

As mentioned, we plan to merge this to 4.0, 4.1, 5.0 and trunk.

Any objections to that?

Regards

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Ariel Weisberg
Hi,

Support for multiple storage backends including remote storage backends is a 
pretty high value piece of functionality. I am happy to see there is interest 
in that.

I think that `ChannelProxyFactory` as an integration point is going to quickly 
turn into a dead end as we get into really using multiple storage backends. We 
need to be able to list files and really the full range of filesystem 
interactions that Java supports should work with any backend to make 
development, testing, and using existing code straightforward.

It's a little more work to get C* to creates paths for alternate backends where 
appropriate, but that works is probably necessary even with 
`ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
Fileystems). There will probably also be backend specific behaviors that show 
up above the `ChannelProxy` layer that will depend on the backend.

Ideally there would be some config to specify several backend filesystems and 
their individual configuration that can be used, as well as configuration and 
support for a "backend file router" for file creation (and opening) that can be 
used to route files to the backend most appropriate.

Regards,
Ariel

On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.  
> 
> There are two desires  driving this change:
>  1. The ability to temporarily move some keyspaces/tables to storage outside 
> the normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
>  2. The ability to store infrequently used data on slower cheaper storage 
> layers.
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
> 
> I look forward to productive discussions,
> Claude
> 
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory 
> 
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Benedict
I agree with Ariel, the more suitable insertion point is probably the JDK level 
FileSystemProvider and FileSystem abstraction.

It might also be that we can reuse existing work here in some cases?

> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> 
> 
> Hi,
> 
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
> 
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
> 
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
> 
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
> 
> Regards,
> Ariel
> 
>> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
>> the standard storage space.  
>> 
>> There are two desires  driving this change:
>> The ability to temporarily move some keyspaces/tables to storage outside the 
>> normal directory tree to other disk so that compaction can occur in 
>> situations where there is not enough disk space for compaction and the 
>> processing to the moved data can not be suspended.
>> The ability to store infrequently used data on slower cheaper storage layers.
>> I have a working POC implementation [2] though there are some issues still 
>> to be solved and much logging to be reduced.
>> 
>> I look forward to productive discussions,
>> Claude
>> 
>> [1] 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory 
>> 
>> 
> 


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Jake Luciani
We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


-- 
http://twitter.com/tjake


Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Miklosovic, Stefan
Would it be possible to make Jimfs integration production-ready then? I see we 
are using it in the tests already.

It might be one of the reference implementations of this CEP. If there is a 
type of workload / type of nodes with plenty of RAM but no disk, some kind of 
compute nodes, it would just hold it all in memory and we might "flush" it to a 
cloud-based storage if rendered to be not necessary anymore (whatever that 
means).

We could then completely bypass the memtables as fetching data from an SSTable 
from memory would be basically roughly same?

On the other hand, that might be achieved by creating a ramdisk so I am not 
sure what exactly we would gain here. However, if it was eventually storing 
these SSTables in a cloud storage, we might "compact" "TWCS tables" 
automatically after so-and-so period by moving them there.


From: Jake Luciani 
Sent: Tuesday, September 26, 2023 19:03
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external 
storage locations

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.




We (DataStax) have a FileSystemProvider for Astra we can provide.
Works with S3/GCS/Azure.

I'll ask someone on our end to make it accessible.

This would work by having a bucket prefix per node. But there are lots
of details needed to support things like out of bound compaction
(mentioned in CEP).

Jake

On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
>
> I agree with Ariel, the more suitable insertion point is probably the JDK 
> level FileSystemProvider and FileSystem abstraction.
>
> It might also be that we can reuse existing work here in some cases?
>
> On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
>
> 
> Hi,
>
> Support for multiple storage backends including remote storage backends is a 
> pretty high value piece of functionality. I am happy to see there is interest 
> in that.
>
> I think that `ChannelProxyFactory` as an integration point is going to 
> quickly turn into a dead end as we get into really using multiple storage 
> backends. We need to be able to list files and really the full range of 
> filesystem interactions that Java supports should work with any backend to 
> make development, testing, and using existing code straightforward.
>
> It's a little more work to get C* to creates paths for alternate backends 
> where appropriate, but that works is probably necessary even with 
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple 
> Fileystems). There will probably also be backend specific behaviors that show 
> up above the `ChannelProxy` layer that will depend on the backend.
>
> Ideally there would be some config to specify several backend filesystems and 
> their individual configuration that can be used, as well as configuration and 
> support for a "backend file router" for file creation (and opening) that can 
> be used to route files to the backend most appropriate.
>
> Regards,
> Ariel
>
> On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>
> I have just filed CEP-36 [1] to allow for keyspace/table storage outside of 
> the standard storage space.
>
> There are two desires  driving this change:
>
> The ability to temporarily move some keyspaces/tables to storage outside the 
> normal directory tree to other disk so that compaction can occur in 
> situations where there is not enough disk space for compaction and the 
> processing to the moved data can not be suspended.
> The ability to store infrequently used data on slower cheaper storage layers.
>
> I have a working POC implementation [2] though there are some issues still to 
> be solved and much logging to be reduced.
>
> I look forward to productive discussions,
> Claude
>
> [1] 
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>
>
>


--
http://twitter.com/tjake


Re: [DISCUSS] Backport CASSANDRA-18816 to 5.0? Add support for repair coordinator to retry messages that timeout

2023-09-26 Thread David Capwell
Thanks all for the feedback!  The patch has 2 +1s on trunk and back ported to 
5.0, making sure it’s stable now; I plan to merge early this week.

> On Sep 21, 2023, at 2:07 PM, Ekaterina Dimitrova  
> wrote:
> 
> +1 from me too. Moreover, this work has started as part of the test efforts 
> and identifying weak points during the 4.0 testing, if I recall correctly. 
> 5.0 sounds like a good place to land. Thank you David and everyone else 
> involved for your efforts!
> 
> On Thu, 21 Sep 2023 at 1:01, Berenguer Blasi  > wrote:
>> +1 I agree with Brandon. It's more like a bug imo.
>> On 20/9/23 21:42, Caleb Rackliffe wrote:
>>> +1 on a 5.0 backport
>>> 
>>> On Wed, Sep 20, 2023 at 2:26 PM Brandon Williams >> > wrote:
 I think it could be argued that not retrying messages is a bug, I am
 +1 on including this in 5.0.
 
 Kind Regards,
 Brandon
 
 On Tue, Sep 19, 2023 at 1:16 PM David Capwell >>> > wrote:
 >
 > To try to get repair more stable, I added optional retry logic (patch is 
 > still in review) to a handful of critical repair verbs.  This patch is 
 > disabled by default but allows you to opt-in to retries so ephemeral 
 > issues don’t cause a repair to fail after running for a long time 
 > (assuming they resolve within the retry window). There are 2 protocol 
 > level changes to enable this: VALIDATION_RSP and SYNC_RSP now send an 
 > ACK (if the sender doesn’t attach a callback, these ACKs get ignored in 
 > all versions; see org.apache.cassandra.net 
 > .ResponseVerbHandler#doVerb and 
 > Verb.REPAIR_RSP).  Given that we have already forked, I believe we would 
 > need to give a waiver to allow this patch due to this change.
 >
 > The patch was written on trunk, but figured back porting 5.0 would be 
 > rather trivial and this was brought up during the review, so floating 
 > this to a wider audience.
 >
 > If you look at the patch you will see that it is very large, but this is 
 > only to make testing of repair coordination easier and deterministic, 
 > the biggest code changes are:
 >
 > 1) Moving from ActiveRepairService.instance to 
 > ActiveRepairService.instance() (this is the main reason so many files 
 > were touched; this was needed so unit tests don’t load the whole world)
 > 2) Repair no longer reaches into global space and instead is provided 
 > the subsystems needed to perform repair; this change is local to repair 
 > code
 >
 > Both of these changes were only for testing as they allow us to simulate 
 > 1k repairs in around 15 seconds with 100% deterministic execution.



Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev
I spent a little (very little) time building an S3 implementation using an
Apache licensed S3 filesystem package.  I have not yet tested it but if
anyone is interested it is at
https://github.com/Aiven-Labs/S3-Cassandra-ChannelProxy

In looking at some of the code I think the Cassandra File class needs to be
modified to ask the ChannelProxy for the default file system for the file
in question.  This should resolve some of the issues my original demo has
with some files being created in the data tree.  It may also handle many of
the cases for offline tools as well.


On Tue, Sep 26, 2023 at 7:33 PM Miklosovic, Stefan <
stefan.mikloso...@netapp.com> wrote:

> Would it be possible to make Jimfs integration production-ready then? I
> see we are using it in the tests already.
>
> It might be one of the reference implementations of this CEP. If there is
> a type of workload / type of nodes with plenty of RAM but no disk, some
> kind of compute nodes, it would just hold it all in memory and we might
> "flush" it to a cloud-based storage if rendered to be not necessary anymore
> (whatever that means).
>
> We could then completely bypass the memtables as fetching data from an
> SSTable from memory would be basically roughly same?
>
> On the other hand, that might be achieved by creating a ramdisk so I am
> not sure what exactly we would gain here. However, if it was eventually
> storing these SSTables in a cloud storage, we might "compact" "TWCS tables"
> automatically after so-and-so period by moving them there.
>
> 
> From: Jake Luciani 
> Sent: Tuesday, September 26, 2023 19:03
> To: dev@cassandra.apache.org
> Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias
> external storage locations
>
> NetApp Security WARNING: This is an external email. Do not click links or
> open attachments unless you recognize the sender and know the content is
> safe.
>
>
>
>
> We (DataStax) have a FileSystemProvider for Astra we can provide.
> Works with S3/GCS/Azure.
>
> I'll ask someone on our end to make it accessible.
>
> This would work by having a bucket prefix per node. But there are lots
> of details needed to support things like out of bound compaction
> (mentioned in CEP).
>
> Jake
>
> On Tue, Sep 26, 2023 at 12:56 PM Benedict  wrote:
> >
> > I agree with Ariel, the more suitable insertion point is probably the
> JDK level FileSystemProvider and FileSystem abstraction.
> >
> > It might also be that we can reuse existing work here in some cases?
> >
> > On 26 Sep 2023, at 17:49, Ariel Weisberg  wrote:
> >
> > 
> > Hi,
> >
> > Support for multiple storage backends including remote storage backends
> is a pretty high value piece of functionality. I am happy to see there is
> interest in that.
> >
> > I think that `ChannelProxyFactory` as an integration point is going to
> quickly turn into a dead end as we get into really using multiple storage
> backends. We need to be able to list files and really the full range of
> filesystem interactions that Java supports should work with any backend to
> make development, testing, and using existing code straightforward.
> >
> > It's a little more work to get C* to creates paths for alternate
> backends where appropriate, but that works is probably necessary even with
> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
> Fileystems). There will probably also be backend specific behaviors that
> show up above the `ChannelProxy` layer that will depend on the backend.
> >
> > Ideally there would be some config to specify several backend
> filesystems and their individual configuration that can be used, as well as
> configuration and support for a "backend file router" for file creation
> (and opening) that can be used to route files to the backend most
> appropriate.
> >
> > Regards,
> > Ariel
> >
> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
> >
> > I have just filed CEP-36 [1] to allow for keyspace/table storage outside
> of the standard storage space.
> >
> > There are two desires  driving this change:
> >
> > The ability to temporarily move some keyspaces/tables to storage outside
> the normal directory tree to other disk so that compaction can occur in
> situations where there is not enough disk space for compaction and the
> processing to the moved data can not be suspended.
> > The ability to store infrequently used data on slower cheaper storage
> layers.
> >
> > I have a working POC implementation [2] though there are some issues
> still to be solved and much logging to be reduced.
> >
> > I look forward to productive discussions,
> > Claude
> >
> > [1]
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
> >
> >
> >
>
>
> --
> http://twitter.com/tjake
>