Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-04-11 Thread Jon Haddad
Right, exactly. Which (I think) makes the object store about as valuable as an ephemeral disk if you don't keep everything on there. It's a tradeoff I'd never use given the cost / benefit. Does that mean you agree that we should focus on writethrough cache mode first? Jon On Fri, Apr 11, 202

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-04-11 Thread Jeff Jirsa
> On Apr 11, 2025, at 1:15 PM, Jon Haddad wrote: > > > I also keep running up against my concern about treating object store as a > write back cache instead of write through. "Tiering" data off has real > consequences for the user, the big one being data loss, especially with > regards to

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-04-11 Thread Jon Haddad
I've been thinking about this a bit more recently, and I think Joey's suggestion about improving the yaml based disk configuration is a better first step than what I wrote (table definition), for a couple reasons. 1. Attaching it to the schema means we need to have the disk configuration as part o

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-08 Thread Jon Haddad
I really like the data directories and replication configuration. I think it makes a ton of sense to put it in the yaml, but we should probably yaml it all, and not nest JSON :), and we can probably simplify it a little with a file uri scheme, something like this: data_file_locations: disk:

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-08 Thread Štefan Miklošovič
That is cool but this still does not show / explain how it would look like when it comes to dependencies needed for actually talking to storages like s3. Maybe I am missing something here and please explain when I am mistaken but If I understand that correctly, for talking to s3 we would need to u

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-08 Thread Joseph Lynch
Jon, I like where you are headed with that, just brainstorming out what the end interface might look like (might be getting a bit ahead of things talking about directories if we don't even have files implemented yet). What do folks think about pairing data_file_locations (fka data_file_directories)

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-08 Thread Jon Haddad
Taking this a step further, this opens up is a different way of bootstrapping new nodes, by using the object store's native copy commands. This is something that's impossible when just using the filesystem mount. I think Joey mentioned something along these lines to me several years ago, maybe at t

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-08 Thread Jon Haddad
Thanks Jordan and Joey for the additional info. One thing I'd like to clarify - what I'm mostly after is 100% of my data on object store, local disk acting as a LRU cache, although there's also a case for the mirror. What I see so far are three high level ways of running this: 1. Mirror Mode Th

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-08 Thread Joseph Lynch
Great discussion - I agree strongly with Jon's points, giving operators this option will make many operator's lives easier. Even if you still have to have 100% disk space to meet performance requirements, that's still much more efficient than you can run C* with just disks (as you need to leave hea

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
Because an earlier reply hinted that mounting a bucket yields "terrible results". That has moved the discussion, in my mind, practically to the place of "we are not going to do this", to which I explained that in this particular case I do not find the speed important, because the use cases you want

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jordan West
I too initially felt we should just use mounts and was excited by e.g. Single Zone Express mounting. As Cheng mentioned we tried it…and the results were disappointing (except for use cases who could sometimes tolerate seconds of p99 latency. That brought me around to needing an implementation we ow

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jon Haddad
Supporting a filesystem mount is perfectly reasonable. If you wanted to use that with the S3 mount, there's nothing that should prevent you from doing so, and the filesystem version is probably the default implementation that we'd want to ship with since, to your point, it doesn't require additiona

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jon Haddad
If that's not your intent, then you should be more careful with your replies. When you write something like this: > While this might work, what I find tricky is that we are forcing this to users. Not everybody is interested in putting everything to a bucket and server traffic from that. They just

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
I was explaining multiple times (1) that I don't have anything against what is discussed here. Having questions about what that is going to look like does not mean I am dismissive. (1) https://lists.apache.org/thread/ofh2q52p92cr89wh2l3djsm5n9dmzzsg On Fri, Mar 7, 2025 at 5:44 PM Jon Haddad wro

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
The only way I see that working is that, if everything was in a bucket, if you take a snapshot, these SSTables would be "copied" from live data dir (living in a bucket) to snapshots dir (living in a bucket). Basically, we would need to say "and if you go to take a snapshot on this table, instead of

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Jon Haddad
Nobody is saying you can't work with a mount, and this isn't a conversation about snapshots. Nobody is forcing users to use object storage either. You're making a ton of negative assumptions here about both the discussion, and the people you're having it with. Try to be more open minded. On Fr

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread guo Maxwell
Thank you very much, I'm certainly interested. I'll start working on the update for cep-36 next week. Mick Semb Wever 于2025年3月7日 周五下午7:07写道: > > > On Thu, 6 Mar 2025 at 09:40, Štefan Miklošovič > wrote: > >> That is cool but this still does not show / explain how it would look >> like when it c

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Mick Semb Wever
On Thu, 6 Mar 2025 at 09:40, Štefan Miklošovič wrote: > That is cool but this still does not show / explain how it would look like > when it comes to dependencies needed for actually talking to storages like > s3. > As Benedict writes, dealing with optional dependencies is not hard (and as Jon

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
BTW, snapshots are quite special because these are not "files", they are just hard links. They "materialize" as regular files once underlying SSTables are compacted away. How are you going to hardlink from local storage to an object storage anyway? We will always need to "upload". On Fri, Mar 7, 2

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-07 Thread Štefan Miklošovič
Jon, all "big three" support mounting a bucket locally. That being said, I do not think that completely ditching this possibility for Cassandra working with a mount, e.g. for just uploading snapshots there etc, is reasonable. GCP https://cloud.google.com/storage/docs/cloud-storage-fuse/quickstar

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-06 Thread Joel Shepherd
On 3/6/2025 7:16 AM, Jon Haddad wrote: Assuming everything else is identical, might not matter for S3. However, not every object store has a filesystem mount. Regarding sprawling dependencies, we can always make the provider specific libraries available as a separate download and put them on

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-06 Thread Jon Haddad
Hey Joel, thanks for chiming in! Regarding dependencies - while it's possible to provide pluggable interfaces, the issue I'm concerned about is conflicting versions of transitive dependencies at runtime. For example, I used a java agent that had a different version of snakeyaml, and it ended up b

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-06 Thread Benedict
It anyway seems reasonable to me that we would support multiple FileSystemProvider. So perhaps this is really two problems we’re maybe conflating: 1) a mechanism for dropping jars that can register a FileSystemProvider for Cassandra to utilise2) a way to mark directories (from any provider) as “rem

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-06 Thread Jon Haddad
Assuming everything else is identical, might not matter for S3. However, not every object store has a filesystem mount. Regarding sprawling dependencies, we can always make the provider specific libraries available as a separate download and put them on their own thread with a separate class path.

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-06 Thread Benedict
I think another way of saying what Stefan may be getting at is: what does a library give us that an appropriately configured mount dir doesn’t?We don’t want to treat S3 the same as local disk, but this can be achieved easily with config. Is there some other benefit of direct integration? Well defin

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-05 Thread Mick Semb Wever
. It’s not an area where I can currently dedicate engineering effort. But if > others are interested in contributing a feature like this, I’d see it as > valuable for the project and would be happy to collaborate on > design/architecture/goals. > Jake mentioned 17 months ago a custom FileSys

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-05 Thread Štefan Miklošovič
Scott, what you wrote is all correct, but I have a feeling that both you and Jeff are talking about something different, some other aspect of using that. It seems that I still need to explain myself that I don't consider object storage to be useless, it is as if everybody has to make the point ab

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread C. Scott Andreas
To Jeff’s point on tactical vs. strategic, here’s the big picture for me on object storage:– Object storage is 70% cheaper:Replicated flash block storage is extremely expensive, and more so with compute resources constantly attached. If one were to build a storage platform on top of a cloud provide

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Cheng Wang via dev
I agree with all the points mentioned by Soctt. We are actually very interested to explore the tiered storage for the same reasons above. Our first experiment with S3 single zone express was, unfortunately, awfully slow compared to ephemeral and EBS. On Tue, Mar 4, 2025 at 9:22 PM C. Scott Andreas

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Jon Haddad
I've come around on the rsync vs built in topic, and I think this is the same. Having this managed in process gives us more options for control. I think it's critical that 100% of the data be pushed to the object store. I mentioned this in my email on Dec 15, but nobody directly responded to that

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Štefan Miklošovič
Jeff, when it comes to snapshots, there was already discussion in the other thread I am not sure you are aware of (1), here (2) I am talking about Sidecar + snapshots specifically. One "caveat" of Sidecar is that you actually _need_ sidecar if we ever contemplated Sidecar doing upload / backup (by

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Jeff Jirsa
Mounted dirs give up the opportunity to change the IO model to account for different behaviors. The configurable channel proxy may suffer from the same IO constraints depending on implementation, too. But it may also become viable. The snapshot outside of the mounted file system seems like you’re i

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Štefan Miklošovič
For what it's worth, as it might come to somebody I am rejecting this altogether (which is not the case, all I am trying to say is that we should just think about it more) - it would be cool to know more about the experience of others when it comes to this, maybe somebody already tried to mount and

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread guo Maxwell
If we want to do this, we should wrap the object storage downwards and provide the file system api capabilities upwards (Cassandra layer),if my understanding is correct. Brandon Williams 于2025年3月4日 周二下午9:55写道: > A failing remote api that you are calling and a failing filesystem you > are using h

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Štefan Miklošovič
I would be very cautious about not "reinventing the wheel" here. Are we confident that we implement it in such a way which is bug-free / robust enough? Do we think we can do a better job than the authors of "s3 driver"? If they did a good job (which I assume they did) then failing an operation whil

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Jeff Jirsa
Most obviously, you don’t need to move all components of the sstable to s3, you could keep index + compression offsets locally. On Mar 4, 2025, at 1:46 PM, Štefan Miklošovič wrote:I don't say that using remote object storage is useless. I am just saying that I don't see the difference. I have not

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Brandon Williams
A failing remote api that you are calling and a failing filesystem you are using have different implications. Kind Regards, Brandon On Tue, Mar 4, 2025 at 7:47 AM Štefan Miklošovič wrote: > > I don't say that using remote object storage is useless. > > I am just saying that I don't see the diffe

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Štefan Miklošovič
I don't say that using remote object storage is useless. I am just saying that I don't see the difference. I have not measured that but I can imagine that s3 mounted would use, under the hood, the same calls to s3 api. How else would it be done? You need to talk to remote s3 storage eventually any

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Jeff Jirsa
Mounting an s3 bucket as a directory is an easy but poor implementation of object backed storage for databases Object storage is durable (most data loss is due to bugs not concurrent hardware failures), cheap (can 5-10x cheaper) and ubiquitous. A  huge number of modern systems are object-storage-on

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Štefan Miklošovič
I do not think we need this CEP, honestly. I don't want to diss this unnecessarily but if you mount a remote storage locally (e.g. mounting s3 bucket as if it was any other directory on node's machine), then what is this CEP good for? Not talking about the necessity to put all dependencies to be a

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-03-04 Thread Rolo, Carlos via dev
26 February 2025 14:54 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations EXTERNAL EMAIL - USE CAUTION when clicking links or attachments Is anyone else interested in continuing to discuss this topic? guo Maxwell mailto:

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-02-26 Thread C. Scott Andreas
I’d love to see this implemented — where “this” is a proxy for some notion of support for remote object storage, perhaps usable by compaction strategies like TWCS to migrate data older than a threshold from a local filesystem to remote object.It’s not an area where I can currently dedicate enginee

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2025-02-26 Thread guo Maxwell
Is anyone else interested in continuing to discuss this topic? guo Maxwell 于2024年9月20日周五 09:44写道: > I discussed this offline with Claude, he is no longer working on this. > > It's a pity. I think this is a very valuable thing. Commitlog's archiving > and restore may be able to use the relevant c

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2024-09-19 Thread guo Maxwell
I discussed this offline with Claude, he is no longer working on this. It's a pity. I think this is a very valuable thing. Commitlog's archiving and restore may be able to use the relevant code if it is completed. Patrick McFadin 于2024年9月20日 周五上午2:01写道: > Thanks for reviving this one! > > On Wed

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2024-09-19 Thread Patrick McFadin
Thanks for reviving this one! On Wed, Sep 18, 2024 at 12:06 AM guo Maxwell wrote: > Is there any update on this topic? It seems that things can make a big > progress if Jake Luciani can find someone who can make the > FileSystemProvider code accessible. > > Jon Haddad 于2023年12月16日周六 05:29写道:

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2024-09-18 Thread guo Maxwell
Is there any update on this topic? It seems that things can make a big progress if Jake Luciani can find someone who can make the FileSystemProvider code accessible. Jon Haddad 于2023年12月16日周六 05:29写道: > At a high level I really like the idea of being able to better leverage > cheaper storage

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-15 Thread Jon Haddad
At a high level I really like the idea of being able to better leverage cheaper storage especially object stores like S3. One important thing though - I feel pretty strongly that there's a big, deal breaking downside. Backups, disk failure policies, snapshots and possibly repairs would get more

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-12-14 Thread Claude Warren
Is there still interest in this? Can we get some points down on electrons so that we all understand the issues? While it is fairly simple to redirect the read/write to something other than the local system for a single node this will not solve the problem for tiered storage. Tiered storage w

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-31 Thread Claude Warren, Jr via dev
@henrik, Have you made any progress on this? I would like to help drive it forward but I am waiting to see what your code looks like and figure out what I need to do. Any update on timeline would be appreciated. On Mon, Oct 23, 2023 at 9:07 PM Jon Haddad wrote: > I think this is a great more

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-23 Thread Jon Haddad
I think this is a great more generally useful than the two scenarios you've outlined. I think it could / should be possible to use an object store as the primary storage for sstables and rely on local disk as a cache for reads. I don't know the roadmap for TCM, but imo if it allowed for more

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-19 Thread Claude Warren, Jr via dev
nd yesterday. >>>>> >>>>> Henrik, How does your system work? What is the design strategy? >>>>> Also is your code available somewhere? >>>>> >>>>> After looking at the code some more I think that the best solution is >

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread guo Maxwell
Also >>>> is your code available somewhere? >>>> >>>> After looking at the code some more I think that the best solution is >>>> not a FileChannelProxy but to modify the Cassandra File class to get a >>>> FileSystem object for a Factory

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread Claude Warren, Jr via dev
that this makes if very small change that will pick up >>> 90+% of the cases. We then just need to find the edge cases. >>> >>> >>> >>> >>> >>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev < >>> dev@cassandra.

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-18 Thread Claude Warren, Jr via dev
gt; On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev < >> dev@cassandra.apache.org> wrote: >> >>> Super excited about this as well. Happy to help test with Azure and any >>> other way needed. >>> >>> Thanks, >>> German >&g

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-10-10 Thread Claude Warren, Jr via dev
;> -- >> *From:* guo Maxwell >> *Sent:* Wednesday, September 27, 2023 7:38 PM >> *To:* dev@cassandra.apache.org >> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy >> to alias external storage locations >> >

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-28 Thread Claude Warren, Jr via dev
; *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable ChannelProxy > to alias external storage locations > > Thanks , So I think a jira can be created now. And I'd be happy to provide > some help with this as well if needed. > > Henrik Ingo 于2023年9月28日周四

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-28 Thread German Eichberger via dev
ChannelProxy to alias external storage locations Thanks , So I think a jira can be created now. And I'd be happy to provide some help with this as well if needed. Henrik Ingo mailto:henrik.i...@datastax.com>> 于2023年9月28日周四 00:21写道: It seems I was volunteered to rebase the Astra implementat

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread Jeff Jirsa
_ From: Jake Luciani <jak...@gmail.com> Sent: Tuesday, September 26, 2023 19:03 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations NetApp Security WARNING: This is an external email. Do not click links or

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread guo Maxwell
Thanks , So I think a jira can be created now. And I'd be happy to provide some help with this as well if needed. Henrik Ingo 于2023年9月28日周四 00:21写道: > It seems I was volunteered to rebase the Astra implementation of this > functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-27 Thread Henrik Ingo
It seems I was volunteered to rebase the Astra implementation of this functionality (FileSystemProvider) onto Cassandra trunk. (And publish it, of course) I'll try to get going today or tomorrow, so that this discussion can then benefit from having that code available for inspection. And potentiall

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev
we might "compact" "TWCS tables" > automatically after so-and-so period by moving them there. > > > From: Jake Luciani > Sent: Tuesday, September 26, 2023 19:03 > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS]

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Miklosovic, Stefan
___ From: Jake Luciani Sent: Tuesday, September 26, 2023 19:03 To: dev@cassandra.apache.org Subject: Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations NetApp Security WARNING: This is an external email. Do not click links or open attachments u

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Jake Luciani
We (DataStax) have a FileSystemProvider for Astra we can provide. Works with S3/GCS/Azure. I'll ask someone on our end to make it accessible. This would work by having a bucket prefix per node. But there are lots of details needed to support things like out of bound compaction (mentioned in CEP).

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Benedict
I agree with Ariel, the more suitable insertion point is probably the JDK level FileSystemProvider and FileSystem abstraction. It might also be that we can reuse existing work here in some cases? > On 26 Sep 2023, at 17:49, Ariel Weisberg wrote: > >  > Hi, > > Support for multiple storage ba

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Ariel Weisberg
Hi, Support for multiple storage backends including remote storage backends is a pretty high value piece of functionality. I am happy to see there is interest in that. I think that `ChannelProxyFactory` as an integration point is going to quickly turn into a dead end as we get into really usin

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread guo Maxwell
Yeah, there is so much things to do as cassandra (share-nothing) is different from some other system like hbase , So I think we can break the final goal into multiple steps. first is what Claude proposed. But I suggest that this design can make the interface more scalable and we can consider the im

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Josh McKenzie
> it may be better to support most cloud storage > It simply only supports S3, which feels a bit customized for a certain user > and is not universal enough.Am I right ? I agree w/the eventual goal (and constraint on design now) of supporting most popular cloud storage vendors, but if we have som

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread Claude Warren, Jr via dev
The intention of the CEP is to lay the groundwork to allow development of ChannelProxyFactories that are pluggable in Cassandra. In this way any storage system can be a candidate for Cassandra storage provided FileChannels can be created for the system. As I stated before I think that there may b

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-26 Thread guo Maxwell
In my mind , it may be better to support most cloud storage : aws, azure,gcp,aliyun and so on . We may make it a plugable. But in that way, it seems there may need a filesystem interface layer for object storage. And should we support ,distributed system like hdfs ,or something else. We should firs

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Claude Warren, Jr via dev
My intention is to develop an S3 storage system using https://github.com/carlspring/s3fs-nio There are several issues yet to be solved: 1. There are some internal calls that create files in the table directory that do not use the channel proxy. I believe that these are making calls on F

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread guo Maxwell
"Rather than building this piece by piece, I think it'd be awesome if someone drew up an end-to-end plan to implement tiered storage, so we can make sure we're discussing the whole final state, and not an implementation detail of one part of the final state?" Do agree with jeff for this ~~~ If the

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Jeff Jirsa
- I think this is a great step forward. - Being able to move sstables around between tiers of storage is a feature Cassandra desperately needs, especially if one of those tiers is some sort of object storage - This looks like it's a foundational piece that enables that. Perhaps by a team that's alr

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread Claude Warren, Jr via dev
external storage can be any storage that you can produce a FileChannel for. There is an S3 library that does this so S3 is a definite possibility for storage in this solution. My example code only writes to a different directory on the same system. And there are a couple of places where I did no

Re: [DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-25 Thread guo Maxwell
Great suggestion, Can external storage only be local storage media? Or can it be stored in any storage medium, such as object storage s3 ? We have previously implemented a tiered storage capability, that is, there are multiple storage media on one node, SSD, HDD, and data placement based on reques

[DISCUSS] CEP-36: A Configurable ChannelProxy to alias external storage locations

2023-09-24 Thread Claude Warren, Jr via dev
I have just filed CEP-36 [1] to allow for keyspace/table storage outside of the standard storage space. There are two desires driving this change: 1. The ability to temporarily move some keyspaces/tables to storage outside the normal directory tree to other disk so that compaction can o