Henrik and Guo,

Have you moved forward on this topic?  I have not seen anything recently.
I have posted a solution that intercepts calls for directories and injects
directories from different FileSystems.  This means that a node can have
keyspaces both on the local file system and one or more other FileSystem
implementations.

I look forward to hearing from you,
Claude


On Wed, Oct 18, 2023 at 9:00 AM Claude Warren, Jr <claude.war...@aiven.io>
wrote:

> After a bit more analysis and some testing I have a new branch that I
> think solves the problem. [1]  I have also created a pull request internal
> to my clone so that it is easy to see the changes. [2]
>
> The strategy change is to move the insertion of the proxy from the
> Cassandra File class to the Directories class.  This means that all action
> with the table is captured (this solves a problem encountered in the
> earlier strategy).
> The strategy is to create a path on a different FileSystem and return
> that.  The example code only moves the data for the table to another
> directory on the same FileSystem but using a different FileSystem
> implementation should be a trivial change.
>
> The current code works on an entire keyspace.  I, while code exists to
> limit the redirect to a table I have not tested that branch yet and am not
> certain that it will work.  There is also some code (i.e. the PathParser)
> that may no longer be needed but has not been removed yet.
>
> Please take a look and let me know if you see any issues with this
> solution.
>
> Claude
>
> [1] https://github.com/Claudenw/cassandra/tree/FileSystemProxy
> [2] https://github.com/Claudenw/cassandra/pull/5/files
>
>
>
> On Tue, Oct 10, 2023 at 10:28 AM Claude Warren, Jr <claude.war...@aiven.io>
> wrote:
>
>> I have been exploring adding a second Path to the Cassandra File object.
>> The original path being the path within the standard Cassandra directory
>> tree and the second being a translated path when there is what was called a
>> ChannelProxy in place.
>>
>> A problem arises when the Directories.getLocationForDisk() is called.  It
>> seems to be looking for locations that start with the data directory
>> absolute path.   I can change it to make it look for the original path not
>> the translated path.  But in other cases the translated path is the one
>> that is needed.
>>
>> I notice that there is a concept of multiple file locations in the code
>> base, particularly in the Directories.DataDirectories class where there are
>> "locationsForNonSystemKeyspaces" and "locationsForSystemKeyspace" in the
>> constructor, and in the
>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations() method
>> which returns an array of String and is populated from the cassandra.yaml
>> file.
>>
>> The DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()
>> only ever seems to return an array of one item.
>>
>> Why does
>> DatabaseDescriptor.getNonLocalSystemKeyspacesDataFileLocations()  return an
>> array?
>>
>> Should the system set the path to the root of the ColumnFamilyStore in
>> the ColumnFamilyStore directories instance?
>> Should the Directories.getLocationForDisk() do the proxy to the other
>> file system?
>>
>> Where is the proper location to change from the standard internal
>> representation to the remote location?
>>
>>
>> On Fri, Sep 29, 2023 at 8:07 AM Claude Warren, Jr <claude.war...@aiven.io>
>> wrote:
>>
>>> Sorry I was out sick and did not respond yesterday.
>>>
>>> Henrik,  How does your system work?  What is the design strategy?  Also
>>> is your code available somewhere?
>>>
>>> After looking at the code some more I think that the best solution is
>>> not a FileChannelProxy but to modify the Cassandra File class to get a
>>> FileSystem object for a Factory to build the Path that is used within that
>>> object.  I think that this makes if very small change that will pick up
>>> 90+% of the cases.  We then just need to find the edge cases.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Sep 29, 2023 at 1:14 AM German Eichberger via dev <
>>> dev@cassandra.apache.org> wrote:
>>>
>>>> Super excited about this as well. Happy to help test with Azure and any
>>>> other way needed.
>>>>
>>>> Thanks,
>>>> German
>>>> ------------------------------
>>>> *From:* guo Maxwell <cclive1...@gmail.com>
>>>> *Sent:* Wednesday, September 27, 2023 7:38 PM
>>>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org>
>>>> *Subject:* [EXTERNAL] Re: [DISCUSS] CEP-36: A Configurable
>>>> ChannelProxy to alias external storage locations
>>>>
>>>> Thanks , So I think a jira can be created now. And I'd be happy to
>>>> provide some help with this as well if needed.
>>>>
>>>> Henrik Ingo <henrik.i...@datastax.com> 于2023年9月28日周四 00:21写道:
>>>>
>>>> It seems I was volunteered to rebase the Astra implementation of this
>>>> functionality (FileSystemProvider) onto Cassandra trunk. (And publish it,
>>>> of course) I'll try to get going today or tomorrow, so that this
>>>> discussion can then benefit from having that code available for inspection.
>>>> And potentially using it as a soluttion to this use case.
>>>>
>>>> On Tue, Sep 26, 2023 at 8:04 PM Jake Luciani <jak...@gmail.com> wrote:
>>>>
>>>> We (DataStax) have a FileSystemProvider for Astra we can provide.
>>>> Works with S3/GCS/Azure.
>>>>
>>>> I'll ask someone on our end to make it accessible.
>>>>
>>>> This would work by having a bucket prefix per node. But there are lots
>>>> of details needed to support things like out of bound compaction
>>>> (mentioned in CEP).
>>>>
>>>> Jake
>>>>
>>>> On Tue, Sep 26, 2023 at 12:56 PM Benedict <bened...@apache.org> wrote:
>>>> >
>>>> > I agree with Ariel, the more suitable insertion point is probably the
>>>> JDK level FileSystemProvider and FileSystem abstraction.
>>>> >
>>>> > It might also be that we can reuse existing work here in some cases?
>>>> >
>>>> > On 26 Sep 2023, at 17:49, Ariel Weisberg <ar...@weisberg.ws> wrote:
>>>> >
>>>> > 
>>>> > Hi,
>>>> >
>>>> > Support for multiple storage backends including remote storage
>>>> backends is a pretty high value piece of functionality. I am happy to see
>>>> there is interest in that.
>>>> >
>>>> > I think that `ChannelProxyFactory` as an integration point is going
>>>> to quickly turn into a dead end as we get into really using multiple
>>>> storage backends. We need to be able to list files and really the full
>>>> range of filesystem interactions that Java supports should work with any
>>>> backend to make development, testing, and using existing code
>>>> straightforward.
>>>> >
>>>> > It's a little more work to get C* to creates paths for alternate
>>>> backends where appropriate, but that works is probably necessary even with
>>>> `ChanelProxyFactory` and munging UNIX paths (vs supporting multiple
>>>> Fileystems). There will probably also be backend specific behaviors that
>>>> show up above the `ChannelProxy` layer that will depend on the backend.
>>>> >
>>>> > Ideally there would be some config to specify several backend
>>>> filesystems and their individual configuration that can be used, as well as
>>>> configuration and support for a "backend file router" for file creation
>>>> (and opening) that can be used to route files to the backend most
>>>> appropriate.
>>>> >
>>>> > Regards,
>>>> > Ariel
>>>> >
>>>> > On Mon, Sep 25, 2023, at 2:48 AM, Claude Warren, Jr via dev wrote:
>>>> >
>>>> > I have just filed CEP-36 [1] to allow for keyspace/table storage
>>>> outside of the standard storage space.
>>>> >
>>>> > There are two desires  driving this change:
>>>> >
>>>> > The ability to temporarily move some keyspaces/tables to storage
>>>> outside the normal directory tree to other disk so that compaction can
>>>> occur in situations where there is not enough disk space for compaction and
>>>> the processing to the moved data can not be suspended.
>>>> > The ability to store infrequently used data on slower cheaper storage
>>>> layers.
>>>> >
>>>> > I have a working POC implementation [2] though there are some issues
>>>> still to be solved and much logging to be reduced.
>>>> >
>>>> > I look forward to productive discussions,
>>>> > Claude
>>>> >
>>>> > [1]
>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-36%3A+A+Configurable+ChannelProxy+to+alias+external+storage+locations
>>>> > [2] https://github.com/Claudenw/cassandra/tree/channel_proxy_factory
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> --
>>>> http://twitter.com/tjake
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Henrik Ingo
>>>>
>>>> c. +358 40 569 7354
>>>>
>>>> w. www.datastax.com
>>>>
>>>> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>
>>>> <https://www.linkedin.com/company/datastax/>
>>>> <https://github.com/datastax/>
>>>>
>>>>
>>>>
>>>> --
>>>> you are the apple of my eye !
>>>>
>>>

Reply via email to