Re: Proposing an Apache Cassandra Management process

Sankalp Kohli Thu, 16 Aug 2018 21:28:19 -0700

I am bumping this thread because patch has landed for this with repair 
functionality.


I have a following proposal for this which I can put in the JIRA or doc 

1. We should see if we can keep this in a separate repo like Dtest. 
2. It should have its own release process.
3. It should have integration tests for different versions of Cassandra it will 
support. 

Does anyone has any ideas on this ? 

Thanks
Sankalp 

> On Apr 18, 2018, at 19:20, Dinesh Joshi <[email protected]> 
> wrote:
> 
> Thank you all for the feedback. Nate made a Google doc with the proposal in 
> it and is linked off of the ticket. I will try to flesh it out as I gather 
> your input.
> Dinesh 
> 
>    On Wednesday, April 18, 2018, 5:12:49 PM PDT, kurt greaves 
> <[email protected]> wrote:  
> 
> For anyone looking Dinesh made a ticket already: CASSANDRA-14395
> <https://issues.apache.org/jira/browse/CASSANDRA-14395>
> 
>> On 18 April 2018 at 18:14, Vinay Chella <[email protected]> wrote:
>> 
>> This is a good initiative. We have advocated for and run a sidecar for the
>> past 5+ years, and we've learned and improved it a lot. We look forward to
>> moving features from Priam (such as backup, HTTP -> JMX, etc) incrementally
>> to this sidecar as they make sense.
>> 
>> 
>> Thanks,
>> Vinay Chella
>> 
>> On Fri, Apr 13, 2018 at 7:01 AM, Eric Evans <[email protected]>
>> wrote:
>> 
>>> On Thu, Apr 12, 2018 at 4:41 PM, Dinesh Joshi
>>> <[email protected]> wrote:
>>>> Hey all -
>>>> With the uptick in discussion around Cassandra operability and after
>>> discussing potential solutions with various members of the community, we
>>> would like to propose the addition of a management process/sub-project
>> into
>>> Apache Cassandra. The process would be responsible for common operational
>>> tasks like bulk execution of nodetool commands, backup/restore, and
>> health
>>> checks, among others. We feel we have a proposal that will garner some
>>> discussion and debate but is likely to reach consensus.
>>>> While the community, in large part, agrees that these features should
>>> exist “in the database”, there is debate on how they should be
>> implemented.
>>> Primarily, whether or not to use an external process or build on
>>> CassandraDaemon. This is an important architectural decision but we feel
>>> the most critical aspect is not where the code runs but that the operator
>>> still interacts with the notion of a single database. Multi-process
>>> databases are as old as Postgres and continue to be common in newer
>> systems
>>> like Druid. As such, we propose a separate management process for the
>>> following reasons:
>>>> 
>>>>     - Resource isolation & Safety: Features in the management process
>>> will not affect C*'s read/write path which is critical for stability. An
>>> isolated process has several technical advantages including preventing
>> use
>>> of unnecessary dependencies in CassandraDaemon, separation of JVM
>> resources
>>> like thread pools and heap, and preventing bugs from adversely affecting
>>> the main process. In particular, GC tuning can be done separately for the
>>> two processes, hopefully helping to improve, or at least not adversely
>>> affect, tail latencies of the main process.
>>>> 
>>>>     - Health Checks & Recovery: Currently users implement health checks
>>> in their own sidecar process. Implementing them in the serving process
>> does
>>> not make sense because if the JVM running the CassandraDaemon goes south,
>>> the healthchecks and potentially any recovery code may not be able to
>> run.
>>> Having a management process running in isolation opens up the possibility
>>> to not only report the health of the C* process such as long GC pauses or
>>> stuck JVM but also to recover from it. Having a list of basic health
>> checks
>>> that are tested with every C* release and officially supported will help
>>> boost confidence in C* quality and make it easier to operate.
>>>> 
>>>>     - Reduced Risk: By having a separate Daemon we open the possibility
>>> to contribute features that otherwise would not have been considered
>> before
>>> eg. a UI. A library that started many background threads and is operated
>>> completely differently would likely be considered too risky for
>>> CassandraDaemon but is a good candidate for the management process.
>>> 
>>> Makes sense IMO.
>>> 
>>>> What can go into the management process?
>>>>     - Features that are non-essential for serving reads & writes for eg.
>>> Backup/Restore or Running Health Checks against the CassandraDaemon, etc.
>>>> 
>>>>     - Features that do not make the management process critical for
>>> functioning of the serving process. In other words, if someone does not
>>> wish to use this management process, they are free to disable it.
>>>> 
>>>> We would like to initially build minimal set of features such as health
>>> checks and bulk commands into the first iteration of the management
>>> process. We would use the same software stack that is used to build the
>>> current CassandraDaemon binary. This would be critical for sharing code
>>> between CassandraDaemon & management processes. The code should live
>>> in-tree to make this easy.
>>>> With regards to more in-depth features like repair scheduling and
>>> discussions around compaction in or out of CassandraDaemon, while the
>>> management process may be a suitable host, it is not our goal to decide
>>> that at this time. The management process could be used in these cases,
>> as
>>> they meet the criteria above, but other technical/architectural reasons
>> may
>>> exists for why it should not be.
>>>> We are looking forward to your comments on our proposal,
>>> 
>>> Sounds good to me.
>>> 
>>> Personally, I'm a little less interested in things like
>>> health/availability checks and metrics collection, because there are
>>> already tools to solve this problem (and most places will already be
>>> using them).  I'm more interested in things like cluster status,
>>> streaming, repair, etc.  Something to automate/centralize
>>> database-specific command and control, and improve visibility.
>>> 
>>> In-tree also makes sense (tools/ maybe?), but I would suggest working
>>> out of a branch initially, and seeking inclusion when there is
>>> something more concrete to discuss.
>>> 
>>> 
>>> --
>>> Eric Evans
>>> [email protected]
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>> 
>>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Proposing an Apache Cassandra Management process

Reply via email to