Thank you all for the feedback. Nate made a Google doc with the proposal in it and is linked off of the ticket. I will try to flesh it out as I gather your input. Dinesh
On Wednesday, April 18, 2018, 5:12:49 PM PDT, kurt greaves <k...@instaclustr.com> wrote: For anyone looking Dinesh made a ticket already: CASSANDRA-14395 <https://issues.apache.org/jira/browse/CASSANDRA-14395> On 18 April 2018 at 18:14, Vinay Chella <vinaykumar...@gmail.com> wrote: > This is a good initiative. We have advocated for and run a sidecar for the > past 5+ years, and we've learned and improved it a lot. We look forward to > moving features from Priam (such as backup, HTTP -> JMX, etc) incrementally > to this sidecar as they make sense. > > > Thanks, > Vinay Chella > > On Fri, Apr 13, 2018 at 7:01 AM, Eric Evans <john.eric.ev...@gmail.com> > wrote: > > > On Thu, Apr 12, 2018 at 4:41 PM, Dinesh Joshi > > <dinesh.jo...@yahoo.com.invalid> wrote: > > > Hey all - > > > With the uptick in discussion around Cassandra operability and after > > discussing potential solutions with various members of the community, we > > would like to propose the addition of a management process/sub-project > into > > Apache Cassandra. The process would be responsible for common operational > > tasks like bulk execution of nodetool commands, backup/restore, and > health > > checks, among others. We feel we have a proposal that will garner some > > discussion and debate but is likely to reach consensus. > > > While the community, in large part, agrees that these features should > > exist “in the database”, there is debate on how they should be > implemented. > > Primarily, whether or not to use an external process or build on > > CassandraDaemon. This is an important architectural decision but we feel > > the most critical aspect is not where the code runs but that the operator > > still interacts with the notion of a single database. Multi-process > > databases are as old as Postgres and continue to be common in newer > systems > > like Druid. As such, we propose a separate management process for the > > following reasons: > > > > > > - Resource isolation & Safety: Features in the management process > > will not affect C*'s read/write path which is critical for stability. An > > isolated process has several technical advantages including preventing > use > > of unnecessary dependencies in CassandraDaemon, separation of JVM > resources > > like thread pools and heap, and preventing bugs from adversely affecting > > the main process. In particular, GC tuning can be done separately for the > > two processes, hopefully helping to improve, or at least not adversely > > affect, tail latencies of the main process. > > > > > > - Health Checks & Recovery: Currently users implement health checks > > in their own sidecar process. Implementing them in the serving process > does > > not make sense because if the JVM running the CassandraDaemon goes south, > > the healthchecks and potentially any recovery code may not be able to > run. > > Having a management process running in isolation opens up the possibility > > to not only report the health of the C* process such as long GC pauses or > > stuck JVM but also to recover from it. Having a list of basic health > checks > > that are tested with every C* release and officially supported will help > > boost confidence in C* quality and make it easier to operate. > > > > > > - Reduced Risk: By having a separate Daemon we open the possibility > > to contribute features that otherwise would not have been considered > before > > eg. a UI. A library that started many background threads and is operated > > completely differently would likely be considered too risky for > > CassandraDaemon but is a good candidate for the management process. > > > > Makes sense IMO. > > > > > What can go into the management process? > > > - Features that are non-essential for serving reads & writes for eg. > > Backup/Restore or Running Health Checks against the CassandraDaemon, etc. > > > > > > - Features that do not make the management process critical for > > functioning of the serving process. In other words, if someone does not > > wish to use this management process, they are free to disable it. > > > > > > We would like to initially build minimal set of features such as health > > checks and bulk commands into the first iteration of the management > > process. We would use the same software stack that is used to build the > > current CassandraDaemon binary. This would be critical for sharing code > > between CassandraDaemon & management processes. The code should live > > in-tree to make this easy. > > > With regards to more in-depth features like repair scheduling and > > discussions around compaction in or out of CassandraDaemon, while the > > management process may be a suitable host, it is not our goal to decide > > that at this time. The management process could be used in these cases, > as > > they meet the criteria above, but other technical/architectural reasons > may > > exists for why it should not be. > > > We are looking forward to your comments on our proposal, > > > > Sounds good to me. > > > > Personally, I'm a little less interested in things like > > health/availability checks and metrics collection, because there are > > already tools to solve this problem (and most places will already be > > using them). I'm more interested in things like cluster status, > > streaming, repair, etc. Something to automate/centralize > > database-specific command and control, and improve visibility. > > > > In-tree also makes sense (tools/ maybe?), but I would suggest working > > out of a branch initially, and seeking inclusion when there is > > something more concrete to discuss. > > > > > > -- > > Eric Evans > > john.eric.ev...@gmail.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > >