My two cents as a (relatively small) user. I'm coming at this from the ops/user side, so my apologies if some of these don't make sense based on a more detailed understanding of the codebase:
Repair is definitely a major missing piece of Cassandra. Integrated would be easier, but a sidecar might be more flexible. As an intermediate step that works towards both options, does it make sense to start with finer-grained tracking and reporting for subrange repairs? That is, expose a set of interfaces (both internally and via JMX) that give a scheduler enough information to run subrange repairs across multiple keyspaces or even non-overlapping ranges at the same time. That lets people experiment with and quickly/safely/easily iterate on different scheduling strategies in the short term, and long-term those strategies can be integrated into a built-in scheduler On the subject of scheduling, I think adjusting parallelism/aggression with a possible whitelist or blacklist would be a lot more useful than a "time between repairs". That is, if repairs run for a few hours then don't run for a few (somewhat hard-to-predict) hours, I still have to size the cluster for the load when the repairs are running. The only reason I can think of for an interval between repairs is to allow re-compaction from repair anticompactions, and subrange repairs seem to eliminate this. Even if they didn't, a more direct method along the lines of "don't repair when the compaction queue is too long" might make more sense. Blacklisted timeslots might be useful for avoiding peak time or batch jobs, but only if they can be specified for consistent time-of-day intervals instead of unpredictable lulls between repairs. I really like the idea of automatically adjusting gc_grace_seconds based on repair state. The only_purge_repaired_tombstones option fixes this elegantly for sequential/incremental repairs on STCS, but not for subrange repairs or LCS (unless a scheduler gains the ability somehow to determine that every subrange in an sstable has been repaired and mark it accordingly?) On 2018/04/03 17:48:14, Blake Eggleston <b...@apple.com> wrote: > Hi dev@,> > > > > > The question of the best way to schedule repairs came up on CASSANDRA-14346, and I thought it would be good to bring up the idea of an external tool on the dev list.> > > > > > Cassandra lacks any sort of tools for automating routine tasks that are required for running clusters, specifically repair. Regular repair is a must for most clusters, like compaction. This means that, especially as far as eventual consistency is concerned, Cassandra isn’t totally functional out of the box. Operators either need to find a 3rd party solution or implement one themselves. Adding this to Cassandra would make it easier to use.> > > > > > Is this something we should be doing? If so, what should it look like?> > > > > > Personally, I feel like this is a pretty big gap in the project and would like to see an out of process tool offered. Ideally, Cassandra would just take care of itself, but writing a distributed repair scheduler that you trust to run in production is a lot harder than writing a single process management application that can failover.> > > > > > Any thoughts on this?> > > > > > Thanks,> > > > > > Blake> > >