I'd like to discuss the inclusion of the above tickets for a 3.11.x release. These are not a pure 'bug fix' so I'll need a waiver to get them into 3.11.x (and implicitly, 4.0.x).
The first two are straightforward oversights: neither *nodetool garbagecollect *nor *nodetool scrub* currently accept a *--user-defined* parameter list of SSTables in the same way that *nodetool compact* does. This is an operational problem for large tables. I often need to scrub just one file that is corrupted for some reason, and not scrub an entire 1TB+ of data for a table on a node. This renders 'nodetool scrub' operationally useless for large tables. For *garbagecollect* it is often operationally easy to identify which tables are likely to be full of bloa- and operationally useful to do this task in small increments. The existing order that garbagecollect processes SSTables prevents it from being useful in any incremental fashion -- if you stop it and later restart, it will first process the SSTables you just garbage collected. The third ticket adds an option for* nodetool garbagecollect*, *--oldest-fraction* that can select a fraction of the oldest table data in bytes, and garbagecollect only the SSTables that 'cover' that percentage of data. Operationally, this lends itself to easy automation -- for example running this once a week on 10% of a table's data would imply that there is no data on disk that has been overwritten within the last 10 weeks. This caps data bloat in ways neither LCS nor STCS can currently achieve without regular major compactions or full-pass garbagecollect. I have a large LCS table that has existed in steady state for about two years. Its oldest SSTable files were about 20 months old. These old tables were 95% bloated by that time -- 'garbagecollect' was able to shrink those to 5% of their original size. Being able to automate garbagecollect on a small fraction of the older data would be a big disk space and performance win, without the downsides of a major compaction. The overall risk of these additions is low: - They do not modify any existing behavior, only add new options. - They re-use existing machinery for most of the work, and only adds logic in areas that are already well tested. The areas that need the most scrutiny in review have good test coverage. - scripts that worked with nodetool before should continue to work except for the case where a keyspace is named --user-defined or --oldest-fraction, but this flaw already exists with other nodetool commands. - Three is no modification to sensitive areas like the read, write, or autocompaction path. This merely does the same thing that is already done, just on a subset of SSTables rather than all of them. Thanks for considering this proposal, -Scott Carey P.S. You might wonder why the --oldest-fraction is necessary when one can use --user-defined and some OS level scripting. 1. --oldest-fraction calculates the SSTable fraction based on the total data size, not file count. 2. nodetool can avoid race conditions with autocompaction on sstable selection 3. nodetool has access to the current state of active SSTables, a script just sees files on disk, files that might be scheduled for delete or files that are actively being written to. 4. Even if used at a 100% fraction, it processes from oldest to newest by the SSTable generation number, meaning that if it is interrupted half way through, then re-started, it won't immediately work on the files that were just processed, as those will have the largest generation number.