> On May 9, 2025, at 12:59 PM, Ariel Weisberg <ar...@weisberg.ws> wrote: > > > I am *big* fan of getting repair really working with MVs. It does seem > problematic that the number of merkle trees will be equal to the number of > ranges in the cluster and repair of MVs would become an all node operation. > How would down nodes be handled and how many nodes would simultaneously > working to validate a given base table range at once? How many base table > ranges could simultaneously be repairing MVs? > > If a row containing a column that creates an MV partition is deleted, and the > MV isn't updated, then how does the merkle tree approach propagate the > deletion to the MV? The CEP says that anti-compaction would remove extra > rows, but I am not clear on how that works. When is anti-compaction performed > in the repair process and what is/isn't included in the outputs?
I thought about these two points last night after I sent my email. There’s 2 things in this proposal that give me a lot of pause. One is the lack of tombstones / deletions in the merle trees, which makes properly dealing with writes/deletes/inconsistency very hard (afaict) The second is the reality that repairing a single partition in the base table may repair all hosts/ranges in the MV table, and vice versa. Basically scanning either base or MV is effectively scanning the whole cluster (modulo what you can avoid in the clean/dirty repaired sets). This makes me really, really concerned with how it scales, and how likely it is to be able to schedule automatically without blowing up. The paxos vs accord comments so far are interesting in that I think both could be made to work, but I am very concerned about how the merkle tree comparisons are likely to work with wide partitions leading to massive fanout in ranges.