> The MV repair tool in Cassandra is intended to address inconsistencies that > may occur in materialized views due to various factors. This component is the > most complex and demanding part of the development effort, representing > roughly 70% of the overall work.
> but I am very concerned about how the merkle tree comparisons are likely to > work with wide partitions leading to massive fanout in ranges. As far as I can tell, being based off Accord means you don’t need to care about repair, as Accord will manage the consistency for you; you can’t get out of sync. Being based off accord also means you can deal with multiple partitions/tokens, where as LWT is limited to a single token. I am not sure how the following would work with the proposed design and LWT CREATE TABLE tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck)); CREATE MATERIALIZED VIEW tbl2 AS SELECT * FROM tbl WHERE ck > 42 PRIMARY KEY(pk, ck) — mutations UPDATE tbl SET v=42 WHERE pk IN (0, 1) AND ck IN (50, 74); — this touches 2 partition keys BEGIN BATCH — also touches 2 partition keys INSERT INTO tbl (pk, ck, v) VALUES (0, 47, 0); INSERT INTO tbl (pk, ck, v) VALUES (1, 48, 0); END BATCH > On May 9, 2025, at 1:03 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > > >> On May 9, 2025, at 12:59 PM, Ariel Weisberg <ar...@weisberg.ws> wrote: >> >> >> I am *big* fan of getting repair really working with MVs. It does seem >> problematic that the number of merkle trees will be equal to the number of >> ranges in the cluster and repair of MVs would become an all node operation. >> How would down nodes be handled and how many nodes would simultaneously >> working to validate a given base table range at once? How many base table >> ranges could simultaneously be repairing MVs? >> >> If a row containing a column that creates an MV partition is deleted, and >> the MV isn't updated, then how does the merkle tree approach propagate the >> deletion to the MV? The CEP says that anti-compaction would remove extra >> rows, but I am not clear on how that works. When is anti-compaction >> performed in the repair process and what is/isn't included in the outputs? > > > I thought about these two points last night after I sent my email. > > There’s 2 things in this proposal that give me a lot of pause. > > One is the lack of tombstones / deletions in the merle trees, which makes > properly dealing with writes/deletes/inconsistency very hard (afaict) > > The second is the reality that repairing a single partition in the base table > may repair all hosts/ranges in the MV table, and vice versa. Basically > scanning either base or MV is effectively scanning the whole cluster (modulo > what you can avoid in the clean/dirty repaired sets). This makes me really, > really concerned with how it scales, and how likely it is to be able to > schedule automatically without blowing up. > > The paxos vs accord comments so far are interesting in that I think both > could be made to work, but I am very concerned about how the merkle tree > comparisons are likely to work with wide partitions leading to massive fanout > in ranges. > >