Re: [DISCUSS] CEP-48: First-Class Materialized View Support

David Capwell Fri, 09 May 2025 14:26:55 -0700

> The MV repair tool in Cassandra is intended to address inconsistencies that 
> may occur in materialized views due to various factors. This component is the 
> most complex and demanding part of the development effort, representing 
> roughly 70% of the overall work.


> but I am very concerned about how the merkle tree comparisons are likely to 
> work with wide partitions leading to massive fanout in ranges. 

As far as I can tell, being based off Accord means you don’t need to care about 
repair, as Accord will manage the consistency for you; you can’t get out of 
sync.

Being based off accord also means you can deal with multiple partitions/tokens, 
where as LWT is limited to a single token.  I am not sure how the following 
would work with the proposed design and LWT

CREATE TABLE tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck));
CREATE MATERIALIZED VIEW tbl2
AS SELECT * FROM tbl WHERE ck > 42 PRIMARY KEY(pk, ck)

— mutations
UPDATE tbl SET v=42 WHERE pk IN (0, 1) AND ck IN (50, 74); — this touches 2 
partition keys
BEGIN BATCH — also touches 2 partition keys
  INSERT INTO tbl (pk, ck, v) VALUES (0, 47, 0);
  INSERT INTO tbl (pk, ck, v) VALUES (1, 48, 0);
END BATCH



> On May 9, 2025, at 1:03 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> 
> 
>> On May 9, 2025, at 12:59 PM, Ariel Weisberg <ar...@weisberg.ws> wrote:
>> 
>> 
>> I am *big* fan of getting repair really working with MVs. It does seem 
>> problematic that the number of merkle trees will be equal to the number of 
>> ranges in the cluster and repair of MVs would become an all node operation.  
>> How would down nodes be handled and how many nodes would simultaneously 
>> working to validate a given base table range at once? How many base table 
>> ranges could simultaneously be repairing MVs?
>> 
>> If a row containing a column that creates an MV partition is deleted, and 
>> the MV isn't updated, then how does the merkle tree approach propagate the 
>> deletion to the MV? The CEP says that anti-compaction would remove extra 
>> rows, but I am not clear on how that works. When is anti-compaction 
>> performed in the repair process and what is/isn't included in the outputs?
> 
> 
> I thought about these two points last night after I sent my email.
> 
> There’s 2 things in this proposal that give me a lot of pause.
> 
> One is the lack of tombstones / deletions in the merle trees, which makes 
> properly dealing with writes/deletes/inconsistency very hard (afaict)
> 
> The second is the reality that repairing a single partition in the base table 
> may repair all hosts/ranges in the MV table, and vice versa. Basically 
> scanning either base or MV is effectively scanning the whole cluster (modulo 
> what you can avoid in the clean/dirty repaired sets). This makes me really, 
> really concerned with how it scales, and how likely it is to be able to 
> schedule automatically without blowing up. 
> 
> The paxos vs accord comments so far are interesting in that I think both 
> could be made to work, but I am very concerned about how the merkle tree 
> comparisons are likely to work with wide partitions leading to massive fanout 
> in ranges. 
> 
>

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

Reply via email to