Re: [DISCUSS] CEP-48: First-Class Materialized View Support

Blake Eggleston Mon, 23 Jun 2025 12:12:20 -0700

> Sorry, Blake—I was traveling last week and couldn’t reply to your email 
> sooner.


No worries, I’ll be taking some time off soon as well.

> I don’t think the first or second option is ideal. We should treat the base 
> table as the source of truth. Modifying it—or forcing an update on it, even 
> if it’s just a timestamp change—is not a good approach and won’t solve all 
> problems.

I agree the first option probably isn’t the right way to go. Could you say a 
bit more about why the second option is not a good approach and which problems 
it won’t solve?

On Sun, Jun 22, 2025, at 6:09 PM, Runtian Liu wrote:
> Sorry, Blake—I was traveling last week and couldn’t reply to your email 
> sooner.
> 
> > First - we interpret view data with higher timestamps than the base table 
> > as data that’s missing from the base and replicate it into the base table. 
> > The timestamp of the missing data may be below the paxos timestamp low 
> > bound so we’d have to adjust the paxos coordination logic to allow that in 
> > this case. Depending on how the view got this way it may also tear writes 
> > to the base table, breaking the write atomicity promise.
> 
> As discussed earlier, we want this MV repair mechanism to handle all edge 
> cases. However, it would be difficult to design it in a way that detects the 
> root cause of each mismatch and repairs it accordingly. Additionally, as you 
> mentioned, this approach could introduce other issues, such as violating the 
> write atomicity guarantee.
> 
> > Second - If this happens it means that we’ve either lost base table data or 
> > paxos metadata. If that happened, we could force a base table update that 
> > rewrites the current base state with new timestamps making the extra view 
> > data removable. However this wouldn’t fix the case where the view cell has 
> > a timestamp from the future - although that’s not a case that C* can fix 
> > today either.
> 
> I don’t think the first or second option is ideal. We should treat the base 
> table as the source of truth. Modifying it—or forcing an update on it, even 
> if it’s just a timestamp change—is not a good approach and won’t solve all 
> problems.
> 
> > the idea to use anti-compaction makes a lot more sense now (in principle - 
> > I don’t think it’s workable in practice)
> 
> I have one question regarding anti-compaction. Is the main concern that 
> processing too much data during anti-compaction could cause issues for the 
> cluster? 
> 
> > I guess you could add some sort of assassin cell that is meant to remove a 
> > cell with a specific timestamp and value, but is otherwise invisible. 
> 
> The idea of the assassination cell is interesting. To prevent data from being 
> incorrectly removed during the repair process, we need to ensure a quorum of 
> nodes is available and agrees on the same value before repairing a 
> materialized view (MV) row or cell. However, this could be very expensive, as 
> it requires coordination to repair even a single row.
> 
> I think there are a few key differences between MV repair and normal 
> anti-entropy repair:
> 
>  1. For normal anti-entropy repair, there is no single source of truth.
> All replicas act as sources of truth, and eventual consistency is achieved by 
> merging data across them. Even if some data is lost or bugs result in future 
> timestamps, the replicas will eventually converge on the same (possibly 
> incorrect) value, and that value becomes the accepted truth. In contrast, MV 
> repair relies on the base table as the source of truth and modifies the MV if 
> inconsistencies are detected. This means that in some cases, simply merging 
> data won’t resolve the issue, since Cassandra resolves conflicts using 
> timestamps and there’s no guarantee the base table will always "win" unless 
> we change the MV merging logic—which I’ve been trying to avoid.
> 
>  2. I see MV repair as more of an on-demand operation, whereas normal 
> anti-entropy repair needs to run regularly. This means we shouldn’t treat MV 
> repair the same way as existing repairs. When an operator initiates MV 
> repair, they need to ensure that sufficient resources are available to 
> support it.
> 
> 
> On Thu, Jun 12, 2025 at 8:53 AM Blake Eggleston <bl...@ultrablake.com> wrote:
>> __
>> Got it, thanks for clearing that up. I’d seen the strict liveness code 
>> around but didn’t realize it was MV related and hadn’t dug into what it did 
>> or how it worked. 
>> 
>> I think you’re right about the row liveness update working for extra data 
>> with timestamps lower than the most recent base table update.
>> 
>> I see what you mean about the timestamp from the future case. I thought of 3 
>> options:
>> 
>> First - we interpret view data with higher timestamps than the base table as 
>> data that’s missing from the base and replicate it into the base table. The 
>> timestamp of the missing data may be below the paxos timestamp low bound so 
>> we’d have to adjust the paxos coordination logic to allow that in this case. 
>> Depending on how the view got this way it may also tear writes to the base 
>> table, breaking the write atomicity promise.
>> 
>> Second - If this happens it means that we’ve either lost base table data or 
>> paxos metadata. If that happened, we could force a base table update that 
>> rewrites the current base state with new timestamps making the extra view 
>> data removable. However this wouldn’t fix the case where the view cell has a 
>> timestamp from the future - although that’s not a case that C* can fix today 
>> either.
>> 
>> Third - we add a new tombstone type or some mechanism to delete specific 
>> cells that doesn’t preclude correct writes with lower timestamps from being 
>> visible. I’m not sure how this would work, and the idea to use 
>> anti-compaction makes a lot more sense now (in principle - I don’t think 
>> it’s workable in practice). I guess you could add some sort of assassin cell 
>> that is meant to remove a cell with a specific timestamp and value, but is 
>> otherwise invisible. This seems dangerous though, since it’s likely there’s 
>> a replication problem that may resolve itself and the repair process would 
>> actually be removing data that the user intended to write.
>> 
>> Paulo - I don’t think storage changes are off the table, but they do expand 
>> the scope and risk of the proposal, so we should be careful.
>> 
>> On Wed, Jun 11, 2025, at 4:44 PM, Paulo Motta wrote:
>>>  > I’m not sure if this is the only edge case—there may be other issues as 
>>> well. I’m also unsure whether we should redesign the tombstone handling for 
>>> MVs, since that would involve changes to the storage engine. To minimize 
>>> impact there, the original proposal was to rebuild the affected ranges 
>>> using anti-compaction, just to be safe.
>>> 
>>> I haven't been following the discussion but I think one of the issues with 
>>> the materialized view "strict liveness" fix[1] is that we avoided making 
>>> invasive changes to the storage engine at the time, but this was considered 
>>> by Zhao on [1]. I think we shouldn't be trying to avoid updates to the 
>>> storage format as part of the MV implementation, if this is what it takes 
>>> to make MVs V2 reliable.
>>> 
>>> [1] - 
>>> https://issues.apache.org/jira/browse/CASSANDRA-11500?focusedCommentId=16101603&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16101603
>>> 
>>> On Wed, Jun 11, 2025 at 7:02 PM Runtian Liu <curly...@gmail.com> wrote:
>>>> The current design leverages strict liveness to shadow the old view row. 
>>>> When the view-indexed value changes from 'a' to 'b', no tombstone is 
>>>> written; instead, the old row is marked as expired by updating its 
>>>> liveness info with the timestamp of the change. If the column is later set 
>>>> back to 'a', the view row is re-inserted with a new, non-expired liveness 
>>>> info reflecting the latest timestamp.
>>>> 
>>>> To delete an extra row in the materialized view (MV), we can likely use 
>>>> the same approach—marking it as shadowed by updating the liveness info. 
>>>> However, in the case of inconsistencies where a column in the MV has a 
>>>> higher timestamp than the corresponding column in the base table, this 
>>>> row-level liveness mechanism is insufficient.
>>>> 
>>>> Even for the case where we delete the row by marking its liveness info as 
>>>> expired during repair, there are concerns. Since this introduces a data 
>>>> mutation as part of the repair process, it’s unclear whether there could 
>>>> be edge cases we’re missing. This approach may risk unexpected side 
>>>> effects if the repair logic is not carefully aligned with write path 
>>>> semantics.
>>>> 
>>>> 
>>>> On Thu, Jun 12, 2025 at 3:59 AM Blake Eggleston <bl...@ultrablake.com> 
>>>> wrote:
>>>>> __
>>>>> That’s a good point, although as described I don’t think that could ever 
>>>>> work properly, even in normal operation. Either we’re misunderstanding 
>>>>> something, or this is a flaw in the current MV design.
>>>>> 
>>>>> Assuming changing the view indexed column results in a tombstone being 
>>>>> applied to the view row for the previous value, if we wrote the other 
>>>>> base columns (the non view indexed ones) to the view with the same 
>>>>> timestamps they have on the base, then changing the view indexed value 
>>>>> from ‘a’ to ‘b’, then back to ‘a’ would always cause this problem. I 
>>>>> think you’d need to always update the column timestamps on the view to be 
>>>>> >= the view column timestamp on the base
>>>>> 
>>>>> On Tue, Jun 10, 2025, at 11:38 PM, Runtian Liu wrote:
>>>>>> > In the case of a missed update, we'll have a new value and we can send 
>>>>>> > a tombstone to the view with the timestamp of the most recent update.
>>>>>> 
>>>>>> > then something has gone wrong and we should issue a tombstone using 
>>>>>> > the paxos repair timestamp as the tombstone timestamp.
>>>>>> 
>>>>>> The current MV implementation uses “strict liveness” to determine 
>>>>>> whether a row is live. I believe that using regular tombstones during 
>>>>>> repair could cause problems. For example, consider a base table with 
>>>>>> schema (pk, ck, v1, v2) and a materialized view with schema (v1, pk, ck) 
>>>>>> -> v2. If, for some reason, we detect an extra row in the MV and delete 
>>>>>> it using a tombstone with the latest update timestamp, we may run into 
>>>>>> issues. Suppose we later update the base table’s v1 field to match the 
>>>>>> MV row we previously deleted, and the v2 value now has an older 
>>>>>> timestamp. In that case, the previously issued tombstone could still 
>>>>>> shadow the v2 column, which is unintended. That is why I was asking if 
>>>>>> we are going to introduce a new kind of tombstones. I’m not sure if this 
>>>>>> is the only edge case—there may be other issues as well. I’m also unsure 
>>>>>> whether we should redesign the tombstone handling for MVs, since that 
>>>>>> would involve changes to the storage engine. To minimize impact there, 
>>>>>> the original proposal was to rebuild the affected ranges using 
>>>>>> anti-compaction, just to be safe.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 11, 2025 at 1:20 AM Blake Eggleston <bl...@ultrablake.com> 
>>>>>> wrote:
>>>>>>> __
>>>>>>>>  Extra row in MV (assuming the tombstone is gone in the base table) — 
>>>>>>>> how should we fix this?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> This would mean that the base table had either updated or deleted a row 
>>>>>>> and the view didn't receive the corresponding delete.
>>>>>>> 
>>>>>>> In the case of a missed update, we'll have a new value and we can send 
>>>>>>> a tombstone to the view with the timestamp of the most recent update. 
>>>>>>> Since timestamps issued by paxos and accord writes are always 
>>>>>>> increasing monotonically and don't have collisions, this is safe.
>>>>>>> 
>>>>>>> In the case of a row deletion, we'd also want to send a tombstone with 
>>>>>>> the same timestamp, however since tombstones can be purged, we may not 
>>>>>>> have that information and would have to treat it like the view has a 
>>>>>>> higher timestamp than the base table.
>>>>>>> 
>>>>>>>> Inconsistency (timestamps don’t match) — it’s easy to fix when the 
>>>>>>>> base table has higher timestamps, but how do we resolve it when the MV 
>>>>>>>> columns have higher timestamps?
>>>>>>>> 
>>>>>>> 
>>>>>>> There are 2 ways this could happen. First is that a write failed and 
>>>>>>> paxos repair hasn't completed it, which is expected, and the second is 
>>>>>>> a replication bug or base table data loss. You'd need to compare the 
>>>>>>> view timestamp to the paxos repair history to tell which it is. If the 
>>>>>>> view timestamp is higher than the most recent paxos repair timestamp 
>>>>>>> for the key, then it may just be a failed write and we should do 
>>>>>>> nothing. If the view timestamp is less than the most recent paxos 
>>>>>>> repair timestamp for that key and higher than the base timestamp, then 
>>>>>>> something has gone wrong and we should issue a tombstone using the 
>>>>>>> paxos repair timestamp as the tombstone timestamp. This is safe to do 
>>>>>>> because the paxos repair timestamps act as a low bound for ballots 
>>>>>>> paxos will process, so it wouldn't be possible for a legitimate write 
>>>>>>> to be shadowed by this tombstone.
>>>>>>> 
>>>>>>>> Do we need to introduce a new kind of tombstone to shadow the rows in 
>>>>>>>> the MV for cases 2 and 3? If yes, how will this tombstone work? If no, 
>>>>>>>> how should we fix the MV data?
>>>>>>>> 
>>>>>>> 
>>>>>>> No, a normal tombstone would work.
>>>>>>> 
>>>>>>> On Tue, Jun 10, 2025, at 2:42 AM, Runtian Liu wrote:
>>>>>>>> Okay, let’s put the efficiency discussion on hold for now. I want to 
>>>>>>>> make sure the actual repair process after detecting inconsistencies 
>>>>>>>> will work with the index-based solution.
>>>>>>>> 
>>>>>>>> When a mismatch is detected, the MV replica will need to stream its 
>>>>>>>> index file to the base table replica. The base table will then perform 
>>>>>>>> a comparison between the two files.
>>>>>>>> 
>>>>>>>> There are three cases we need to handle:
>>>>>>>> 
>>>>>>>>  1. Missing row in MV — this is straightforward; we can propagate the 
>>>>>>>> data to the MV.
>>>>>>>> 
>>>>>>>>  2. Extra row in MV (assuming the tombstone is gone in the base table) 
>>>>>>>> — how should we fix this?
>>>>>>>> 
>>>>>>>>  3. Inconsistency (timestamps don’t match) — it’s easy to fix when the 
>>>>>>>> base table has higher timestamps, but how do we resolve it when the MV 
>>>>>>>> columns have higher timestamps?
>>>>>>>> 
>>>>>>>> Do we need to introduce a new kind of tombstone to shadow the rows in 
>>>>>>>> the MV for cases 2 and 3? If yes, how will this tombstone work? If no, 
>>>>>>>> how should we fix the MV data?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jun 9, 2025 at 11:00 AM Blake Eggleston <bl...@ultrablake.com> 
>>>>>>>> wrote:
>>>>>>>>> __
>>>>>>>>> > hopefully we can come up with a solution that everyone agrees on.
>>>>>>>>> 
>>>>>>>>> I’m sure we can, I think we’ve been making good progress
>>>>>>>>> 
>>>>>>>>> > My main concern with the index-based solution is the overhead it 
>>>>>>>>> > adds to the hot path, as well as having to build indexes 
>>>>>>>>> > periodically.
>>>>>>>>> 
>>>>>>>>> So the additional overhead of maintaining a storage attached index on 
>>>>>>>>> the client write path is pretty minimal - it’s basically adding data 
>>>>>>>>> to an in memory trie. It’s a little extra work and memory usage, but 
>>>>>>>>> there isn’t any extra io or other blocking associated with it. I’d 
>>>>>>>>> expect the latency impact to be negligible.
>>>>>>>>> 
>>>>>>>>> > As mentioned earlier, this MV repair should be an infrequent 
>>>>>>>>> > operation
>>>>>>>>> 
>>>>>>>>> I don’t this that’s a safe assumption. There are a lot of situations 
>>>>>>>>> outside of data loss bugs where repair would need to be run. 
>>>>>>>>> 
>>>>>>>>> These use cases could probably be handled by repairing the view with 
>>>>>>>>> other view replicas:
>>>>>>>>> 
>>>>>>>>> Scrubbing corrupt sstables
>>>>>>>>> Node replacement via backup
>>>>>>>>> 
>>>>>>>>> These use cases would need an actual MV repair to check consistency 
>>>>>>>>> with the base table:
>>>>>>>>> 
>>>>>>>>> Restoring a cluster from a backup
>>>>>>>>> Imported sstables via nodetool import
>>>>>>>>> Data loss from operator error
>>>>>>>>> Proactive consistency checks - ie preview repairs
>>>>>>>>> 
>>>>>>>>> Even if it is an infrequent operation, when operators need it, it 
>>>>>>>>> needs to be available and reliable.
>>>>>>>>> 
>>>>>>>>> It’s a fact that there are clusters where non-incremental repairs are 
>>>>>>>>> run on a cadence of a week or more to manage the overhead of 
>>>>>>>>> validation compactions. Assuming the cluster doesn’t have any 
>>>>>>>>> additional headroom, that would mean that any one of the above events 
>>>>>>>>> could cause views to remain out of sync for up to a week while the 
>>>>>>>>> full set of merkle trees is being built.
>>>>>>>>> 
>>>>>>>>> This delay eliminates a lot of the value of repair as a risk 
>>>>>>>>> mitigation tool. If I had to make a recommendation where a bad call 
>>>>>>>>> could cost me my job, the prospect of a 7 day delay on repair would 
>>>>>>>>> mean a strong no.
>>>>>>>>> 
>>>>>>>>> Some users also run preview repair continuously to detect data 
>>>>>>>>> consistency errors, so at least a subset of users will probably be 
>>>>>>>>> running MV repairs continuously - at least in preview mode.
>>>>>>>>> 
>>>>>>>>> That’s why I say that the replication path should be designed to 
>>>>>>>>> never need repair, and MV repair should be designed to be prepared 
>>>>>>>>> for the worst.
>>>>>>>>> 
>>>>>>>>> > I’m wondering if it’s possible to enable or disable index building 
>>>>>>>>> > dynamically so that we don’t always incur the cost for something 
>>>>>>>>> > that’s rarely needed.
>>>>>>>>> 
>>>>>>>>> I think this would be a really reasonable compromise as long as the 
>>>>>>>>> default is on. That way it’s as safe as possible by default, but 
>>>>>>>>> users who don’t care or have a separate system for repairing MVs can 
>>>>>>>>> opt out.
>>>>>>>>> 
>>>>>>>>> > I’m not sure what you mean by “data problems” here.
>>>>>>>>> 
>>>>>>>>> I mean out of sync views - either due to bugs, operator error, 
>>>>>>>>> corruption, etc
>>>>>>>>> 
>>>>>>>>> > Also, this does scale with cluster size—I’ve compared it to full 
>>>>>>>>> > repair, and this MV repair should behave similarly. That means as 
>>>>>>>>> > long as full repair works, this repair should work as well.
>>>>>>>>> 
>>>>>>>>> You could build the merkle trees at about the same cost as a full 
>>>>>>>>> repair, but the actual data repair path is completely different for 
>>>>>>>>> MV, and that’s the part that doesn’t scale well. As you know, with 
>>>>>>>>> normal repair, we just stream data for ranges detected as out of 
>>>>>>>>> sync. For Mvs, since the data isn’t in base partition order, the view 
>>>>>>>>> data for an out of sync view range needs to be read out and streamed 
>>>>>>>>> to every base replica that it’s detected a mismatch against. So in 
>>>>>>>>> the example I gave with the 300 node cluster, you’re looking at 
>>>>>>>>> reading and transmitting the same partition at least 100 times in the 
>>>>>>>>> best case, and the cost of this keeps going up as the cluster 
>>>>>>>>> increases in size. That's the part that doesn't scale well. 
>>>>>>>>> 
>>>>>>>>> This is also one the benefits of the index design. Since it stores 
>>>>>>>>> data in segments that roughly correspond to points on the grid, 
>>>>>>>>> you’re not rereading the same data over and over. A repair for a 
>>>>>>>>> given grid point only reads an amount of data proportional to the 
>>>>>>>>> data in common for the base/view grid point, and it’s stored in a 
>>>>>>>>> small enough granularity that the base can calculate what data needs 
>>>>>>>>> to be sent to the view without having to read the entire view 
>>>>>>>>> partition.
>>>>>>>>> 
>>>>>>>>> On Sat, Jun 7, 2025, at 7:42 PM, Runtian Liu wrote:
>>>>>>>>>> Thanks, Blake. I’m open to iterating on the design, and hopefully we 
>>>>>>>>>> can come up with a solution that everyone agrees on.
>>>>>>>>>> 
>>>>>>>>>> My main concern with the index-based solution is the overhead it 
>>>>>>>>>> adds to the hot path, as well as having to build indexes 
>>>>>>>>>> periodically. As mentioned earlier, this MV repair should be an 
>>>>>>>>>> infrequent operation, but the index-based approach shifts some of 
>>>>>>>>>> the work to the hot path in order to allow repairs that touch only a 
>>>>>>>>>> few nodes.
>>>>>>>>>> 
>>>>>>>>>> I’m wondering if it’s possible to enable or disable index building 
>>>>>>>>>> dynamically so that we don’t always incur the cost for something 
>>>>>>>>>> that’s rarely needed.
>>>>>>>>>> 
>>>>>>>>>> > it degrades operators ability to react to data problems by 
>>>>>>>>>> > imposing a significant upfront processing burden on repair, and 
>>>>>>>>>> > that it doesn’t scale well with cluster size
>>>>>>>>>> 
>>>>>>>>>> I’m not sure what you mean by “data problems” here. Also, this does 
>>>>>>>>>> scale with cluster size—I’ve compared it to full repair, and this MV 
>>>>>>>>>> repair should behave similarly. That means as long as full repair 
>>>>>>>>>> works, this repair should work as well.
>>>>>>>>>> 
>>>>>>>>>> For example, regardless of how large the cluster is, you can always 
>>>>>>>>>> enable Merkle tree building on 10% of the nodes at a time until all 
>>>>>>>>>> the trees are ready.
>>>>>>>>>> 
>>>>>>>>>> I understand that coordinating this type of repair is harder than 
>>>>>>>>>> what we currently support, but with CEP-37, we should be able to 
>>>>>>>>>> handle this coordination without adding too much burden on the 
>>>>>>>>>> operator side.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Sat, Jun 7, 2025 at 8:28 AM Blake Eggleston 
>>>>>>>>>> <bl...@ultrablake.com> wrote:
>>>>>>>>>>> __
>>>>>>>>>>>> I don't see any outcome here that is good for the community 
>>>>>>>>>>>> though. Either Runtian caves and adopts your design that he (and 
>>>>>>>>>>>> I) consider inferior, or he is prevented from contributing this 
>>>>>>>>>>>> work.
>>>>>>>>>>> 
>>>>>>>>>>> Hey Runtian, fwiw, these aren't the only 2 options. This isn’t a 
>>>>>>>>>>> competition. We can collaborate and figure out the best approach to 
>>>>>>>>>>> the problem. I’d like to keep discussing it if you’re open to 
>>>>>>>>>>> iterating on the design.
>>>>>>>>>>> 
>>>>>>>>>>> I’m not married to our proposal, it’s just the cleanest way we 
>>>>>>>>>>> could think of to address what Jon and I both see as blockers in 
>>>>>>>>>>> the current proposal. It’s not set in stone though.
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jun 6, 2025, at 1:32 PM, Benedict Elliott Smith wrote:
>>>>>>>>>>>> Hmm, I am very surprised as I helped write that and I distinctly 
>>>>>>>>>>>> recall a specific goal was avoiding binding vetoes as they're so 
>>>>>>>>>>>> toxic.
>>>>>>>>>>>> 
>>>>>>>>>>>> Ok, I guess you can block this work if you like.
>>>>>>>>>>>> 
>>>>>>>>>>>> I don't see any outcome here that is good for the community 
>>>>>>>>>>>> though. Either Runtian caves and adopts your design that he (and 
>>>>>>>>>>>> I) consider inferior, or he is prevented from contributing this 
>>>>>>>>>>>> work. That isn't a functioning community in my mind, so I'll be 
>>>>>>>>>>>> noping out for a while, as I don't see much value here right now.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2025/06/06 18:31:08 Blake Eggleston wrote:
>>>>>>>>>>>> > Hi Benedict, that’s actually not true. 
>>>>>>>>>>>> > 
>>>>>>>>>>>> > Here’s a link to the project governance page: 
>>>>>>>>>>>> > _https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Project+Governance_
>>>>>>>>>>>> > 
>>>>>>>>>>>> > The CEP section says:
>>>>>>>>>>>> > 
>>>>>>>>>>>> > “*Once the proposal is finalized and any major committer dissent 
>>>>>>>>>>>> > reconciled, call a [VOTE] on the ML to have the proposal 
>>>>>>>>>>>> > adopted. The criteria for acceptance is consensus (3 binding +1 
>>>>>>>>>>>> > votes and no binding vetoes). The vote should remain open for 72 
>>>>>>>>>>>> > hours.*”
>>>>>>>>>>>> > 
>>>>>>>>>>>> > So they’re definitely vetoable. 
>>>>>>>>>>>> > 
>>>>>>>>>>>> > Also note the part about “*Once the proposal is finalized and 
>>>>>>>>>>>> > any major committer dissent reconciled,*” being a prerequisite 
>>>>>>>>>>>> > for moving a CEP to [VOTE]. Given the as yet unreconciled 
>>>>>>>>>>>> > committer dissent, it wouldn’t even be appropriate to move to a 
>>>>>>>>>>>> > VOTE until we get to the bottom of this repair discussion.
>>>>>>>>>>>> > 
>>>>>>>>>>>> > On Fri, Jun 6, 2025, at 12:31 AM, Benedict Elliott Smith wrote:
>>>>>>>>>>>> > > > but the snapshot repair design is not a viable path forward. 
>>>>>>>>>>>> > > > It’s the first iteration of a repair design. We’ve proposed 
>>>>>>>>>>>> > > > a second iteration, and we’re open to a third iteration.
>>>>>>>>>>>> > > 
>>>>>>>>>>>> > > I shan't be participating further in discussion, but I want to 
>>>>>>>>>>>> > > make a point of order. The CEP process has no vetoes, so you 
>>>>>>>>>>>> > > are not empowered to declare that a design is not viable 
>>>>>>>>>>>> > > without the input of the wider community.
>>>>>>>>>>>> > > 
>>>>>>>>>>>> > > 
>>>>>>>>>>>> > > On 2025/06/05 03:58:59 Blake Eggleston wrote:
>>>>>>>>>>>> > > > You can detect and fix the mismatch in a single round of 
>>>>>>>>>>>> > > > repair, but the amount of work needed to do it is 
>>>>>>>>>>>> > > > _significantly_ higher with snapshot repair. Consider a case 
>>>>>>>>>>>> > > > where we have a 300 node cluster w/ RF 3, where each view 
>>>>>>>>>>>> > > > partition contains entries mapping to every token range in 
>>>>>>>>>>>> > > > the cluster - so 100 ranges. If we lose a view sstable, it 
>>>>>>>>>>>> > > > will affect an entire row/column of the grid. Repair is 
>>>>>>>>>>>> > > > going to scan all data in the mismatching view token ranges 
>>>>>>>>>>>> > > > 100 times, and each base range once. So you’re looking at 
>>>>>>>>>>>> > > > 200 range scans.
>>>>>>>>>>>> > > > 
>>>>>>>>>>>> > > > Now, you may argue that you can merge the duplicate view 
>>>>>>>>>>>> > > > scans into a single scan while you repair all token ranges 
>>>>>>>>>>>> > > > in parallel. I’m skeptical that’s going to be achievable in 
>>>>>>>>>>>> > > > practice, but even if it is, we’re now talking about the 
>>>>>>>>>>>> > > > view replica hypothetically doing a pairwise repair with 
>>>>>>>>>>>> > > > every other replica in the cluster at the same time. Neither 
>>>>>>>>>>>> > > > of these options is workable.
>>>>>>>>>>>> > > > 
>>>>>>>>>>>> > > > Let’s take a step back though, because I think we’re getting 
>>>>>>>>>>>> > > > lost in the weeds.
>>>>>>>>>>>> > > > 
>>>>>>>>>>>> > > > The repair design in the CEP has some high level concepts 
>>>>>>>>>>>> > > > that make a lot of sense, the idea of repairing a grid is 
>>>>>>>>>>>> > > > really smart. However, it has some significant drawbacks 
>>>>>>>>>>>> > > > that remain unaddressed. I want this CEP to succeed, and I 
>>>>>>>>>>>> > > > know Jon does too, but the snapshot repair design is not a 
>>>>>>>>>>>> > > > viable path forward. It’s the first iteration of a repair 
>>>>>>>>>>>> > > > design. We’ve proposed a second iteration, and we’re open to 
>>>>>>>>>>>> > > > a third iteration. This part of the CEP process is meant to 
>>>>>>>>>>>> > > > identify and address shortcomings, I don’t think that 
>>>>>>>>>>>> > > > continuing to dissect the snapshot repair design is making 
>>>>>>>>>>>> > > > progress in that direction.
>>>>>>>>>>>> > > > 
>>>>>>>>>>>> > > > On Wed, Jun 4, 2025, at 2:04 PM, Runtian Liu wrote:
>>>>>>>>>>>> > > > > >  We potentially have to do it several times on each 
>>>>>>>>>>>> > > > > > node, depending on the size of the range. Smaller ranges 
>>>>>>>>>>>> > > > > > increase the size of the board exponentially, larger 
>>>>>>>>>>>> > > > > > ranges increase the number of SSTables that would be 
>>>>>>>>>>>> > > > > > involved in each compaction.
>>>>>>>>>>>> > > > > As described in the CEP example, this can be handled in a 
>>>>>>>>>>>> > > > > single round of repair. We first identify all the points 
>>>>>>>>>>>> > > > > in the grid that require repair, then perform 
>>>>>>>>>>>> > > > > anti-compaction and stream data based on a second scan 
>>>>>>>>>>>> > > > > over those identified points. This applies to the 
>>>>>>>>>>>> > > > > snapshot-based solution—without an index, repairing a 
>>>>>>>>>>>> > > > > single point in that grid requires scanning the entire 
>>>>>>>>>>>> > > > > base table partition (token range). In contrast, with the 
>>>>>>>>>>>> > > > > index-based solution—as in the example you referenced—if a 
>>>>>>>>>>>> > > > > large block of data is corrupted, even though the index is 
>>>>>>>>>>>> > > > > used for comparison, many key mismatches may occur. This 
>>>>>>>>>>>> > > > > can lead to random disk access to the original data files, 
>>>>>>>>>>>> > > > > which could cause performance issues. For the case you 
>>>>>>>>>>>> > > > > mentioned for snapshot based solution, it should not take 
>>>>>>>>>>>> > > > > months to repair all the data, instead one round of repair 
>>>>>>>>>>>> > > > > should be enough. The actual repair phase is split from 
>>>>>>>>>>>> > > > > the detection phase.
>>>>>>>>>>>> > > > > 
>>>>>>>>>>>> > > > > 
>>>>>>>>>>>> > > > > On Thu, Jun 5, 2025 at 12:12 AM Jon Haddad 
>>>>>>>>>>>> > > > > <j...@rustyrazorblade.com> wrote:
>>>>>>>>>>>> > > > >> > This isn’t really the whole story. The amount of wasted 
>>>>>>>>>>>> > > > >> > scans on index repairs is negligible. If a difference 
>>>>>>>>>>>> > > > >> > is detected with snapshot repairs though, you have to 
>>>>>>>>>>>> > > > >> > read the entire partition from both the view and base 
>>>>>>>>>>>> > > > >> > table to calculate what needs to be fixed.
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> You nailed it.
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> When the base table is converted to a view, and sent to 
>>>>>>>>>>>> > > > >> the view, the information we have is that one of the 
>>>>>>>>>>>> > > > >> view's partition keys needs a repair.  That's going to be 
>>>>>>>>>>>> > > > >> different from the partition key of the base table.  As a 
>>>>>>>>>>>> > > > >> result, on the base table, for each affected range, we'd 
>>>>>>>>>>>> > > > >> have to issue another compaction across the entire set of 
>>>>>>>>>>>> > > > >> sstables that could have the data the view needs 
>>>>>>>>>>>> > > > >> (potentially many GB), in order to send over the 
>>>>>>>>>>>> > > > >> corrected version of the partition, then send it over to 
>>>>>>>>>>>> > > > >> the view.  Without an index in place, we have to do yet 
>>>>>>>>>>>> > > > >> another scan, per-affected range.  
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> Consider the case of a single corrupted SSTable on the 
>>>>>>>>>>>> > > > >> view that's removed from the filesystem, or the data is 
>>>>>>>>>>>> > > > >> simply missing after being restored from an inconsistent 
>>>>>>>>>>>> > > > >> backup.  It presumably contains lots of partitions, which 
>>>>>>>>>>>> > > > >> maps to base partitions all over the cluster, in a lot of 
>>>>>>>>>>>> > > > >> different token ranges.  For every one of those ranges 
>>>>>>>>>>>> > > > >> (hundreds, to tens of thousands of them given the 
>>>>>>>>>>>> > > > >> checkerboard design), when finding the missing data in 
>>>>>>>>>>>> > > > >> the base, you'll have to perform a compaction across all 
>>>>>>>>>>>> > > > >> the SSTables that potentially contain the missing data 
>>>>>>>>>>>> > > > >> just to rebuild the view-oriented partitions that need to 
>>>>>>>>>>>> > > > >> be sent to the view.  The complexity of this operation 
>>>>>>>>>>>> > > > >> can be looked at as O(N*M) where N and M are the number 
>>>>>>>>>>>> > > > >> of ranges in the base table and the view affected by the 
>>>>>>>>>>>> > > > >> corruption, respectively.  Without an index in place, 
>>>>>>>>>>>> > > > >> finding the missing data is very expensive.  We 
>>>>>>>>>>>> > > > >> potentially have to do it several times on each node, 
>>>>>>>>>>>> > > > >> depending on the size of the range.  Smaller ranges 
>>>>>>>>>>>> > > > >> increase the size of the board exponentially, larger 
>>>>>>>>>>>> > > > >> ranges increase the number of SSTables that would be 
>>>>>>>>>>>> > > > >> involved in each compaction.
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> Then you send that data over to the view, the view does 
>>>>>>>>>>>> > > > >> it's anti-compaction thing, again, once per affected 
>>>>>>>>>>>> > > > >> range.  So now the view has to do an anti-compaction once 
>>>>>>>>>>>> > > > >> per block on the board that's affected by the missing 
>>>>>>>>>>>> > > > >> data.  
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> Doing hundreds or thousands of these will add up pretty 
>>>>>>>>>>>> > > > >> quickly.
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> When I said that a repair could take months, this is what 
>>>>>>>>>>>> > > > >> I had in mind.
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> 
>>>>>>>>>>>> > > > >> On Tue, Jun 3, 2025 at 11:10 AM Blake Eggleston 
>>>>>>>>>>>> > > > >> <bl...@ultrablake.com> wrote:
>>>>>>>>>>>> > > > >>> __
>>>>>>>>>>>> > > > >>> > Adds overhead in the hot path due to maintaining 
>>>>>>>>>>>> > > > >>> > indexes. Extra memory needed during write path and 
>>>>>>>>>>>> > > > >>> > compaction.
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > >>> I’d make the same argument about the overhead of 
>>>>>>>>>>>> > > > >>> maintaining the index that Jon just made about the disk 
>>>>>>>>>>>> > > > >>> space required. The relatively predictable overhead of 
>>>>>>>>>>>> > > > >>> maintaining the index as part of the write and 
>>>>>>>>>>>> > > > >>> compaction paths is a pro, not a con. Although you’re 
>>>>>>>>>>>> > > > >>> not always paying the cost of building a merkle tree 
>>>>>>>>>>>> > > > >>> with snapshot repair, it can impact the hot path and you 
>>>>>>>>>>>> > > > >>> do have to plan for it.
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > >>> > Verifies index content, not actual data—may miss 
>>>>>>>>>>>> > > > >>> > low-probability errors like bit flips
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > >>> Presumably this could be handled by the views performing 
>>>>>>>>>>>> > > > >>> repair against each other? You could also periodically 
>>>>>>>>>>>> > > > >>> rebuild the index or perform checksums against the 
>>>>>>>>>>>> > > > >>> sstable content.
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > >>> > Extra data scan during inconsistency detection
>>>>>>>>>>>> > > > >>> > Index: Since the data covered by certain indexes is 
>>>>>>>>>>>> > > > >>> > not guaranteed to be fully contained within a single 
>>>>>>>>>>>> > > > >>> > node as the topology changes, some data scans may be 
>>>>>>>>>>>> > > > >>> > wasted.
>>>>>>>>>>>> > > > >>> > Snapshots: No extra data scan
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > >>> This isn’t really the whole story. The amount of wasted 
>>>>>>>>>>>> > > > >>> scans on index repairs is negligible. If a difference is 
>>>>>>>>>>>> > > > >>> detected with snapshot repairs though, you have to read 
>>>>>>>>>>>> > > > >>> the entire partition from both the view and base table 
>>>>>>>>>>>> > > > >>> to calculate what needs to be fixed.
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > >>> On Tue, Jun 3, 2025, at 10:27 AM, Jon Haddad wrote:
>>>>>>>>>>>> > > > >>>> One practical aspect that isn't immediately obvious is 
>>>>>>>>>>>> > > > >>>> the disk space consideration for snapshots.
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> When you have a table with a mixed workload using LCS 
>>>>>>>>>>>> > > > >>>> or UCS with scaling parameters like L10 and initiate a 
>>>>>>>>>>>> > > > >>>> repair, the disk usage will increase as long as the 
>>>>>>>>>>>> > > > >>>> snapshot persists and the table continues to receive 
>>>>>>>>>>>> > > > >>>> writes. This aspect is understood and factored into the 
>>>>>>>>>>>> > > > >>>> design.
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> However, a more nuanced point is the necessity to 
>>>>>>>>>>>> > > > >>>> maintain sufficient disk headroom specifically for 
>>>>>>>>>>>> > > > >>>> running repairs. This echoes the challenge with STCS 
>>>>>>>>>>>> > > > >>>> compaction, where enough space must be available to 
>>>>>>>>>>>> > > > >>>> accommodate the largest SSTables, even when they are 
>>>>>>>>>>>> > > > >>>> not being actively compacted.
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> For example, if a repair involves rewriting 100GB of 
>>>>>>>>>>>> > > > >>>> SSTable data, you'll consistently need to reserve 100GB 
>>>>>>>>>>>> > > > >>>> of free space to facilitate this.
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> Therefore, while the snapshot-based approach leads to 
>>>>>>>>>>>> > > > >>>> variable disk space utilization, operators must 
>>>>>>>>>>>> > > > >>>> provision storage as if the maximum potential space 
>>>>>>>>>>>> > > > >>>> will be used at all times to ensure repairs can be 
>>>>>>>>>>>> > > > >>>> executed.
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> This introduces a rate of churn dynamic, where the 
>>>>>>>>>>>> > > > >>>> write throughput dictates the required extra disk 
>>>>>>>>>>>> > > > >>>> space, rather than the existing on-disk data volume.
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> If 50% of your SSTables are rewritten during a 
>>>>>>>>>>>> > > > >>>> snapshot, you would need 50% free disk space. Depending 
>>>>>>>>>>>> > > > >>>> on the workload, the snapshot method could consume 
>>>>>>>>>>>> > > > >>>> significantly more disk space than an index-based 
>>>>>>>>>>>> > > > >>>> approach. Conversely, for relatively static workloads, 
>>>>>>>>>>>> > > > >>>> the index method might require more space. It's not as 
>>>>>>>>>>>> > > > >>>> straightforward as stating "No extra disk space needed".
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> Jon
>>>>>>>>>>>> > > > >>>> 
>>>>>>>>>>>> > > > >>>> On Mon, Jun 2, 2025 at 2:49 PM Runtian Liu 
>>>>>>>>>>>> > > > >>>> <curly...@gmail.com> wrote:
>>>>>>>>>>>> > > > >>>>> > Regarding your comparison between approaches, I 
>>>>>>>>>>>> > > > >>>>> > think you also need to take into account the other 
>>>>>>>>>>>> > > > >>>>> > dimensions that have been brought up in this thread. 
>>>>>>>>>>>> > > > >>>>> > Things like minimum repair times and vulnerability 
>>>>>>>>>>>> > > > >>>>> > to outages and topology changes are the first that 
>>>>>>>>>>>> > > > >>>>> > come to mind.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Sure, I added a few more points.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> *Perspective*
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> *Index-Based Solution*
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> *Snapshot-Based Solution*
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 1. Hot path overhead
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Adds overhead in the hot path due to maintaining 
>>>>>>>>>>>> > > > >>>>> indexes. Extra memory needed during write path and 
>>>>>>>>>>>> > > > >>>>> compaction.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> No impact on the hot path
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 2. Extra disk usage when repair is not running
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Requires additional disk space to store persistent 
>>>>>>>>>>>> > > > >>>>> indexes
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> No extra disk space needed
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 3. Extra disk usage during repair
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Minimal or no additional disk usage
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Requires additional disk space for snapshots
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 4. Fine-grained repair  to deal with emergency 
>>>>>>>>>>>> > > > >>>>> situations / topology changes
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Supports fine-grained repairs by targeting specific 
>>>>>>>>>>>> > > > >>>>> index ranges. This allows repair to be retried on 
>>>>>>>>>>>> > > > >>>>> smaller data sets, enabling incremental progress when 
>>>>>>>>>>>> > > > >>>>> repairing the entire table. This is especially helpful 
>>>>>>>>>>>> > > > >>>>> when there are down nodes or topology changes during 
>>>>>>>>>>>> > > > >>>>> repair, which are common in day-to-day operations.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Coordination across all nodes is required over a long 
>>>>>>>>>>>> > > > >>>>> period of time. For each round of repair, if all 
>>>>>>>>>>>> > > > >>>>> replica nodes are down or if there is a topology 
>>>>>>>>>>>> > > > >>>>> change, the data ranges that were not covered will 
>>>>>>>>>>>> > > > >>>>> need to be repaired in the next round.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 5. Validating data used in reads directly
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Verifies index content, not actual data—may miss 
>>>>>>>>>>>> > > > >>>>> low-probability errors like bit flips
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Verifies actual data content, providing stronger 
>>>>>>>>>>>> > > > >>>>> correctness guarantees
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 6. Extra data scan during inconsistency detection
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Since the data covered by certain indexes is not 
>>>>>>>>>>>> > > > >>>>> guaranteed to be fully contained within a single node 
>>>>>>>>>>>> > > > >>>>> as the topology changes, some data scans may be wasted.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> No extra data scan
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 7. The overhead of actual data repair after an 
>>>>>>>>>>>> > > > >>>>> inconsistency is detected
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Only indexes are streamed to the base table node, and 
>>>>>>>>>>>> > > > >>>>> the actual data being fixed can be as accurate as the 
>>>>>>>>>>>> > > > >>>>> row level.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Anti-compaction is needed on the MV table, and 
>>>>>>>>>>>> > > > >>>>> potential over-streaming may occur due to the lack of 
>>>>>>>>>>>> > > > >>>>> row-level insight into data quality.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> > one of my biggest concerns I haven't seen discussed 
>>>>>>>>>>>> > > > >>>>> > much is LOCAL_SERIAL/SERIAL on read
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> Paxos v2 introduces an optimization where serial reads 
>>>>>>>>>>>> > > > >>>>> can be completed in just one round trip, reducing 
>>>>>>>>>>>> > > > >>>>> latency compared to traditional Paxos which may 
>>>>>>>>>>>> > > > >>>>> require multiple phases.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> > I think a refresh would be low-cost and give users 
>>>>>>>>>>>> > > > >>>>> > the flexibility to run them however they want.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>>>> I think this is an interesting idea. Does it suggest 
>>>>>>>>>>>> > > > >>>>> that the MV should be rebuilt on a regular schedule? 
>>>>>>>>>>>> > > > >>>>> It sounds like an extension of the snapshot-based 
>>>>>>>>>>>> > > > >>>>> approach—rather than detecting mismatches, we would 
>>>>>>>>>>>> > > > >>>>> periodically reconstruct a clean version of the MV 
>>>>>>>>>>>> > > > >>>>> based on the snapshot. This seems to diverge from the 
>>>>>>>>>>>> > > > >>>>> current MV model in Cassandra, where consistency 
>>>>>>>>>>>> > > > >>>>> between the MV and base table must be maintained 
>>>>>>>>>>>> > > > >>>>> continuously. This could be an extension of the CEP-48 
>>>>>>>>>>>> > > > >>>>> work, where the MV is periodically rebuilt from a 
>>>>>>>>>>>> > > > >>>>> snapshot of the base table, assuming the user can 
>>>>>>>>>>>> > > > >>>>> tolerate some level of staleness in the MV data.
>>>>>>>>>>>> > > > >>>>> 
>>>>>>>>>>>> > > > >>> 
>>>>>>>>>>>> > > > 
>>>>>>>>>>>> > > 
>>>>>>>>>>>> > 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>

Re: [DISCUSS] CEP-48: First-Class Materialized View Support

Reply via email to