Thanks for the information, Yifan and James!Given that, we can scope this email discussion only for this specific MV repair. Two points:1. Can this MV repair job provide some value addition?2. If yes, does it even make sense to merge this MV repair tooling, which uses Spak as its underlying technol
Oh, I just noticed that James already mentioned it.
On Fri, Dec 6, 2024 at 3:51 PM Yifan Cai wrote:
> I would like to highlight an existing tooling for "many things beyond the
> MV work, such as counting rows, etc."
>
> The Apache Cassandra Analytics project (
> http://github.com/apache/cassandr
I would like to highlight an existing tooling for "many things beyond the
MV work, such as counting rows, etc."
The Apache Cassandra Analytics project (
http://github.com/apache/cassandra-analytics/) could be a great resource
for this type of task. It reads directly from the SSTables in the Spark
There are two approaches I have been thinking about for MV.
*1. **Short Term (**Status Quo)*
Here, we do not improve Cassandra MV architecture such that it reduces the
data inconsistencies drastically; thus, we continually mark MV as an
experimental feature.
In this case, we can have two suboptio
It feels uncomfortable asking users to rely on a third party that’s as
heavy-weight as spark to use a built-in feature.
Can we really not do this internally? I get that the obvious way with merkle
trees is hard because the range fanout of the MV using a different partitioner,
but have we tried
I think this would be useful and - having never really used Materialized
Views - I didn't know it was an issue for some users. I would say the
Cassandra Analytics library (http://github.com/apache/cassandra-analytics/)
could be utilized for much of this, with a specialized Spark job for this
purpos
Hi,
*NOTE: *This email does not promote using Cassandra's Materialized View
(MV) but assists those stuck with it for various reasons.
The primary issue with MV is that once it goes out of sync with the base
table, no tooling is available to remediate it. This Spark job aims to fill
this gap by lo