Stability of MaterializedView in 3.11.x | 4.0
Hello, concern about Materialized Views (MVs) in Cassandra. Unfortunately starting with version 3.11, MVs are officially considered experimental and not ready for production use, as you can read here: http://mail-archives.apache.org/mod_mbox/cassandra-user/201710.mbox/%3cetpan.59f24f38.438f4e99.7...@apple.com%3E Can you please someone give some productive feedback on this ? it would help us to further implementation around the MVs in Cassandra. Does anyone facing some critical issue or data lose or synchronization issue ? Regards Pankaj. -- -- Regards Pankkaj.
Re: Stability of MaterializedView in 3.11.x | 4.0
Hi Michael, Thanks for putting very clever information " Users of MVs *must* determine for themselves, through thorough testing and understanding, if they wish to use them." And this concluded that if there is any issue occur in future then only solution is to rebuild the MVs since Cassandra does not able to make consistent synch well. Also, we practically using the 10+ MVs and as of now, we have not faced any issue, so my question to all community member, does anyone face any critical issues ? so we need to start migration from MVs to manual query base table ? Also, I can understand now, it's experimental and not ready for production, so if possible, please ignore it only right ? Thanks Pankaj On 27/08/19, 19:03, "Michael Shuler" wrote: It appears that you found the first message of the chain. I suggest reading the linked JIRA and the complete dev@ thread that arrived at this conclusion; there are loads of well formed opinions and information. Users of MVs *must* determine for themselves, through thorough testing and understanding, if they wish to use them. Linkage: https://issues.apache.org/jira/browse/CASSANDRA-13959 (sub-linkage..) https://issues.apache.org/jira/browse/CASSANDRA-13595 https://issues.apache.org/jira/browse/CASSANDRA-13911 https://issues.apache.org/jira/browse/CASSANDRA-13880 https://issues.apache.org/jira/browse/CASSANDRA-12872 https://issues.apache.org/jira/browse/CASSANDRA-13747 Very much worth reading the complete thread: part1: https://lists.apache.org/thread.html/d81a61da48e1b872d7599df4edfa8e244d34cbd591a18539f724796f@ part2: https://lists.apache.org/thread.html/19b7fcfd3b47f1526d6e993b3bb97f6c43e5ce204bc976ec0701cdd3@ Quick JQL for open tickets with "mv": https://issues.apache.org/jira/issues/?jql=project%20%3D%20CASSANDRA%20AND%20text%20~%20mv%20AND%20status%20!%3D%20Resolved -- Michael On 8/27/19 5:01 AM, pankaj gajjar wrote: > Hello, > > > > concern about Materialized Views (MVs) in Cassandra. Unfortunately starting > with version 3.11, MVs are officially considered experimental and not ready > for production use, as you can read here: > > > > http://mail-archives.apache.org/mod_mbox/cassandra-user/201710.mbox/%3cetpan.59f24f38.438f4e99.7...@apple.com%3E > > > > Can you please someone give some productive feedback on this ? it would > help us to further implementation around the MVs in Cassandra. > > > > Does anyone facing some critical issue or data lose or synchronization > issue ? > > > > Regards > > Pankaj. > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Stability of MaterializedView in 3.11.x | 4.0
Understand it well, how about Cassandra running on single node, we don’t have cluster setup (3 nodes+ i.e). Does MVs perform well on single node machine ? Note: I know about HA, so lets keep it side for now and it's only possible when we have cluster setup. On 29/08/19, 06:21, "Dor Laor" wrote: On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad wrote: > > Arguably, the other alternative to server-side denormalization is to do > the denormalization client-side which comes with the same axes of costs and > complexity, just with more of each. > > That's not completely true. You can write to any number of tables without > doing a read, and the cost of reading data off disk is significantly > greater than an insert alone. You can crush a cluster with a write heavy > workload and MVs that would otherwise be completely fine to do all writes. > > The other issue with MVs is that you still need to understand fundamentals > of data modeling, that don't magically solve the problem of enormous > partitions. One of the reasons I've had to un-MV a lot of clusters is > because people have put an MV on a table with a low-cardinality field and > found themselves with a 10GB partition nightmare, so they need to go back > and remodel the view as something more complex anyways. In this case, the > MV was extremely high cost since now they've not only pushed out a poor > implementation to begin with but now have the cost of a migration as well > as a rewrite. > +1 Moreover, the hard part is that an update for the base table means that the original data needs to be read and the database (or the poor developer who implements the denormalized model) needs to delete the data in the view and then to write the new ones. All need to be of course resilient to all types of errors and failures. Had it been simple, there was no need for a database MV.. > > > > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie > wrote: > > > > > > > so we need to start migration from MVs to manual query base table ? > > > > Arguably, the other alternative to server-side denormalization is to do > > the denormalization client-side which comes with the same axes of costs > and > > complexity, just with more of each. > > > > Jeff's spot on when he discusses the risk appetite vs. mitigation aspect > of > > it. There's a reason banks do end-of-day close-out validation analysis > and > > have redundant systems for things like this. > > > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad wrote: > > > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate away > from > > > MVs due to inconsistencies, issues with streaming (have you added or > > > removed nodes yet?), and massive performance issues to the point of > > cluster > > > failure under (what I consider) trivial load. I haven't gone too deep > > into > > > analyzing their issues, folks are usually fine with "move off them", vs > > > having me do a ton of analysis. > > > > > > tlp-stress has a materialized view workload built in, and you can add > > > arbitrary CQL via the --cql flag to add a MV to any existing workload > > such > > > as KeyValue or BasicTimeSeries. > > > > > > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa wrote: > > > > > > > There have been people who have had operational issues related to MVs > > > (many > > > > of them around running repair), but the biggest concern is > correctness. > > > > > > > > It probably ultimately depends on what type of database you're > running. > > > If > > > > you're running some sort of IOT / analytics workload and you just > want > > > > another way to SELECT the data, but you won't notice one of a billion > > > > records going missing, using MVs may be fine. If you're a bank, and > one > > > of > > > > a billion records going missing means you lose someone's bank > account, > > I > > > > would avoid using MVs. > > > > > > > > It's all just risk management. > > > > > > > > On Wed, Aug 28, 2019 at 7:18 AM Pankaj Gajjar < > > > > pankaj.gaj...
Re: Stability of MaterializedView in 3.11.x | 4.0
Hi Team, Thanks but this is not point, question again in mind, do we have any plan to fix this MVs issue into upcoming any Cassandra release ? 4.0 ? if yes then it would be great to wait. Or is there any plugin or workaround to resolve this issue well on Cassandra setup ? -- Regards Pankaj G. On 31/08/19, 00:33, "Jon Haddad" wrote: If you don't have any intent on running across multiple nodes, Cassandra is probably the wrong DB for you. Postgres will give you a better feature set for a single node. On Fri, Aug 30, 2019 at 5:23 AM Pankaj Gajjar wrote: > Understand it well, how about Cassandra running on single node, we don’t > have cluster setup (3 nodes+ i.e). > > Does MVs perform well on single node machine ? > > Note: I know about HA, so lets keep it side for now and it's only possible > when we have cluster setup. > > On 29/08/19, 06:21, "Dor Laor" wrote: > > On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad wrote: > > > > Arguably, the other alternative to server-side denormalization is > to do > > the denormalization client-side which comes with the same axes of > costs and > > complexity, just with more of each. > > > > That's not completely true. You can write to any number of tables > without > > doing a read, and the cost of reading data off disk is significantly > > greater than an insert alone. You can crush a cluster with a write > heavy > > workload and MVs that would otherwise be completely fine to do all > writes. > > > > The other issue with MVs is that you still need to understand > fundamentals > > of data modeling, that don't magically solve the problem of enormous > > partitions. One of the reasons I've had to un-MV a lot of clusters > is > > because people have put an MV on a table with a low-cardinality > field and > > found themselves with a 10GB partition nightmare, so they need to go > back > > and remodel the view as something more complex anyways. In this > case, the > > MV was extremely high cost since now they've not only pushed out a > poor > > implementation to begin with but now have the cost of a migration as > well > > as a rewrite. > > > > +1 > > Moreover, the hard part is that an update for the base table means that > the original data needs to be read and the database (or the poor > developer > who implements the denormalized model) needs to delete the data in the > view > and then to write the new ones. All need to be of course resilient to > all > types of > errors and failures. Had it been simple, there was no need for a > database > MV.. > > > > > > > > > > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie < > jmcken...@apache.org> > > wrote: > > > > > > > > > > so we need to start migration from MVs to manual query base > table ? > > > > > > Arguably, the other alternative to server-side denormalization is > to do > > > the denormalization client-side which comes with the same axes of > costs > > and > > > complexity, just with more of each. > > > > > > Jeff's spot on when he discusses the risk appetite vs. mitigation > aspect > > of > > > it. There's a reason banks do end-of-day close-out validation > analysis > > and > > > have redundant systems for things like this. > > > > > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad > wrote: > > > > > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate > away > > from > > > > MVs due to inconsistencies, issues with streaming (have you > added or > > > > removed nodes yet?), and massive performance issues to the point > of > > > cluster > > > > failure under (what I consider) trivial load. I haven't gone > too deep > > > into > > > > analyzing their issues, folks are usually fine with "move off > them", vs > >
Virtual tables in Cassandra 4.0
Hello All, I was exploring the Virtual tables and there is not much material available on internet or even apache Cassandra site. I found one link little old one : https://thelastpickle.com/blog/2019/03/08/virtual-tables-in-cassandra-4_0.html have you guys explored it ? what use case it’s fit well ? Regards Pankaj G.
materialise view creation taking too much time with 12 million data set
Hi everyone, recently we are in big trouble while creating the materialise view(mv) on 12 million data set and then we wait for almost 72 hours. with same data model with half of data set 7 million it took only 10-15 mins to prepare same mv. please welcome any input on this.. -- Regards, Pankaj G. ContentSphere
Re: Login page with multiple types of users - Data Modelling
Hi Sandeep, In Cassandra, if you are thinking for Data Modelling such use case then first think how you are accessing data ? what is pattern and base on that you can build the table whether it's single table or different table given partition key. On Wed, Oct 25, 2017 at 1:10 PM, sandeep gajjam wrote: > Hi All, > > > > I want to create a login page with multiple types of users (Admin, Normal > Users, Analytics team, up-loaders, annotators) i can add a usertype and > divert them to the respective pages but I have a tables where i need to > store multiple users(example up-loaders and annotators) in such case > creating each table for user kind would b e better approach ? > > > > Regards, > > Sandeep G > > > > > -- Regards, Pankaj G. ContentSphere