While many users may apparently be using MVs successfully, the problem is how few (if any) know what guarantees they are getting. Since we aren’t even absolutely certain ourselves, it cannot be many. Most of the shortcomings we are aware of are complicated, concern failure scenarios and aren’t fully explained; i.e. if you’re lucky they’ll never be a problem, but some users must surely be bitten, and they won’t have had fair warning. The same goes for as-yet undiscovered edge cases.
It is my humble opinion that averting problems like this for just a handful of users, that cannot readily tolerate corruption, offsets any inconvenience we might cause to those who can. For the record, while it’s true that detecting inconsistencies is as much of a problem for user-rolled solutions, it’s worth remembering that the inconsistencies themselves are not equally likely: In cases where C* is not the database of record, it is quite easy to provide very good consistency guarantees when rolling your own Conversely, a global-CAS with synchronous QUORUM updates that are retried until success, while much slower, also doesn’t easily suffer these consistency problems, and is the naive approach a user might take if C* were the database of record Given our approach isn’t uniformly superior, I think we should be very cautious about how it is made available until we’re very confident in it, and we and the community fully understand it. > On 3 Oct 2017, at 18:51, kurt greaves <k...@instaclustr.com> wrote: > > Lots of users are already using MV's, believe it or not in some cases quite > effectively and also on older versions which were still exposed to a lot of > the bugs that cause inconsistencies. 3.11.1 has come a long way since then > and I think with a bit more documentation around the current issues marking > MV's as experimental is unnecessary and likely annoying for current users. > On that note we've already had complaints about changing defaults and > behaviours willy nilly across majors and minors, I can't see this helping > our cause. Sure, you can make it "seamless" from an upgrade perspective, > but that doesn't account for every single way operators do things. I'm sure > someone will express surprise when they run up a new cluster or datacenter > for testing with default config and find out that they have to enable MV's. > Meanwhile they've been using them the whole time and haven't had any major > issues because they didn't touch the edge cases. > > I'd like to point out that introducing "experimental" features sets a > precedent for future releases, and will likely result in using the > "experimental" tag to push out features that are not ready (again). In fact > we already routinely say >=3 isn't production ready yet, so why don't we > just mark 3+ as "experimental" as well? I don't think experimental is the > right approach for a database. The better solution, as I said, is more > verification and testing during the release process (by users!). A lot of > other projects take this approach, and it certainly makes sense. It could > also be coupled with beta releases, so people can start getting > verification of their new features at an earlier date. Granted this is > similar to experimental features, but applied to the whole release rather > than just individual features. > > * There's no way to determine if a view is out of sync with the base table. >> > As already pointed out by Jake, this is still true when you don't use > MV's. We should document this. I think it's entirely fair to say that > users *should > not *expect this to be done for them. There is also no way for a user to > determine they have inconsistencies short of their own verification. And > also a lot of the synchronisation problems have been resolved, undoubtedly > there are more unknowns out there but what MV's have is still better than > managing your own. > >> * If you do determine that a view is out of sync, the only way to fix it >> is to drop and rebuild the view. >> > This is undoubtedly a problem, but also no worse than managing your own > views. Also at least there is still a way to fix your view. It certainly > shouldn't be as common in 3.11.1/3.0.15, and we have enough insight now to > be able to tell when out of sync will actually occur, so we can document > those cases. > >> * There are liveness issues with updates being reflected in the view. > > What specific issues are you referring to here? The only one I'm aware of > is deletion of unselected columns in the view affecting out of order > updates. If we deem this a major problem we can document it or at least put > a restriction in place until it's fixed in CASSANDRA-13826 > <https://issues.apache.org/jira/browse/CASSANDRA-13826> > > > In this case, 'out of sync' means 'you lost data', since the current design >> + repair should keep things eventually consistent right? > > I'd like Zhao or Paulo to confirm here but I believe the only way you can > really "lose data" (that can't be repaired) here would be partition > deletions on massively wide rows in the view that will not fit in the > batchlog (256mb/max value size) as it currently stands. Frankly this is > probably an anti-pattern for MV's at the moment anyway and one we should > advise against.