Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
I started to respond, then realized I and the other OP posters are not thinking the same: What is the business case for availability, data los/reload/recoverability? You all argue for higher availability and damn the cost. But noone asked "can you lose access, for 20 minutes, to a portion of the da

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Was assuming reaper did incremental?  That was probably a bad assumption. nodetool repair -pr I know it well now! :) -Joe On 8/17/2023 4:47 PM, Bowen Song via user wrote: I don't have experience with Cassandra on Kubernetes, so I can't comment on that. For repairs, may I interest you with i

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
I don't have experience with Cassandra on Kubernetes, so I can't comment on that. For repairs, may I interest you with incremental repairs? It will make repairs hell of a lot faster. Of course, occasional full repair is still needed, but that's another story. On 17/08/2023 21:36, Joe Obernb

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Thank you.  Enjoying this conversation. Agree on blade servers, where each blade has a small number of SSDs.  Yeh/Nah to a kubernetes approach assuming fast persistent storage?  I think that might be easier to manage. In my current benchmarks, the performance is excellent, but the repairs are

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
From my experience, that's not entirely true. For large nodes, the bottleneck is usually the JVM garbage collector. The the GC pauses can easily get out of control on very large heaps, and long STW pauses may also result in nodes flip up and down from other nodes' perspective, which often rende

Re: Big Data Question

2023-08-17 Thread daemeon reiydelle
A lot of (actually all) seem to be based on local nodes with 1gb networks of spinning rust. Much of what is mentioned below is TOTALLY wrong for cloud. So clarify whether you are "real world" or rusty slow data center world (definitely not modern DC either). E.g. should not handle more than 2tb of

Re: Materialized View inconsistency issue

2023-08-17 Thread Miklosovic, Stefan
Why can't you do it like this? You would have two tables: create table visits (user_id bigint, visitor_id bigint, visit_date timestamp, primary key ((user_id, visitor_id), visit_date)) order by visit_date desc create table visitors_by_user_id (user_id bigint, visitor_id bigint, primary key ((us

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
The optimal node size largely depends on the table schema and read/write pattern. In some cases 500 GB per node is too large, but in some other cases 10TB per node works totally fine. It's hard to estimate that without benchmarking. Again, just pointing out the obvious, you did not count the o

RE: Big Data Question

2023-08-17 Thread Durity, Sean R via user
For a variety of reasons, we have clusters with 5 TB of disk per host as a “standard.” In our larger data clusters, it does take longer to add/remove nodes or do things like upgradesstables after an upgrade. These nodes have 3+TB of actual data on the drive. But, we were able to shrink the node

Re: Big Data Question

2023-08-17 Thread C. Scott Andreas
A few thoughts on this:– 80TB per machine is pretty dense. Consider the amount of data you'd need to re-replicate in the event of a hardware failure that takes down all 80TB (DIMM failure requiring replacement, non-reduntant PSU failure, NIC, etc).– 24GB of heap is also pretty generous. Dependi

Re: Big Data Question

2023-08-17 Thread Joe Obernberger
Thanks for this - yeah - duh - forgot about replication in my example! So - is 2TBytes per Cassandra instance advisable?  Better to use more/less?  Modern 2u servers can be had with 24 3.8TBtyte SSDs; so assume 80Tbytes per server, you could do: (1024*3)/80 = 39 servers, but you'd have to run 40

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
Just pointing out the obvious, for 1PB of data on nodes with 2TB disk each, you will need far more than 500 nodes. 1, it is unwise to run Cassandra with replication factor 1. It usually makes sense to use RF=3, so 1PB data will cost 3PB of storage space, minimal of 1500 such nodes. 2, depend

Re: 2 nodes marked as '?N' in 5 node cluster

2023-08-17 Thread Bowen Song via user
The first thing to look is the logs, specifically, the /var/log/cassandra/system.log file on each node. 5 seconds time drift is enough to cause Cassandra to fail. You should ensure the time difference between Cassandra nodes is very low by ensure time sync is working correctly, otherwise cross