Re: Challenge with initial data load with TWCS

2019-09-28 Thread Jeff Jirsa
We used to do either: - CQLSSTableWriter and explicitly break between windows (then nodetool refresh or sstableloader to push them into the system), or - Use the normal write path for a single window at a time, explicitly calling flush between windows. You can’t have current data writing whi

Re: Cluster sizing for huge dataset

2019-09-28 Thread Jeff Jirsa
A few random thoughts here 1) 90 nodes / 900T in a cluster isn’t that big. petabyte per cluster is a manageable size. 2) The 2TB guidance is old and irrelevant for most people, what you really care about is how fast you can replace the failed machine You’d likely be ok going significantly lar

Challenge with initial data load with TWCS

2019-09-28 Thread DuyHai Doan
Hello users TWCS works great for permanent state. It creates SSTables of roughly fixed size if your insertion rate is pretty constant. Now the big deal is about the initial load. Let's say we configure a TWCS with window unit = day and window size = 1, we would have 1 SSTable per day and with TT

Cluster sizing for huge dataset

2019-09-28 Thread DuyHai Doan
Hello users I'm facing with a very challenging exercise: size a cluster with a huge dataset. Use-case = IoT Number of sensors: 30 millions Frequency of data: every 10 minutes Estimate size of a data: 100 bytes (including clustering columns) Data retention: 2 years Replication factor: 3 (pretty s