Oh sorry - a cluster per application makes sense. Sharding within an application makes sense to avoid very very very large clusters (think: ~thousand nodes). 1 cluster per app/use case.
On Fri, Nov 12, 2021 at 1:39 PM S G <sg.online.em...@gmail.com> wrote: > Thanks Jeff. > Any side-effect on the client config from small clusters perspective? > > Like several smaller clusters means more CassandraClient objects on the > client side but I guess number of connections shall remain the same as > number of physical nodes will most likely remain the same only. So I think > client side would not see any major issue. > > > On Fri, Nov 12, 2021 at 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote: > >> Most people are better served building multiple clusters and spending >> their engineering time optimizing for maintaining multiple clusters, vs >> spending their engineering time learning how to work around the sharp edges >> that make large shared clusters hard. >> >> Large multi-tenant clusters give you less waste and a bit more elasticity >> (one tenant can burst and use spare capacity that would typically be left >> for the other tenants). However, one bad use case / table can ruin >> everything (one bad read that generates GC hits all use cases), and >> eventually certain mechanisms/subsystems dont scale past certain points >> (e.g. schema - large schemas and large clusters are much harder than small >> schemas and small clusters) >> >> >> >> >> On Fri, Nov 12, 2021 at 11:31 AM S G <sg.online.em...@gmail.com> wrote: >> >>> Hello, >>> >>> Is there any case where we would prefer one big giant cluster (with >>> multiple large tables) over several smaller clusters? >>> Apart from some management overhead of multiple Cassandra Clients, it >>> seems several smaller clusters are always better than a big one: >>> >>> 1. Avoids SPOF for all tables >>> 2. Helps debugging (less noise from all tables in the logs) >>> 3. Traffic spikes on one table do not affect others if they are in >>> different tables. >>> 4. We can scale tables independently of each other - so colder data >>> can be in a smaller cluster (more data/node) while hotter data can be on >>> a >>> bigger cluster (less data/node) >>> >>> >>> It does not mean that every table should be in its own cluster. >>> But large ones can be moved to their own dedicated clusters (like those >>> more than a few terabytes). >>> And smaller ones can be clubbed together in one or few clusters. >>> >>> Please share any recommendations for the above from actual production >>> experiences. >>> Thanks for helping ! >>> >>>