RE: Recycled-Commitlogs

2025-06-28 Thread Jeff Jirsa
atible I naturally assumed that > these items were in both applications. > > Marc > > From: Jeff Jirsa > Sent: Thursday, June 26, 2025 7:35 PM > To: user@cassandra.apache.org > Cc: user@cassandra.apache.org > Subject: Re: Recycled-Commitlogs > > EXTER

RE: Recycled-Commitlogs

2025-06-28 Thread Jeff Jirsa
atible I naturally assumed that > these items were in both applications. > > Marc > > From: Jeff Jirsa > Sent: Thursday, June 26, 2025 7:35 PM > To: user@cassandra.apache.org > Cc: user@cassandra.apache.org > Subject: Re: Recycled-Commitlogs > > EXTER

Re: Recycled-Commitlogs

2025-06-26 Thread Jeff Jirsa
What version of cassandra is this? Recycling segments was a thing from like 1.1 to 2.2 but really very different in modern versions (and cdc / point in time backup mirrors some of the concepts around hanging onto segments)Knowing the version would be super helpful though Is this … 1.2? 2.0?On Jun 2

Re: How to fix delta data when reattach EBS async replica?

2025-03-03 Thread Jeff Jirsa
you do not want to do this until you build a system that archives and restore the commitlog segments. > On Feb 28, 2025, at 6:10 PM, Jeff Jirsa wrote: > >  > >> On 2025/03/01 00:27:27 Jaydeep Chovatia wrote: >> Hi, >> >> I want to reattach an asynchronousl

Re: How to fix delta data when reattach EBS async replica?

2025-02-28 Thread Jeff Jirsa
On 2025/03/01 00:27:27 Jaydeep Chovatia wrote: > Hi, > > I want to reattach an asynchronously replicated EBS volume to Cassandra. I > want to know how to fix the delta inconsistency when reattaching other than > running a repair on the dataset. > > Here is the scenario. > Three Cassandra nodes

Re: Enable audit log

2025-01-14 Thread Jeff Jirsa
Surprising. Feels like something that should change. If it’s enabled in yaml, why WOULDNT we want it started on start? > On Jan 14, 2025, at 7:40 AM, Štefan Miklošovič wrote: > > Hi Sebastian, > > the behaviour you see seems to be a conscious decision: > > https://github.com/apache/cassand

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-18 Thread Jeff Jirsa
I think this is one of those cases where if someone tells us they’re feeling pain, instead of telling them it shouldn’t be painful, we try to learn a bit more about the pain. For example, both you and Scott expressed surprise at the concern of rolling restarts (you repeatedly, Scott mentioned t

Re: Cassandra 5 Upgrade - Storage Compatibility Modes

2024-12-17 Thread Jeff Masud
We have similar issues with 3.x repairs, and run manually as well as with Reaper.  Can someone tell me, if I cannot get a table repaired because it is locking up a node, is it still possible to upgrade to 4.0?  Jeff From: Jon Haddad Reply-To: Date: Tuesday, December 17, 2024 at 2:20 PM

Re: Cassandra Restore Issue

2024-12-01 Thread Jeff Jirsa
3.11.2 is from Feb 2018. It’s a 6 year old release. It’s VERY hard to guess what’s happening here without a lot more info. How are you doing backups? How are you doing restores? What consistency level are you using for writes? Reads? Is the data in the sstable (can you find it with sstabledump?

Re: Token Assignment Strategy for Single-Token Nodes with Multi-Datacenter

2024-11-30 Thread Jeff Jirsa
You've enumerated the options and tradeoffs correctly. I've personally seen both implemented, and they're both fine. With option 1, there's also an option that you don't just do "primary range" based repairs, but rather, let a scheduler run through the token range, and use any replica in any D

Re: Unexplained stuck memtable flush

2024-11-05 Thread Jeff Jirsa
> On Nov 5, 2024, at 4:12 AM, Bowen Song via user > wrote: > > Writes on this node starts to timeout and fail. But if left untouched, it's > only gonna get worse, and eventually lead to JVM OOM and crash. > > By inspecting the heap dump created at OOM, we can see that both of the > Memtable

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-09 Thread Jeff Jirsa
The easiest option here, though still unpleasant, is sstabledump to json and look at the tombstoneUsually when this happens it’s because something unexpected is happening - actually writing nulls or doing deleted or weird short TTLs without realizing it Dump the sstable and look, it’ll be faster th

Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread Jeff Jirsa
You don’t have to double. You can add 1 node at a time - you just have to move every other token to stay balancedMost people don’t write the tooling to do that, but it’s not that complicatedCalculate the token positions with N nodesCalculate the token positions with N+1 nodes Bootstrap the new mach

Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread Jeff Jirsa
I’ll take a slightly different position - people who never expect to change the cluster shouldn’t care which they’re using, people who want to grow by 10-20% often should probably use vnodes, everyone else can probably figure out how to get by with single token, with the caveat that they’ll prob

Re: Recommend Cassandra consultant

2024-09-30 Thread Jeff Masud
clean repair before performing an upgrade?  When upgrading we go to 3.11.x first, then 4.0.x, then 4.1.x, is that correct?  Jeff From: Jon Haddad Reply-To: Date: Friday, September 27, 2024 at 3:16 PM To: Subject: Re: Recommend Cassandra consultant Thank you both for the

Recommend Cassandra consultant

2024-09-26 Thread Jeff Masud
nd looking to upgrade to a newer version once we can get a repair successfully. Please reach out to me directly. Thanks Jeff -- Jeff Masud Deasil Works 818-945-0821 x107 310-918-5333 Mobile jeff@deasil.works

Re: [EXTERNAL] Cassandra 3.11 - below normal disk read after restart

2024-09-06 Thread Jeff Jirsa
to see what those nodes were doing at the time, vs what they’re doing “normally”. > On Sep 6, 2024, at 12:29 PM, Pradeep Badiger wrote: > > Thanks, Jeff. We use QUORUM consistency for reads and writes. Even we are > clueless as to why such an issue could occur. Do you think rest

Re: Cassandra 3.11 - below normal disk read after restart

2024-09-06 Thread Jeff Jirsa
If they went up by 1/7th, could potentially assume it was something related to the snitch not choosing the restarted host. They went up by a lot (2-3x?). What consistency level do you use for reads and writes, and do you have graphs for local reads / hint delivery? (I’m GUESSING that you’re seei

Re: null values injected while drop compact storage was executed

2024-05-07 Thread Jeff Jirsa
This sounds a lot like cassandra-13004 which was fixed, but broke data being read-repaired during an alter statement I suspect it’s not actually that same bug, but may be close/related. Reproducing it reliably would be a huge help. - Jeff > On May 7, 2024, at 1:50 AM, Matthias Pfau

Re: ssl certificate hot reloading test - cassandra 4.1

2024-04-15 Thread Jeff Jirsa
It seems like if folks really want the life of a connection to be finite (either client/server or server/server), adding in an option to quietly drain and recycle a connection on some period isn’t that difficult. That type of requirement shows up in a number of environments, usually on interact

Re: Datacenter decommissioning on Cassandra 4.1.4

2024-04-08 Thread Jeff Jirsa
To Jon’s point, if you remove from replication after step 1 or step 2 (probably step 2 if your goal is to be strictly correct), the nodetool decommission phase becomes almost a no-op. If you use the order below, the last nodes to decommission will cause those surviving machines to run out of s

Re: Schema inconsistency in mixed-version cluster

2023-12-12 Thread Jeff Jirsa
A static collection is probably atypical, and again, would encourage you to open a JIRA. This seems like a case we should be able to find in a simulator. On Tue, Dec 12, 2023 at 2:05 PM Sebastian Marsching wrote: > I assume these are column names of a non-system table. > > This is correct. It

Re: Schema inconsistency in mixed-version cluster

2023-12-12 Thread Jeff Jirsa
This deserves a JIRA On Tue, Dec 12, 2023 at 8:30 AM Sebastian Marsching wrote: > Hi, > > while upgrading our production cluster from C* 3.11.14 to 4.1.3, we > experienced the issue that some SELECT queries failed due to supposedly no > replica being available. The system logs on the C* nodes

Re: Remove folders of deleted tables

2023-12-05 Thread Jeff Jirsa
The last time you mentioned this: On Tue, Dec 5, 2023 at 11:57 AM Sébastien Rebecchi wrote: > Hi Bowen, > > Thanks for your answer. > > I was thinking of extreme use cases, but as far as I am concerned I can > deal with creation and deletion of 2 tables every 6 hours for a keyspace. > So it lets

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Jeff Jirsa
I don’t work for datastax, thats not my blog, and I’m on a phone and potentially missing nuance, but I’d never try to convert a cluster to IR by disabling auto compaction . It sounds very much out of date or its optimized for fixing one node in a cluster somehow. It didn’t make sense in the 4.0

Re: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Jeff Jirsa
> On Oct 26, 2023, at 12:32 AM, Michalis Kotsiouros (EXT) via user > wrote: > >  > Hello Cassandra community, > We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan > to do this per data center. > During the upgrade, a cluster with mixed SW levels is expected. At thi

Re: Resources to understand rebalancing

2023-10-25 Thread Jeff Jirsa
Data ownership is defined by the token ring concept. Hosts in the cluster may have tokens - let's oversimplify to 5 hosts, each with 1 token A=0, B=1000, C=2000, D=3000, E=4000 The partition key is hashed to calculate the token, and the next 3 hosts in the ring are the "owners" of that data - a k

Re: Cassandra 4.0.6 token mismatch issue in production environment

2023-10-23 Thread Jeff Jirsa
happen? > > Jaydeep > > On Sat, Oct 21, 2023 at 10:25 AM Jaydeep Chovatia < > chovatia.jayd...@gmail.com> wrote: > >> Thanks, Jeff! >> I will keep this thread updated on our findings. >> >> Jaydeep >> >> On Sat, Oct 21, 2023 at 9:37 AM Jeff

Re: Cassandra 4.0.6 token mismatch issue in production environment

2023-10-21 Thread Jeff Jirsa
That code path was added to protect against invalid gossip states For this logger to be issued, the coordinator receiving the query must identify a set of replicas holding the data to serve the read, and one of the selected replicas must disagree that it’s a replica based on its view of the toke

Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Jeff Jirsa
Just to be clear: - How many of the proxy nodes are you providing as contact points? One of them or all of them? It sounds like you're saying you're passing all of them, and only one is connecting, and the driver is declining to connect to the rest because they're not in system.peers. I'm not sur

Re: Startup errors - 4.1.3

2023-08-30 Thread Jeff Jirsa
There are at least two bugs in the compaction lifecycle transaction log - one that can drop an ABORT / ADD in the wrong order (and prevent startup), and one that allows for invalid timestamps in the log file (and again, prevent startups). I believe it's safe to work around the former by removing

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
ing aka faster streaming also works for > STCS. > > Dinesh > > On Aug 21, 2023, at 8:01 AM, Jeff Jirsa wrote: > >  > There's a lot of questionable advice scattered in this thread. Set aside > most of the guidance like 2TB/node, it's old and super nuanced

Re: Big Data Question

2023-08-21 Thread Jeff Jirsa
There's a lot of questionable advice scattered in this thread. Set aside most of the guidance like 2TB/node, it's old and super nuanced. If you're bare metal, do what your organization is good at. If you have millions of dollars in SAN equipment and you know how SANs work and fail and get backed u

Re: Big Data Question

2023-08-16 Thread Jeff Jirsa
A lot of things depend on actual cluster config - compaction settings (LCS vs STCS vs TWCS) and token allocation (single token, vnodes, etc) matter a ton. With 4.0 and LCS, streaming for replacement is MUCH faster, so much so that most people should be fine with 4-8TB/node, because the rebuild tim

Re: Cassandra p95 latencies

2023-08-11 Thread Jeff Jirsa
You’re going to have to help us help you 4.0 is pretty widely deployed. I’m not aware of a perf regression Can you give us a schema (anonymized) and queries and show us a trace ? On Aug 10, 2023, at 10:18 PM, Shaurya Gupta wrote:The queries are rightly designed as I already explained. 40 ms is wa

Re: write on ONE node vs replication factor

2023-07-15 Thread Jeff Jirsa
Consistency level controls when queries acknowledge/succeed Replication factor is where data lives / how many copies If you write at consistency ONE and replication factor 3, the query finishes successfully when the write is durable on one of the 3 copies. It will get sent to all 3, but it’ll

Re: Replacing node without shutting down the old node

2023-05-16 Thread Jeff Jirsa
In-line On May 15, 2023, at 5:26 PM, Runtian Liu wrote:Hi Jeff,I tried the setup with vnode 16 and NetworkTopologyStrategy replication strategy with replication factor 3 with 3 racks in one cluster. When using the new node token as the old node token - 1I had said +1 but you’re right that it’s

Re: Replacing node without shutting down the old node

2023-05-08 Thread Jeff Jirsa
You can't have two nodes with the same token (in the current metadata implementation) - it causes problems counting things like how many replicas ACK a write, and what happens if the one you're replacing ACKs a write but the joining host doesn't? It's harder than it seems to maintain consistency gu

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jeff Jirsa
cassandra.apache.org> wrote: >> >>> Have you thought of using "-Dcassandra.replace_address_first_boot=..." >>> (or "-Dcassandra.replace_address=..." if you are using an older version)? >>> This will not result in a topology change, which means &q

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
scaling On May 4, 2023, at 9:25 PM, Jaydeep Chovatia wrote:Thanks, Jeff!But in our environment we replace nodes quite often for various optimization purposes, etc. say, almost 1 node per day (node addition followed by node decommission, which of course changes the topology), and we have a cluster

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring After cassandra has transactional cluster metadata to make ring changes strongly consistent, cassandra should do this in every compaction. But until then it’s left for operators to run when they’re sure the state of the

Re: CAS operation result is unknown - proposal accepted by 1 but not a quorum

2023-04-12 Thread Jeff Jirsa
Are you always inserting into the same partition (with contention) or different ? Which version are you using ? The short tldr is that the failure modes of the existing paxos implementation (under contention, under latency, under cluster strain) can cause undefined states. I believe that a su

Re: When are sstables that were compacted deleted?

2023-04-04 Thread Jeff Jirsa
ighting with compaction to make sure we don't run out > of space. > Will open a ticket, thanks. > > > On Wed, Apr 5, 2023 at 12:10 AM Jeff Jirsa wrote: > >> If you restart the node, it'll process/purge those compaction logs on >> startup, but you want them to

Re: When are sstables that were compacted deleted?

2023-04-04 Thread Jeff Jirsa
If you restart the node, it'll process/purge those compaction logs on startup, but you want them to purge/process now. I genuinely dont know when the tidier runs, but it's likely the case that you're too busy compaction to purge (though I don't know what exactly "too busy" means). Since you're cl

Re: Reads not returning data after adding node

2023-04-03 Thread Jeff Jirsa
se nodetool decommission on the node instead. On 03/04/2023 16:32, Jeff Jirsa wrote: FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is gu

Re: Reads not returning data after adding node

2023-04-03 Thread Jeff Jirsa
FWIW, `nodetool decommission` is strongly preferred. `nodetool removenode` is designed to be run when a host is offline. Only decommission is guaranteed to maintain consistency / correctness, and removemode probably streams a lot more data around than decommission. On Mon, Apr 3, 2023 at 6:47 AM

Re: Understanding rack in cassandra-rackdc.properties

2023-04-03 Thread Jeff Jirsa
As long as the number of racks is already at/above the number of nodes / replication factor, it's gonna be fine. Where it tends to surprise people is if you have RF=3 and either 1 or 2 racks, and then you add a third, that third rack gets one copy of "all" of the data, so you often run out of disk

Re: Reads not returning data after adding node

2023-04-02 Thread Jeff Jirsa
Just run nodetool rebuild on the new node If you assassinate it now you violate consistency for your most recent writes On Apr 2, 2023, at 10:22 PM, Carlos Diaz wrote:That's what nodetool assassinte will do.On Sun, Apr 2, 2023 at 10:19 PM David Tinker wrote:Is it possible

Re: Reads not returning data after adding node

2023-04-02 Thread Jeff Jirsa
Looks like it joined with no data. Did you set auto_bootstrap to false? Or does the node think it’s a seed? You want to use “nodetool rebuild” to stream data to that host. You can potentially end the production outage / incident by taking the host offline, or making it less likely to be querie

Re: Nodetool command to pre-load the chunk cache

2023-03-21 Thread Jeff Jirsa
We serialize the other caches to disk to avoid cold-start problems, I don't see why we couldn't also serialize the chunk cache? Seems worth a JIRA to me. Until then, you can probably use the dynamic snitch (badness + severity) to route around newly started hosts. I'm actually pretty surprised the

Re: Cassandra in Kubernetes: IP switch decommission issue

2023-03-09 Thread Jeff Jirsa
I described something roughly similar to this a few years ago on the list. The specific chain you're describing isn't one I've thought about before, but if you open a JIRA for tracking and attribution, I'll ask some folks to take a peek at it. On Thu, Mar 9, 2023 at 10:57 AM Inès Potier wrote:

Re: Replacing node w/o bootstrapping (streaming)?

2023-02-09 Thread Jeff Jirsa
You don’t have to do anything else. Just use smart rsync flags (including delete). It’ll work fine just the way you described. No special start args. No replacement flag Be sure you rsync the commitlog directory too . Flush and drain to be extra safe > On Feb 9, 2023, at 6:42 PM, Max Campos

Re: Deletions getting omitted

2023-02-04 Thread Jeff Jirsa
While you'd expect only_purge_repaired_tombstones:true to be sufficient, your gc_grace_secnds of 1 hour is making you unusually susceptible to resurrecting data. (To be clear, you should be safe to do this, but if there is a bug hiding in there somewhere, your low gc_grace_seconds will make it lik

Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Jeff Jirsa
Those hosts are likely sending streams. If you do `nodetool netstats` on the replicas of the node you're removing, you should see byte counters and file counters - they should all be incrementing. If one of them isnt incremening, that one is probably stuck. There's at least one bug in 4.1 that ca

Re: Failed disks - correct procedure

2023-01-16 Thread Jeff Jirsa
Prior to cassandra-6696 you’d have to treat one missing disk as a failed machine, wipe all the data and re-stream it, as a tombstone for a given value may be on one disk and data on another (effectively redirecting data) So the answer has to be version dependent, too - which version were you usi

Re: Cassandra 4.0.7 - issue - service not starting

2022-12-08 Thread Jeff Jirsa
What version of java are you using? On Thu, Dec 8, 2022 at 8:07 AM Amit Patel via user < user@cassandra.apache.org> wrote: > Hi, > > > > I have installed cassandra-4.0.7-1.noarch - repo ( baseurl= > https://redhat.cassandra.apache.org/40x/noboolean/) on Redhat 7.9. > > > > We have configured t

Re: Unable to gossip with peers when starting cluster

2022-11-09 Thread Jeff Jirsa
When you say you configured them to talk to .0.31 as a seed, did you do that by changing the yaml? Was 0.9 ever a seed before? I expect if you start 0.7 and 0.9 at the same time, it all works. This looks like a logic/state bug that needs to be fixed, though. (If you're going to upgrade, usually

Re: concurrent sstable read

2022-10-25 Thread Jeff Jirsa
Sequentially, and yes - for some definition of "directly" - but not just because it's sequential, but also because each sstable has cost in reading (e.g. JVM garbage created when you open/seek that has to be collected after the read) On Tue, Oct 25, 2022 at 8:27 AM Grzegorz Pietrusza wrote: > HI

Re: Doubts on multiple filter design in cassandra

2022-10-16 Thread Jeff Jirsa
The limit only bounds what you return not what you scan On Oct 3, 2022, at 10:56 AM, Regis Le Bretonnic wrote:Hi...We do the same (even if a lot of people will say it's bad and that you shouldn't...) with a "allow filtering" BUT ALWAYS  WITHIN A PARTITION AND WITH A LIMIT CLAUSE TO AVOID A FULL P

Re: TWCS recommendation on number of windows

2022-09-28 Thread Jeff Jirsa
So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30 days of retention. In that application, we had tested 12h windows, 24h windows, and 7 day windows, and eventually settled on 24h windows because that balanced factors like sstable size, sstables-per-read, and expired data wait

Re: Cassandra GC tuning

2022-09-20 Thread Jeff Jirsa
more news.   Thanks a lot for your valuable input.   BR MK From: Jeff Jirsa Sent: Monday, September 19, 2022 20:06 To: user@cassandra.apache.org; Michail Kotsiouros Subject: Re: Cassandra GC tuning   https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning

Re: Cassandra GC tuning

2022-09-19 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-13019 is in 4.0, you may find that tuning those thresholds On Mon, Sep 19, 2022 at 9:50 AM Jeff Jirsa wrote: > Snapshots are probably actually caused by a spike in disk IO and disk > latency, not GC (you'll see longer STW pauses as y

Re: Cassandra GC tuning

2022-09-19 Thread Jeff Jirsa
Snapshots are probably actually caused by a spike in disk IO and disk latency, not GC (you'll see longer STW pauses as you get to a safepoint if that disk is hanging). This is especially problematic on SATA SSDs, or nVME SSDs with poor IO scheduler tuning. There's a patch somewhere to throttle har

Re: Local reads metric

2022-09-17 Thread Jeff Jirsa
Yes > On Sep 17, 2022, at 10:46 PM, Gil Ganz wrote: > >  > Hey > Do reads that come from a read repair are somehow counted as part of the > local read metric? > i.e > org.apache.cassandra.metrics.Table... : > ReadLatency.1m_rate > > Version is 4.0.4 > > Gil

Re: TimeWindowCompactionStrategy Operational Concerns

2022-09-15 Thread Jeff Jirsa
If you were able to generate old data offline, using something like the CQLSSTableWriter class, you can add that to the cluster (either via streaming or nodetool import), that would maintain the TWCS invariant. That said, with https://issues.apache.org/jira/browse/CASSANDRA-13418 , IF you're comfo

Re: Bootstrap data streaming order

2022-09-12 Thread Jeff Jirsa
s just one per host). If you're using rack aware (or in AWS, AZ-aware) snitches, it's also influenced by the number of hosts in the rack. On Mon, Sep 12, 2022 at 7:16 AM Jeff Jirsa wrote: > A new node joining will receive (replication factor) streams for each > token it ha

Re: Bootstrap data streaming order

2022-09-12 Thread Jeff Jirsa
A new node joining will receive (replication factor) streams for each token it has. If you use single token and RF=3, three hosts will send data at the same time (the data sent is the “losing” replica of the data based on the next/new topology that will exist after the node finishes bootstrappin

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
> The reply to suggest that folk head off a pay for a course when there are > ‘pre-sales’ questions is not a practical response as any business is > unlikely to be spending speculative money. > > > > *From:* Jeff Jirsa > *Sent:* Tuesday, July 12, 2022 4:43 PM > *To:* cassa

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
On Tue, Jul 12, 2022 at 7:27 AM Marc Hoppins wrote: > > I was asking the questions but no one cared to answer. > This is probably a combination of "it is really hard to answer a question with insufficient data" and your tone. Nobody here gets paid to help you solve your company's problems except

Re: Adding nodes

2022-07-12 Thread Jeff Jirsa
network connectivity or stuck in long STW GC pauses. > Regardless of the reason behind it, the state shown on the joining node > will remain as joining unless the steaming process has failed. > > The node state is propagated between nodes via gossip, and there may be a > delay before a

Re: Adding nodes

2022-07-08 Thread Jeff Jirsa
Having a node UJ but not sending/receiving other streams is an invalid state (unless 4.0 moved the streaming data out of netstats? I'm not 100% sure, but I'm 99% sure it should be there). It likely stopped the bootstrap process long ago with an error (which you may not have seen), and is running w

Re: Adding nodes

2022-07-07 Thread Jeff Jirsa
What version are you using? When you run `nodetool netstats` on the joining node, what is the output? How much data is there per node (presumably more than 86G)? On Thu, Jul 7, 2022 at 7:49 AM Marc Hoppins wrote: > Hi all, > > Cluster of 2 DC and 24 nodes > > DC1 (RF3) = 12 nodes, 16 tokens

Re: Query around Data Modelling -2

2022-06-30 Thread Jeff Jirsa
How are you running repair? -pr? Or -st/-et? 4.0 gives you real incremental repair which helps. Splitting the table won’t make reads faster. It will increase the potential parallelization of compaction. > On Jun 30, 2022, at 7:04 AM, MyWorld wrote: > >  > Hi all, > > Another query around d

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
This is assuming each row is like … I dunno 10-1000 bytes. If you’re storing like a huge 1mb blob use two tables for sure. > On Jun 22, 2022, at 9:06 PM, Jeff Jirsa wrote: > >  > > Ok so here’s how I would think about this > > The writes don’t matter. (There’s a tiny

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
inefficiency of the first model, and I’d be inclined to do that until you demonstrate it won’t work (I bet it works fine for a long long time). > On Jun 22, 2022, at 7:11 PM, MyWorld wrote: > >  > Hi Jeff, > Let me know how no of rows have an impact here. > May be today I have 80-100

Re: Query around Data Modelling

2022-06-22 Thread Jeff Jirsa
How many rows per partition in each model? > On Jun 22, 2022, at 6:38 PM, MyWorld wrote: > >  > Hi all, > > Just a small query around data Modelling. > Suppose we have to design the data model for 2 different use cases which will > query the data on same set of (partion+clustering key). So s

Re: Configuration for new(expanding) cluster and new admins.

2022-06-20 Thread Jeff Jirsa
One of the advantages of faster streaming in 4.0+ is that it’s now very much viable to do this entirely with bootstraps and decoms in the same DC, when you have use cases where you can’t just change DC names Vnodes will cause more compaction than single token, but you can just add in all the ex

Re: Configuration for new(expanding) cluster and new admins.

2022-06-15 Thread Jeff Jirsa
You shouldn't need to change num_tokens at all. num_tokens helps you pretend your cluster is a bigger than it is and randomly selects tokens for you so that your data is approximately evenly distributed. As you add more hosts, it should balance out automatically. The alternative to num_tokens is

Re: Cassandra 3.0 upgrade

2022-06-13 Thread Jeff Jirsa
The versions with caveats should all be enumerated in https://github.com/apache/cassandra/blob/cassandra-3.0/NEWS.txt The biggest caveat was 3.0.14 (which had the fix for cassandra-13004), which you're already on. Personally, I'd qualify exactly one upgrade, and rather than doing 3 different upgr

Re: Gossip issues after upgrading to 4.0.4

2022-06-07 Thread Jeff Jirsa
This deserves a JIRA ticket please. (I assume the sending host is randomly choosing the bad IP and blocking on it for some period of time, causing other tasks to pile up, but it should be investigated as a regression). On Tue, Jun 7, 2022 at 7:52 AM Gil Ganz wrote: > Yes, I know the issue wit

Re: Malformed IPV6 address

2022-04-26 Thread Jeff Jirsa
Oof. From which version did you upgrade? I would try: > export _JAVA_OPTIONS="-Djava.net.preferIPv4Stack=true" There's a chance that fixes it (for an unpleasant reason). Did you get a specific stack trace / log message at all? or just that error? On Tue, Apr 26, 2022 at 1:47 PM Joe Obernbe

Re: about the performance of select * from tbl

2022-04-26 Thread Jeff Jirsa
Yes, you CAN change the fetch size to adjust how many pages of results are returned. But, if you have a million rows, you may still do hundreds or thousands of queries, one after the next. Even if each is 1ms, it's going to take a long time. What Dor suggested is generating a number of SELECT stat

Re: sstables changing in snapshots

2022-03-22 Thread Jeff Jirsa
;> On Mar 18, 2022, at 12:15 PM, James Brown wrote: >> >> This in 4.0.3 after running nodetool snapshot that we're seeing sstables >> change, yes. >> >> James Brown >> Infrastructure Architect @ easypost.com >> >> >> On 2022-03-18 at 12:06:

Re: sstables changing in snapshots

2022-03-18 Thread Jeff Jirsa
This is nodetool snapshot yes? 3.11 or 4.0? In versions prior to 3.0, sstables would be written with -tmp- in the name, then renamed when complete, so an sstable definitely never changed once it had the final file name. With the new transaction log mechanism, we use one name and a transaction log

Re: Gossips pending task increasing, nodes are DOWN

2022-03-17 Thread Jeff Jirsa
This release is from Sep 2016 (5.5 years ago) and has no fixes applied to it since. There are likely MANY issues with that version. On Thu, Mar 17, 2022 at 9:07 AM Jean Carlo wrote: > Hello, > > After some restart, we go a list of nodes unreachable. These nodes are > being seen as DOWN for the r

Re: Cassandra Management tools?

2022-03-01 Thread Jeff Jirsa
Most teams are either using things like ansible/python scripts, or have bespoke infrastructure. Some of what you're describing is included in the intent of the `cassandra-sidecar` project: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652224 Goals We target two main goal

Re: [RELEASE] Apache Cassandra 4.0.2 released

2022-02-11 Thread Jeff Jirsa
We don't HAVE TO remove the Config.java entry - we can mark it as deprecated and ignored and remove it in a future version (and you could update Config.java to log a message about having a deprecated config option). It's a much better operator experience: log for a major version, then remove in the

Re: [RELEASE] Apache Cassandra 4.0.2 released

2022-02-11 Thread Jeff Jirsa
Accidentally dropped dev@, so adding back in the dev list, with the hopes that someone on the dev list helps address this. On Fri, Feb 11, 2022 at 2:22 PM Jeff Jirsa wrote: > That looks like https://issues.apache.org/jira/browse/CASSANDRA-17132 + > https://github.com/apache/cassandra/

Re: [RELEASE] Apache Cassandra 4.0.2 released

2022-02-11 Thread Jeff Jirsa
That looks like https://issues.apache.org/jira/browse/CASSANDRA-17132 + https://github.com/apache/cassandra/commit/b6f61e850c8cfb1f0763e0f15721cde8893814b5 I suspect this needs to be reverted, at least in 4.0.x, and it definitely deserved a NEWS.txt entry (and ideally some period of deprecation/wa

Re: Running enablefullquerylog crashes cassandra

2022-02-06 Thread Jeff Jirsa
That looks like a nodetool stack - can you check the Cassandra log for corresponding error? > On Feb 6, 2022, at 12:52 AM, Gil Ganz wrote: > >  > Hey > I'm trying to enable full query log on cassandra 4.01 node and it's causing > cassandra to shutdown > > nodetool enablefullquerylog --path

Re: about memory problem in write heavy system..

2022-01-07 Thread Jeff Jirsa
3.11.4 is a very old release, with lots of known bugs. It's possible the memory is related to that. If you bounce one of the old nodes, where does the memory end up? On Thu, Jan 6, 2022 at 3:44 PM Eunsu Kim wrote: > > Looking at the memory usage chart, it seems that the physical memory usage >

Re: Node failed after drive failed

2021-12-11 Thread Jeff Jirsa
Likely lost (enough of) the system key space on that disk so the data files indicating the host was in the cluster are missing and the host tried to rebootstrap > On Dec 11, 2021, at 12:47 PM, Bowen Song wrote: > >  > Hi Joss, > > To unsubscribe from this mailing list, please send an email

Re: Which source replica does rebuild stream from?

2021-11-25 Thread Jeff Jirsa
confining your reads to where you query and you can do a single rebuild + repair after going to 3 > On Nov 25, 2021, at 11:53 AM, Sam Kramer wrote: > >  > Hi both, thank you for your responses! > > Yes Jeff, we expect strictly correct responses. Our starting / ending

Re: Which source replica does rebuild stream from?

2021-11-25 Thread Jeff Jirsa
The risk is not negligible if you expect strictly correct responses The only way to do this correctly is very, very labor intensive at the moment, and it requires repair between rebuilds and incrementally adding replicas such that you don’t violate consistency If you give me the starting topol

Re: Cross DC replication failing

2021-11-13 Thread Jeff Jirsa
> On Nov 13, 2021, at 10:25 AM, Inquistive allen wrote: > >  > Hello team, > Greetings. > > Simple question > > Using Cassandra 3.0.8 > Writing to DC-A using local_quorum > Reading the same data from a DC-B using local quorum. > > It succeeds for a table and fails for other. > Data writte

Re: One big giant cluster or several smaller ones?

2021-11-12 Thread Jeff Jirsa
Oh sorry - a cluster per application makes sense. Sharding within an application makes sense to avoid very very very large clusters (think: ~thousand nodes). 1 cluster per app/use case. On Fri, Nov 12, 2021 at 1:39 PM S G wrote: > Thanks Jeff. > Any side-effect on the client config from

Re: One big giant cluster or several smaller ones?

2021-11-12 Thread Jeff Jirsa
Most people are better served building multiple clusters and spending their engineering time optimizing for maintaining multiple clusters, vs spending their engineering time learning how to work around the sharp edges that make large shared clusters hard. Large multi-tenant clusters give you less

Re: Cassandra Delete Query Doubt

2021-11-10 Thread Jeff Jirsa
This type of delete - which doesnt supply a user_id, so it's deleting a range of rows - creates what is known as a range tombstone. It's not tied to any given cell, as it covers a range of cells, and supersedes/shadows them when merged (either in the read path or compaction path). On Wed, Nov 10

Re: How does a node decide where each of its vnodes will be replicated to?

2021-11-08 Thread Jeff Jirsa
quot;. Multiple ranges are the "vnodes". There's not a block of data that is a vnode. There's just hosts and ranges they own. On Mon, Nov 8, 2021 at 4:07 PM Tech Id wrote: > Thanks Jeff. > > One follow-up question please: Each node specifies num_tokens. > So if there

Re: How does a node decide where each of its vnodes will be replicated to?

2021-11-08 Thread Jeff Jirsa
ch is where it may skip some tokens, if they're found to be on the same rack) On Mon, Nov 8, 2021 at 3:22 PM Tech Id wrote: > Thanks Jeff. > I think what you explained below is before and after vnodes introduction. > The vnodes part is clear - how each node holds a small range of to

  1   2   3   4   5   6   7   8   9   10   >