Weird results

2022-12-15 Thread Claude Warren, Jr via dev
I am working on a StandaloneDowngrader.java based on StandaloneUpgrader.java

While working on the tests I had a problem with 2 test (testFlagArgs and
testDefaultCall) that failed with:

ERROR [main] 2022-12-14 10:35:20,051 SSTableReader.java:496 - Cannot open
/home/claude/apache/cassandra/build/test/cassandra/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/nb-41-big;
partitioner org.apache.cassandra.dht.ByteOrderedPartitioner does not match
system partitioner org.apache.cassandra.dht.Murmur3Partitioner.  Note that
the default partitioner starting with Cassandra 1.2 is Murmur3Partitioner,
so you will need to edit that to match your old partitioner if upgrading.

The same tests failed in the StandaloneUpgraderTests on which the
StandaloneDowngraderTests are based.

After chatting with Jake I added code to set the partitioner using
DatabaseDescriptor.setPartitionerUsafe() and a try catch  block to make
sure it got reset in one test.  BOTH tests worked.

I then removed the just added code and both tests continued to work.

I restarted the IDE and both tests continued to work.

So I am not sure how adding and then removing code (including the include
statements) can make the tests work.  But I wanted to post this here so
that if there are other weird cases perhaps we can figure out what is
happening.


Re: Single slow node dramatically reduces cluster write throughput regardless of CL

2022-12-15 Thread Sarisky, Dan

CASSANDRA-18120 created.

On 12/14/2022 3:13 PM, Jeremiah Jordan wrote:

I have seen this same behavior in the past as well and came to the same 
conclusions of where the issue is.  It would be good to write this up in a 
ticket.  Giving people the option of using the DynamicEndpointSnitch to order 
batch log replica selection could mitigate this exact issue, but may have other 
tradeoffs to batch log guarantees.


On Dec 14, 2022, at 11:19 AM, Sarisky, Dan  wrote:

We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, 
QUORUM, or LOCAL_QUORUM)

On clusters of any size - a single extremely slow node causes a ~90% loss of 
cluster-wide throughput using batched writes.  We can replicate this in the lab 
via CPU or disk throttling.  I observe this in 3.11, 4.0, and 4.1.

It appears the mechanism in play is:
Those logged batches are immediately written to two replica nodes and the 
actual mutations aren't processed until those two nodes acknowledge the batch 
statements.  Those replica nodes are selected randomly from all nodes in the 
local data center currently up in gossip.  If a single node is slow, but still 
thought to be up in gossip, this eventually causes every other node to have all 
of its MutationStages to be waiting while the slow replica accepts batch writes.

The code in play appears to be:
See 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245.
  In the method filterBatchlogEndpoints() there is a Collections.shuffle() to 
order the endpoints and a FailureDetector.isEndpointAlive() to test if the 
endpoint is acceptable.

This behavior causes Cassandra to move from a multi-node fault tolerant system 
toa collection of single points of failure.

We try to take administrator actions to kill off the extremely slow nodes, but it would 
be great to have some notion of "what node is a bad choice" when writing log 
batches to replica nodes.





Re: [DISCUSSION] Cassandra's code style and source code analysis

2022-12-15 Thread Mick Semb Wever
>
> Another angle I forgot to mention is that this is quite a big patch and
> there are quite big pieces of work coming, being it CEP-15, for example. So
> I am trying to figure out if we are ok to just merge this work first and
> devs doing CEP-15 will need to rework their imports or we merge this after
> them so we will fix their stuff. I do not know what is more preferable.
>


Thank you for bringing this point up Stefan.

I would be actively reaching out to all those engaged with current CEPs,
asking them the rebase impact this would cause and if they are ok with it.
The CEPs are our priority, and we have a significant amount of them in
progress compared to anything we've had for many years.


Re: Weird results

2022-12-15 Thread Paulo Motta
I recently came across this issue. I was able to fix it with "ant
realclean" + "ant build", indicating it's probably a leftover configuration
from a previous test. We probably need to find which tests with a non
default partitioner are not being properly cleaned up.

Em qui., 15 de dez. de 2022 às 04:39, Claude Warren, Jr via dev <
dev@cassandra.apache.org> escreveu:

> I am working on a StandaloneDowngrader.java based on
> StandaloneUpgrader.java
>
> While working on the tests I had a problem with 2 test (testFlagArgs and
> testDefaultCall) that failed with:
>
> ERROR [main] 2022-12-14 10:35:20,051 SSTableReader.java:496 - Cannot open
> /home/claude/apache/cassandra/build/test/cassandra/data/system_schema/tables-afddfb9dbc1e30688056eed6c302ba09/nb-41-big;
> partitioner org.apache.cassandra.dht.ByteOrderedPartitioner does not match
> system partitioner org.apache.cassandra.dht.Murmur3Partitioner.  Note that
> the default partitioner starting with Cassandra 1.2 is Murmur3Partitioner,
> so you will need to edit that to match your old partitioner if upgrading.
>
> The same tests failed in the StandaloneUpgraderTests on which the
> StandaloneDowngraderTests are based.
>
> After chatting with Jake I added code to set the partitioner using
> DatabaseDescriptor.setPartitionerUsafe() and a try catch  block to make
> sure it got reset in one test.  BOTH tests worked.
>
> I then removed the just added code and both tests continued to work.
>
> I restarted the IDE and both tests continued to work.
>
> So I am not sure how adding and then removing code (including the include
> statements) can make the tests work.  But I wanted to post this here so
> that if there are other weird cases perhaps we can figure out what is
> happening.
>
>