Re: Single node slowing down queries in a large cluster

2021-10-17 Thread Jeff Jirsa
Internode speculative retry is on by default with p99 The client side retry varies by driver / client > On Oct 17, 2021, at 1:59 PM, S G wrote: > >  > > "The harder thing to solve is a bad coordinator node slowing down all reads > coordinated by that node" > I think this is the root of the

Re: Single node slowing down queries in a large cluster

2021-10-17 Thread S G
Also, for the percentile based speculative retry, how big of a time-period is used to calculate the percentile? If it is only a few seconds, then the latency will increase very quickly when server performance degrades. But if it is upto a few minutes (or it is configurable), then its percentile wil

Re: Single node slowing down queries in a large cluster

2021-10-17 Thread S G
"The harder thing to solve is a bad coordinator node slowing down all reads coordinated by that node" I think this is the root of the problem and since all nodes act as coordinator nodes, so it guaranteed that if any 1 node slows down (High GC, Segment Merging etc), it will slow down 1/N queries in

Re: Single node slowing down queries in a large cluster

2021-10-13 Thread Jeff Jirsa
Some random notes, not necessarily going to help you, but: - You probably have vnodes enable, which means one bad node is PROBABLY a replica of almost every other node, so the fanout here is worse than it should be, and - You probably have speculative retry on the table set to a percentile. As the

Single node slowing down queries in a large cluster

2021-10-13 Thread S G
Hello, We have frequently seen that a single bad node running slow can affect the latencies of the entire cluster (especially for queries where the slow node was acting as a coordinator). Is there any suggestion to avoid this behavior? Like something on the client side to not query that bad nod