At the moment, when a read error, such as unrecoverable bit error or
data corruption, occurs in the SSTable data files, regardless of the
disk_failure_policy configuration, manual (or to be precise, external)
intervention is required to recover from the error.
Commonly, there's two approach to
For this to be safe, my understanding is that:– A repair of the affected range would
need to be completed among the replicas without such corruption (including paxos
repair).– And we'd need a mechanism to execute repair on the affected node without it
being available to respond to queries, eith
Realized I’m somewhat mistaken here -
The repair of surviving replicas would be necessary for correctness prior to
the node with deleted data files to be able to serve client/internode reads.
But the repair of the node with deleted data files prior to being brought back
into the cluster is more
/– A repair of the affected range would need to be completed among
the replicas without such corruption (including paxos repair)./
It can be safe without a repair by over-streaming the data from more (or
all) available replicas, either within the DC (when LOCAL_* CL is used)
or across the
On Wed, Mar 8, 2023 at 5:25 AM Bowen Song via dev
wrote:
> At the moment, when a read error, such as unrecoverable bit error or data
> corruption, occurs in the SSTable data files, regardless of the
> disk_failure_policy configuration, manual (or to be precise, external)
> intervention is require
Link to the next episode:
https://drive.google.com/file/d/1_EOBpG3yiuptDJ-PU-3a7amSVvi7pgM8/view?usp=sharing
s2Ep2 - Aaron Morton
(You may have to download it to listen)
It will remain in staging for 72 hours, going live (assuming no objections)
by Saturday, March 11th (22:00 UTC).
If anyone sh