[
https://issues.apache.org/jira/browse/KAFKA-10800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329364#comment-17329364
]
Jose Armando Garcia Sancio commented on KAFKA-10800:
----------------------------------------------------
1) What does "the state machine" mean here? I assume it's the KafkaRaftClient?
And "attempts to create a snapshot writer", I assume this refers to
`log.createSnapshot(snapshotId)`?
Sorry, by state machine, I mean users of `interface RaftClient`. This basically
means snapshots created through `SnapshotWriter`.
In general there are two ways of creating a snapshot. One is by the state
machine through `RaftClient::createSnapshot` and `SnapshotWriter`. Another way
is by the `KafkaRaftClient` itself downloading the snapshot from the quorum
leader. In the second case we want to trust the leader's snapshot and not
perform the validation described in this issue.
2) "The end offset and epoch of the snapshot is less than the high-watermark",
does the "high-watermark" refer to the leader's highwatermark or the follower's
highwatermark? If it is the former, shouldn't it be the leader's responsibility
to satisfy this ? If it's the latter, then I think the snapshotId can actually
be larger than itself's highwatermark, say the follower has been lagged too
much, and its highwatermark == its logEndOffset, which is smaller than the
leader's logStartOffset, in this case, the follower's highwatermark will be
updated to the snapshotId's endOffset when the snapshot fetching has completed,
did I miss anything?
See my answer to 1) but in this issue we are only concern with snapshot created
locally by either the leader or the follower. Note that both the leader and the
followers are responsible for creating snapshot based on the state of the local
log. Having said that, high watermark means the local high watermark this is
the high watermark reported by the quorum state object.
3) "validation should not be performed when the raft client creates the
snapshot writer ", if my assumption in Question 1) is correct, then this seems
to be in conflict with 1)
The KafkaRaftClient can download a snapshot from the leader when it is too far
behind. In this case, those snapshots don't need to get validated against the
local quorum state and the local log. When KafkaRaftClient downloads snapshots
from the leader the snapshotId will always be greater than the local LEO (and
high-watermark). Instead the KafkaRaftClient will write the snapshot to local
disk, fully truncate the local log and update the high watermark accordingly.
> Validate the snapshot id when the state machine creates a snapshot
> ------------------------------------------------------------------
>
> Key: KAFKA-10800
> URL: https://issues.apache.org/jira/browse/KAFKA-10800
> Project: Kafka
> Issue Type: Sub-task
> Components: replication
> Reporter: Jose Armando Garcia Sancio
> Assignee: Haoran Xuan
> Priority: Major
>
> When the state machine attempts to create a snapshot writer we should
> validate that the following is true:
> # The end offset and epoch of the snapshot is less than the high-watermark.
> # The end offset and epoch of the snapshot is valid based on the leader
> epoch cache.
> Note that this validation should not be performed when the raft client
> creates the snapshot writer because in that case the local log is out of date
> and the follower should trust the snapshot id sent by the partition leader.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)