Re: [DISCUSS] KIP-1262: Enable auto-formatting directories

Kevin Wu Mon, 06 Apr 2026 07:15:52 -0700

Hi Alyssa,

Thanks for the replies and questions.


RE AH8: Yes, that is correct.

RE AH9: When I say "before starting kafka," I am referring to invoking the
entry point in `Kafka.scala`. I can reword this sentence to make it more
clear.

RE AH10: Yes, the broker will never send a broker registration request with
an empty cluster id. This is because all the in-memory readers of cluster
id on the broker can block until the cluster id is known (either from their
own meta.properties, which is assumed to be correct if it exists, or from
fetching the ClusterIdRecord).

RE AH11: I am planning to add a docs change as part of this KIP's
implementation to explain this.

Best,
Kevin Wu

On Fri, Apr 3, 2026 at 5:53 PM Alyssa Huang via dev <[email protected]>
wrote:

> Hey Kevin,
>
> Some questions about your revisions:
>
> AH8:
>
> > This means that during formatting, the bootstrap ClusterIdRecord is only
> > written if the node is formatted with a MV that supports this feature.
> When
> > a node runs kafka-storage format...
>
> I want to make sure I understand this properly. Formatting (i.e. operators
> calling kafka-storage format) is no longer necessary for the basic use case
> (no SCRAM, non default features, etc). And this sentence is just saying, if
> you still choose to explicitly format, then "the bootstrap ClusterIdRecord
> is only written if the node is formatted with a MV that supports this
> feature." And if you don't explicitly format the cluster, as long as the
> cluster's MV supports auto-formatting, the first elected KRaft leader will
> write the `ClusterIdRecord` if it does not yet exist in the metadata log.
>
> AH9:
>
> > *Remove the requirement of nodes to format before starting kafka*
>
> Might it make sense to reword this to *Remove the requirement of manually
> formatting nodes*? "Before starting Kafka" is still tripping me up - I can
> interpret it as "no need to format at all" or "can format after kafka
> starts".
>
> AH10:
>
> > the KRaft leader (clusterid = Y) must either receive a request without
> > clusterid, or a request whose clusterid is Y. The broker fulfills neither
> > of these conditions.
>
>  Are you saying the broker would never send a request w/ an empty
> clusterId? E.g. in their BrokerRegistrationRequests? Why will that be the
> case.
>
> AH11:
> It would be helpful to have a summary of what the provisioning and startup
> flow would look like with full auto-formatting w/ respect to controllers,
> observers, and brokers. Most of the details are in the KIP but just touched
> on in many different places.
>
> I like the new design though, thanks for the changes!
> Alyssa
>
>
>
> On Wed, Apr 1, 2026 at 10:58 AM Kevin Wu <[email protected]> wrote:
>
> > Hi Jun,
> >
> > Sounds good. I will make an explicit section to document when formatting
> is
> > still required. Thanks again for the feedback and questions.
> >
> > Best,
> > Kevin Wu
> >
> > On Wed, Apr 1, 2026 at 12:02 PM Jun Rao via dev <[email protected]>
> > wrote:
> >
> > > Hi, Kevin,
> > >
> > > Thanks for the explanation. I guess the special thing with dynamic
> > > deployment is the need to write the VotersRecord. We can keep the
> design
> > in
> > > the KIP. Could you document when formatting is still required in the
> KIP?
> > >
> > > Jun
> > >
> > > On Tue, Mar 31, 2026 at 2:49 PM Kevin Wu <[email protected]>
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > RE JR12:
> > > > Dynamic quorums can technically not require formatting, but I fear it
> > > will
> > > > cause cluster misconfigurations more easily than the static quorum
> > case.
> > > > The main difference between the two quorum deployments is that static
> > > > quorum cannot recover from data loss on a voter, whereas dynamic
> quorum
> > > > can. Below is my reasoning for why maintaining this requirement for
> > > dynamic
> > > > clusters is better for Kafka operators given that formatting each
> > node's
> > > > disk is currently required.
> > > >
> > > > The KRaft voter set in both static and dynamic deployments is a piece
> > of
> > > > bootstrapping data that requires some orchestration to manage.
> > Currently,
> > > > the static voter set is managed by the supplier of a `.properties`
> file
> > > > which contains `controller.quorum.voters`. In the static quorum case,
> > the
> > > > requirements from kafka on an orchestration layer to safely manage
> the
> > > > value of `controller.quorum.voters` config is pretty straightforward:
> > > > supply the same value on all nodes all the time forever. I think it
> is
> > > more
> > > > obvious to operators, without needing to know too much about how
> KRaft
> > > > works, that having different values for `controller.quorum.voters` is
> > > > incorrect and unsafe.
> > > >
> > > > The dynamic voter set's contents is initially managed by the caller
> of
> > > > `kafka-storage format`, and then it is managed by KRaft itself. I
> will
> > > just
> > > > focus on what a standalone dynamic controller deployment would look
> > like
> > > > without formatting, but bootstrapping a dynamic quorum with multiple
> > > > controllers is unsafe for the same reasons. In order to remove the
> > > > formatting requirement for dynamic clusters, we could imagine having
> > > > something like `controller.quorum.standalone.enabled`. When that
> config
> > > is
> > > > defined during startup, kafka writes the bootstrapping VotersRecord
> and
> > > > KRaftVersion that would be done during formatting. However, the
> > > > requirements from kafka on an orchestration layer to safely manage
> this
> > > > "standalone" config are more complicated than the static quorum case.
> > > They
> > > > are also not obvious without prior knowledge of bootstrapping
> quorums.
> > > This
> > > > config should only be set on one node whenever the cluster has not
> been
> > > > "bootstrapped," and it cannot be present on any nodes during startup
> if
> > > the
> > > > cluster has already been "bootstrapped." If this config is present
> > after
> > > > the cluster has a voter set, it can result in multiple KRaft leaders
> > if a
> > > > node with the standalone config defined experiences data loss and
> tries
> > > to
> > > > restart.
> > > >
> > > > This "unsafeness" WRT dynamic quorum also applies to `kafka-storage
> > > > format,` but that CLI does not bootstrap with dynamic quorum by
> > default,
> > > as
> > > > the user has to specify one of `--standalone`,
> `--initial-controllers`,
> > > or
> > > > `--no-initial-controllers` when `controller.quorum.voters` is not
> > > defined.
> > > > Maybe that is a sufficient argument that it is okay to introduce the
> > > > `controller.quorum.standalone.enabled` static config as part of this
> > KIP.
> > > > Interested to know what you think about this.
> > > >
> > > > Best,
> > > > Kevin Wu
> > > >
> > > > On Tue, Mar 31, 2026 at 11:34 AM Jun Rao via dev <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > Hi, Kevin,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > JR12. https://kafka.apache.org/42/operations/kraft/ specifies two
> > > > > deployment methods for KRaft: static and dynamic. Are you saying
> that
> > > > > dynamic still requires formatting while static doesn't? Could you
> > > explain
> > > > > why there is difference? BTW, which method do we recommend?
> > > > >
> > > > > Jun
> > > > >
> > > > > On Mon, Mar 30, 2026 at 9:03 AM Kevin Wu <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > RE JR10: Yes, I will update the KIP to reflect that.
> > > > > >
> > > > > > RE JR11: Yeah, I think it is fine to write V2.
> > > > > >
> > > > > > RE JR 12: Is this in reference to KIP-853: Dynamic Quorum
> > > > > Reconfiguration?
> > > > > > If so, see: "However, operators still have the option to format
> > nodes
> > > > to
> > > > > > set the MV, feature versions, scram credentials, or to properly
> > > > > provision a
> > > > > > kraft.version=1 cluster." In order to bootstrap any dynamic
> quorum
> > > > (i.e.
> > > > > > kraft.version=1) with an initial voter set, it is required to
> > format
> > > a
> > > > > > controller(s) with either `--standalone` or
> `--initial-controllers`
> > > so
> > > > > that
> > > > > > a KRaft VotersRecord is part of the 0-0.checkpoint. Formatting
> > > > > controllers
> > > > > > is still needed if you want to specify a non-default feature
> level
> > or
> > > > > > metadata version, and kraft.version=1 would be a "non-default"
> > KRaft
> > > > > > version (mainly because it is not correct without formatting,
> > > described
> > > > > > below).
> > > > > >
> > > > > > I'm not sure if removing this formatting requirement for new
> > KIP-853
> > > > > > clusters is in-scope for this KIP. The main issue with this is:
> How
> > > > does
> > > > > a
> > > > > > node know it can safely write a "bootstrapping" 0-0.checkpoint
> with
> > > the
> > > > > > KRaft VotersRecord on startup of the kafka process without
> knowing
> > > any
> > > > > > state of the cluster? This can lead to split-brain when a node
> > writes
> > > > > this
> > > > > > for a cluster who has already elected a leader. Currently, the
> > caller
> > > > of
> > > > > > the kafka-storage format command is responsible for writing this
> > > > exactly
> > > > > > once for the lifetime of the cluster.
> > > > > >
> > > > > > Operators still have the option of starting kafka without
> > formatting,
> > > > and
> > > > > > then upgrading the kraft version to kraft.version=1. This path
> > allows
> > > > > for a
> > > > > > dynamic quorum without formatting the cluster.
> > > > > >
> > > > > > Thanks,
> > > > > > Kevin Wu
> > > > > >
> > > > > > On Fri, Mar 27, 2026 at 4:20 PM Jun Rao via dev <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, Kevin,
> > > > > > >
> > > > > > > Thanks for the updated KIP. It's better if we can remove the
> > > > formatting
> > > > > > > requirements for all nodes.
> > > > > > >
> > > > > > > JR10. "The reason for this KIP is to remove the requirement of
> > > > brokers
> > > > > > > needing to run kafka-storage format  before starting Kafka."
> > > > > > > Should we change brokers to nodes?
> > > > > > >
> > > > > > > JR11. "When --cluster-id  is specified, the formatter writes
> > > > > > > meta.properties  V1."
> > > > > > > It's a bit weird for the new code to write in V1 format. Could
> it
> > > > write
> > > > > > in
> > > > > > > V2 format?
> > > > > > >
> > > > > > > JR12. Without formatting, is it true that one can only
> bootstrap
> > a
> > > > > > > standalone controller? In other words, does bootstrapping with
> > > > multiple
> > > > > > > controllers still require formatting?
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Thu, Mar 19, 2026 at 1:39 AM Kevin Wu <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi José,
> > > > > > > >
> > > > > > > > Thanks for the replies and questions.
> > > > > > > >
> > > > > > > > RE JS1: "Can you clarify that this KIP removes the need for
> all
> > > > Kafka
> > > > > > > nodes
> > > > > > > > to be formatted prior to starting Kafka." Hmmm, I guess in
> the
> > > > static
> > > > > > > > cluster case that skips formatting having a newer software
> > > version
> > > > +
> > > > > > > older
> > > > > > > > MV is not a possible case, so I will remove that mention from
> > the
> > > > > KIP.
> > > > > > We
> > > > > > > > should default to the latest MV if we skip formatting, which
> > will
> > > > > > support
> > > > > > > > writing a ClusterIdRecord.
> > > > > > > >
> > > > > > > > Right now, it is not completely clear to me how we can allow
> > > > > bootstrap
> > > > > > > > controllers (this applies mainly for kraft.version=0, since
> > > > > > > kraft.version=1
> > > > > > > > cannot elect a leader without proper formatting) to also skip
> > > > > > formatting.
> > > > > > > > That is why I said in the proposed changes: "*Remove the
> > > > requirement
> > > > > of
> > > > > > > > brokers and observer controllers to format before starting
> > > > kafka"*. I
> > > > > > > agree
> > > > > > > > that KRaft can still elect a leader without clusterId in this
> > > case,
> > > > > but
> > > > > > > I'm
> > > > > > > > not completely sure how a QuorumController with an "empty"
> > > > clusterId
> > > > > > > which
> > > > > > > > needs to be set later, should behave. My working idea is
> > detailed
> > > > in
> > > > > RE
> > > > > > > > JS6. This is required because the active controller needs to
> > > > > generate a
> > > > > > > > clusterId and write it back to KRaft upon activation in order
> > for
> > > > the
> > > > > > > > committed `ClusterIdRecord` to appear in records passed to
> > > > > > > > `RaftListener#handleCommit()`, so we cannot block its
> > > > initialization.
> > > > > > > > Keeping the assumption that QuorumController.clusterId is
> final
> > > and
> > > > > > > > non-null would be nice, but that requires all KRaft voters to
> > > > format
> > > > > > > with a
> > > > > > > > cluster.id. Let me know what you think about the best way to
> > > > remove
> > > > > > this
> > > > > > > > requirement.
> > > > > > > >
> > > > > > > > RE JS2: My plan was to continue to write meta.properties V1
> > > during
> > > > > > > > formatting with a `cluster.id` field like today, but also
> > write
> > > a
> > > > > > > > `ClusterIdRecord` to the bootstrap snapshot for redundancy if
> > the
> > > > MV
> > > > > > > > supports it (I'm not sure if kafka is expected to handle only
> > > > partial
> > > > > > log
> > > > > > > > directory corruption/destruction). If the "bootstrap
> controller
> > > > > cluster
> > > > > > > id
> > > > > > > > check" from JS4 is correct, then the initial active
> controller
> > is
> > > > > > > > guaranteed to have a non-null `cluster.id` in
> meta.properties.
> > > So
> > > > > long
> > > > > > > as
> > > > > > > > the MV supports it, the active controller would then write
> > > > > > > ClusterIdRecord
> > > > > > > > as part of the bootstrap records.
> > > > > > > >
> > > > > > > > RE JS3: When I said this, I meant that the restriction of
> > waiting
> > > > for
> > > > > > the
> > > > > > > > discovery of cluster.id to persist it to meta.properties
> > during
> > > > > broker
> > > > > > > > startup is no more restrictive than what already currently
> > > exists,
> > > > > > which
> > > > > > > is
> > > > > > > > being caught up to the HWM in order to register with the
> active
> > > > > > > controller.
> > > > > > > >
> > > > > > > > RE JS 4: Yeah, I thought about this, specifically around the
> > > > > > > > kraft.version=1 case since it is less straightforward what a
> > > > > "bootstrap
> > > > > > > > controller" is. Under the current design, in kraft.version=0,
> > any
> > > > > node
> > > > > > > who
> > > > > > > > is part of the `controller.quorum.voters` config must have
> > > > > > > > `meta.properties` with `cluster.id`. In kraft.version=1, any
> > > node
> > > > > who
> > > > > > > has
> > > > > > > > a
> > > > > > > > `0-0.checkpoint` is considered a "bootstrap controller." This
> > is
> > > a
> > > > > > > > heuristic, but I believe it is correct, since in order for
> the
> > > > > > > > 0-0.checkpoint to not exist on a node which formatted with
> > > > > --standalone
> > > > > > > or
> > > > > > > > --initial-controllers, there must have either been another
> > > > checkpoint
> > > > > > > with
> > > > > > > > committed records, which imply an elected initial leader, or
> a
> > > disk
> > > > > > loss.
> > > > > > > > Whenever a voter with id X and initial directory-id A comes
> > back
> > > as
> > > > > (X,
> > > > > > > B),
> > > > > > > > this process incarnation is an observer from the perspective
> of
> > > > > KRaft,
> > > > > > > and
> > > > > > > > I think we can assume it has neither `meta.properties` or
> > > > > > > `0-0.checkpoint`
> > > > > > > > if the operator did not format it (assumption from RE JS2
> about
> > > the
> > > > > > kinds
> > > > > > > > of storage failures we expect to handle are not partial
> > directory
> > > > > > > > failures). In this case, the "bootstrap controller" check
> does
> > > not
> > > > > > apply
> > > > > > > to
> > > > > > > > (X, B), and if auto-join is enabled, it will follow the steps
> > > > > detailed
> > > > > > in
> > > > > > > > RE JS5 to recover and rejoin the voter set. If we remove the
> > > > > > requirement
> > > > > > > on
> > > > > > > > all nodes to format, then we would not need to implement
> these
> > > > > checks.
> > > > > > > >
> > > > > > > > RE JS5: An observer without clusterId who can auto-join will
> > > fetch
> > > > > > until
> > > > > > > > its KafkaRaftClient updates the cluster id in-memory
> > (basically,
> > > > > > > auto-join
> > > > > > > > is off until it discovers the leader's clusterId). If the
> > > observer
> > > > > has
> > > > > > > > clusterId, it needs to match the leader's to perform a
> > successful
> > > > > > fetch,
> > > > > > > > which is required for successfully adding a voter via
> > auto-join.
> > > > > > > >
> > > > > > > > RE JS6: Apologies, I meant to say a MetadataPublisher
> > registered
> > > to
> > > > > the
> > > > > > > > MetadataLoader. Although, looking at this again, maybe this
> > > > > discovery +
> > > > > > > > persistence of clusterId can be handled by a new RaftListener
> > > > > instead.
> > > > > > I
> > > > > > > > don't think we need the overhead of the MetadataImage +
> > > > MetadataDelta
> > > > > > for
> > > > > > > > this feature since a RaftListener's `handleCommit()` and
> > > > > > > > `handleLoadSnapshot()` contain `ClusterIdRecord`. However,
> this
> > > > means
> > > > > > > > needing a third listener besides the MetadataLoader and
> > > > > > > QuorumMetaListener,
> > > > > > > > and therefore an additional call to log#read() when handling
> > > KRaft
> > > > > > > commits
> > > > > > > > + snapshots. From my reading, it seems like the Kafka log
> layer
> > > > does
> > > > > > not
> > > > > > > > attempt any caching, and instead we rely on the OS page
> cache.
> > > > > Because
> > > > > > of
> > > > > > > > this, I think we should be using MetadataPublisher, but let
> me
> > > know
> > > > > > what
> > > > > > > > you think.
> > > > > > > >
> > > > > > > > I am thinking of using an AtomicReference<String> to
> represent
> > > the
> > > > > > > > clusterId in-memory. This RaftListener/MetadataPublisher will
> > be
> > > > the
> > > > > > only
> > > > > > > > writer to this value if it is not already defined by
> > > > meta.properties,
> > > > > > but
> > > > > > > > there are many readers of this value. The initial value of
> this
> > > > > > reference
> > > > > > > > is null or the cluster.id from meta.properties. Upon reading
> > > > > > > > `ClusterIdRecord`, the listener will throw an exception if it
> > > has a
> > > > > > > > non-null clusterId and reads a ClusterIdRecord with a
> different
> > > ID.
> > > > > If
> > > > > > it
> > > > > > > > does not have cluster.id set and reads a ClusterIdRecord, it
> > > will
> > > > > > update
> > > > > > > > the AtomicReference and persist cluster.id to
> meta.properties.
> > > Let
> > > > > me
> > > > > > > know
> > > > > > > > if this approach sounds reasonable to you.
> > > > > > > >
> > > > > > > > RE JS7: From what I understand about MetaPropertiesEnsemble
> and
> > > its
> > > > > > > > verify() method, I think it is reasonable to say our
> > > > > > > > RaftListener/MetadataPublisher will know how many (if any)
> > > > > > > > `meta.properties` files it is responsible for persisting
> > > > cluster.id
> > > > > to
> > > > > > > > during the current process incarnation when it starts up.
> > > Currently
> > > > > we
> > > > > > > only
> > > > > > > > validate the MetaPropertiesEnsemble in two places: during
> > > > formatting,
> > > > > > and
> > > > > > > > during node startup. From what I understand, scenarios 1 and
> 2
> > > > should
> > > > > > > only
> > > > > > > > occur alongside a restart of the kafka process (to generate a
> > new
> > > > > > > > directory-id and/or update log.dirs), but please correct me
> if
> > > this
> > > > > > > > assumption is wrong. I'm not sure if scenario 3 is referring
> > to a
> > > > > > partial
> > > > > > > > write of a given meta.properties (i.e. it does not contain
> > > > > cluster.id
> > > > > > ),
> > > > > > > or
> > > > > > > > not writing the discovered cluster.id to all meta.properties
> > > files
> > > > > on
> > > > > > > the
> > > > > > > > node before a crash. If a meta.properties does not exist in a
> > > > > > > log/metadata
> > > > > > > > log directory during startup, we need to write a V2 one
> > without a
> > > > > > > > cluster.id,
> > > > > > > > but we would be aware of this. If we succeed writing
> > cluster.id
> > > to
> > > > > at
> > > > > > > > least
> > > > > > > > one meta.properties via the ClusterIdRecord, I believe it is
> > safe
> > > > to
> > > > > > > write
> > > > > > > > that same value to the other meta.properties upon restart if
> > they
> > > > > exist
> > > > > > > > because cluster.id does not change.
> > > > > > > >
> > > > > > > > I may have previously removed this from the KIP, but given
> this
> > > > > > > discussion,
> > > > > > > > I believe it is only safe to update the in-memory cluster.id
> > > only
> > > > > > after
> > > > > > > > writing this to all meta.properties on a node.
> > > > > > > >
> > > > > > > > RE JS8: Okay, maybe I will just rewrite the section. My point
> > was
> > > > to
> > > > > > say
> > > > > > > > something like: a node's discovery of the leader's committed
> > > > > > cluster.id
> > > > > > > > relies on the discovery of a HWM and our
> > > > > RaftListener/MetadataPublisher
> > > > > > > to
> > > > > > > > be registered with the raft client, and that we need to wait
> > for
> > > > > these
> > > > > > > > things before the startup logic in Controller/BrokerServer
> > > > executes.
> > > > > > > > However, if our listener does not see the ClusterIdRecord in
> > > > > > > `handleCommit`
> > > > > > > > or `handleLoadSnapshot`, it can't do anything meaningful, so
> it
> > > is
> > > > > more
> > > > > > > > accurate to say we need to wait until ClusterIdRecord is
> > > committed.
> > > > > > > >
> > > > > > > > On Thu, Mar 19, 2026 at 12:57 AM José Armando García Sancio
> via
> > > > dev <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > > Hi Kevin, Thanks for the KIP and excuse my delay response.
> > > > > > > > >
> > > > > > > > > JS1: Can you clarify that this KIP removes the need for all
> > > Kafka
> > > > > > > > > nodes to be formatted pior to starting Kafka. However, this
> > > > doesn't
> > > > > > > > > prevent users from formatting their broker with a cluster
> ID
> > if
> > > > > they
> > > > > > > > > prefer. This is especially needed for Kafka nodes formatted
> > > for a
> > > > > > > > > cluster using an MV that doesn't support this feature.
> > > > > > > > >
> > > > > > > > > JS2: How are you planning to implement "kafka-storage
> format
> > > > > > > > > --clusterid YYY --standalone"? Is that going to behave like
> > it
> > > > does
> > > > > > > > > today by writing the cluster id to the meta.properties
> files?
> > > Or
> > > > > are
> > > > > > > > > you planning to write the cluster id using the
> > ClusterIdRecord
> > > to
> > > > > the
> > > > > > > > > bootstrap.checkpoint or 0-0.checkpoint (after KIP-1170)?
> > > > > > > > >
> > > > > > > > > JS3: In one of your replies you say "Discovering the
> cluster
> > id
> > > > > value
> > > > > > > > > for the first time would only require a single
> FetchSnapshot
> > > or a
> > > > > > > > > Fetch of the bootstrap metadata records." This is not
> > entirely
> > > > > > > > > accurate. The best we can say is that brokers need to catch
> > up
> > > to
> > > > > the
> > > > > > > > > HWM before they can send a registration requests to the
> > active
> > > > > > > > > controller or it can start a few internal component.
> However,
> > > the
> > > > > > > > > broker already had this requirement prior to this KIP, so
> it
> > is
> > > > not
> > > > > > > > > new.
> > > > > > > > >
> > > > > > > > > JS4: In the KIP you mention "if meta.properties does not
> > exist
> > > > and
> > > > > > the
> > > > > > > > > node is a bootstrap controller, throw a runtime exception."
> > Can
> > > > you
> > > > > > > > > explain how you plan to implement this? One important
> aspect
> > to
> > > > > > > > > consider is that in KRaft voters (controllers) are
> identified
> > > by
> > > > > the
> > > > > > > > > node ID and directory ID. A node can recover from a disk
> > > failure
> > > > by
> > > > > > > > > coming back with the same node ID but a different directory
> > ID.
> > > > In
> > > > > > > > > this case, the controller should auto-recover if the
> > auto-join
> > > > > > feature
> > > > > > > > > is enabled.
> > > > > > > > >
> > > > > > > > > JS5: In the KIP you mention "One detail here is that
> observer
> > > > > > > > > controllers with auto-join must wait until they have a
> > cluster
> > > id
> > > > > > > > > before trying to add or remove themselves." I understand
> the
> > > > reason
> > > > > > > > > for this requirement. If a node auto-joins the controller
> > > > cluster,
> > > > > > you
> > > > > > > > > must guarantee that it knows the cluster id in case it
> > becomes
> > > > the
> > > > > > > > > leader and needs to write the ClusterIDRecord. Can you
> > > elaborate
> > > > on
> > > > > > > > > your implementation plan?
> > > > > > > > >
> > > > > > > > > JS6: In the KIP you mention "This can be implemented as a
> > > > > > > > > MetadataPublisher that registers to the raft client
> alongside
> > > the
> > > > > > > > > MetadataLoader." Metadata publishers don't register with
> the
> > > > KRaft
> > > > > > > > > client. RaftClient.Listener register with the KRaft client.
> > > > > Metadata
> > > > > > > > > publisher register with the metadata loader instead.
> > > > > > > > >
> > > > > > > > > JS7: One complexity is that there is a meta.properties per
> > log
> > > > > > > > > directory and metadata log directory. This means that in
> the
> > > > stable
> > > > > > > > > case the cluster ID exists in all the meta.properties
> files.
> > > > > > > > > Unfortunately, this may not be the case for several
> reasons:
> > 1)
> > > > the
> > > > > > > > > disk was replaced, 2) a new disk was added, or 3) the write
> > > > > operation
> > > > > > > > > was only partially successful. How do you plan to handle
> this
> > > > case?
> > > > > > > > > Consider that the controller and the broker can run on the
> > same
> > > > JVM
> > > > > > > > > and use a log directory different from the metadata log
> > > > directory.
> > > > > > > > > Controllers only read and write to the metadata log
> > directory.
> > > > > > > > >
> > > > > > > > > JS8: In the KIP you mention "Learning of a HWM from the
> > leader,
> > > > > which
> > > > > > > > > the leader allows for because it will send valid fetch
> > > responses
> > > > > back
> > > > > > > > > to nodes that do not have a cluster id." One implementation
> > > > > > complexity
> > > > > > > > > is that KRaft can discover the HWM and send a handleCommit
> > > event
> > > > > > > > > without having fetched all data up to the HWM. What KRaft
> > > > > guarantees
> > > > > > > > > is that the active leader will not receive a
> > handleLeaderChange
> > > > > event
> > > > > > > > > until it has caught up to the leader's epoch. How do you
> plan
> > > to
> > > > > > > > > implement this?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > --
> > > > > > > > > -José
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-1262: Enable auto-formatting directories

Reply via email to