Hi Lijun,

Thanks for the update! I'm still not clear on this.

The RemoteLogSegmentMetadataUpdateRecord does not contain the below fields
compared to RemoteLogSegmentMetadataRecord:

- startOffset
- endOffset (will be added as a tagged field)
- MaxTimestampMs
- SegmentLeaderEpochs
- SegmentSizeInBytes and
- TxnIndexEmpty

When a broker gets restarted, will it be able to rebuild
the RemoteLogMetadataCache? Assume that there are 10 remote
segments and the __remote_log_metadata topic contains only the
COPY_SEGMENT_FINISHED events; the COPY_SEGMENT_STARTED event
gets compacted as the key is the same.

Do we need a separate key for the COPY_SEGMENT_STARTED event and another
key for the remaining states?

Current key format: TopicIdPartition:EndOffset:BrokerLeaderEpoch
Proposed key format: TopicIdPartition:EndOffset:BrokerLeaderEpoch:x/y where
x denotes a identifier for COPY_SEGMENT_STARTED and y denote for all the
other events.

Thanks,
Kamal



On Tue, Mar 31, 2026 at 8:23 AM Lijun Tong <[email protected]> wrote:

> Hi Kamal,
>
> The scenario you described only happened with the old version
> RemoteLogSegmentUpdateMetadata message, since the endOffset will be added
> in the new RemoteLogSegmentUpdateMetadata schema. For the existing
> RemoteLogSegmentUpdateMetadata messages, we rely on the time based
> retention policy to clean up. Does that make sense?
>
> Best,
> Lijun Tong
>
> Kamal Chandraprakash <[email protected]> 于2026年3月30日周一
> 18:14写道:
>
> > Hi Lijun,
> >
> > RemoteLogSegmentUpdateMetadata event does not have all the
> > fields/attributes similar to RemoteLogSegmentMetadata event.
> >
> > Assume that after compaction, for a segment, we have only
> > COPY_SEGMENT_FINISHED records. How do you plan to retrieve the other
> fields
> > after broker restart?
> >
> > Thanks,
> > Kamal
> >
> > On Mon, Mar 30, 2026, 23:22 Lijun Tong <[email protected]> wrote:
> >
> > > Hi Kamal,
> > >
> > > Thanks for taking another look at the KIP.
> > > 1. I have removed the left-over line about using another new topic from
> > the
> > > KIP.
> > > 2.
> > >
> > > > 2. Assume that the topic is enabled with compaction and only one
> event
> > is
> > > > maintained per segment. If there is a transient error in the remote
> log
> > > > deletion,
> > > >     then the COPY_SEGMENT started / finished events might be
> compacted
> > by
> > > > the DELETE_SEGMENT_STARTED events. If the broker is restarted during
> > > >     this time, will there be dangling remote log segments? Currently,
> > > > during restart, the broker discards the events if it does not see the
> > > > COPY_SEGMENT_STARTED      events.
> > >
> > >
> > > I am glad you asked this question, I didn't mention this part in my
> > current
> > > design to avoid distractions from the main design, but I plan to add
> > > another background thread to clean up all the stale messages by
> comparing
> > > the message's endOffset with the topic partition's log start offset. I
> > > believe this would help remove all the dangling messages.
> > >
> > > Thanks,
> > > Lijun TOng
> > >
> > > Kamal Chandraprakash <[email protected]> 于2026年3月29日周日
> > > 22:48写道:
> > >
> > > > Hi Lijun,
> > > >
> > > > Sorry for the late reply. Went over the KIP again. Overall LGTM. Few
> > > > points:
> > > >
> > > > > This KIP aims to solve this issue through introducing another
> > compacted
> > > > topic for the brokers to bootstrap the state from
> > > >
> > > > 1. Shall we update the motivation section to mention that another
> topic
> > > is
> > > > not introduced?
> > > > 2. Assume that the topic is enabled with compaction and only one
> event
> > is
> > > > maintained per segment. If there is a transient error in the remote
> log
> > > > deletion,
> > > >     then the COPY_SEGMENT started / finished events might be
> compacted
> > by
> > > > the DELETE_SEGMENT_STARTED events. If the broker is restarted during
> > > >     this time, will there be dangling remote log segments? Currently,
> > > > during restart, the broker discards the events if it does not see the
> > > > COPY_SEGMENT_STARTED      events.
> > > >
> > > > Thanks,
> > > > Kamal
> > > >
> > > > On Thu, Mar 26, 2026 at 5:08 AM Lijun Tong <[email protected]>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have started a Vote thread for this KIP, considering all
> questions
> > > > raised
> > > > > so far have been answered. I am happy to continue the discussion if
> > > > needed,
> > > > > otherwise, this is a friendly reminder on the vote for this KIP.
> > > > >
> > > > > Thanks,
> > > > > Lijun Tong
> > > > >
> > > > >
> > > > >
> > > > > Lijun Tong <[email protected]> 于2026年1月19日周一 17:59写道:
> > > > >
> > > > > > Hey Kamal,
> > > > > >
> > > > > > Thanks for raising these questions. Here are my responses to your
> > > > > > questions:
> > > > > > Q1 and Q2:
> > > > > > I think both questions boil down to how to release this new
> > feature,
> > > > both
> > > > > > questions are valid concerns. The solution I have in mind is this
> > > > feature
> > > > > > is *gated by the metadata version*. The new tombstone semantics
> and
> > > the
> > > > > > additional fields (for example in RemoteLogSegmentUpdateRecord)
> are
> > > > only
> > > > > > enabled once the cluster metadata version is upgraded to the
> > version
> > > > that
> > > > > > introduces this feature. As long as the cluster metadata version
> is
> > > not
> > > > > > bumped, the system will not produce tombstone records. Therefore,
> > > > during
> > > > > > rolling upgrades (mixed 4.2/4.3 brokers), the feature remains
> > > > effectively
> > > > > > disabled. Tombstones will only start being produced after the
> > > metadata
> > > > > > version is upgraded, at which point all brokers are already
> > required
> > > to
> > > > > > support the new behavior.
> > > > > >
> > > > > > Since Kafka does not support metadata version downgrades at the
> > > moment,
> > > > > > once a metadata version that supports this feature is enabled, it
> > > > cannot
> > > > > be
> > > > > > downgraded to a version that does not support it. I will add
> these
> > > > > details
> > > > > > to the KIP later.
> > > > > > Q3. This is an *editing mistake* in the KIP. Thanks for pointing
> it
> > > > out —
> > > > > > the enum value has already been corrected in the latest revision
> to
> > > > > remove
> > > > > > the unused placeholder and keep the state values contiguous and
> > > > > consistent.
> > > > > > Q4. I don't foresee the quota mechanism will interfere with the
> > state
> > > > > > transition in any way so far, let me know if any concern hits
> you.
> > > > > >
> > > > > > Thanks,
> > > > > > Lijun
> > > > > >
> > > > > > Kamal Chandraprakash <[email protected]>
> > 于2026年1月18日周日
> > > > > > 00:40写道:
> > > > > >
> > > > > >> Hi Lijun,
> > > > > >>
> > > > > >> Thanks for updating the KIP!
> > > > > >>
> > > > > >> The updated migration plan looks clean to me. Few questions:
> > > > > >>
> > > > > >> 1. The ConsumerTask in 4.2 Kafka build does not handle the
> > tombstone
> > > > > >> records. Should the tombstone records be sent only when all the
> > > > brokers
> > > > > >> are
> > > > > >> upgraded to 4.3 version?
> > > > > >>
> > > > > >> 2. Once all the brokers are upgraded and the
> __remote_log_metadata
> > > > topic
> > > > > >> cleanup policy changed to compact. Then, downgrading the brokers
> > is
> > > > not
> > > > > >> allowed as the records without key will throw an error while
> > > producing
> > > > > the
> > > > > >> compacted topic. Shall we mention this in the compatibility
> > section?
> > > > > >>
> > > > > >> 3. In the RemoteLogSegmentState Enum, why is the value 1 marked
> as
> > > > > unused?
> > > > > >>
> > > > > >> 4. Regarding the key
> > (TopicIdPartition:EndOffset:BrokerLeaderEpoch),
> > > > we
> > > > > >> may
> > > > > >> have to check for scenarios where there is segment lag due to
> > remote
> > > > log
> > > > > >> write quota. Will the state transition work correctly? Will come
> > > back
> > > > to
> > > > > >> this later.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Kamal
> > > > > >>
> > > > > >> On Fri, Jan 16, 2026 at 4:50 AM jian fu <[email protected]>
> > > wrote:
> > > > > >>
> > > > > >> > Hi Lijun and Kamal
> > > > > >> > I also think we don't need to keep delJIanpolicy in final
> > > config,if
> > > > > >> so,we
> > > > > >> > should always keep remembering all of our topic retention time
> > > must
> > > > > less
> > > > > >> > than the value,right?It is one protect with risk involved.
> > > > > >> > Regards
> > > > > >> > JIan
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > Lijun Tong <[email protected]>于2026年1月16日 周五06:45写道:
> > > > > >> >
> > > > > >> > > Hey Kamal,
> > > > > >> > >
> > > > > >> > > Some additional points about the Q4,
> > > > > >> > >
> > > > > >> > > > The user can decide when to change their internal topic
> > > cleanup
> > > > > >> policy
> > > > > >> > to
> > > > > >> > > > compact. If someone retains
> > > > > >> > > > the data in the remote storage for 3 months, then they can
> > > > migrate
> > > > > >> to
> > > > > >> > the
> > > > > >> > > > compacted topic after 3 months
> > > > > >> > > > post rolling out this change. And, update their cleanup
> > policy
> > > > to
> > > > > >> > > [compact,
> > > > > >> > > > delete].
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > I don't think it's a good idea to keep delete in the final
> > > cleanup
> > > > > >> policy
> > > > > >> > > for the topic `__remote_log_metadata`, as this still
> requires
> > > the
> > > > > >> user to
> > > > > >> > > keep track of the max retention hours of topics that have
> > remote
> > > > > >> storage
> > > > > >> > > enabled, and it's operational burden. It's also hard to
> reason
> > > > about
> > > > > >> what
> > > > > >> > > will happen if the user configures the wrong retention.ms.
> I
> > > hope
> > > > > >> this
> > > > > >> > > makes sense.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Lijun Tong
> > > > > >> > >
> > > > > >> > > Lijun Tong <[email protected]> 于2026年1月15日周四 11:43写道:
> > > > > >> > >
> > > > > >> > > > Hey Kamal,
> > > > > >> > > >
> > > > > >> > > > Thanks for your reply! I am glad we are on the same page
> > with
> > > > > making
> > > > > >> > the
> > > > > >> > > > __remote_log_metadata topic compacted optional for the
> user
> > > > now, I
> > > > > >> will
> > > > > >> > > > update the KIP with this change.
> > > > > >> > > >
> > > > > >> > > > For the Q2:
> > > > > >> > > > With the key designed as
> > > > > >> TopicId:Partition:EndOffset:BrokerLeaderEpoch,
> > > > > >> > > > even the same broker retries the upload multiple times for
> > the
> > > > > same
> > > > > >> log
> > > > > >> > > > segment, the latest retry attempt with the latest segment
> > UUID
> > > > > will
> > > > > >> > > > overwrite the previous attempts' value since they share
> the
> > > same
> > > > > >> key,
> > > > > >> > so
> > > > > >> > > we
> > > > > >> > > > don't need to explicitly track the failed upload metadata,
> > > > because
> > > > > >> it's
> > > > > >> > > > gone already by the later attempt. That's my understanding
> > > about
> > > > > the
> > > > > >> > > > RLMCopyTask, correct me if I am wrong.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Lijun Tong
> > > > > >> > > >
> > > > > >> > > > Kamal Chandraprakash <[email protected]>
> > > > > 于2026年1月14日周三
> > > > > >> > > > 21:18写道:
> > > > > >> > > >
> > > > > >> > > >> Hi Lijun,
> > > > > >> > > >>
> > > > > >> > > >> Thanks for the reply!
> > > > > >> > > >>
> > > > > >> > > >> Q1: Sounds good. Could you clarify it in the KIP that the
> > > same
> > > > > >> > > partitioner
> > > > > >> > > >> will be used?
> > > > > >> > > >>
> > > > > >> > > >> Q2: With TopicId:Partition:EndOffset:BrokerLeaderEpoch
> key,
> > > if
> > > > > the
> > > > > >> > same
> > > > > >> > > >> broker retries the upload due to intermittent
> > > > > >> > > >> issues in object storage (or) RLMM. Then, those failed
> > upload
> > > > > >> metadata
> > > > > >> > > >> also
> > > > > >> > > >> need to be cleared.
> > > > > >> > > >>
> > > > > >> > > >> Q3: We may have to skip the null value records in the
> > > > > ConsumerTask.
> > > > > >> > > >>
> > > > > >> > > >> Q4a: The idea is to keep the cleanup policy as "delete"
> and
> > > > also
> > > > > >> send
> > > > > >> > > the
> > > > > >> > > >> tombstone markers
> > > > > >> > > >> to the existing `__remote_log_metadata` topic. And,
> handle
> > > the
> > > > > >> > tombstone
> > > > > >> > > >> records in the ConsumerTask.
> > > > > >> > > >>
> > > > > >> > > >> The user can decide when to change their internal topic
> > > cleanup
> > > > > >> policy
> > > > > >> > > to
> > > > > >> > > >> compact. If someone retains
> > > > > >> > > >> the data in the remote storage for 3 months, then they
> can
> > > > > migrate
> > > > > >> to
> > > > > >> > > the
> > > > > >> > > >> compacted topic after 3 months
> > > > > >> > > >> post rolling out this change. And, update their cleanup
> > > policy
> > > > to
> > > > > >> > > >> [compact,
> > > > > >> > > >> delete].
> > > > > >> > > >>
> > > > > >> > > >> Thanks,
> > > > > >> > > >> Kamal
> > > > > >> > > >>
> > > > > >> > > >> On Thu, Jan 15, 2026 at 4:12 AM Lijun Tong <
> > > > > >> [email protected]>
> > > > > >> > > >> wrote:
> > > > > >> > > >>
> > > > > >> > > >> > Hey Jian,
> > > > > >> > > >> >
> > > > > >> > > >> > Thanks for your time to review this KIP. I appreciate
> > that
> > > > you
> > > > > >> > > propose a
> > > > > >> > > >> > simpler migration solution to onboard the new feature.
> > > > > >> > > >> >
> > > > > >> > > >> > There are 2 points that I think can be further refined
> > on:
> > > > > >> > > >> >
> > > > > >> > > >> > 1). make the topic compacted optional, although the new
> > > > feature
> > > > > >> will
> > > > > >> > > >> > continue to emit tombstone message for those expired
> log
> > > > > segments
> > > > > >> > even
> > > > > >> > > >> when
> > > > > >> > > >> > the topic is still on time-based retention mode, so
> once
> > > user
> > > > > >> > switched
> > > > > >> > > >> to
> > > > > >> > > >> > using the compacted topic, those expired messages can
> > still
> > > > be
> > > > > >> > deleted
> > > > > >> > > >> > despite the topic is not retention based anymore.
> > > > > >> > > >> > 2). we need to expose some flag to the user to indicate
> > > > whether
> > > > > >> the
> > > > > >> > > >> topic
> > > > > >> > > >> > can be flipped to compacted by checking whether all the
> > old
> > > > > >> format
> > > > > >> > > >> > keyed-less message has expired, and allow user to
> choose
> > to
> > > > > flip
> > > > > >> to
> > > > > >> > > >> > compacted only when the flag is true.
> > > > > >> > > >> >
> > > > > >> > > >> > Thanks for sharing your idea. I will update the KIP
> later
> > > > with
> > > > > >> this
> > > > > >> > > new
> > > > > >> > > >> > idea.
> > > > > >> > > >> >
> > > > > >> > > >> > Best,
> > > > > >> > > >> > Lijun Tong
> > > > > >> > > >> >
> > > > > >> > > >> >
> > > > > >> > > >> > jian fu <[email protected]> 于2026年1月12日周一 04:55写道:
> > > > > >> > > >> >
> > > > > >> > > >> > > Hi  Lijun Tong:
> > > > > >> > > >> > >
> > > > > >> > > >> > > Thanks for your KIP which raise this critical issue.
> > > > > >> > > >> > >
> > > > > >> > > >> > > what about just keep one topic instead of involve
> > another
> > > > > >> topic.
> > > > > >> > > >> > > for existed topic data's migration. maybe we can use
> > this
> > > > way
> > > > > >> to
> > > > > >> > > solve
> > > > > >> > > >> > the
> > > > > >> > > >> > > issue:
> > > > > >> > > >> > > (1) set the retention date > all of topic which
> enable
> > > > remote
> > > > > >> > > >> storage's
> > > > > >> > > >> > > retention time
> > > > > >> > > >> > > (2) deploy new kafka version with feature:  which
> send
> > > the
> > > > > >> message
> > > > > >> > > >> with
> > > > > >> > > >> > key
> > > > > >> > > >> > > (3) wait all the message expired and new message with
> > key
> > > > > >> coming
> > > > > >> > to
> > > > > >> > > >> the
> > > > > >> > > >> > > topic
> > > > > >> > > >> > > (4) convert the topic to compact
> > > > > >> > > >> > >
> > > > > >> > > >> > > I don't test it. Just propose this solution according
> > to
> > > > code
> > > > > >> > review
> > > > > >> > > >> > > result.  just for your reference.
> > > > > >> > > >> > > The steps maybe a little complex. but it can avoiding
> > add
> > > > new
> > > > > >> > topic.
> > > > > >> > > >> > >
> > > > > >> > > >> > > Regards
> > > > > >> > > >> > > Jian
> > > > > >> > > >> > >
> > > > > >> > > >> > > Lijun Tong <[email protected]> 于2026年1月8日周四
> > > 09:17写道:
> > > > > >> > > >> > >
> > > > > >> > > >> > > > Hey Kamal,
> > > > > >> > > >> > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Thanks for your time for the review.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Here is my response to your questions:
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Q1 At this point, I don’t see a need to change
> > > > > >> > > >> > > > RemoteLogMetadataTopicPartitioner for this design.
> > > > Nothing
> > > > > in
> > > > > >> > the
> > > > > >> > > >> > current
> > > > > >> > > >> > > > approach appears to require a partitioner change,
> but
> > > I’m
> > > > > >> open
> > > > > >> > to
> > > > > >> > > >> > > > revisiting if a concrete need arises.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Q2 I have some reservations about using
> > SegmentId:State
> > > > as
> > > > > >> the
> > > > > >> > > key.
> > > > > >> > > >> A
> > > > > >> > > >> > > > practical challenge we see today is that the same
> > > logical
> > > > > >> > segment
> > > > > >> > > >> can
> > > > > >> > > >> > be
> > > > > >> > > >> > > > retried multiple times with different SegmentIds
> > across
> > > > > >> brokers.
> > > > > >> > > If
> > > > > >> > > >> the
> > > > > >> > > >> > > key
> > > > > >> > > >> > > > is SegmentId-based, it becomes harder to discover
> and
> > > > > >> tombstone
> > > > > >> > > all
> > > > > >> > > >> > > related
> > > > > >> > > >> > > > attempts when the segment eventually expires. The
> > > > > >> > > >> > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch key
> is
> > > > > >> > deterministic
> > > > > >> > > >> for
> > > > > >> > > >> > a
> > > > > >> > > >> > > > logical segment attempt and helps group retries by
> > > epoch,
> > > > > >> which
> > > > > >> > > >> > > simplifies
> > > > > >> > > >> > > > cleanup and reasoning about state. I’d love to
> > > understand
> > > > > the
> > > > > >> > > >> benefits
> > > > > >> > > >> > > > you’re seeing with SegmentId:State compared to the
> > > > > >> > > >> offset/epoch-based
> > > > > >> > > >> > key
> > > > > >> > > >> > > > so we can weigh the trade-offs.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > On partitioning: with this proposal, all states
> for a
> > > > given
> > > > > >> user
> > > > > >> > > >> > > > topic-partition still map to the same metadata
> > > partition.
> > > > > >> That
> > > > > >> > > >> remains
> > > > > >> > > >> > > true
> > > > > >> > > >> > > > for the existing __remote_log_metadata (unchanged
> > > > > >> partitioner)
> > > > > >> > and
> > > > > >> > > >> for
> > > > > >> > > >> > > the
> > > > > >> > > >> > > > new __remote_log_metadata_compacted, preserving the
> > > > > >> properties
> > > > > >> > > >> > > > RemoteMetadataCache relies on.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Q3 It should be fine for ConsumerTask to ignore
> > > tombstone
> > > > > >> > records
> > > > > >> > > >> (null
> > > > > >> > > >> > > > values) and no-op.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Q4 Although TBRLMM is a sample RLMM implementation,
> > > it’s
> > > > > >> > currently
> > > > > >> > > >> the
> > > > > >> > > >> > > only
> > > > > >> > > >> > > > OSS option and is widely used. The new
> > > > > >> > > >> __remote_log_metadata_compacted
> > > > > >> > > >> > > > topic offers clear operational benefits in that
> > > context.
> > > > We
> > > > > >> can
> > > > > >> > > also
> > > > > >> > > >> > > > provide a configuration to let users choose whether
> > > they
> > > > > >> want to
> > > > > >> > > >> keep
> > > > > >> > > >> > the
> > > > > >> > > >> > > > audit topic (__remote_log_metadata) in their
> cluster.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Q4a Enabling compaction on __remote_log_metadata
> > alone
> > > > may
> > > > > >> not
> > > > > >> > > fully
> > > > > >> > > >> > > > address the unbounded growth, since we also need to
> > > emit
> > > > > >> > > tombstones
> > > > > >> > > >> for
> > > > > >> > > >> > > > expired keys to delete them. Deferring compaction
> and
> > > > > >> > tombstoning
> > > > > >> > > to
> > > > > >> > > >> > user
> > > > > >> > > >> > > > configuration could make the code flow complicated,
> > > also
> > > > > add
> > > > > >> > > >> > operational
> > > > > >> > > >> > > > complexity and make outcomes less predictable. The
> > > > proposal
> > > > > >> aims
> > > > > >> > > to
> > > > > >> > > >> > > provide
> > > > > >> > > >> > > > a consistent experience by defining deterministic
> > keys
> > > > and
> > > > > >> > > emitting
> > > > > >> > > >> > > > tombstones as part of the broker’s
> responsibilities,
> > > > while
> > > > > >> still
> > > > > >> > > >> > allowing
> > > > > >> > > >> > > > users to opt out of the audit topic if they prefer.
> > > But I
> > > > > am
> > > > > >> > open
> > > > > >> > > to
> > > > > >> > > >> > more
> > > > > >> > > >> > > > discussion if there is any concrete need I don't
> > > foresee.
> > > > > >> > > >> > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Thanks,
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Lijun Tong
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > Kamal Chandraprakash <
> [email protected]
> > >
> > > > > >> > > 于2026年1月6日周二
> > > > > >> > > >> > > > 01:01写道:
> > > > > >> > > >> > > >
> > > > > >> > > >> > > > > Hi Lijun,
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > > > Thanks for the KIP! Went over the first pass.
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > > > Few Questions:
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > > > 1. Are we going to maintain the same
> > > > > >> > > >> > RemoteLogMetadataTopicPartitioner
> > > > > >> > > >> > > > > <
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > >
> > > > > >> > > >> >
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > for both the topics? It is not clear in the KIP,
> > > could
> > > > > you
> > > > > >> > > clarify
> > > > > >> > > >> > it?
> > > > > >> > > >> > > > > 2. Can the key be changed to SegmentId:State
> > instead
> > > of
> > > > > >> > > >> > > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch if
> > the
> > > > same
> > > > > >> > > >> partitioner
> > > > > >> > > >> > > is
> > > > > >> > > >> > > > > used? It is good to maintain all the segment
> states
> > > > for a
> > > > > >> > > >> > > > > user-topic-partition in the same metadata
> > partition.
> > > > > >> > > >> > > > > 3. Should we have to handle the records with null
> > > value
> > > > > >> > > >> (tombstone)
> > > > > >> > > >> > in
> > > > > >> > > >> > > > the
> > > > > >> > > >> > > > > ConsumerTask
> > > > > >> > > >> > > > > <
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > >
> > > > > >> > > >> >
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/ConsumerTask.java?L166
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > ?
> > > > > >> > > >> > > > > 4. TBRLMM
> > > > > >> > > >> > > > > <
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > >
> > > > > >> > > >> >
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > is a sample plugin implementation of RLMM. Not
> sure
> > > > > whether
> > > > > >> > the
> > > > > >> > > >> > > community
> > > > > >> > > >> > > > > will agree to add one more internal topic for
> this
> > > > plugin
> > > > > >> > impl.
> > > > > >> > > >> > > > > 4a. Can we modify the new messages to the
> > > > > >> > __remote_log_metadata
> > > > > >> > > >> topic
> > > > > >> > > >> > > to
> > > > > >> > > >> > > > > contain the key and leave it to the user to
> enable
> > > > > >> compaction
> > > > > >> > > for
> > > > > >> > > >> > this
> > > > > >> > > >> > > > > topic if they need?
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > > > Thanks,
> > > > > >> > > >> > > > > Kamal
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > > > On Tue, Jan 6, 2026 at 7:35 AM Lijun Tong <
> > > > > >> > > >> [email protected]>
> > > > > >> > > >> > > > wrote:
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > > > > Hey Henry,
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > Thank you for your time and response! I really
> > like
> > > > > your
> > > > > >> > > >> KIP-1248
> > > > > >> > > >> > > about
> > > > > >> > > >> > > > > > offloading the consumption of remote log away
> > from
> > > > the
> > > > > >> > broker,
> > > > > >> > > >> and
> > > > > >> > > >> > I
> > > > > >> > > >> > > > > think
> > > > > >> > > >> > > > > > with that change, the topic that enables the
> > tiered
> > > > > >> storage
> > > > > >> > > can
> > > > > >> > > >> > also
> > > > > >> > > >> > > > have
> > > > > >> > > >> > > > > > longer retention configurations and would
> benefit
> > > > from
> > > > > >> this
> > > > > >> > > KIP
> > > > > >> > > >> > too.
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > Some suggestions: In your example scenarios, it
> > > would
> > > > > >> also
> > > > > >> > be
> > > > > >> > > >> good
> > > > > >> > > >> > to
> > > > > >> > > >> > > > add
> > > > > >> > > >> > > > > > > an example of remote log segment deletion
> > > triggered
> > > > > by
> > > > > >> > > >> retention
> > > > > >> > > >> > > > policy
> > > > > >> > > >> > > > > > > which will trigger generation of tombstone
> > event
> > > > into
> > > > > >> > > metadata
> > > > > >> > > >> > > topic
> > > > > >> > > >> > > > > and
> > > > > >> > > >> > > > > > > trigger log compaction/deletion 24 hour
> later,
> > I
> > > > > think
> > > > > >> > this
> > > > > >> > > is
> > > > > >> > > >> > the
> > > > > >> > > >> > > > key
> > > > > >> > > >> > > > > > > event to cap the metadata topic size.
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > Regarding to this suggestion, I am not sure
> > whether
> > > > > >> > Scenario 4
> > > > > >> > > >> > > > > > <
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > >
> > > > > >> > > >> >
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > has
> > > > > >> > > >> > > > > > covered it. I can add more rows in the Timeline
> > > Table
> > > > > >> like
> > > > > >> > > >> > T5+24hour
> > > > > >> > > >> > > to
> > > > > >> > > >> > > > > > indicate the messages are gone by then to
> > > explicitly
> > > > > show
> > > > > >> > that
> > > > > >> > > >> > > messages
> > > > > >> > > >> > > > > are
> > > > > >> > > >> > > > > > deleted, thus the number of messages are capped
> > in
> > > > the
> > > > > >> > topic.
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > Regarding whether the topic
> __remote_log_metadata
> > > is
> > > > > >> still
> > > > > >> > > >> > > necessary, I
> > > > > >> > > >> > > > > am
> > > > > >> > > >> > > > > > inclined to continue to have this topic at
> least
> > > for
> > > > > >> > debugging
> > > > > >> > > >> > > purposes
> > > > > >> > > >> > > > > so
> > > > > >> > > >> > > > > > we can build confidence about the compacted
> topic
> > > > > >> change, we
> > > > > >> > > can
> > > > > >> > > >> > > > > > always choose to remove this topic in the
> future
> > > once
> > > > > we
> > > > > >> all
> > > > > >> > > >> agree
> > > > > >> > > >> > it
> > > > > >> > > >> > > > > > provides limited value for the users.
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > Thanks,
> > > > > >> > > >> > > > > > Lijun Tong
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > Henry Haiying Cai via dev <
> [email protected]>
> > > > > >> > 于2026年1月5日周一
> > > > > >> > > >> > > 16:19写道:
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > > > > Lijun,
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Thanks for the proposal and I liked your idea
> > of
> > > > > using
> > > > > >> a
> > > > > >> > > >> > compacted
> > > > > >> > > >> > > > > topic
> > > > > >> > > >> > > > > > > for tiered storage metadata topic.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > In our setup, we have set a shorter retention
> > (3
> > > > > days)
> > > > > >> for
> > > > > >> > > the
> > > > > >> > > >> > > tiered
> > > > > >> > > >> > > > > > > storage metadata topic to control the size
> > > growth.
> > > > > We
> > > > > >> can
> > > > > >> > > do
> > > > > >> > > >> > that
> > > > > >> > > >> > > > > since
> > > > > >> > > >> > > > > > we
> > > > > >> > > >> > > > > > > control all topic's retention policy in our
> > > > clusters
> > > > > >> and
> > > > > >> > we
> > > > > >> > > >> set a
> > > > > >> > > >> > > > > uniform
> > > > > >> > > >> > > > > > > retention.policy for all our tiered storage
> > > topics.
> > > > > I
> > > > > >> can
> > > > > >> > > see
> > > > > >> > > >> > > other
> > > > > >> > > >> > > > > > > users/companies will not be able to enforce
> > that
> > > > > >> retention
> > > > > >> > > >> policy
> > > > > >> > > >> > > to
> > > > > >> > > >> > > > > all
> > > > > >> > > >> > > > > > > tiered storage topics.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Some suggestions: In your example scenarios,
> it
> > > > would
> > > > > >> also
> > > > > >> > > be
> > > > > >> > > >> > good
> > > > > >> > > >> > > to
> > > > > >> > > >> > > > > add
> > > > > >> > > >> > > > > > > an example of remote log segment deletion
> > > triggered
> > > > > by
> > > > > >> > > >> retention
> > > > > >> > > >> > > > policy
> > > > > >> > > >> > > > > > > which will trigger generation of tombstone
> > event
> > > > into
> > > > > >> > > metadata
> > > > > >> > > >> > > topic
> > > > > >> > > >> > > > > and
> > > > > >> > > >> > > > > > > trigger log compaction/deletion 24 hour
> later,
> > I
> > > > > think
> > > > > >> > this
> > > > > >> > > is
> > > > > >> > > >> > the
> > > > > >> > > >> > > > key
> > > > > >> > > >> > > > > > > event to cap the metadata topic size.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > For the original unbounded
> remote_log_metadata
> > > > topic,
> > > > > >> I am
> > > > > >> > > not
> > > > > >> > > >> > sure
> > > > > >> > > >> > > > > > > whether we still need it or not.  If it is
> left
> > > > only
> > > > > >> for
> > > > > >> > > audit
> > > > > >> > > >> > > trail
> > > > > >> > > >> > > > > > > purpose, people can set up a data ingestion
> > > > pipeline
> > > > > to
> > > > > >> > > ingest
> > > > > >> > > >> > the
> > > > > >> > > >> > > > > > content
> > > > > >> > > >> > > > > > > of metadata topic into a separate storage
> > > location.
> > > > > I
> > > > > >> > think
> > > > > >> > > >> we
> > > > > >> > > >> > can
> > > > > >> > > >> > > > > have
> > > > > >> > > >> > > > > > a
> > > > > >> > > >> > > > > > > flag to have only one metadata topic (the
> > > compacted
> > > > > >> > > version).
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > On Monday, January 5, 2026 at 01:22:42 PM
> PST,
> > > > Lijun
> > > > > >> Tong
> > > > > >> > <
> > > > > >> > > >> > > > > > > [email protected]> wrote:
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Hello Kafka Community,
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > I would like to start a discussion on
> KIP-1266,
> > > > which
> > > > > >> > > >> proposes to
> > > > > >> > > >> > > add
> > > > > >> > > >> > > > > > > another new compacted remote log metadata
> topic
> > > for
> > > > > the
> > > > > >> > > tiered
> > > > > >> > > >> > > > storage,
> > > > > >> > > >> > > > > > to
> > > > > >> > > >> > > > > > > limit the number of messages that need to be
> > > > iterated
> > > > > >> to
> > > > > >> > > build
> > > > > >> > > >> > the
> > > > > >> > > >> > > > > remote
> > > > > >> > > >> > > > > > > metadata state.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > KIP link: KIP-1266 Bounding The Number Of
> > > > > >> > RemoteLogMetadata
> > > > > >> > > >> > > Messages
> > > > > >> > > >> > > > > via
> > > > > >> > > >> > > > > > > Compacted RemoteLogMetadata Topic
> > > > > >> > > >> > > > > > > <
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > >
> > > > > >> > > >> >
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic
> > > > > >> > > >> > > > > > > >
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Background:
> > > > > >> > > >> > > > > > > The current Tiered Storage implementation
> uses
> > a
> > > > > >> > > >> > > > __remote_log_metadata
> > > > > >> > > >> > > > > > > topic with infinite retention and
> delete-based
> > > > > cleanup
> > > > > >> > > policy,
> > > > > >> > > >> > > > causing
> > > > > >> > > >> > > > > > > unbounded growth, slow broker bootstrap, no
> > > > mechanism
> > > > > >> to
> > > > > >> > > >> clean up
> > > > > >> > > >> > > > > expired
> > > > > >> > > >> > > > > > > segment metadata, and inefficient re-reading
> > from
> > > > > >> offset 0
> > > > > >> > > >> during
> > > > > >> > > >> > > > > > > leadership changes.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Proposal:
> > > > > >> > > >> > > > > > > A dual-topic approach that introduces a new
> > > > > >> > > >> > > > > > __remote_log_metadata_compacted
> > > > > >> > > >> > > > > > > topic using log compaction with deterministic
> > > > > >> offset-based
> > > > > >> > > >> keys,
> > > > > >> > > >> > > > while
> > > > > >> > > >> > > > > > > preserving the existing topic for audit
> > history;
> > > > this
> > > > > >> > allows
> > > > > >> > > >> > > brokers
> > > > > >> > > >> > > > to
> > > > > >> > > >> > > > > > > build their metadata cache exclusively from
> the
> > > > > >> compacted
> > > > > >> > > >> topic,
> > > > > >> > > >> > > > > enables
> > > > > >> > > >> > > > > > > cleanup of expired segment metadata through
> > > > > tombstones,
> > > > > >> > and
> > > > > >> > > >> > > includes
> > > > > >> > > >> > > > a
> > > > > >> > > >> > > > > > > migration strategy to populate the new topic
> > > during
> > > > > >> > > >> > > > upgrade—delivering
> > > > > >> > > >> > > > > > > bounded metadata growth and faster broker
> > startup
> > > > > while
> > > > > >> > > >> > maintaining
> > > > > >> > > >> > > > > > > backward compatibility.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > More details are in the attached KIP link.
> > > > > >> > > >> > > > > > > Looking forward to your thoughts.
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Thank you for your time!
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > > > Best,
> > > > > >> > > >> > > > > > > Lijun Tong
> > > > > >> > > >> > > > > > >
> > > > > >> > > >> > > > > >
> > > > > >> > > >> > > > >
> > > > > >> > > >> > > >
> > > > > >> > > >> > >
> > > > > >> > > >> >
> > > > > >> > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to