[
https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828247#comment-17828247
]
Divij Vaidya commented on KAFKA-16385:
--------------------------------------
[~showuon] I must be missing something here but the current behaviour looks
correct to me.
Let's consider a use case from a Apache Kafka user:
I have set max segment size to be 1 GB and I have a topic with low ingress
traffic. I want to expire data in my log every 1 day due to compliance
requirement. But the partition doesn't receive 1GB of data in one day and
hence, my active segment will never become eligible for expiration.
Now, user can set segment.ms = 1 day to force a rotation even when segment is
not full. This should satisfy the use case. But how do we define the behaviour
when expiration configuration is less than roll configuration.
We have have two options:
Option 1: Ignore expiration config if it is less than rotation config
Option 2: Expiration config overrides rotation config
Option 1 prioritizes an internal configuration (ideally a user shouldn't know
about segments etc in a log) over a functional config (user wants to expire
data). This requires users to know about inner details of logs such as presence
of a segment or index etc.
At Apache Kafka, we have chosen option 2, i.e. prioritize a user facing
functionality config (expiration config) over an internal config (rotation
config).
Thoughts?
> Segment is rolled before segment.ms or segment.bytes breached
> -------------------------------------------------------------
>
> Key: KAFKA-16385
> URL: https://issues.apache.org/jira/browse/KAFKA-16385
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.5.1, 3.7.0
> Reporter: Luke Chen
> Assignee: Kuan Po Tseng
> Priority: Major
>
> Steps to reproduce:
> 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up
> the test.
> 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec .
> 2. Send a record "aaa" to the topic
> 3. Wait for 1 second
> Will this segment will rolled? I thought no.
> But what I have tested is it will roll:
> {code:java}
> [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1,
> dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms.
> (kafka.log.LocalLog)
> [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote
> producer snapshot at offset 1 with 1 producer ids in 1 ms.
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1,
> dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71,
> lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to
> log retention time 1000ms breach based on the largest record timestamp in the
> segment (kafka.log.UnifiedLog)
> {code}
> The segment is rolled due to log retention time 1000ms breached, which is
> unexpected.
> Tested in v3.5.1, it has the same issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)