[
https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828319#comment-17828319
]
Luke Chen commented on KAFKA-16385:
-----------------------------------
[~divijvaidya], I was thinking the use case you mentioned:
_I have set max segment size to be 1 GB and I have a topic with low ingress
traffic. I want to expire data in my log every 1 day due to compliance
requirement. But the partition doesn't receive 1GB of data in one day and
hence, my active segment will never become eligible for expiration. _
OK, so, even if we adopt the option 2, we still cannot guarantee all the data
expire the 1 day limit will get deleted. Let's say, when right before the
retention thread starting to check, a new record arrived. In this case, this
segment won't be eligible for expiration even though it contains data over 1
day. And it breaks the contract of the retention.ms.
Again, I don't know which is the expected behavior we want. So I'd like to hear
more comments from the community/experts.
> Segment is rolled before segment.ms or segment.bytes breached
> -------------------------------------------------------------
>
> Key: KAFKA-16385
> URL: https://issues.apache.org/jira/browse/KAFKA-16385
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.5.1, 3.7.0
> Reporter: Luke Chen
> Assignee: Kuan Po Tseng
> Priority: Major
>
> Steps to reproduce:
> 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up
> the test.
> 1. Creating a topic with the config: segment.ms=7days , retention.ms=1sec .
> 2. Send a record "aaa" to the topic
> 3. Wait for 1 second
> Will this segment will rolled? I thought no.
> But what I have tested is it will roll:
> {code:java}
> [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1,
> dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms.
> (kafka.log.LocalLog)
> [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote
> producer snapshot at offset 1 with 1 producer ids in 1 ms.
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1,
> dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71,
> lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to
> log retention time 1000ms breach based on the largest record timestamp in the
> segment (kafka.log.UnifiedLog)
> {code}
> The segment is rolled due to log retention time 1000ms breached, which is
> unexpected.
> Tested in v3.5.1, it has the same issue.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)