[ 
https://issues.apache.org/jira/browse/KAFKA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877743#comment-17877743
 ] 

Federico Valeri edited comment on KAFKA-17428 at 8/29/24 4:34 PM:
------------------------------------------------------------------

[~showuon] we could introduce a new RemoteLogSegmentState.DANGLING terminal 
state that we can set before attempting to delete the segment that failed the 
copy (COPY_SEGMENT_STARTED -> DANGLING). That way, it would clearly reflect the 
actual segment state, and it won't count against the retention size 
calculation. Additionally, we can use this new state in 
cleanupExpiredRemoteLogSegments to periodically try to delete all dangling 
segments. Wdyt?


was (Author: fvaleri):
[~showuon] we could introduce a new RemoteLogSegmentState.DANGLING state that 
we can set before attempting to delete the segment that failed the copy. That 
way, it would clearly reflect the actual segment state, and it won't count 
against the retention size calculation. Additionally, we can use this new state 
in cleanupExpiredRemoteLogSegments to periodically try to delete all dangling 
segments. Wdyt?

> remote segments deleted in RLMCopyTask stays `COPY_SEGMENT_START` state
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-17428
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17428
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Luke Chen
>            Assignee: Federico Valeri
>            Priority: Major
>
> Currently, we will delete failed uploaded segment and Custom metadata size 
> exceeded segments in copyLogSegment in RLMCopyTask. But after deletion, these 
> segment states are still in COPY_SEGMENT_STARTED. That "might" cause 
> unexpected issues in the future. We'd better to move the state from 
> {{COPY_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_STARTED}} -> 
> {{DELETE_SEGMENT_FINISHED}}
>  
> updated:
> I thought about this when I first had a look at it and one thing that 
> bothered me is that {{DELETE_SEGMENT_STARTED}} means to me that we're now in 
> a state where we attempt deletion. However if the remote store is down and we 
> fail to copy and delete we will leave that segment in 
> {{DELETE_SEGMENT_STARTED}} and not attempt to delete it till the segment 
> itself breaches retention.ms/bytes.
> We can probably just make it clearer but that was my thought at the time.
> So, maybe when in deletion loop, we can add {{DELETE_SEGMENT_STARTED}} 
> segments into deletion directly, but that also needs to consider the 
> retention size calculation.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to