[
https://issues.apache.org/jira/browse/KAFKA-7074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phil Mikhailov updated KAFKA-7074:
----------------------------------
Attachment: LogCleanerError.log
> Race condition between Kafka log deletion and compaction
> --------------------------------------------------------
>
> Key: KAFKA-7074
> URL: https://issues.apache.org/jira/browse/KAFKA-7074
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.10.2.1
> Reporter: Phil Mikhailov
> Priority: Major
> Attachments: LogCleanerError.log
>
>
> We have one of standard options of deploying Kafka-based EventSourcing
> microservice - deployment with reset. It includes running
> {{kafka-reset-tool}} before deploying and running microservice which should
> reset offsets for input topics and delete intermediate topics to allow
> microservice rebuild its internal state.
> We have faced a problem several times (its not 100% reproducible) that
> {{LogCleaner}} crashed with {{NoSuchFileException}} during compaction routine
> after such deployment. See detailed log:
> {code}
> Disable possible compaction: [2018-06-14 18:25:01,464] INFO
> The cleaning for partition
> reports-app-ReportsContractExpansionStateStore-changelog-7 is aborted
> Deletion happended: [2018-06-14 18:25:56,761] INFO
> Deleting index
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
> (kafka.log.TimeIndex)
> Compaction failed 'cause it can't find file: [2018-06-14 18:25:57,402] ERROR
> [kafka-log-cleaner-thread-0], Error due to (kafka.log.LogCleaner)
>
> kafka.common.KafkaStorageException: Failed to change the timeindex file
> suffix from to .deleted for log segment 0
> at
> kafka.log.LogSegment.kafkaStorageException$1(LogSegment.scala:340)
> at
> kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:350)
> at
> kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:981)
> at
> kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1027)
> at
> kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1022)
> at
> scala.collection.immutable.List.foreach(List.scala:381)
> at
> kafka.log.Log.replaceSegments(Log.scala:1022)
> at
> kafka.log.Cleaner.cleanSegments(LogCleaner.scala:426)
> at
> kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at
> kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at
> scala.collection.immutable.List.foreach(List.scala:381)
> at
> kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at
> kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at
> kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> Caused by:
> java.nio.file.NoSuchFileException:
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
> at
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at
> java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:711)
> at
> kafka.log.AbstractIndex.renameTo(AbstractIndex.scala:126)
> ... 14 more
> Suppressed:
> java.nio.file.NoSuchFileException:
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex
> ->
>
> /var/lib/kafka/data/reports-app-ReportsContractExpansionStateStore-changelog-7/00000000000000000000.timeindex.deleted
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
> at
> sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
> at
> java.nio.file.Files.move(Files.java:1395)
> at
> org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:708)
> ... 15
> more
> {code}
> It looks like disabling compaction while deleting log doesn't work.
> After that compaction stops working and space consumption grows and there is
> only one solution to this problem is to restart broker.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)