[
https://issues.apache.org/jira/browse/KAFKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859975#comment-17859975
]
lkgen edited comment on KAFKA-1194 at 6/26/24 4:17 AM:
-------------------------------------------------------
Hi, I have been trying to use the pull request code of
[https://github.com/apache/kafka/pull/12331] in Windows with Kafka stream with
a test program
the test program seems to cause the Kafka to crash
Used the Kafka source code from pull request branch, set the following
server.properties modifications to cause the crash to happen after a short time
{code:java}
#comment out
log.retention.hours=168
#and add instead
log.retention.ms=30000
#change existing
log.retention.check.interval.ms to 5000
#and add
log.roll.ms = 20000{code}
The test program I use for Kafka streams source code is attached in the file
SendFor1194StreamCrash.zip
you can compile it with mvn package
Then copy the jar kafkaStreamTest-1.0-SNAPSHOT-jar-with-dependencies.jar to the
windows machine with kafka and run
{code:java}
java -jar kafkaStreamTest-1.0-SNAPSHOT-jar-with-dependencies.jar
{code}
In my test environment the Kafka crashed after several minutes with the error
{code:java}
[2024-06-25 19:13:20,749] ERROR Failed to clean up log for
__transaction_state-48 in dir C:\tmp\kafka-logs due to IOException
(org.apache.kafka.storage.internals.log.LogDirFailureChannel)
java.nio.file.FileAlreadyExistsException:
C:\tmp\kafka-logs\__transaction_state-48\00000000000000000000.log ->
C:\tmp\kafka-logs\__transaction_state-48\00000000000000000000.log.deleted
at
java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:87)
at
java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
at
java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1422)
at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951)
at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:247)
{code}
was (Author: lkgen):
Hi, I have been trying to use the pull request code of
[https://github.com/apache/kafka/pull/12331] in Windows with Kafka stream with
a test program
the test program seems to cause the Kafka to crash
Used the Kafka source code from pull request branch, set the following
server.properties modifications to cause the crash to happen after a short time
{code:java}
#comment out
log.retention.hours=168
#and add instead
log.retention.ms=30000
#change existing
log.retention.check.interval.ms to 5000
#and add
log.roll.ms = 20000{code}
The test program I use for Kafka streams source code is attached in the file
SendFor1194StreamCrash.zip
you can compile it with mvn package
Then copy the jar kafkaStreamTest-1.0-SNAPSHOT-jar-with-dependencies.jar to the
windows machine with kafka and run
{code:java}
java -jar kafkaStreamTest-1.0-SNAPSHOT-jar-with-dependencies.jar
{code}
In my test environment the Kafka crashed after several minutes with the error
{code:java}
[2024-06-25 19:13:20,749] ERROR Failed to clean up log for
__transaction_state-48 in dir C:\tmp\kafka-logs due to IOException
(org.apache.kafka.storage.internals.log.LogDirFailureChannel)
java.nio.file.FileAlreadyExistsException:
C:\tmp\kafka-logs\__transaction_state-48\00000000000000000000.log ->
C:\tmp\kafka-logs\__transaction_state-48\00000000000000000000.log.deleted
at
java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:87)
at
java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:395)
at
java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292)
at java.base/java.nio.file.Files.move(Files.java:1422)
at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:951)
at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:247)
{code}
> The kafka broker cannot delete the old log files after the configured time
> --------------------------------------------------------------------------
>
> Key: KAFKA-1194
> URL: https://issues.apache.org/jira/browse/KAFKA-1194
> Project: Kafka
> Issue Type: Bug
> Components: log
> Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
> Environment: window
> Reporter: Tao Qin
> Priority: Critical
> Labels: features, patch, windows
> Attachments: KAFKA-1194.patch, RetentionExpiredWindows.txt,
> SendFor1194StreamCrash.zip, Untitled.jpg, image-2018-09-12-14-25-52-632.png,
> image-2018-11-26-10-18-59-381.png, kafka-1194-v1.patch, kafka-1194-v2.patch,
> kafka-bombarder.7z, kafka370fixwin.patch, screenshot-1.png
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> We tested it in windows environment, and set the log.retention.hours to 24
> hours.
> # The minimum age of a log file to be eligible for deletion
> log.retention.hours=24
> After several days, the kafka broker still cannot delete the old log file.
> And we get the following exceptions:
> [2013-12-19 01:57:38,528] ERROR Uncaught exception in scheduled task
> 'kafka-log-retention' (kafka.utils.KafkaScheduler)
> kafka.common.KafkaStorageException: Failed to change the log file suffix from
> to .deleted for log segment 1516723
> at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:249)
> at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:638)
> at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:629)
> at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418)
> at kafka.log.Log$$anonfun$deleteOldSegments$1.apply(Log.scala:418)
> at
> scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
> at scala.collection.immutable.List.foreach(List.scala:76)
> at kafka.log.Log.deleteOldSegments(Log.scala:418)
> at
> kafka.log.LogManager.kafka$log$LogManager$$cleanupExpiredSegments(LogManager.scala:284)
> at
> kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:316)
> at
> kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:314)
> at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)
> at scala.collection.Iterator$class.foreach(Iterator.scala:772)
> at
> scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:573)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:73)
> at
> scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:615)
> at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:742)
> at kafka.log.LogManager.cleanupLogs(LogManager.scala:314)
> at
> kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:143)
> at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> I think this error happens because kafka tries to rename the log file when it
> is still opened. So we should close the file first before rename.
> The index file uses a special data structure, the MappedByteBuffer. Javadoc
> describes it as:
> A mapped byte buffer and the file mapping that it represents remain valid
> until the buffer itself is garbage-collected.
> Fortunately, I find a forceUnmap function in kafka code, and perhaps it can
> be used to free the MappedByteBuffer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)