[
https://issues.apache.org/jira/browse/KAFKA-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16700692#comment-16700692
]
Hans Schuell commented on KAFKA-6188:
-------------------------------------
I have also a Confluent 5.0.0 (Kafka 2.0.0) test system on Windows 10 and the
broker stopped working after I have deleted a topic and the broker was
restarted.
The stack trace from server.log is below. The german system message from the
Windows operating system says, that the process cannot access the file, because
it is still in use.
The english Windows system message would be "*the process cannot access the
file because it is being used by another process*"
This is one major difference between *nix and Windows. Under Windows a file
cannot be moved (in the case below) or deleted, when there is still an open
file handle of the running current process or of another process. I know, that
Windows is not supported, but it is very likely, that also under Linux it is a
bit weird, that the file xxx.log is renamed to xxx.log.deleted by the
"scheduler" while the file is not yet closed.
{{[2018-11-27 17:11:04,811] INFO [Log partition=test-events-1-4,
dir=C:\tmp\confluent\kafka-logs] Scheduling log segment [baseOffset 0, size
219591] for deletion. (kafka.log.Log)}}
{{[2018-11-27 17:11:04,814] ERROR Error while deleting segments for
test-events-1-4 in dir C:\tmp\confluent\kafka-logs
(kafka.server.LogDirFailureChannel)}}
{{java.nio.file.FileSystemException:
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log ->
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log.deleted:
Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen
Prozess verwendet wird.}}{{ at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)}}
{{ at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)}}
{{ at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)}}
{{ at
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)}}
{{ at java.nio.file.Files.move(Files.java:1395)}}
{{ at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786)}}
{{ at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:211)}}
{{ at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)}}
{{ at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1751)}}
{{ at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1738)}}
{{ at
kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1309)}}
{{ at
kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1309)}}
{{ at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)}}
{{ at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)}}
{{ at kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1309)}}
{{ at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1300)}}
{{ at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1300)}}
{{ at kafka.log.Log.maybeHandleIOException(Log.scala:1837)}}
{{ at kafka.log.Log.deleteSegments(Log.scala:1300)}}
{{ at kafka.log.Log.deleteOldSegments(Log.scala:1295)}}
{{ at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1367)}}
{{ at kafka.log.Log.deleteOldSegments(Log.scala:1361)}}
{{ at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:874)}}
{{ at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:872)}}
{{ at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)}}
{{ at scala.collection.immutable.List.foreach(List.scala:392)}}
{{ at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)}}
{{ at kafka.log.LogManager.cleanupLogs(LogManager.scala:872)}}
{{ at
kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:395)}}
{{ at
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:114)}}
{{ at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)}}
{{ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
{{ at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
{{ at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
{{ at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
{{ at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
{{ Suppressed: java.nio.file.FileSystemException:
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log ->
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log.deleted:
Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen
Prozess verwendet wird.}}{{ at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)}}
{{ at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)}}
{{ at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)}}
{{ at
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)}}
{{ at java.nio.file.Files.move(Files.java:1395)}}
{{ at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783)}}
{{ ... 32 more}}
{{[2018-11-27 17:11:04,818] INFO [ReplicaManager broker=0] Stopping serving
replicas in dir C:\tmp\confluent\kafka-logs (kafka.server.ReplicaManager)}}
{{[2018-11-27 17:11:04,819] ERROR Uncaught exception in scheduled task
'kafka-log-retention' (kafka.utils.KafkaScheduler)}}
{{org.apache.kafka.common.errors.KafkaStorageException: Error while deleting
segments for test-events-1-4 in dir C:\tmp\confluent\kafka-logs}}
{{Caused by: java.nio.file.FileSystemException:
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log ->
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log.deleted:
Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen
Prozess verwendet wird.}}{{ at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)}}
{{ at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)}}
{{ at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387)}}
{{ at
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)}}
{{ at java.nio.file.Files.move(Files.java:1395)}}
{{ at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:786)}}
{{ at
org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:211)}}
{{ at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)}}
{{ at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1751)}}
{{ at kafka.log.Log.kafka$log$Log$$deleteSegment(Log.scala:1738)}}
{{ at
kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1309)}}
{{ at
kafka.log.Log$$anonfun$deleteSegments$1$$anonfun$apply$mcI$sp$1.apply(Log.scala:1309)}}
{{ at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)}}
{{ at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)}}
{{ at kafka.log.Log$$anonfun$deleteSegments$1.apply$mcI$sp(Log.scala:1309)}}
{{ at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1300)}}
{{ at kafka.log.Log$$anonfun$deleteSegments$1.apply(Log.scala:1300)}}
{{ at kafka.log.Log.maybeHandleIOException(Log.scala:1837)}}
{{ at kafka.log.Log.deleteSegments(Log.scala:1300)}}
{{ at kafka.log.Log.deleteOldSegments(Log.scala:1295)}}
{{ at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1367)}}
{{ at kafka.log.Log.deleteOldSegments(Log.scala:1361)}}
{{ at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:874)}}
{{ at kafka.log.LogManager$$anonfun$cleanupLogs$3.apply(LogManager.scala:872)}}
{{ at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)}}
{{ at scala.collection.immutable.List.foreach(List.scala:392)}}
{{ at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)}}
{{ at kafka.log.LogManager.cleanupLogs(LogManager.scala:872)}}
{{ at
kafka.log.LogManager$$anonfun$startup$1.apply$mcV$sp(LogManager.scala:395)}}
{{ at
kafka.utils.KafkaScheduler$$anonfun$1.apply$mcV$sp(KafkaScheduler.scala:114)}}
{{ at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)}}
{{ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
{{ at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)}}
{{ at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)}}
{{ at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)}}
{{ at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
{{ at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
{{ Suppressed: java.nio.file.FileSystemException:
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log ->
C:\tmp\confluent\kafka-logs\test-events-1-4\00000000000000000000.log.deleted:
Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen
Prozess verwendet wird.}}{{ at
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)}}
{{ at
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)}}
{{ at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:301)}}
{{ at
sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287)}}
{{ at java.nio.file.Files.move(Files.java:1395)}}
{{ at
org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:783)}}
{{ ... 32 more}}
{{[2018-11-27 17:11:04,824] INFO [ReplicaFetcherManager on broker 0] Removed
fetcher for partitions
__consumer_offsets-22,__consumer_offsets-30,test-events-1-2,test-events-1-4,__consumer_offsets-8,__consumer_offsets-21,test-events-2-3,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,test-events-2-2,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,test-events-1-1,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,test-events-3,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,test-events-0,test-events-4,test-events-1-0,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,test-events-2,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,test-events-2-4,test-events-2-0,test-events-1-3,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,test-events-1,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,test-events-2-1,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40
(kafka.server.ReplicaFetcherManager)}}
{{[2018-11-27 17:11:04,825] INFO [ReplicaAlterLogDirsManager on broker 0]
Removed fetcher for partitions
__consumer_offsets-22,__consumer_offsets-30,test-events-1-2,test-events-1-4,__consumer_offsets-8,__consumer_offsets-21,test-events-2-3,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,test-events-2-2,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,test-events-1-1,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,test-events-3,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,test-events-0,test-events-4,test-events-1-0,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,test-events-2,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,test-events-2-4,test-events-2-0,test-events-1-3,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,test-events-1,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,test-events-2-1,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40
(kafka.server.ReplicaAlterLogDirsManager)}}
{{[2018-11-27 17:11:04,866] INFO [ReplicaManager broker=0] Broker 0 stopped
fetcher for partitions
__consumer_offsets-22,__consumer_offsets-30,test-events-1-2,test-events-1-4,__consumer_offsets-8,__consumer_offsets-21,test-events-2-3,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-9,__consumer_offsets-46,__consumer_offsets-25,__consumer_offsets-35,__consumer_offsets-41,test-events-2-2,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,test-events-1-1,__consumer_offsets-47,__consumer_offsets-16,__consumer_offsets-28,__consumer_offsets-31,__consumer_offsets-36,__consumer_offsets-42,__consumer_offsets-3,__consumer_offsets-18,test-events-3,__consumer_offsets-37,__consumer_offsets-15,__consumer_offsets-24,test-events-0,test-events-4,test-events-1-0,__consumer_offsets-38,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-19,__consumer_offsets-11,test-events-2,__consumer_offsets-13,__consumer_offsets-2,__consumer_offsets-43,__consumer_offsets-6,__consumer_offsets-14,test-events-2-4,test-events-2-0,test-events-1-3,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-44,__consumer_offsets-39,__consumer_offsets-12,test-events-1,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-5,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-34,test-events-2-1,__consumer_offsets-10,__consumer_offsets-32,__consumer_offsets-40
and stopped moving logs for partitions because they are in the failed log
directory C:\tmp\confluent\kafka-logs. (kafka.server.ReplicaManager)}}
{{[2018-11-27 17:11:04,892] INFO Stopping serving logs in dir
C:\tmp\confluent\kafka-logs (kafka.log.LogManager)}}
{{[2018-11-27 17:11:04,905] ERROR Shutdown broker because all log dirs in
C:\tmp\confluent\kafka-logs have failed (kafka.log.LogManager)}}
> Broker fails with FATAL Shutdown - log dirs have failed
> -------------------------------------------------------
>
> Key: KAFKA-6188
> URL: https://issues.apache.org/jira/browse/KAFKA-6188
> Project: Kafka
> Issue Type: Bug
> Components: clients, log
> Affects Versions: 1.0.0, 1.0.1
> Environment: Windows 10
> Reporter: Valentina Baljak
> Assignee: Dong Lin
> Priority: Blocker
> Labels: windows
> Attachments: Segments are opened before deletion,
> kafka_2.10-0.10.2.1.zip, output.txt
>
>
> Just started with version 1.0.0 after a 4-5 months of using 0.10.2.1. The
> test environment is very simple, with only one producer and one consumer.
> Initially, everything started fine, stand alone tests worked as expected.
> However, running my code, Kafka clients fail after approximately 10 minutes.
> Kafka won't start after that and it fails with the same error.
> Deleting logs helps to start again, and the same problem occurs.
> Here is the error traceback:
> [2017-11-08 08:21:57,532] INFO Starting log cleanup with a period of 300000
> ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,548] INFO Starting log flusher with a default period of
> 9223372036854775807 ms. (kafka.log.LogManager)
> [2017-11-08 08:21:57,798] INFO Awaiting socket connections on 0.0.0.0:9092.
> (kafka.network.Acceptor)
> [2017-11-08 08:21:57,813] INFO [SocketServer brokerId=0] Started 1 acceptor
> threads (kafka.network.SocketServer)
> [2017-11-08 08:21:57,829] INFO [ExpirationReaper-0-Produce]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-DeleteRecords]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [ExpirationReaper-0-Fetch]: Starting
> (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
> [2017-11-08 08:21:57,845] INFO [LogDirFailureHandler]: Starting
> (kafka.server.ReplicaManager$LogDirFailureHandler)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Stopping serving
> replicas in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs
> (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaManager broker=0] Partitions are
> offline due to failure on log directory C:\Kafka\kafka_2.12-1.0.0\kafka-logs
> (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,860] INFO [ReplicaFetcherManager on broker 0] Removed
> fetcher for partitions (kafka.server.ReplicaFetcherManager)
> [2017-11-08 08:21:57,892] INFO [ReplicaManager broker=0] Broker 0 stopped
> fetcher for partitions because they are in the failed log dir
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.server.ReplicaManager)
> [2017-11-08 08:21:57,892] INFO Stopping serving logs in dir
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)
> [2017-11-08 08:21:57,892] FATAL Shutdown broker because all log dirs in
> C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)