[
https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243730#comment-17243730
]
Wenbing Shen commented on KAFKA-10672:
--------------------------------------
* We increased the batch number of one IO read
* After many tests, the startup speed increased by 50% on average
* See the attached file for detailed code
> Restarting Kafka always takes a lot of time
> -------------------------------------------
>
> Key: KAFKA-10672
> URL: https://issues.apache.org/jira/browse/KAFKA-10672
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Affects Versions: 2.0.0
> Environment: A cluster of 21 Kafka nodes;
> Each node has 12 disks;
> Each node has about 1500 partitions;
> There are approximately 700 leader partitions per node;
> Slow-loading partitions have about 1000 log segments;
> Reporter: Wenbing Shen
> Priority: Major
> Attachments: AbstractIterator.java, AbstractIteratorOfRestart.java,
> AbstractLegacyRecordBatch.java, ByteBufferLogInputStream.java,
> DefaultRecordBatch.java, FileLogInputStream.java, FileRecords.java,
> LazyDownConversionRecords.java, Log.scala, LogInputStream.java,
> LogManager.scala, LogSegment.scala, MemoryRecords.java,
> RecordBatchIterator.java, RecordBatchIteratorOfRestart.java, Records.java,
> server.log
>
>
> When the snapshot file does not exist, or the latest snapshot file before the
> current active period, restoring the state of producers will traverse the log
> section, it will traverse the log all batch, in the period when the
> individual broker node partition number many, that there are most of the
> number of logs, can cause a lot of IO number, IO will only load one batch at
> a time, such as a log there will always be in the tens of thousands of batch,
> I found that in the code for each batch are at least two IO operation, when a
> batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be
> generated, and then at least 65,536 *2= 131,072 IO operations will be
> generated, which will lead to a lot of time spent in kafka startup process.
> We configured 15 log recovery threads in the production environment, and it
> still took more than 2 hours to load a partition,can community puts forward
> some proposals to the situation or improve.For detailed logs, see the section
> on test-perf-18 partitions in the nearby logs
--
This message was sent by Atlassian Jira
(v8.3.4#803005)