[
https://issues.apache.org/jira/browse/KAFKA-19603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18013645#comment-18013645
]
ally heev commented on KAFKA-19603:
-----------------------------------
[~proggga] you might want to create a KIP and add details there. Then, you can
trigger disc on mailing list. Guidelines here:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> Change log.segment.bytes configuration type from int to long to support
> segments larger than 2GB
> -------------------------------------------------------------------------------------------------
>
> Key: KAFKA-19603
> URL: https://issues.apache.org/jira/browse/KAFKA-19603
> Project: Kafka
> Issue Type: Improvement
> Components: core, log
> Reporter: Mikhail Fesenko
> Priority: Major
>
> h2. Description
> h3. Summary
> Change the data type of *{{log.segment.bytes}}* configuration from *{{int}}*
> to *{{long}}* to allow segment sizes beyond the current 2GB limit imposed by
> the integer maximum value.
> h3. Current Limitation
> The {{*log.segment.bytes*}} configuration currently uses an *{{int}}* data
> type, which limits the maximum segment size to ~2GB (2,147,483,647 bytes).
> This constraint becomes problematic for modern high-capacity storage
> deployments.
> h3. Background: Kafka Log Segment Structure
> Each Kafka topic partition consists of multiple log segments stored as
> separate files on disk. For each segment, Kafka maintains three core files:
> * {*}{{.log}} files{*}: Contain the actual message data
> * {*}{{.index}} files{*}: Store mappings between message offsets and their
> physical positions within the log file, allowing Kafka to quickly locate
> messages by their offset without scanning the entire log file
> * {*}{{.timeindex}} files{*}: Store mappings between message timestamps and
> their corresponding offsets, enabling efficient time-based retrieval of
> messages
> h3. Motivation
> # {*}Modern Hardware Capabilities{*}: Current deployments often use
> high-capacity storage (e.g., EPYC servers with 4×15TB drives) where 2GB
> segments are inefficiently small
> # {*}File Handle Optimization{*}: Large Kafka deployments with many topics
> can have 50-100k open files across all segment types (.log, .index,
> .timeindex files). Each segment requires open file handles, and larger
> segments would reduce the total number of files and improve caching efficiency
> # {*}Performance Benefits{*}: Fewer segment rotations in high-traffic
> scenarios would reduce I/O overhead and improve overall performance.
> Sequential disk operations are much faster than random access patterns
> # {*}Storage Efficiency{*}: Reducing segment file proliferation improves
> filesystem metadata performance and reduces inode usage on high-volume
> deployments
> # {*}Community Interest{*}: Similar requests have been raised in community
> forums (see [Confluent forum
> discussion|https://forum.confluent.io/t/what-happens-if-i-increase-log-segment-bytes/5845])
> h3. Proposed Solution
> Change *{{log.segment.bytes}}* from *{{int}}* to *{{long}}* data type,
> allowing segment sizes of 3-4GB or larger to better align with modern storage
> capabilities.
> h3. Technical Considerations (Raised by Community)
> Based on dev mailing list discussion:
> # {*}Index File Format Limitation{*}: Current index files use 4 bytes to
> represent file positions within segments, assuming 2GB cap (Jun Rao). This
> means:
> ** {{.index}} files store offset-to-position mappings using 4-byte integers
> for file positions
> ** If segments exceed 2GB, position values would overflow the 4-byte limit
> ** Index format may need to be updated to support 8-byte positions
> # {*}RemoteLogSegmentMetadata Interface{*}: Public interface currently uses
> {{int}} for {{segmentSizeInBytes}} and may need updates (Jun Rao)
> # {*}Segment File Ecosystem Impact{*}: Need to evaluate impact on all three
> file types (.log, .index, .timeindex) and their interdependencies
> # {*}Impact Assessment{*}: Need to evaluate all components that assume 2GB
> segment limit
> h3. Questions for Discussion
> # What would be a reasonable maximum segment size limit?
> # Should this change be backward compatible or require a protocol/format
> version bump?
> # Are there any other components beyond index files and
> RemoteLogSegmentMetadata that need updates?
> h3. Expected Benefits
> * Reduced number of segment files for high-volume topics
> * Improved file handle utilization and caching efficiency
> * Better alignment with modern storage hardware capabilities
> * Reduced segment rotation overhead in high-traffic scenarios
> h3. Acceptance Criteria
> * {{log.segment.bytes}} accepts long values > 2GB
> * Index file format supports larger segments (if needed)
> * RemoteLogSegmentMetadata interface updated (if needed)
> * Backward compatibility maintained
> * Documentation updated
> * Unit and integration tests added
> *Disclaimer*
> I'm relatively new to Kafka internals and the JIRA contribution process. The
> original idea and motivation came from my experience with large-scale
> deployments, but I used Claude AI to help make this ticket more detailed and
> technically structured. There may be technical inaccuracies or missing
> implementation details that I haven't considered.
> This ticket is open for community discussion and feedback before
> implementation.
> *Expert review and guidance would be greatly appreciated.*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)