knoxy5467 opened a new pull request, #20358:
URL: https://github.com/apache/kafka/pull/20358
### Summary
This PR fixes two critical issues related to producer batch splitting that
can cause infinite retry loops and stack overflow errors when batch sizes are
significantly larger than broker-configured message size limits.
### Issues Addressed
- **KAFKA-8350**: Producers endlessly retry batch splitting when
`batch.size` is much larger than topic-level `message.max.bytes`, leading to
infinite retry loops with "MESSAGE_TOO_LARGE" errors
- **KAFKA-8202**: Stack overflow errors in `FutureRecordMetadata.chain()`
due to excessive recursive splitting attempts
### Root Cause
The existing batch splitting logic in
`RecordAccumulator.splitAndReenqueue()` always used the configured `batchSize`
parameter for splitting, regardless of whether the batch had already been split
before. This caused:
1. **Infinite loops**: When `batch.size` (e.g., 8MB) >> `message.max.bytes`
(e.g., 1MB), splits would never succeed since the split size was still too large
2. **Stack overflow**: Repeated splitting attempts created deep call chains
in the metadata chaining logic
### Solution
Implemented progressive batch splitting logic:
```java
int maxBatchSize = this.batchSize;
if (bigBatch.isSplitBatch()) {
maxBatchSize = Math.max(bigBatch.maxRecordSize,
bigBatch.estimatedSizeInBytes() / 2);
}
```
__Key improvements:__
- __First split__: Uses original `batchSize` (maintains backward
compatibility)
- __Subsequent splits__: Uses the larger of:
- `maxRecordSize`: Ensures we can always split down to individual records
- `estimatedSizeInBytes() / 2`: Provides geometric reduction for faster
convergence
### Testing
Added comprehensive test `testSplitAndReenqueuePreventInfiniteRecursion()`
that:
- Creates oversized batches with 100 records of 1KB each
- Verifies splitting can reduce batches to single-record size
- Ensures no infinite recursion (safety limit of 100 operations)
- Validates no data loss or duplication during splitting
- Confirms all original records are preserved with correct keys
### Backward Compatibility
- No breaking changes to public APIs
- First split attempt still uses original `batchSize` configuration
- Progressive splitting only engages for retry scenarios
###
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]