fresh-borzoni opened a new pull request, #443: URL: https://github.com/apache/fluss-rust/pull/443
## Summary closes #431 Arrow log batches were previously capped at a hard 256-record limit regardless of actual data size, meaning a batch of 256 tiny ints (~1KB) was treated the same as 256 large rows (~10MB). This replaces that fixed cap with byte-size-based fullness matching Java's ArrowWriter, so batches now fill to the configured writer_batch_size (default 2MB). The fullness check uses a threshold-based optimization to avoid computing sizes on every append, it estimates how many records should fit, skips checks until that count is reached, then recalculates. Size estimation reads buffer lengths directly from Arrow builders (O(num_columns), zero allocation), with a pre-computed IPC overhead constant that captures metadata and body framing for the schema. Compression is accounted for via an adaptive ArrowCompressionRatioEstimator shared across batches for the same table. It starts at 1.0 (assume no compression) and adjusts asymmetrically after each batch is serialized, quick to increase when compression worsens, slow to decrease when it improves. This matches Java's conservative approach to avoid underestimating batch sizes. Also aligns VARIABLE_WIDTH_AVG_BYTES (64 -> 8) and INITIAL_ROW_CAPACITY (256 -> 1024) with Java Arrow defaults. Writer pooling (ArrowWriterPool) and DynamicWriteBatchSizeEstimator are out of scope -> TODOs added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
