David Mollitor created AVRO-4240:
------------------------------------
Summary: Size DataFileWriter output buffer to fit entire block
frame to reduce write syscalls
Key: AVRO-4240
URL: https://issues.apache.org/jira/browse/AVRO-4240
Project: Apache Avro
Issue Type: Improvement
Components: java
Affects Versions: 1.12.1, 1.11.5, 1.10.2
Reporter: David Mollitor
Assignee: David Mollitor
Fix For: 1.13.0
DataFileStream.DataBlock#writeBlockTo writes four pieces sequentially through a
DirectBinaryEncoder into a BufferedFileOutputStream that uses the default 8KB
BufferedOutputStream buffer:
# Entry count (varint-encoded long, 1-10 bytes)
# Block size (varint-encoded long, 1-10 bytes)
# Compressed block data (~64KB at the default sync interval)
# Sync marker (16 bytes)
The default sync interval was increased from 16KB to 64KB in AVRO-1398 but the
BufferedFileOutputStream buffer size was never adjusted. Since the block data
far exceeds the 8KB buffer, BufferedOutputStream flushes the buffered entry
count and block size bytes, then writes the block data directly, then the sync
marker goes into the buffer and gets flushed again at the end, resulting in at
least 3 write syscalls per block instead of 1.
This change sizes the BufferedFileOutputStream buffer to maxBlockSize() + 20 +
sync.length so that a complete block frame fits in a single buffer, accumulates
all writes, and flushes once.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)