[ 
https://issues.apache.org/jira/browse/AVRO-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned AVRO-4240:
------------------------------------


> Size DataFileWriter output buffer to fit entire block frame to reduce write 
> syscalls
> ------------------------------------------------------------------------------------
>
>                 Key: AVRO-4240
>                 URL: https://issues.apache.org/jira/browse/AVRO-4240
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.12.1, 1.11.5, 1.10.2
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>             Fix For: 1.13.0
>
>
> DataFileStream.DataBlock#writeBlockTo writes four pieces sequentially through 
> a DirectBinaryEncoder into a BufferedFileOutputStream that uses the default 
> 8KB BufferedOutputStream buffer:
>  # Entry count (varint-encoded long, 1-10 bytes)
>  # Block size (varint-encoded long, 1-10 bytes)
>  # Compressed block data (~64KB at the default sync interval)
>  # Sync marker (16 bytes)
> The default sync interval was increased from 16KB to 64KB in AVRO-1398 but 
> the BufferedFileOutputStream buffer size was never adjusted. Since the block 
> data far exceeds the 8KB buffer, BufferedOutputStream flushes the buffered 
> entry count and block size bytes, then writes the block data directly, then 
> the sync marker goes into the buffer and gets flushed again at the end, 
> resulting in at least 3 write syscalls per block instead of 1.
> This change sizes the BufferedFileOutputStream buffer to maxBlockSize() + 20 
> + sync.length so that a complete block frame fits in a single buffer, 
> accumulates all writes, and flushes once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to