Re: RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

Brett Okken Tue, 01 Jul 2025 07:37:10 -0700

On Tue, 1 Jul 2025 00:01:21 GMT, Shaojin Wen <s...@openjdk.org> wrote:


>> BufferedWriter -> OutputStreamWriter -> StreamEncoder
>> 
>> In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder 
>> has a ByteBuffer. There are two layers of cache here, or the BufferedWriter 
>> layer can be removed. And when charset is UTF8, if the content of 
>> write(String) is LATIN1, a conversion from LATIN1 to UTF16 and then to 
>> LATIN1 will occur here.
>> 
>> LATIN1 -> UTF16 -> UTF8
>> 
>> We can improve BufferedWriter. When the parameter Writer instanceof 
>> OutputStreamWriter is passed in, remove the cache and call it directly. In 
>> addition, improve write(String) in StreamEncoder to avoid unnecessary 
>> encoding conversion.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Revert "BufferedWriter buffer use StringBuilder"
>   
>   This reverts commit da902ca0b0bd6acc003deb8ad1ca0d6485a29a27.

My initial impression was that the point of this PR was that the BufferedWriter 
was forcing the conversion to utf-16 and bypassing that would avoid a 
conversion. However, it seems that it is actually the 
StreamEncoder/CharsetEncoder that is really forcing that - and the conversion 
to utf-16 is required for optimal encoder performance.

The result (of this PR), then, seems to be that for OutputStreamWriter as a 
target (and maybe for specific character encodings) BufferedWriter no long 
buffers, but delegates that responsibility to the OutputStreamWriter (and its 
StreamEncoder).

Are there any scenarios where wrapping an OutputStreamWriter with a 
BufferedWriter makes sense? Is it only to control the buffer size? If so, 
should OutputStreamWriter itself just allow consumers to control the buffer 
size? (And then just change the doc of OutputStreamWriter to discourage the use 
of BufferedWriter - and change PrintWriter to not [create this 
combo](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/io/PrintWriter.java#L167).)

Should the various encoders be optimized to work with a StringCharBuffer? 
Perhaps only if backed by a String or AbstractStringBuilder? It seems that 
there could be more target character encodings beyond utf-8 and utf-16 (i.e 
ascii, iso-8859-1, cp1252, etc.) which could benefit from the source already 
known whether it is latin 1. 
It feels strange to place the optimizations for specific character encodings 
directly in StreamEncoder.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26022#issuecomment-3024309269

Re: RFR: 8361018: Re-examine buffering and encoding conversion in BufferedWriter [v6]

Reply via email to