If we use StringBuilder as the buffer of BufferedWriter, we need to make 
StreamEncoder and all CharsetEncoders support write(StringBuilder) to achieve 
better performance. There are many CharsetEncoders here, and this change is too 
much.
And there is a scenario where performance will be degraded. When using 
BufferedWriter::append(char[]), the content is LATIN1, and the encoding of 
OutputStreamWriter is UTF16, it will cause unnecessary conversion of
`UTF16 char[] -> LATIN1 byte[] -> UTF16 ByteBuffer`, which will degrade 
performance.
-
Shaojin Wen
------------------------------------------------------------------
发件人:Chen Liang <liangchenb...@gmail.com>
发送时间:2025年6月30日(周一) 14:07
收件人:"温绍锦(高铁)"<shaojin.we...@alibaba-inc.com>
抄 送:Brett Okken<brett.okken...@gmail.com>; 
"core-libs-dev"<core-libs-dev@openjdk.org>
主 题:Re: Eliminate unnecessary buffering and encoding conversion in 
BufferedWriter
Brainstorming time:
I think Brett's suggestion makes sense: BufferedWriter now can handle a mix of 
String and char[] inputs. So if the input is all LATIN1 Strings, we can allow 
them to stay LATIN1 in the StringBuilder and let OutputStreamWriter decide if 
the OSW has a fast path for LATIN1 when it is passed the SB through 
append(CharSequence).
At my first glance, this proposal might be broken into a few actions:
1. BufferedWriter now buffers to a StringBuilder (which can consider resetting 
the coder when length is reset to 0)
2. OutputStreamWriter now has a fast path for append(CharSequence csq) when 
this method receives a StringBuilder - if the LATIN1 or UTF16 bytes are 
compatible, we can trivially perform a write.
3. (Optional) StringBuilder can reset its coder to LATIN1 if it is called 
setLength(0)
4. (Alternative) StreamEncoder can define append(CharSequence) itself to handle 
StringBuilder, or even make use of the latest addition of CharSequence.getChars 
(but we need to defend against untrusted arrays). We have the ArrayEncoder 
interface that we might use.
However, this implies potential incompatibilities as in:
1. BufferedWriter previously only calls write(char[], int, int) to the 
delegated writer. Calling other methods may introduce incompatibilities.
2. Performing checks on mutable structures, such as StringBuilder and/or its 
array, is inherently risky to TOCTOU.
These are just the opportunities and the risks I have discovered on a first 
quick glance. I believe the whole situation is much more than what I have here, 
but we might start from some small, uncontroversial, and low-risk gains. For 
example, I would consider resetting StringBuilder coder to LATIN1 upon 
setLength(0) as a small gain with lower risk, though it does not directly 
relate to this effort of speeding up the write buffering.
Chen
On Sun, Jun 29, 2025 at 11:51 PM wenshao <shaojin.we...@alibaba-inc.com 
<mailto:shaojin.we...@alibaba-inc.com >> wrote:
Both Writer and CharsetEncoder are designed for char[]. Converting 
BufferedWriter to use byte[] value + byte coder like StringBuilder will also 
require redundant encoding conversion when using LATIN1 String, and the 
performance will not be good.
------------------------------------------------------------------
发件人:Brett Okken <brett.okken...@gmail.com <mailto:brett.okken...@gmail.com >>
发送时间:2025年6月30日(周一) 11:39
收件人:"温绍锦(高铁)"<shaojin.we...@alibaba-inc.com 
<mailto:shaojin.we...@alibaba-inc.com >>
抄 送:"core-libs-dev"<core-libs-dev@openjdk.org <mailto:core-libs-dev@openjdk.org 
>>
主 题:Re: Eliminate unnecessary buffering and encoding conversion in 
BufferedWriter
Maybe another option would be to implement BufferedWriter with a StringBuilder 
rather than a char[]. This would remove the force to utf-16
On Sun, Jun 29, 2025 at 10:36 PM Brett Okken <brett.okken...@gmail.com 
<mailto:brett.okken...@gmail.com >> wrote:
Is StreamEncoder buffering content to only write to the underlying OutputStream 
when some threshold is hit? While the layers of conversions are unfortunate, it 
seems there could be negative performance implications of having many extremely 
small writes (such as 1 character/byte) at a time to the underlying 
OutputStream.
Presumably this is a common pattern, as it is recommended:
https://github.com/openjdk/jdk/blob/4dd1b3a6100f9e379c7cee3c699d63d0d01144a7/src/java.base/share/classes/java/io/OutputStreamWriter.java#L45
 
<https://github.com/openjdk/jdk/blob/4dd1b3a6100f9e379c7cee3c699d63d0d01144a7/src/java.base/share/classes/java/io/OutputStreamWriter.java#L45
 >
On Sun, Jun 29, 2025 at 11:04 AM wenshao <shaojin.we...@alibaba-inc.com 
<mailto:shaojin.we...@alibaba-inc.com >> wrote:
BufferedWriter -> OutputStreamWriter -> StreamEncoder
In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder has a 
ByteBuffer. There are two layers of cache here, or the BufferedWriter layer can 
be removed. 
LATIN1 (byte[]) -> UTF16 (char[]) -> UTF8 (byte[])
And when charset is UTF8, if the content of write(String) is LATIN1, a 
conversion from LATIN1 to UTF16 and then to LATIN1 will occur here.
We can improve BufferedWriter. When the parameter Writer instanceof 
OutputStreamWriter is passed in, remove the cache and call it directly. In 
addition, improve write(String) in StreamEncoder to avoid unnecessary encoding 
conversion.
-
Shaojin Wen

Reply via email to