On Tue, 1 Jul 2025 00:01:21 GMT, Shaojin Wen <s...@openjdk.org> wrote:
>> BufferedWriter -> OutputStreamWriter -> StreamEncoder >> >> In this call chain, BufferedWriter has a char[] buffer, and StreamEncoder >> has a ByteBuffer. There are two layers of cache here, or the BufferedWriter >> layer can be removed. And when charset is UTF8, if the content of >> write(String) is LATIN1, a conversion from LATIN1 to UTF16 and then to >> LATIN1 will occur here. >> >> LATIN1 -> UTF16 -> UTF8 >> >> We can improve BufferedWriter. When the parameter Writer instanceof >> OutputStreamWriter is passed in, remove the cache and call it directly. In >> addition, improve write(String) in StreamEncoder to avoid unnecessary >> encoding conversion. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > Revert "BufferedWriter buffer use StringBuilder" > > This reverts commit da902ca0b0bd6acc003deb8ad1ca0d6485a29a27. According to the suggestions of liach and xx, I added the improvement research of BufferedWriter using StringBuilder as buffer + ArrayEncoder to the current PR, which can have a good performance improvement in non-UTF8 scenarios. The code is this branch: https://github.com/wenshao/jdk/tree/utf8_writer_202506_x4 There are a lot of code changes, which should be another PR. git remote add wenshao g...@github.com:wenshao/jdk.git git fetch wenshao ## Baseline # https://github.com/wenshao/jdk/tree/utf8_writer_202506_test git checkout 2758d6ad7767832db004d28f10cc764f33fa438e make test TEST="micro:java.io.BufferedWriterBench" MICRO="OPTIONS=-p charset=ISO_8859_1,ASCII,UTF8,UTF16,GB18030 # Current (PR 26022 & BufferedWriter use StringBuilder as buffer) # https://github.com/wenshao/jdk/tree/utf8_writer_202506_x4 git checkout 77c5996b6a7b7ea74d03b64c4c8e827a7d76f05a make test TEST="micro:java.io.BufferedWriterBench" MICRO="OPTIONS=-p charset=ISO_8859_1,ASCII,UTF8,UTF16,GB18030 ## Benchmark Numbers on Aliyun ECS c9i (Intel x64 CPU) Benchmark (charType) (charset) Units Base_Score Current_Score Improvement(%) writeCharArray ascii ISO_8859_1 us/op 3.128 3.027 +3.23% writeCharArray ascii ASCII us/op 3.126 3.351 -6.88% writeCharArray ascii UTF8 us/op 3.125 3.716 -18.91% writeCharArray ascii UTF16 us/op 32.469 11.404 +64.87% writeCharArray ascii GB18030 us/op 9.642 7.296 +24.34% writeCharArray utf8_2_bytes ISO_8859_1 us/op 3.137 3.016 +3.86% writeCharArray utf8_2_bytes ASCII us/op 96.779 8.725 +90.99% writeCharArray utf8_2_bytes UTF8 us/op 17.346 12.966 +25.25% writeCharArray utf8_2_bytes UTF16 us/op 32.407 11.267 +65.19% writeCharArray utf8_2_bytes GB18030 us/op 82.994 12.401 +85.14% writeCharArray utf8_3_bytes ISO_8859_1 us/op 100.063 7.486 +92.51% writeCharArray utf8_3_bytes ASCII us/op 96.061 9.236 +90.40% writeCharArray utf8_3_bytes UTF8 us/op 28.340 13.358 +52.86% writeCharArray utf8_3_bytes UTF16 us/op 32.468 11.785 +63.70% writeCharArray utf8_3_bytes GB18030 us/op 40.864 37.012 +9.66% writeCharArray emoji ISO_8859_1 us/op 190.547 10.149 +94.67% writeCharArray emoji ASCII us/op 187.803 12.774 +93.17% writeCharArray emoji UTF8 us/op 41.493 23.473 +43.49% writeCharArray emoji UTF16 us/op 48.248 16.227 +66.36% writeCharArray emoji GB18030 us/op 147.360 63.437 +57.01% writeString ascii ISO_8859_1 us/op 3.340 2.770 +17.09% writeString ascii ASCII us/op 3.340 3.069 +8.11% writeString ascii UTF8 us/op 3.324 2.944 +11.43% writeString ascii UTF16 us/op 32.503 11.214 +65.49% writeString ascii GB18030 us/op 9.023 6.999 +22.43% writeString utf8_2_bytes ISO_8859_1 us/op 3.338 2.827 +15.31% writeString utf8_2_bytes ASCII us/op 95.964 8.542 +91.10% writeString utf8_2_bytes UTF8 us/op 17.660 10.155 +42.44% writeString utf8_2_bytes UTF16 us/op 32.516 11.173 +65.63% writeString utf8_2_bytes GB18030 us/op 82.369 12.231 +85.14% writeString utf8_3_bytes ISO_8859_1 us/op 100.280 7.363 +92.66% writeString utf8_3_bytes ASCII us/op 95.279 9.060 +90.48% writeString utf8_3_bytes UTF8 us/op 28.344 18.366 +35.19% writeString utf8_3_bytes UTF16 us/op 32.672 11.284 +65.43% writeString utf8_3_bytes GB18030 us/op 43.798 37.145 +15.16% writeString emoji ISO_8859_1 us/op 189.574 9.904 +94.75% writeString emoji ASCII us/op 187.021 12.427 +93.35% writeString emoji UTF8 us/op 41.775 25.875 +37.98% writeString emoji UTF16 us/op 48.240 15.696 +67.10% writeString emoji GB18030 us/op 147.097 63.587 +56.78% ## Benchmark Numbers on MacBook M1 Pro (aarch64) Benchmark (charType) (charset) Units Base_Score Current_Score Improvement(%) BufferedWriterBench.writeCharArray ascii ISO_8859_1 us/op 2.815 2.133 +24.20% BufferedWriterBench.writeCharArray ascii ASCII us/op 2.742 2.352 +14.22% BufferedWriterBench.writeCharArray ascii UTF8 us/op 2.704 2.616 +3.25% BufferedWriterBench.writeCharArray ascii UTF16 us/op 31.294 8.489 +72.87% BufferedWriterBench.writeCharArray ascii GB18030 us/op 8.932 3.820 +57.20% BufferedWriterBench.writeCharArray utf8_2_bytes ISO_8859_1 us/op 2.828 2.210 +21.85% BufferedWriterBench.writeCharArray utf8_2_bytes ASCII us/op 109.255 5.669 +94.80% BufferedWriterBench.writeCharArray utf8_2_bytes UTF8 us/op 22.353 14.039 +37.15% BufferedWriterBench.writeCharArray utf8_2_bytes UTF16 us/op 31.268 8.349 +73.28% BufferedWriterBench.writeCharArray utf8_2_bytes GB18030 us/op 90.835 6.816 +92.50% BufferedWriterBench.writeCharArray utf8_3_bytes ISO_8859_1 us/op 109.734 7.834 +92.88% BufferedWriterBench.writeCharArray utf8_3_bytes ASCII us/op 106.981 7.906 +92.60% BufferedWriterBench.writeCharArray utf8_3_bytes UTF8 us/op 21.453 16.076 +25.07% BufferedWriterBench.writeCharArray utf8_3_bytes UTF16 us/op 31.294 6.945 +77.75% BufferedWriterBench.writeCharArray utf8_3_bytes GB18030 us/op 49.007 27.891 +43.02% BufferedWriterBench.writeCharArray emoji ISO_8859_1 us/op 223.538 11.189 +94.54% BufferedWriterBench.writeCharArray emoji ASCII us/op 264.875 11.384 +95.69% BufferedWriterBench.writeCharArray emoji UTF8 us/op 35.704 21.672 +39.29% BufferedWriterBench.writeCharArray emoji UTF16 us/op 45.979 11.255 +75.51% BufferedWriterBench.writeCharArray emoji GB18030 us/op 148.829 57.625 +61.33% BufferedWriterBench.writeString ascii ISO_8859_1 us/op 2.898 2.159 +25.49% BufferedWriterBench.writeString ascii ASCII us/op 2.876 2.591 +9.91% BufferedWriterBench.writeString ascii UTF8 us/op 2.894 2.466 +14.79% BufferedWriterBench.writeString ascii UTF16 us/op 31.236 8.759 +71.82% BufferedWriterBench.writeString ascii GB18030 us/op 9.010 3.899 +56.70% BufferedWriterBench.writeString utf8_2_bytes ISO_8859_1 us/op 2.894 2.178 +24.71% BufferedWriterBench.writeString utf8_2_bytes ASCII us/op 108.426 5.611 +94.82% BufferedWriterBench.writeString utf8_2_bytes UTF8 us/op 22.206 12.225 +44.93% BufferedWriterBench.writeString utf8_2_bytes UTF16 us/op 31.305 8.773 +71.98% BufferedWriterBench.writeString utf8_2_bytes GB18030 us/op 90.820 6.907 +92.40% BufferedWriterBench.writeString utf8_3_bytes ISO_8859_1 us/op 108.983 7.931 +92.66% BufferedWriterBench.writeString utf8_3_bytes ASCII us/op 107.064 7.836 +92.66% BufferedWriterBench.writeString utf8_3_bytes UTF8 us/op 21.664 13.102 +39.47% BufferedWriterBench.writeString utf8_3_bytes UTF16 us/op 31.546 6.930 +78.00% BufferedWriterBench.writeString utf8_3_bytes GB18030 us/op 52.688 27.698 +47.17% BufferedWriterBench.writeString emoji ISO_8859_1 us/op 221.930 11.160 +94.95% BufferedWriterBench.writeString emoji ASCII us/op 236.791 11.116 +95.30% BufferedWriterBench.writeString emoji UTF8 us/op 35.025 23.210 +33.73% BufferedWriterBench.writeString emoji UTF16 us/op 45.988 11.334 +75.32% BufferedWriterBench.writeString emoji GB18030 us/op 148.202 57.472 +61.23% ------------- PR Comment: https://git.openjdk.org/jdk/pull/26022#issuecomment-3027011273