[Bug 69552] Performance regression in MessageBytes.toBytes() when using non-default charset

bugzilla Thu, 06 Feb 2025 13:01:03 -0800

https://bz.apache.org/bugzilla/show_bug.cgi?id=69552


--- Comment #7 from John Engebretson <jeng...@amazon.com> ---
Looks like the right move here is to add an extra pointer on MessageBytes and
reuse the utf8 payload.  For example, modify the last part of toBytes() with an
extra branch:

        ByteBuffer bb;
        if (type == T_CHARS) {
            bb = getCharset().encode(CharBuffer.wrap(charC));
        /* next lines are new */
        } else if (getCharset() == StandardCharsets.UTF_8) {
            if (utf8_bytes == null) {
                utf8_bytes = StandardCharsets.UTF_8.encode(strValue);
            }
            bb = utf8_bytes;
        } else {
            // Must be T_STR
            bb = getCharset().encode(strValue);
        }

This eliminates the redundant parsing of the message and allocating/populating
of the ByteBuffer... a.k.a. large reduction in object allocation and duplicate
work.

The tradeoff is that it extends the lifetime of the currently-created
ByteBuffers from this method to the lifetime of the MessageBytes instance,
which is typically the duration of the current request.  That seems acceptable
to me.

Also, this optimization is applicable only to applications outputting utf-8. 
Various sources on the web say utf-8 is by far the most common, so we should be
okay; the second-most common is ISO_8859_1 and that is already optimized.  Any
other character set should see net-neutral performance from this change.

Thoughts?

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

[Bug 69552] Performance regression in MessageBytes.toBytes() when using non-default charset

Reply via email to