Re: [PR] XmlLayout: replace invalid XML characters with U+FFFD [logging-log4j2]

via GitHub Tue, 24 Mar 2026 12:17:16 -0700


Copilot commented on code in PR #4077:
URL: https://github.com/apache/logging-log4j2/pull/4077#discussion_r2983764216



##########
log4j-core/pom.xml:
##########
@@ -66,7 +66,7 @@
       org.apache.commons.compress.*;resolution:=optional,
       org.apache.commons.csv;resolution:=optional,
       org.apache.kafka.*;resolution:=optional,

Review Comment:
   The BND import pattern `org.codehaus.stax2.*` does not include the base 
package `org.codehaus.stax2`, but this module uses 
`org.codehaus.stax2.XMLStreamWriter2` (and other types in the base package). 
This can cause OSGi resolution failures. Include both `org.codehaus.stax2` and 
`org.codehaus.stax2.*` (or use a pattern that matches both) in 
`bnd-extra-package-options`.
   ```suggestion
         org.apache.kafka.*;resolution:=optional,
         org.codehaus.stax2;resolution:=optional,
   ```



##########
log4j-core/src/main/java/org/apache/logging/log4j/core/jackson/Log4jXmlObjectMapper.java:
##########
@@ -41,7 +51,144 @@ public Log4jXmlObjectMapper() {
      * Create a new instance using the {@link Log4jXmlModule}.
      */
     public Log4jXmlObjectMapper(final boolean includeStacktrace, final boolean 
stacktraceAsString) {
-        super(new Log4jXmlModule(includeStacktrace, stacktraceAsString));
+        super(new SanitizingXmlFactory(), new 
Log4jXmlModule(includeStacktrace, stacktraceAsString));
         this.setSerializationInclusion(JsonInclude.Include.NON_EMPTY);
     }
+
+    /**
+     * Writer that sanitizes text to be valid XML 1.0 by replacing disallowed 
code points with the replacement character (U+FFFD).
+     */
+    private static final class SanitizingWriter extends StreamWriter2Delegate {
+
+        private static final char REPLACEMENT_CHAR = '\uFFFD';
+
+        SanitizingWriter(final XMLStreamWriter2 delegate) {
+            super(delegate);
+            setParent(delegate);
+        }
+
+        @Override
+        public void writeAttribute(final String localName, final String value) 
throws XMLStreamException {
+            super.writeAttribute(localName, sanitizeXml10(value));
+        }
+
+        @Override
+        public void writeAttribute(final String namespaceURI, final String 
localName, final String value)
+                throws XMLStreamException {
+            super.writeAttribute(namespaceURI, localName, 
sanitizeXml10(value));
+        }
+
+        @Override
+        public void writeAttribute(
+                final String prefix, final String namespaceURI, final String 
localName, final String value)
+                throws XMLStreamException {
+            super.writeAttribute(prefix, namespaceURI, localName, 
sanitizeXml10(value));
+        }
+
+        @Override
+        public void writeCData(String text) throws XMLStreamException {
+            super.writeCData(sanitizeXml10(text));
+        }
+
+        @Override
+        public void writeCData(char[] text, int start, int len) throws 
XMLStreamException {
+            super.writeCData(sanitizeXml10(text, start, len));
+        }
+
+        @Override
+        public void writeCharacters(final String text) throws 
XMLStreamException {
+            super.writeCharacters(sanitizeXml10(text));
+        }
+
+        @Override
+        public void writeCharacters(final char[] text, final int start, final 
int len) throws XMLStreamException {
+            super.writeCharacters(sanitizeXml10(text, start, len));
+        }
+
+        @Override
+        public void writeComment(String text) throws XMLStreamException {
+            super.writeComment(sanitizeXml10(text));
+        }
+
+        private static String sanitizeXml10(final String input) {
+            if (input == null) {
+                return null;
+            }
+            final int length = input.length();
+            // Only create a new string if we find an invalid code point.
+            // In the common case, this should avoid unnecessary allocations.
+            for (int i = 0; i < length; ) {
+                final int cp = input.codePointAt(i);
+                if (!isValidXml10(cp)) {
+                    final StringBuilder out = new StringBuilder(length);
+                    out.append(input, 0, i);
+                    appendSanitized(input, i, length, out);
+                    return out.toString();
+                }
+                i += Character.charCount(cp);
+            }
+            return input;
+        }
+
+        private static String sanitizeXml10(final char[] input, final int 
start, final int len) {
+            return sanitizeXml10(new String(input, start, len));
+        }
+
+        private static void appendSanitized(final String input, int i, final 
int length, final StringBuilder out) {
+            while (i < length) {
+                final int cp = input.codePointAt(i);
+                out.appendCodePoint(isValidXml10(cp) ? cp : REPLACEMENT_CHAR);
+                i += Character.charCount(cp);
+            }
+        }
+
+        /**
+         * Checks if a code point is valid
+         *
+         * @param codePoint a code point between {@code 0} and {@link 
Character#MAX_CODE_POINT}
+         * @return {@code true} if it is a valid XML 1.0 code point
+         */
+        private static boolean isValidXml10(final int codePoint) {
+            assert codePoint >= 0 && codePoint <= Character.MAX_CODE_POINT;
+            // XML 1.0 valid characters (Fifth Edition):
+            //   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
[#x10000-#x10FFFF]
+
+            // [#x20–#xD7FF] (placed early as a fast path for the most common 
case)
+            return (codePoint >= ' ' && codePoint < Character.MIN_SURROGATE)
+                    // #x9
+                    || codePoint == '\t'
+                    // #xA
+                    || codePoint == '\n'
+                    // #xD
+                    || codePoint == '\r'
+                    // [#xE000-#xFFFD]
+                    || (codePoint > Character.MAX_SURROGATE && codePoint <= 
0xFFFD)
+                    // [#x10000-#x10FFFF]
+                    || codePoint >= Character.MIN_SUPPLEMENTARY_CODE_POINT;
+        }
+    }
+
+    /**
+     * Factory that creates {@link SanitizingWriter} instances to ensure that 
all text written to the XML output is valid XML 1.0.
+     */
+    private static final class SanitizingXmlFactory extends XmlFactory {
+
+        private static final long serialVersionUID = 1L;
+
+        @Override
+        protected XMLStreamWriter _createXmlWriter(final IOContext ctxt, final 
Writer w) throws IOException {
+            return new 
SanitizingWriter(Stax2WriterAdapter.wrapIfNecessary(super._createXmlWriter(ctxt,
 w)));
+        }
+
+        @Override
+        protected XMLStreamWriter _createXmlWriter(final IOContext ctxt, final 
OutputStream out) throws IOException {
+            return new 
SanitizingWriter(Stax2WriterAdapter.wrapIfNecessary(super._createXmlWriter(ctxt,
 out)));
+        }
+
+        @Override
+        public XmlFactory copy() {
+            _checkInvalidCopy(SanitizingXmlFactory.class);
+            return new SanitizingXmlFactory();
+        }

Review Comment:
   `SanitizingXmlFactory.copy()` currently returns a brand new factory without 
copying configuration/state from the original `XmlFactory`. Jackson calls 
`JsonFactory/XmlFactory.copy()` when copying mappers/writers, and returning a 
default instance can silently drop feature flags and factory configuration. 
Implement `copy()` so it copies from `this` (typically by adding a copy 
constructor and delegating to `super(src, codec)`/equivalent) instead of 
returning a fresh default factory.



##########
log4j-core/src/main/java/org/apache/logging/log4j/core/jackson/Log4jXmlObjectMapper.java:
##########
@@ -41,7 +51,144 @@ public Log4jXmlObjectMapper() {
      * Create a new instance using the {@link Log4jXmlModule}.
      */
     public Log4jXmlObjectMapper(final boolean includeStacktrace, final boolean 
stacktraceAsString) {
-        super(new Log4jXmlModule(includeStacktrace, stacktraceAsString));
+        super(new SanitizingXmlFactory(), new 
Log4jXmlModule(includeStacktrace, stacktraceAsString));
         this.setSerializationInclusion(JsonInclude.Include.NON_EMPTY);
     }
+
+    /**
+     * Writer that sanitizes text to be valid XML 1.0 by replacing disallowed 
code points with the replacement character (U+FFFD).
+     */
+    private static final class SanitizingWriter extends StreamWriter2Delegate {
+
+        private static final char REPLACEMENT_CHAR = '\uFFFD';
+
+        SanitizingWriter(final XMLStreamWriter2 delegate) {
+            super(delegate);
+            setParent(delegate);
+        }
+
+        @Override
+        public void writeAttribute(final String localName, final String value) 
throws XMLStreamException {
+            super.writeAttribute(localName, sanitizeXml10(value));
+        }
+
+        @Override
+        public void writeAttribute(final String namespaceURI, final String 
localName, final String value)
+                throws XMLStreamException {
+            super.writeAttribute(namespaceURI, localName, 
sanitizeXml10(value));
+        }
+
+        @Override
+        public void writeAttribute(
+                final String prefix, final String namespaceURI, final String 
localName, final String value)
+                throws XMLStreamException {
+            super.writeAttribute(prefix, namespaceURI, localName, 
sanitizeXml10(value));
+        }
+
+        @Override
+        public void writeCData(String text) throws XMLStreamException {
+            super.writeCData(sanitizeXml10(text));
+        }
+
+        @Override
+        public void writeCData(char[] text, int start, int len) throws 
XMLStreamException {
+            super.writeCData(sanitizeXml10(text, start, len));
+        }
+
+        @Override
+        public void writeCharacters(final String text) throws 
XMLStreamException {
+            super.writeCharacters(sanitizeXml10(text));
+        }
+
+        @Override
+        public void writeCharacters(final char[] text, final int start, final 
int len) throws XMLStreamException {
+            super.writeCharacters(sanitizeXml10(text, start, len));
+        }
+
+        @Override
+        public void writeComment(String text) throws XMLStreamException {
+            super.writeComment(sanitizeXml10(text));
+        }
+
+        private static String sanitizeXml10(final String input) {
+            if (input == null) {
+                return null;
+            }
+            final int length = input.length();
+            // Only create a new string if we find an invalid code point.
+            // In the common case, this should avoid unnecessary allocations.
+            for (int i = 0; i < length; ) {
+                final int cp = input.codePointAt(i);
+                if (!isValidXml10(cp)) {
+                    final StringBuilder out = new StringBuilder(length);
+                    out.append(input, 0, i);
+                    appendSanitized(input, i, length, out);
+                    return out.toString();
+                }
+                i += Character.charCount(cp);
+            }
+            return input;
+        }
+
+        private static String sanitizeXml10(final char[] input, final int 
start, final int len) {
+            return sanitizeXml10(new String(input, start, len));
+        }

Review Comment:
   `sanitizeXml10(char[], start, len)` always allocates a new `String` even 
when the input contains only valid XML characters, which undermines the “avoid 
unnecessary allocations” intent and can add overhead on hot logging paths. 
Consider implementing a char[]/code-point scan that only allocates when an 
invalid code point is encountered (mirroring the `String` path).



##########
log4j-core-test/src/test/java/org/apache/logging/log4j/core/layout/XmlLayoutTest.java:
##########
@@ -397,4 +403,87 @@ public void testIncludeNullDelimiterFalse() {
         final String str = 
layout.toSerializable(LogEventFixtures.createLogEvent());
         assertFalse(str.endsWith("\0"));
     }
+
+    // In Java 11 this can be replaced with Character.toString
+    private static String codePointToString(final int codePoint) {
+        return new String(Character.toChars(codePoint));
+    }

Review Comment:
   `codePointToString` is unused in this test class. If it’s not needed for the 
assertions/data providers, please remove it to avoid dead code in the test 
suite.
   ```suggestion
   
   ```



##########
log4j-core-test/src/test/java/org/apache/logging/log4j/core/layout/XmlLayoutTest.java:
##########
@@ -46,13 +47,18 @@
 import org.apache.logging.log4j.core.util.KeyValuePair;
 import org.apache.logging.log4j.message.SimpleMessage;
 import org.apache.logging.log4j.spi.AbstractLogger;
+import org.apache.logging.log4j.spi.DefaultThreadContextStack;
 import org.apache.logging.log4j.test.junit.ThreadContextRule;
+import org.apache.logging.log4j.util.StringMap;
+import org.assertj.core.api.Assertions;
 import org.junit.AfterClass;
 import org.junit.Assert;
 import org.junit.BeforeClass;
 import org.junit.Rule;
 import org.junit.Test;
 import org.junit.experimental.categories.Category;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.ValueSource;
 

Review Comment:
   This test class mixes JUnit 4 (`@Test`, `@BeforeClass`, `@Rule`) with JUnit 
Jupiter parameterized tests. Under the JUnit Platform, the Vintage engine will 
run the JUnit 4 tests (including `@BeforeClass/@AfterClass`), while the Jupiter 
engine will run the new `@ParameterizedTest` methods *without* executing the 
JUnit 4 lifecycle methods or `@Rule`, which can lead to global state/setup 
differences and flaky behavior. Prefer either (a) rewriting the new tests as 
JUnit 4 tests, or (b) moving them to a separate Jupiter-only test class and 
using Jupiter lifecycle/extension equivalents (`@BeforeAll/@AfterAll`, 
`@RegisterExtension`) as needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] XmlLayout: replace invalid XML characters with U+FFFD [logging-log4j2]

Reply via email to