Copilot commented on code in PR #4078:
URL: https://github.com/apache/logging-log4j2/pull/4078#discussion_r2983761335


##########
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java:
##########
@@ -185,4 +166,69 @@ public static String escapeJsonControlCharacters(final 
String input) {
         }
         return buf.toString();
     }
+
+    private static void appendEscapingHtmlTags(final String input, int i, 
final int length, final StringBuilder buf) {
+        while (i < length) {
+            final int ch = input.codePointAt(i);
+            switch (ch) {
+                case '<':
+                    buf.append("&lt;");
+                    break;
+                case '>':
+                    buf.append("&gt;");
+                    break;
+                case '&':
+                    buf.append("&amp;");
+                    break;
+                case '"':
+                    buf.append("&quot;");
+                    break;
+                case '\'':
+                    buf.append("&apos;");

Review Comment:
   `escapeHtmlTags` now escapes apostrophes as `&apos;`, but `HtmlLayout` 
declares an HTML 4.01 Transitional doctype where `&apos;` is not a predefined 
entity. This can produce invalid HTML (and may render `&apos;` literally in 
some consumers). Consider using a numeric character reference (e.g., 
`&#39;`/`&#x27;`) or avoid escaping `'` here, and keep XML-specific escaping in 
the XML layouts/utilities.
   ```suggestion
                       buf.append("&#39;");
   ```



##########
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java:
##########
@@ -185,4 +166,69 @@ public static String escapeJsonControlCharacters(final 
String input) {
         }
         return buf.toString();
     }
+
+    private static void appendEscapingHtmlTags(final String input, int i, 
final int length, final StringBuilder buf) {
+        while (i < length) {
+            final int ch = input.codePointAt(i);
+            switch (ch) {
+                case '<':
+                    buf.append("&lt;");
+                    break;
+                case '>':
+                    buf.append("&gt;");
+                    break;
+                case '&':
+                    buf.append("&amp;");
+                    break;
+                case '"':
+                    buf.append("&quot;");
+                    break;
+                case '\'':
+                    buf.append("&apos;");
+                    break;
+                default:
+                    buf.appendCodePoint(isValidXml10(ch) ? ch : 
REPLACEMENT_CHAR);
+                    break;
+            }
+            i += Character.charCount(ch);
+        }
+    }
+
+    private static boolean isHtmlTagCharacter(final int cp) {
+        return cp == '<' || cp == '>' || cp == '&' || cp == '"' || cp == '\'';
+    }
+
+    private static void appendSanitizedXml10(
+            final String input, final int start, final int end, final 
StringBuilder out) {
+        for (int i = start; i < end; ) {
+            final int cp = input.codePointAt(i);
+            out.appendCodePoint(isValidXml10(cp) ? cp : REPLACEMENT_CHAR);
+            i += Character.charCount(cp);
+        }
+    }
+
+    /**
+     * Checks if a code point is a valid in XML 1.0

Review Comment:
   Minor Javadoc grammar: "Checks if a code point is a valid in XML 1.0" should 
be "valid in XML 1.0" (or "a valid XML 1.0 code point").
   ```suggestion
        * Checks if a code point is valid in XML 1.0.
   ```



##########
log4j-core/src/main/java/org/apache/logging/log4j/core/util/Transform.java:
##########
@@ -41,47 +43,26 @@ private Transform() {}
      * @return The input string with the special characters replaced.
      */
     public static String escapeHtmlTags(final String input) {

Review Comment:
   The `escapeHtmlTags` Javadoc is now out of date: it says only `<`, `>`, `&`, 
and `"` are escaped, but the implementation also (a) escapes `'` and (b) 
replaces XML 1.0–invalid code points with U+FFFD. Please update the Javadoc to 
describe the new behavior so callers (including `HtmlLayout` vs XML layouts) 
understand the sanitization/escaping semantics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to