jira-importer opened a new issue, #585: URL: https://github.com/apache/maven-invoker-plugin/issues/585
**[Mikkel Kjeldsen](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=JIRAUSER303566)** opened **[MINVOKER-351](https://issues.apache.org/jira/browse/MINVOKER-351?redirect=false)** and commented Neither the Maven Invoker plugin's implementation of `<writeJunitReport>` nor the underlying XML infrastructure directly protect against the presence of character literals prohibited by the XML specification, meaning such literals can appear in the JUnit report and render it unreadable. **I would appreciate if the Maven Invoker plugin could learn to strip prohibited literals to protect its users from creative plugins.** I argue that this is a safe and expected transformation that is not materially lossy. --- ## Background MINVOKER-196 added the `<writeJunitReport>` option [back in maven-invoker-plugin-3.2.1](https://github.com/apache/maven-invoker-plugin/blob/maven-invoker-plugin-3.2.1/src/main/java/org/apache/maven/plugins/invoker/AbstractInvokerMojo.java#L1878-L1946). As of [maven-invoker-plugin-3.6.0 the effective implementation of the JUnit report remains effectively unchanged](https://github.com/apache/maven-invoker-plugin/blob/maven-invoker-plugin-3.6.0/src/main/java/org/apache/maven/plugins/invoker/AbstractInvokerMojo.java#L1695-L1754). The JUnit report includes a `<system-out>` element ([example documentation](https://github.com/testmoapp/junitxml)) whose value Maven Invoker populates with the raw build log contents. I've observed that this value is XML-escaped, which I imagine is well understood in the implementation, although I can't immediately find documentation to support that. However, escaping notwithstanding, a number of character literals are outright prohibited by the XML specifications. These literals cannot be escaped, and their presence renders an XML document not well formed. The exact set of prohibited characters varies by XML version; the report produced by the Maven Invoker plugin is XML version 1.0. When the Maven Invoker plugin reads in the build log it does not strip these character literals and neither does the XML writer the Maven Invoker plugin relies on. Consequently, if a build log ends up including a prohibited character the resulting JUnit report will not be well formed. The set of prohibited characters is the complement of [the XML specification's definition of `Char`](https://www.w3.org/TR/xml/#NT-Char). ## Example Among the literals prohibited by XML version 1.0 is `^H` (backspace). When [pitest runs via Maven](https://pitest.org/quickstart/maven/) it prints a spinner to standard out, and the implementation uses backspace to render the spinner in place. I have used the Maven Invoker plugin with `<writeJunitReport>` to verify a pitest configuration, whereby I discovered this limitation. ## Remediation ### Blame plugins Perhaps pitest should not behave this way but we can't change pitest, and even if pitest could be changed that offers no protection against any other plugin, so blaming plugins is an ineffective course of action. ### Work-arounds The user can manually clean the build log in-place via `<postBuildHookScript>`. This is technically fairly easy to do, and makes the transformation very explicit, but it requires considerable local work to address an issue many would find obscure and the transformation is permanently lossy unless the user also backs up the raw log to another file name. ### Strip prohibited literals inside Maven Invoker plugin If the Maven Invoker plugin learns to strip offending character literals in-between reading the build log and writing to the `<system-out>` value then `<writeJunitReport>` will Just Workâ˘, which I assert is what a user will typically expect. Although the `<system-out>` value will no longer exactly match the build log contents, this lossy translation is acceptable: the prohibited characters are overwhelmingly unprintable to begin with and therefore cannot be meaningfully rendered in a static context, and the raw build log remains unchanged in the event that the user needs to investigate or assert against the raw output. This change would be backwards compatible, because any existing user that would be affected by it would already have unparseable JUnit reports. * I _believe_ that Java's `j.u.r.Pattern` can trivially express the complement of allowed characters but there may exist more efficient solutions. * Consider also applying this transformation to the 2 uses of `buildJob.getFailureMessage()`. #### Replace prohibited literals inside Maven Invoker plugin As a variation of stripping prohibited character literals, the Maven Invoker plugin could substitute sentinel values for prohibited character literals. This approach has the downside that it requires additional decision making for determining suitable substitution(s) but is otherwise comparable. --- **Attachments:** - [minvoker-351.tar.gz](https://issues.apache.org/jira/secure/attachment/13068809/minvoker-351.tar.gz) (_2.16 kB_) **Issue Links:** - [MINVOKER-196](https://issues.apache.org/jira/browse/MINVOKER-196) Support for JUnit report style (_**"is caused by"**_) - [MINVOKER-348](https://issues.apache.org/jira/browse/MINVOKER-348) Build job report is truncated **Remote Links:** - [GitHub Pull Request #242 ](https://github.com/apache/maven-invoker-plugin/pull/242) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org