My initial impression of the code itself is that it's overly complex, and likely falls prey to some common Java antipatterns. You don't need multiple packages or internal and spi packages. One package with many fewer public classes and methods is fully sufficient.
On Fri, Apr 24, 2026 at 2:35 AM Piotr P. Karwasz <[email protected]> wrote: > > Hi all, > > I finally pushed an initial draft of the Commons XML Factory project I > proposed back in December [1]: > > https://github.com/copernik-eu/commons-xml-factory > > The library is a single `XmlFactories` class with factory methods that > return hardened JAXP factories for: > > - DocumentBuilderFactory > - SAXParserFactory > - XMLInputFactory > - TransformerFactory > - SchemaFactory > - XPathFactory > > Internally, each factory method dispatches to a per-implementation > `XmlProvider` that applies the correct hardening for that > implementation. The SPI is open via `ServiceLoader`, but providers for > the JDK, Xerces, Woodstox and Saxon are bundled. > > It's fair to ask whether this is worth a library at all: a per-factory > hardening recipe is only a handful of lines, and most projects wrote > their own years ago. Two observations: > > First, those handful of lines are exactly the lines people forget or get > subtly wrong. The 2025 Java XXE CVEs bear this out: Apache Tika > (CVE-2025-54988, CVE-2025-66516), WebDriverManager (CVE-2025-4641), > CycloneDX (CVE-2025-64518), GeoServer (CVE-2025-58360). > > Second, the correct recipe depends on which JAXP implementation is > actually on the classpath, and that's often not what the developer > thinks. A library author tests against the JDK, observes that > FEATURE_SECURE_PROCESSING transitively restricts ACCESS_EXTERNAL_* > (JEP 185), and writes a minimal hardening block. The library is then > deployed in an application that pulls in external Xerces transitively: > JEP 185 no longer applies, ACCESS_EXTERNAL_* is not honored, and the > minimal block is no longer sufficient. > > The draft intentionally offers no configuration: it hardens at one > level and fails fast if it encounters an implementation it doesn't > recognize. Before extending it, I'd like feedback on whether the > proposed direction makes sense. > > I see three plausible hardening levels worth supporting: > > 1. No DOCTYPE allowed. Eliminates the entire class of DTD-based > attacks. This is what the draft implements. > > 2. DOCTYPE allowed, no external resources loaded. Internal entities > work (for users who need HTML-style named entities, for example), > entity expansion limits are enforced, but nothing is fetched from > outside the document. > > 3. DOCTYPE allowed, user-supplied resolver. The caller provides an > EntityResolver; we wrap it so that if the resolver returns null for > an unknown reference, we throw rather than falling through to the > parser's default URL-fetching behavior. This closes SAX's most > common footgun while letting integrators implement classpath-scoped > loading, XML catalogs, and similar. > > The draft also addresses the secondary-source problem for > TransformerFactory (stylesheet loading) and SchemaFactory (schema > imports). Currently both are locked down as tightly as primary input, > but this is probably a place where two distinct levels make sense: > users often have trusted stylesheets or schemas they want to load via > xsl:import or xs:include, separate from the question of what to allow > in the document being transformed or validated. > > Two things I'd particularly appreciate feedback on: > > - Does the three-level model above cover the use cases you'd want to > bring to this library? > > - For the secondary-source question, is there appetite for a separate > axis, or should primary and secondary be tied together under a > single level? > > Piotr > > [1] https://lists.apache.org/thread/b2tjc15vjkgsrxxkc8phlnt6801hx4xz > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > -- Elliotte Rusty Harold [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
