This all looks great Piotr, thank you for putting it together. I would 100% support and help maintain this library.
I have minor comments for now: The name XmlFactories reads oddly to me. It's a factory that produces different kind of XML related objects, so I'd just call it XmlFactory. I would put everything in one package and let as much as possible be package private. Thank you again! Gary On Fri, Apr 24, 2026, 03:35 Piotr P. Karwasz <[email protected]> wrote: > Hi all, > > I finally pushed an initial draft of the Commons XML Factory project I > proposed back in December [1]: > > https://github.com/copernik-eu/commons-xml-factory > > The library is a single `XmlFactories` class with factory methods that > return hardened JAXP factories for: > > - DocumentBuilderFactory > - SAXParserFactory > - XMLInputFactory > - TransformerFactory > - SchemaFactory > - XPathFactory > > Internally, each factory method dispatches to a per-implementation > `XmlProvider` that applies the correct hardening for that > implementation. The SPI is open via `ServiceLoader`, but providers for > the JDK, Xerces, Woodstox and Saxon are bundled. > > It's fair to ask whether this is worth a library at all: a per-factory > hardening recipe is only a handful of lines, and most projects wrote > their own years ago. Two observations: > > First, those handful of lines are exactly the lines people forget or get > subtly wrong. The 2025 Java XXE CVEs bear this out: Apache Tika > (CVE-2025-54988, CVE-2025-66516), WebDriverManager (CVE-2025-4641), > CycloneDX (CVE-2025-64518), GeoServer (CVE-2025-58360). > > Second, the correct recipe depends on which JAXP implementation is > actually on the classpath, and that's often not what the developer > thinks. A library author tests against the JDK, observes that > FEATURE_SECURE_PROCESSING transitively restricts ACCESS_EXTERNAL_* > (JEP 185), and writes a minimal hardening block. The library is then > deployed in an application that pulls in external Xerces transitively: > JEP 185 no longer applies, ACCESS_EXTERNAL_* is not honored, and the > minimal block is no longer sufficient. > > The draft intentionally offers no configuration: it hardens at one > level and fails fast if it encounters an implementation it doesn't > recognize. Before extending it, I'd like feedback on whether the > proposed direction makes sense. > > I see three plausible hardening levels worth supporting: > > 1. No DOCTYPE allowed. Eliminates the entire class of DTD-based > attacks. This is what the draft implements. > > 2. DOCTYPE allowed, no external resources loaded. Internal entities > work (for users who need HTML-style named entities, for example), > entity expansion limits are enforced, but nothing is fetched from > outside the document. > > 3. DOCTYPE allowed, user-supplied resolver. The caller provides an > EntityResolver; we wrap it so that if the resolver returns null for > an unknown reference, we throw rather than falling through to the > parser's default URL-fetching behavior. This closes SAX's most > common footgun while letting integrators implement classpath-scoped > loading, XML catalogs, and similar. > > The draft also addresses the secondary-source problem for > TransformerFactory (stylesheet loading) and SchemaFactory (schema > imports). Currently both are locked down as tightly as primary input, > but this is probably a place where two distinct levels make sense: > users often have trusted stylesheets or schemas they want to load via > xsl:import or xs:include, separate from the question of what to allow > in the document being transformed or validated. > > Two things I'd particularly appreciate feedback on: > > - Does the three-level model above cover the use cases you'd want to > bring to this library? > > - For the secondary-source question, is there appetite for a separate > axis, or should primary and secondary be tied together under a > single level? > > Piotr > > [1] https://lists.apache.org/thread/b2tjc15vjkgsrxxkc8phlnt6801hx4xz > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
