stevedlawrence commented on code in PR #1514:
URL: https://github.com/apache/daffodil/pull/1514#discussion_r2246581922


##########
daffodil-schematron/src/main/scala/org/apache/daffodil/validation/schematron/SchematronValidator.scala:
##########
@@ -20,25 +20,52 @@ package org.apache.daffodil.validation.schematron
 import java.io.File
 import java.io.FileOutputStream
 import java.io.InputStream
+import java.io.StringWriter
 import java.net.URI
+import javax.xml.transform.Templates
+import javax.xml.transform.Transformer
+import javax.xml.transform.sax.SAXSource
+import javax.xml.transform.stream.StreamResult
 import scala.util.Try
 import scala.xml.Elem
 import scala.xml.XML
 
 import org.apache.daffodil.api
+import org.apache.daffodil.lib.xml.DaffodilSAXParserFactory
+
+import org.xml.sax.InputSource
+import org.xml.sax.XMLReader
 
 /**
  * Daffodil Validator implementation for ISO schematron
  */
 final class SchematronValidator(
-  engine: Schematron,
+  templates: Templates,
   svrlPath: Option[URI]
 ) extends api.validation.Validator {
+
+  // XMLReader and Transformer are not thread safe so are ThreadLocals. 
Alternatively, they could be
+  // created every time validateXML, but ThreadLocal allows reuse of objects
+  // that are potentially expensive to create
+  val readerTransformerTL = new ThreadLocal[(XMLReader, Transformer)] {
+    override def initialValue(): (XMLReader, Transformer) = {
+      val factory = DaffodilSAXParserFactory()
+      factory.setValidating(false)
+      val reader = factory.newSAXParser.getXMLReader
+      val transformer = templates.newTransformer
+      (reader, transformer)
+    }
+  }
+
   def validateXML(
     document: InputStream,
     handler: api.validation.ValidationHandler
   ): Unit = {
-    val svrl = XML.loadString(engine.validate(document))
+    val (reader, transformer) = readerTransformerTL.get
+    val writer = new StringWriter
+    transformer
+      .transform(new SAXSource(reader, new InputSource(document)), new 
StreamResult(writer))
+    val svrl = XML.loadString(writer.toString)
     svrl.child.collect { case f @ Elem("svrl", "failed-assert", _, _, msg @ 
_*) =>
       handler.validationError(msg.text.trim, (f \ "@location").text)
     }

Review Comment:
   Combining the Schematron logic here made me notice that we do some 
unnecessary work. Below this we do `val svrlString = svrl.mkString`, but 
`svrl.mkString` is the same as `writer.toString` that already do above. So 
we're essentially creating the string twice. Instead we should just do 
something like this:
   
   ```scala
   transformer.transform(...)
   val svrlString = writer.toString
   val svrl = XML.loadString(svrlString)
   svrl.child.collect { ... handler.valdiationError(...) }
   svrlPath.foreach { uri => ... 
     val writer = FileWriter(new (File(uri)) }
     writer.write(svrlString)
     ...
   }
   ```
   
   So we avoid the duplicate string creation. We can also avoid the .getBytes 
to prevent converting the string to a giant byte array by using a FileWriter 
instead of a FileOutputStream.
   
   I think those changes would be an easy worthwhile performance improvement 
while we're already fixing the validator.
   
   I can imagine even more performance, e.g. create a custom SAXResult instance 
that directly calls handler.validationError, completely avoiding the need to 
create a string at all, but we probably shouldn't worry about that unless 
people run into performance limitations with the schematron validator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to