Hello,
I want to transform an XML document, but I can't use XSLT, because
I need to invoke Java code inside the transformation. If I understand
correctly, Xalan is not an option. I don't need to keep the whole XML
document in the memory for the transformation, so I decided to use SAX
parser instead of DOM. I need to create new elements in the
transformation.
I read the sample xerces-2_10_0/samples/sax/Writer.java and it
generates the output document manually:
public void startElement(String uri, String local, String raw,
Attributes attrs) throws SAXException {
//...
fOut.print('<');
fOut.print(raw);
//...
fOut.print('>');
fOut.flush();
}
I don't want to generate the output manually.
I wrote my own example, which uses
org.cyberneko.html.parsers.SAXParser from NekoHTML parser:
package saxtransformexample;
import org.apache.xerces.util.AugmentationsImpl;
import org.apache.xerces.util.XMLAttributesImpl;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.QName;
import org.apache.xerces.xni.XMLAttributes;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLDocumentFilter;
import org.cyberneko.html.filters.DefaultFilter;
import org.cyberneko.html.parsers.SAXParser;
import org.xml.sax.InputSource;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;
public class SAXTransformExample {
public static void main(String args[]) throws Exception {
String inputString = "<div></div>";
StringWriter out = new StringWriter();
StreamResult result = new StreamResult(out);
SAXTransformerFactory transformerFactory =
(SAXTransformerFactory) SAXTransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "no");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "html");
XMLDocumentFilter[] filters = {new DefaultFilter() {
@Override
public void startElement(QName element, XMLAttributes
attributes, Augmentations augs) throws XNIException {
if (!element.localpart.toLowerCase().equals("div")) {
super.startElement(element, attributes, augs);
} else {
super.startElement(element, attributes, augs);
super.startElement(new QName("", "p", "p", null),
new XMLAttributesImpl(), new AugmentationsImpl());
}
}
@Override
public void endElement(QName element, Augmentations augs)
throws XNIException {
if (!element.localpart.toLowerCase().equals("div")) {
super.endElement(element, augs);
} else {
super.endElement(new QName("", "p", "p", null),
new AugmentationsImpl());
super.endElement(element, augs);
}
}
}};
SAXParser parser = new SAXParser();
parser.setFeature("http://xml.org/sax/features/namespaces", false);
parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment",
true);
parser.setProperty("http://cyberneko.org/html/properties/filters",
filters);
parser.setProperty("http://cyberneko.org/html/properties/names/elems",
"lower");
transformer.transform(new SAXSource(parser, new
InputSource(new StringReader(inputString))), result);
System.out.println("RESULT:" + out.getBuffer().toString() + ":");
}
}
It prints out:
RESULT:<div><p></p></div>
The problem is that it uses XNI and since I'm not writing a parser
I think I shouldn't use XNI at all.
There is an example:
http://book.javanb.com/xml-and-java-developing-web-applications-2nd/0201770040_ch05lev1sec2.html
OutputFormat format = new OutputFormat("xml", "UTF-8", false);
format.setPreserveSpace(true);
ContentHandler handler = new XMLSerializer(System.out, format);
XMLReader parser =
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
XMLReader filter = new MailFilter(parser);
filter.setContentHandler(handler);
filter.parse(argv[0]);
MailFilter extends org.xml.sax.helpers.XMLFilterImpl.
The example uses org.apache.xml.serialize.XMLSerializer, which is
deprecated in Xerces 2.9.0 API:
Deprecated. This class was deprecated in Xerces 2.9.0. It is
recommended that new applications use the DOM Level 3 LSSerializer or
JAXP's Transformation API
for XML (TrAX) for serializing XML. See the Xerces documentation for
more information.
http://xerces.apache.org/xerces2-j/javadocs/other/org/apache/xml/serialize/XMLSerializer.html
If I don't want to use DOM, I assume I can't use DOM Level 3 LSSerializer.
If I don't want to use XSLT, I assume I can't use JAXP's
Transformation API for XML (TrAX) for serializing XML.
What is the proper way to transform the stream of SAX events into
another stream of SAX events, so that I don't need to write my own
parser or my own serializer?
Best regards,
Dawid Chodura
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]