I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar:
-x or --xml Output XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader: serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true)); In addition, it seems that the metadata is always appended to the response. Are there any open issues relating to this, or opinions on whether adding additional flexibility to the response format would be of interest for 1.4? Thanks, Peter -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com