I had been assuming that I could choose among possible tika output
formats when using the extracting request handler in extract-only mode
as if from the CLI with the tika jar:

    -x or --xml        Output XHTML content (default)
    -h or --html       Output HTML content
    -t or --text       Output plain text content
    -m or --metadata   Output only metadata

However, looking at the docs and source, it seems that only the xml
option is available (hard-coded) in ExtractingDocumentLoader:

serializer = new XMLSerializer(writer, new OutputFormat("XML", "UTF-8", true));

In addition, it seems that the metadata is always appended to the response.

Are there any open issues relating to this, or opinions on whether
adding additional flexibility to the response format would be of
interest for 1.4?

Thanks,

Peter

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Reply via email to