All,
I finally got around to documenting Apache Tika's MockParser[1]. As of Tika
1.15 (unreleased), add tika-core-tests.jar to your class path, and you can
simulate:
1. Regular catchable exceptions
2. OOMs
3. Permanent hangs
This will allow you to determine if your ingest framework is robust against
these issues.
As always, we fix Tika when we can, but if history is any indicator, you'll
want to make sure your ingest code can handle these issues if you are handling
millions/billions of files from the wild.
Cheers,
Tim
[1] https://wiki.apache.org/tika/MockParser