Tim Barrett created TIKA-2995:
---------------------------------
Summary: markLimit too small in
org.apache.tika.parser.microsoft.POIFSContainerDetector
Key: TIKA-2995
URL: https://issues.apache.org/jira/browse/TIKA-2995
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.22
Reporter: Tim Barrett
Tika fails to parse large msg files (msg files > 16MB in size). This is because
the property markLimit in POIFSContainerDetector is set to 16MB. Although there
is a public set method in the class, this is not called within Tika as we use
the DefaultDetector, which encapsulates the use of POIFSContainerDetector.
As a workaround we have made the following change in POIFSContainerDetector:
@Field
// private int markLimit = 16 * 1024 * 1024;
*private* *int* markLimit = 128 * 1024 * 1024;
Could a better fix to have the DefaultDetector use setMarkLimit to a higher
value? msg files with attachments are often greater than 16MB in size.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)