Tim Barrett created TIKA-2995:
---------------------------------

             Summary:   markLimit too small in 
org.apache.tika.parser.microsoft.POIFSContainerDetector
                 Key: TIKA-2995
                 URL: https://issues.apache.org/jira/browse/TIKA-2995
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.22
            Reporter: Tim Barrett


Tika fails to parse large msg files (msg files > 16MB in size). This is because 
the property markLimit in POIFSContainerDetector is set to 16MB. Although there 
is a public set method in the class, this is not called within Tika as we use 
the DefaultDetector, which encapsulates the use of POIFSContainerDetector.

As a workaround we have made the following change in  POIFSContainerDetector:

 

  @Field

    // private int markLimit = 16 * 1024 * 1024;

    

    *private* *int* markLimit = 128 * 1024 * 1024;

Could a better fix to have the DefaultDetector use setMarkLimit to a higher 
value? msg files with attachments are often greater than 16MB in size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to