[ 
https://issues.apache.org/jira/browse/TIKA-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979298#comment-16979298
 ] 

Tim Allison commented on TIKA-2995:
-----------------------------------

I'm happy to bump the markLimit.  What do others think?

You _should_ be able to configure it via a tika_config.xml along these lines:

{noformat}
<properties>
    <detectors>
        <detector class="org.apache.tika.detect.OverrideDetector"/>
        <detector 
class="org.apache.tika.parser.microsoft.POIFSContainerDetector">
            <params>
                <param name="markLimit" type="int">134217728</param>
            </params>
        </detector>
        <detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/>
        <detector class="org.gagravarr.tika.OggDetector"/>
        <detector class="org.apache.tika.mime.MimeTypes"/>
    </detectors>
</properties>
{noformat}

>   markLimit too small in 
> org.apache.tika.parser.microsoft.POIFSContainerDetector
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-2995
>                 URL: https://issues.apache.org/jira/browse/TIKA-2995
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.22
>            Reporter: Tim Barrett
>            Priority: Major
>
> Tika fails to parse large msg files (msg files > 16MB in size). This is 
> because the property markLimit in POIFSContainerDetector is set to 16MB. 
> Although there is a public set method in the class, this is not called within 
> Tika as we use the DefaultDetector, which encapsulates the use of 
> POIFSContainerDetector.
> As a workaround we have made the following change in  POIFSContainerDetector:
>  
>   @Field
>     // private int markLimit = 16 * 1024 * 1024;
>     
>     *private* *int* markLimit = 128 * 1024 * 1024;
> Could a better fix to have the DefaultDetector use setMarkLimit to a higher 
> value? msg files with attachments are often greater than 16MB in size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to