[ 
https://issues.apache.org/jira/browse/TIKA-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17137864#comment-17137864
 ] 

Hudson commented on TIKA-3106:
------------------------------

SUCCESS: Integrated in Jenkins build tika-branch-1x #341 (See 
[https://builds.apache.org/job/tika-branch-1x/341/])
TIKA-3106 Magic header detection for emails starting with an ARC- (tallison: 
[https://github.com/apache/tika/commit/9e53cec0601d7db7dbc49969a78922c708cdafdf])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


> Tika Fails to detect some EML files if extension is not .eml
> ------------------------------------------------------------
>
>                 Key: TIKA-3106
>                 URL: https://issues.apache.org/jira/browse/TIKA-3106
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, mime
>    Affects Versions: 1.24
>            Reporter: Xiaohong Yang
>            Priority: Critical
>         Attachments: EmlFile.txt
>
>
> I have an eml file that can be detected as message/rfc822 only if the file 
> extension is .eml,  otherwise it will be detected as text/plain.  Following 
> is the code that I use to detect the file type and extension.
>        TikaConfig config = TikaConfigFactory.getTikaConfig();
>        Detector detector = config.getDetector();
>        Metadata metadata = new Metadata();
>        TikaInputStream stream = TikaInputStream.get(fis = new 
> FileInputStream(filePath));
>        metadata.add(Metadata.RESOURCE_NAME_KEY, filePath);
>        MediaType mediaType = detector.detect(stream, metadata);
>        MimeType mimeType = 
> config.getMimeRepository().forName(mediaType.toString());
>        String tikaExtension = mimeType.getExtension();
>  
> When the sample file has .eml extension,  mimeType is message/rfc822 and  
> tikaExtension is eml. When I change the extension to .txt, mimeType is 
> text/plain and  tikaExtension is .txt.
>  
> The same mimeType and tikaExtension should be detected regardless the file 
> extension. 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to