[
https://issues.apache.org/jira/browse/TIKA-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125567#comment-17125567
]
Hudson commented on TIKA-3106:
------------------------------
SUCCESS: Integrated in Jenkins build Tika-trunk #1820 (See
[https://builds.apache.org/job/Tika-trunk/1820/])
TIKA-3106 Magic header detection for emails starting with an ARC- (nick:
[https://github.com/apache/tika/commit/1e02f01819ef44bc99a5fe81e271e67be79b98ad])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
> Tika Fails to detect some EML files if extension is not .eml
> ------------------------------------------------------------
>
> Key: TIKA-3106
> URL: https://issues.apache.org/jira/browse/TIKA-3106
> Project: Tika
> Issue Type: Bug
> Components: metadata, mime
> Affects Versions: 1.24
> Reporter: Xiaohong Yang
> Priority: Critical
> Attachments: EmlFile.txt
>
>
> I have an eml file that can be detected as message/rfc822 only if the file
> extension is .eml, otherwise it will be detected as text/plain. Following
> is the code that I use to detect the file type and extension.
> TikaConfig config = TikaConfigFactory.getTikaConfig();
> Detector detector = config.getDetector();
> Metadata metadata = new Metadata();
> TikaInputStream stream = TikaInputStream.get(fis = new
> FileInputStream(filePath));
> metadata.add(Metadata.RESOURCE_NAME_KEY, filePath);
> MediaType mediaType = detector.detect(stream, metadata);
> MimeType mimeType =
> config.getMimeRepository().forName(mediaType.toString());
> String tikaExtension = mimeType.getExtension();
>
> When the sample file has .eml extension, mimeType is message/rfc822 and
> tikaExtension is eml. When I change the extension to .txt, mimeType is
> text/plain and tikaExtension is .txt.
>
> The same mimeType and tikaExtension should be detected regardless the file
> extension.
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)