[
https://issues.apache.org/jira/browse/TIKA-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank Refol updated TIKA-2148:
------------------------------
Attachment: This is password protected (Created with MS 2010).ppt
This is password protected (Created with MS 2007).ppt
This is password protected (Created with MS 2003).ppt
> Tika app is unable to parse a password protected PowerPoint (97-2003)
> document
> -------------------------------------------------------------------------------
>
> Key: TIKA-2148
> URL: https://issues.apache.org/jira/browse/TIKA-2148
> Project: Tika
> Issue Type: Bug
> Components: cli
> Affects Versions: 1.13
> Environment: Windows console.
> Reporter: Frank Refol
> Labels: Office, PowerPoint
> Attachments: This is password protected (Created with MS 2003).ppt,
> This is password protected (Created with MS 2007).ppt, This is password
> protected (Created with MS 2010).ppt
>
>
> Using the Tika command-line application to extract text from a PowerPoint
> 97-2003 document fails. Here's the basic command that was used:
> {quote}
> java -jar tika-app-1.13.jar -t --password=password "This is password
> protected (Created with MS 2003).ppt"
> {quote}
> The following exception is thrown on the console:
> {noformat}
> Exception in thread "main" org.apache.tika.exception.TikaException:
> Unexpected RuntimeException from
> org.apache.tika.parser.microsoft.OfficeParser@62204612
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
> Caused by: org.apache.poi.hslf.exceptions.EncryptedPowerPointFileException:
> PowerPoint file is encrypted. The correct password needs to be set via
> Biff8EncryptionKey.setCurrentUserPassword()
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowEncrypted.<init>(HSLFSlideShowEncrypted.java:106)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:284)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:275)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:179)
> at
> org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:182)
> at
> org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
> at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 5 more
> {noformat}
> Note that this happens with a PPT file that is created using Office 2010,
> Office 2007, or Office 2003.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)