[ 
https://issues.apache.org/jira/browse/TIKA-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Refol updated TIKA-2148:
------------------------------
    Attachment: This is password protected (Created with MS 2010).ppt
                This is password protected (Created with MS 2007).ppt
                This is password protected (Created with MS 2003).ppt

> Tika app is unable to parse a password protected PowerPoint (97-2003) 
> document 
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-2148
>                 URL: https://issues.apache.org/jira/browse/TIKA-2148
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.13
>         Environment: Windows console.
>            Reporter: Frank Refol
>              Labels: Office, PowerPoint
>         Attachments: This is password protected (Created with MS 2003).ppt, 
> This is password protected (Created with MS 2007).ppt, This is password 
> protected (Created with MS 2010).ppt
>
>
> Using the Tika command-line application to extract text from a PowerPoint 
> 97-2003 document fails. Here's the basic command that was used:
> {quote}
> java -jar tika-app-1.13.jar -t --password=password "This is password 
> protected (Created with MS 2003).ppt"
> {quote}
> The following exception is thrown on the console:
> {noformat}
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from 
> org.apache.tika.parser.microsoft.OfficeParser@62204612
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>       at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:191)
>       at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:480)
>       at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
> Caused by: org.apache.poi.hslf.exceptions.EncryptedPowerPointFileException: 
> PowerPoint file is encrypted. The correct password needs to be set via 
> Biff8EncryptionKey.setCurrentUserPassword()
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowEncrypted.<init>(HSLFSlideShowEncrypted.java:106)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.read(HSLFSlideShowImpl.java:284)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.buildRecords(HSLFSlideShowImpl.java:275)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.<init>(HSLFSlideShowImpl.java:179)
>       at 
> org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:182)
>       at 
> org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
>       at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>       at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>       ... 5 more
> {noformat}
> Note that this happens with a PPT file that is created using Office 2010, 
> Office 2007, or Office 2003.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to