[ 
https://issues.apache.org/jira/browse/PDFBOX-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15698101#comment-15698101
 ] 

Tilman Hausherr commented on PDFBOX-3595:
-----------------------------------------

You're converting binary data to a java string and back. Why are you expecting 
that this would work?

> For a PDF - Loading from URL works. Loading from BAIS does not.
> ---------------------------------------------------------------
>
>                 Key: PDFBOX-3595
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3595
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.12, 2.0.3
>         Environment: Windows
>            Reporter: David Medinets
>            Priority: Minor
>
> I've found several PDF files at 
> https://www.supremecourt.gov/opinions/boundvolumes.aspx that throw an 
> exception when using PDDocument.load with a ByteArrayInputStream but do not 
> throw an exception when the same PDF is loaded using a URL.
> v1.8.12 is the last version in which the load method takes a URL object. I 
> mention it here in case that reference point of 'working' code helps diagnose 
> this issue.
>  
> Below is the complete program that shows the two approaches. The first works. 
> The second does not.
> ```
> package com.affy.wildtuna.adrivers;
> import java.io.ByteArrayInputStream;
> import java.net.URL;
> import org.apache.commons.io.IOUtils;
> import org.apache.pdfbox.pdmodel.PDDocument;
> public class ShowInvalidDistancesSetException {
>     public static void main(final String[] args) throws Exception {
>         String url = 
> "https://www.supremecourt.gov/opinions/boundvolumes/545bv.pdf";;
>         PDDocument doc01 = PDDocument.load(new URL(url));
>         doc01.close();
>         System.out.println("Loading from URL works.");
>         
>         String contents = IOUtils.toString(new URL(url).openStream());
>         try (ByteArrayInputStream bais = new 
> ByteArrayInputStream(contents.getBytes())) {
>             PDDocument doc = PDDocument.load(bais);
>             doc.close();
>         }
>     }
> }
> ```
> Here is the program's output:
> ```
> WARNING: Specified stream length 6845 is wrong. Fall back to reading stream 
> until 'endstream'.
> Loading from URL works.
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Nov 26, 2016 10:24:01 AM org.apache.pdfbox.filter.FlateFilter decode
> SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException
> Exception in thread "main" java.io.IOException
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:138)
>       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:301)
>       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:221)
>       at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:156)
>       at 
> org.apache.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:64)
>       at 
> org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:574)
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1071)
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1038)
>       at 
> com.affy.wildtuna.adrivers.ShowInvalidDistancesSetException.main(ShowInvalidDistancesSetException.java:18)
> Caused by: java.util.zip.DataFormatException: invalid distances set
>       at java.util.zip.Inflater.inflateBytes(Native Method)
>       at java.util.zip.Inflater.inflate(Inflater.java:259)
>       at java.util.zip.Inflater.inflate(Inflater.java:280)
>       at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:169)
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98)
>       ... 9 more
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to