Hello,
First of all, thank you for this wonderful Swiss army knife for PDFs that is
PdfBox.
Java version : 21
Pdfbox version : 3.0.4
Description :
When a pdf is saved with option CompressParameters.NO_COMPRESSION, useless
lines like
nnnnnnnnnn 65535 f
are added to xref section
When splitting a pdf, this side effect seems cumulative when saving each part.
Not really relevant when saving only one pdf but when splitting a pdf to 5000
parts, it becomes huge.
You can reproduce the issue with any pdf
Current workaround to fix this issue : open and save the produced pdf(s) with
itextpdf 5.5.13.4 remove the useless lines like nnnnnnnnnn 65535 f
Regards,
Yannick
Test class :
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdfwriter.compress.CompressParameters;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.util.List;
public class Test_NO_COMPRESSION {
public static void main(String[] args) {
if (args.length != 1) {
System.out.println("Test_NO_COMPRESSION usage : [absolute path of
the pdf to save and split]");
System.exit(-1);
}
try {
File pdf = new File(args[0]);
String targetPath = pdf.getAbsolutePath() + "." +
System.currentTimeMillis() + ".pdf";
try (PDDocument doc = Loader.loadPDF(pdf)) {
System.out.println("Saving NO_COMPRESSION to file " +
targetPath);
doc.save(targetPath, CompressParameters.NO_COMPRESSION);
}
try (PDDocument doc = Loader.loadPDF(pdf)) {
for (int i = 1; i < doc.getNumberOfPages(); i++) {
Splitter splitter = new Splitter();
splitter.setStartPage(i);
splitter.setEndPage(i + 1);
splitter.setSplitAtPage(i + 1);
List<PDDocument> documents = splitter.split(doc);
PDDocument tempDoc = documents.getFirst();
String splitFilePath = targetPath + ".part." + i + ".pdf";
System.out.println("Saving page #" + i + " NO_COMPRESSION
to file " + splitFilePath);
tempDoc.save(splitFilePath,
CompressParameters.NO_COMPRESSION);
tempDoc.close();
}
}
System.out.println("Done");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Worldline, Cardlink, GoPay and Santeos are registered trademarks and trade
names owned by the Worldline Group. This e-mail and any documents attached are
confidential and intended solely for the addressee. It may also be privileged.
If you are not the intended recipient of this e-mail, you are not authorized to
copy, disclose, use or retain it. Please notify the sender immediately and
delete this e-mail (including any attachments) from your systems. As e-mails
may be intercepted, amended or lost, they are not secure. Therefore,
Worldline's and its subsidiaries' liability cannot be triggered for the message
content. Although the Worldline Group endeavors to maintain a virus-free
network, we do not warrant that this e-mail is virus-free and do not accept
liability for any damages, losses or consequences resulting from any
transmitted virus if any. The risks are deemed to be accepted by anyone who
communicates with Worldline or its subsidiaries by e-mail.
Please consider the environment before printing, sending or forwarding this
email.