The -sort option did not solve the problem. I tried the alpha release of PDFBox 3.0 and it produced the same results as the 2.0 version.
Note: Command line parameters are different in PDFBox 3.0. Bob ________________________________ From: Tilman Hausherr <[email protected]> Sent: Friday, May 19, 2023 9:22 AM To: [email protected] <[email protected]> Subject: Re: Text sequence of ExtractText utility Hi, You can try the "-sort" option. Sometimes this helps. Tilman [cid:[email protected]] On 19.05.2023 15:17, Robert Rodini wrote: Hi, I have successfully used PDFBox ExtractText utility to process PDFs produced by a third-party. The text comes out of a multicolumn PDF in the left to right order of the columns from top to bottom. I now have to process PDFs produced by another third-party which also produces a multicolumn PDF. This time the text comes out in an unpredictable order. I've read the FAQ https://pdfbox.apache.org/2.0/faq.html regarding "Why does the extracted text appear in the wrong sequence?" I'd like to know if there is a command line switch (or something) that I can do to get the text extracted in the right order? Can I request an CLI switch to the ExtractText utility? How to do this? Thanks, Bob Rodini

