On 4/15/25 07:19, Richard Owlett wrote:
I don't know how to approach the problem.
What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/files/resource-files/ TFP2021.pdf ].

Suggestions?

TIA


I normally open the document in Atril Document Viewer, select the content I want, copy the selection to the clipboard, open LibreOffice Calc (opens with a new spreadsheet), and paste. The crux is whatever file structure the author's software used to generate the PDF vs. Atril's ability to parse it vs. my ability to use the "Text Import" dialog.


In this case, selecting content in Atril from the table title through the last value in the last row and in "Text Import" checking the options "Separator Options" -> Space" and "Trim spaces", it appears the PDF content is placed into the spreadsheet. But, formatting is a mess and will require a lot of manual correction. Experimenting with different options in "Text Import" may help. Using a different PDF viewer and/or using a different spreadsheet may help. YMMV.


In this case, the table is small enough that the fastest route for myself on the above platform would be to transcribe it into a new spreadsheet by hand.


If you need to convert many tables or to convert repeatedly, and there is encoding consistency across your input documents, then I suggest looking for PDF parsing libraries for your favorite programming/ scripting language and coding a solution.


Alternatively, ask the author for the table in CSV format.


David

Reply via email to