On 4/15/25 12:56 PM, David Christensen wrote:
On 4/15/25 07:19, Richard Owlett wrote:
I don't know how to approach the problem.
What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/files/resource-files/ TFP2021.pdf ].

Suggestions?

TIA


I normally open the document in Atril Document Viewer, select the content I want, copy the selection to the clipboard, open LibreOffice Calc (opens with a new spreadsheet), and paste.  The crux is whatever file structure the author's software used to generate the PDF vs. Atril's ability to parse it vs. my ability to use the "Text Import" dialog.


In this case, selecting content in Atril from the table title through the last value in the last row and in "Text Import" checking the options "Separator Options" -> Space" and "Trim spaces", it appears the PDF content is placed into the spreadsheet.  But, formatting is a mess and will require a lot of manual correction.  Experimenting with different options in "Text Import" may help.  Using a different PDF viewer and/or using a different spreadsheet may help.  YMMV.

I'll try the pdftotext route first.



In this case, the table is small enough that the fastest route for myself on the above platform would be to transcribe it into a new spreadsheet by hand.

As my immediate need is only for the one table, I've been considering that. But several other tables are of possible interest. Besides what else is retirement for than the learning to use new tools ;}



If you need to convert many tables or to convert repeatedly, and there is encoding consistency across your input documents, then I suggest looking for PDF parsing libraries for your favorite programming/ scripting language and coding a solution.

Any favorite tutorials.



Alternatively, ask the author for the table in CSV format.

Chuckle. This is a USDA publication.
Thanks.



David




Reply via email to