On 4/15/25 12:56 PM, David Christensen wrote:
On 4/15/25 07:19, Richard Owlett wrote:
I don't know how to approach the problem.
What I would like to end up with is a CSV formatted file containing
the two left columns of Table A4.14 (pages 106&107) of
[ https://fns-prod.azureedge.us/sites/default/files/resource-files/
TFP2021.pdf ].
Suggestions?
TIA
I normally open the document in Atril Document Viewer, select the
content I want, copy the selection to the clipboard, open LibreOffice
Calc (opens with a new spreadsheet), and paste. The crux is whatever
file structure the author's software used to generate the PDF vs.
Atril's ability to parse it vs. my ability to use the "Text Import" dialog.
In this case, selecting content in Atril from the table title through
the last value in the last row and in "Text Import" checking the options
"Separator Options" -> Space" and "Trim spaces", it appears the PDF
content is placed into the spreadsheet. But, formatting is a mess and
will require a lot of manual correction. Experimenting with different
options in "Text Import" may help. Using a different PDF viewer and/or
using a different spreadsheet may help. YMMV.
I'll try the pdftotext route first.
In this case, the table is small enough that the fastest route for
myself on the above platform would be to transcribe it into a new
spreadsheet by hand.
As my immediate need is only for the one table, I've been considering
that. But several other tables are of possible interest. Besides what
else is retirement for than the learning to use new tools ;}
If you need to convert many tables or to convert repeatedly, and there
is encoding consistency across your input documents, then I suggest
looking for PDF parsing libraries for your favorite programming/
scripting language and coding a solution.
Any favorite tutorials.
Alternatively, ask the author for the table in CSV format.
Chuckle. This is a USDA publication.
Thanks.
David