On 4/16/25 7:21 AM, Richard Owlett wrote:
On 4/15/25 11:01 AM, Kent West wrote:
$ pdftotext -f 106 -l 107 TFP2021.pdf TFP2021.txt
As I replied to Nicolas I'll try both that and also a run with the
"-layout" option.
I typed the wrong line here (I should have copied/pasted); it should be:
$ pdftotext -f 106 -l 107 -layout TFP2021.pdf TFP2021.txt
Without the "-layout", your data is not going to be as "columnized" as
it is in the original PDF, and you probably won't be able to easily use
the data. I apologize for missing that switch in my first email.
The "-f" means start with the first page being at page 106, and the "-l"
says that the last page should be page 107. You'll get all of both
pages, which will need to be manually cleaned up in LibreCalc (or some
other spreadsheet app).
Then open LibreCalc, and File/Open this file. When the import options
window appears, change the selection criteria to "Fixed width", and
then in
the "ruler" bar above the text, click where you want a column divider
(like
at Columns 39, 60, and 76; just eyeball it. Finish importing the
document,
and now you have a spreadsheet with the info you want that should be
pretty
easy to massage into the form you want.
Any particularly relavant tutorials?
No, not really. Just open the "TFP2021.txt" in LibreCalc (or any
spreadsheet program). From the command line, you can do:
$ libreoffice --calc TFP2021.txt &
The "Text Import" window should open. Set the "Separator Options" to
"Fixed width", and set the columns where you need them. Then click on
"OK. That should import the data into a spreadsheet.
Your data should have four columns (assuming you set three column
dividers as I mentioned above). You can highlight columns C and D by
clicking on them one at a time at the top of the column, click on (or
close to) the actual "C" and "D". Once that column is highlighted, just
press BACKSPACE, and select to "Delete all", and "OK". Do that for both
the "C" and the "D" columns, and now you have your two columns of wanted
data. You'll also have some text spread throughout your data, and above
and below it, that you'll have to delete manually.
--
Kent West <")))><
IT Support / Client Support
Abilene Christian University
Westing Peacefully - http://kentwest.blogspot.com