On 4/16/25 7:21 AM, Richard Owlett wrote:
On 4/15/25 11:01 AM, Kent West wrote:
$ pdftotext -f 106 -l 107 TFP2021.pdf TFP2021.txt

As I replied to Nicolas I'll try both that and also a run with the "-layout" option.


I typed the wrong line here (I should have copied/pasted); it should be:

$ pdftotext -f 106 -l 107 -layout TFP2021.pdf TFP2021.txt

Without the "-layout", your data is not going to be as "columnized" as it is in the original PDF, and you probably won't be able to easily use the data. I apologize for missing that switch in my first email.

The "-f" means start with the first page being at page 106, and the "-l" says that the last page should be page 107. You'll get all of both pages, which will need to be manually cleaned up in LibreCalc (or some other spreadsheet app).


Then open LibreCalc, and File/Open this file. When the import options
window appears, change the selection criteria to "Fixed width", and then in the "ruler" bar above the text, click where you want a column divider (like at Columns 39, 60, and 76; just eyeball it. Finish importing the document, and now you have a spreadsheet with the info you want that should be pretty
easy to massage into the form you want.

Any particularly relavant tutorials?


No, not really. Just open the "TFP2021.txt" in LibreCalc (or any spreadsheet program). From the command line, you can do:

$ libreoffice --calc TFP2021.txt &

The "Text Import" window should open. Set the "Separator Options" to "Fixed width", and set the columns where you need them. Then click on "OK. That should import the data into a spreadsheet.

Your data should have four columns (assuming you set three column dividers as I mentioned above). You can highlight columns C and D by clicking on them one at a time at the top of the column, click on (or close to) the actual "C" and "D". Once that column is highlighted, just press BACKSPACE, and select to "Delete all", and "OK". Do that for both the "C" and the "D" columns, and now you have your two columns of wanted data. You'll also have some text spread throughout your data, and above and below it, that you'll have to delete manually.


--
Kent West                    <")))><
IT Support / Client Support
Abilene Christian University
Westing Peacefully - http://kentwest.blogspot.com

Reply via email to