On 4/16/25 8:35 AM, David Wright wrote:
On Wed 16 Apr 2025 at 07:21:07 (-0500), Richard Owlett wrote:
On 4/15/25 11:01 AM, Kent West wrote:
On Tue, Apr 15, 2025 at 10:32 AM Nicolas George wrote:
Richard Owlett (HE12025-04-15):
I don't know how to approach the problem.
What I would like to end up with is a CSV formatted file containing the
two
left columns of Table A4.14 (pages 106&107) of
[

https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
].

Suggestions?

Have you tried starting with pdftotext -layout and then adding the CSV
delimiters using a powerful editor. The rectangle selection of Vim might
be useful.

Riffing off of Nicolas' suggestion, here's what I would do:

$ pdftotext -f 106 -l 107 TFP2021.pdf TFP2021.txt

As I replied to Nicolas I'll try both that and also a run with the
"-layout" option.

BTW I would add 10 to those pagenumbers (physical vs logical
pages). Otherwise you get the wrong table.

OOPS! Was so focused on format I missed the content problem ;/


Ironically, a copy/paste from xpdf seems to do a better job
than -layout at preserving the columns widths over the page break.
(Perhaps the text at the bottom of the second page messes with -layout.)

I liked the text file you attached. Was that the default output of xpdf itself? [Intend to experiment with it this weekend.]


Then open LibreCalc, and File/Open this file. When the import options
window appears, change the selection criteria to "Fixed width", and then in
the "ruler" bar above the text, click where you want a column divider (like
at Columns 39, 60, and 76; just eyeball it. Finish importing the document,
and now you have a spreadsheet with the info you want that should be pretty
easy to massage into the form you want.

Any particularly relavant tutorials?

Perhaps your own thread at:

   https://lists.debian.org/debian-user/2025/02/msg00493.html

is worth rereading. It seems to be the same operation on the
same report from 15 years earlier.

Yes   BUT NOT in way you may be expecting ;/
Someone recalled you saying xpdf was your default PDF viewer.
So I installed it from the Debian repository via Synaptic.
[ I'm running Debian 12.8 with MATE 2.53.20 desktop. ]
In Caja I right click on TFP2021.pdf & choose open with xpdf.
So far so good ;)
I navigate to Table A4.14 without problem.
No problem selecting a rectangular area of interest.

BUT how do I copy it somewhere useful?

http://www.xpdfreader.com/xpdf-man.html#CONTROLS says in part


Toolbar
toggle sidebar button

Toggles (i.e., shows or hides) the sidebar.

status indicator

This icon is animated while Xpdf is rendering a page. It turns red when an 
error or warning has been issued. Clicking on it opens the error dialog.

selection mode

This icon is an "I-beam" in linear selection mode, and an arrow in block 
selection mode. Clicking on it toggles between the two selection modes.


FURTHER DOWN it says
Text selection
In block selection mode, dragging the mouse with the left button held down will 
highlight an arbitrary rectangle. Shift-clicking will extend the selection.

In linear selection mode, dragging with the left button will highlight text in 
reading order. Double-clicking or triple-clicking will select a word or a line, 
respectively. Shift-clicking will extend the selection.

Selected text can be copied to the clipboard (with the edit/copy menu item). On 
X11, selected text will be available in the X selection buffer.


Where is a Toolbar with a sidebar button?
Where is "edit/copy menu"?

Does xpdf have illustrations somewhere?

I suspect xpdf is itself the tool I looking for.

TIA


>

Reply via email to