On Wed 16 Apr 2025 at 07:21:07 (-0500), Richard Owlett wrote: > On 4/15/25 11:01 AM, Kent West wrote: > > On Tue, Apr 15, 2025 at 10:32 AM Nicolas George wrote: > > > Richard Owlett (HE12025-04-15): > > > > I don't know how to approach the problem. > > > > What I would like to end up with is a CSV formatted file containing the > > > two > > > > left columns of Table A4.14 (pages 106&107) of > > > > [ > > > > > > > https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf > > > > ]. > > > > > > > > Suggestions? > > > > > > Have you tried starting with pdftotext -layout and then adding the CSV > > > delimiters using a powerful editor. The rectangle selection of Vim might > > > be useful. > > > > > Riffing off of Nicolas' suggestion, here's what I would do: > > > > $ pdftotext -f 106 -l 107 TFP2021.pdf TFP2021.txt > > As I replied to Nicolas I'll try both that and also a run with the > "-layout" option.
BTW I would add 10 to those pagenumbers (physical vs logical pages). Otherwise you get the wrong table. Ironically, a copy/paste from xpdf seems to do a better job than -layout at preserving the columns widths over the page break. (Perhaps the text at the bottom of the second page messes with -layout.) > > Then open LibreCalc, and File/Open this file. When the import options > > window appears, change the selection criteria to "Fixed width", and then in > > the "ruler" bar above the text, click where you want a column divider (like > > at Columns 39, 60, and 76; just eyeball it. Finish importing the document, > > and now you have a spreadsheet with the info you want that should be pretty > > easy to massage into the form you want. > > Any particularly relavant tutorials? Perhaps your own thread at: https://lists.debian.org/debian-user/2025/02/msg00493.html is worth rereading. It seems to be the same operation on the same report from 15 years earlier. As for tutorials, no, I can't think of any in particular. It's just a matter of thinking how to apply tools that you've got or can gain access to. With luck, you'll find it easier than I made it for myself last time. (I used a longer edit sequence to avoid depending on the use of delimiters, which wasn't necessary in that case.) Attached is a raw paste from xpdf. Cheers, David.
Quantity b of each Cost of each Cost share of each Market Basket Categories Market Basket Market Basket Market Basket Category (lbs) Category ($) c Category (%) d Vegetables 9.13 10.37 20.67 Dark-green vegetables 0.82 1.49 14.38 Red and orange vegetables 1.95 2.71 26.12 Beans, peas, lentilse 1.72 1.53 14.78 Starchy vegetables 2.92 2.36 22.76 Other vegetables 1.72 2.28 21.97 Fruits 6.74 6.76 13.46 Whole fruit 4.62 5.05 74.82 100% fruit juice 2.12 1.70 25.18 Grains 3.59 7.43 14.81 Whole-grain staple grains (e.g., 2.06 4.93 66.34 rice, pasta, breads, tortillas) Whole-grain cereals (e.g., <0.01 <0.01 <0.01 oatmeal, ready-to-eat cereal) f Refined-grain staple grains (e.g., 1.45 2.25 30.31 rice, pasta, breads, tortillas) Refined-grain other (e.g., cereals, 0.09 0.25 3.34 crackers, snacks) Dairy 12.36 7.35 14.65 Low- and non-fat milk, yogurt, 12.27 7.23 98.31 soy alternatives g Higher fat milk, yogurt, soy 0.08 0.12 1.69 alternatives h Cheese 0.00 0.00 0.00 Protein foods 5.60 15.86 31.60 Meats 0.68 2.79 17.60 Poultry 2.33 5.85 36.87 Quantity b of each Cost of each Cost share of each Market Basket Categories Market Basket Market Basket Market Basket Category (lbs) Category ($) c Category (%) d Eggs 0.85 1.32 8.33 Seafood 0.79 3.44 21.68 Nuts, seeds, soy products 0.96 2.46 15.53 Miscellaneous 0.96 2.41 4.81 Pre-prepared entrees and side dishes (e.g., soups, frozen 0.09 0.23 9.47 entrees, pizza) Coffee and tea 0.17 0.87 36.23 Table fats and oils 0.45 0.99 41.05 Sauces, condiments, jams, 0.25 0.32 13.24 honey, sugars, spices Other foods and beverages (e.g., soft drinks, fruit drinks, ice <0.01 <0.01 <0.01 cream, pudding, cookies, candy bars) Total (Vegetables, Fruits, Grains, Dairy, Protein Foods, 38.37 50.18 100.00 Miscellaneous)