Re: Alternative to Debian Repository - extract CSV formatted data from PDF

Richard Owlett Thu, 20 Feb 2025 11:52:35 -0800

On 2/20/25 11:20 AM, debian-u...@howorth.org.uk wrote:

Richard Owlett <rowl...@access.net> wrote:

I wish to extract CSV formatted data from a PDF document. [1]
Page ES-7 has a weekly grocery list for males grouped by age.
I need only the first and last columns.


Can someone point me in a suitable direction?

TIA

[1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006
      Table ES-1. Thrifty Food Plan market baskets, quantities of food
       purchased for a week, by age-gender group, 2006


If you look at
https://www.fns.usda.gov/cnpp/thrifty-food-plan-2021 instead, you can
find the underlying data in spreadsheet form (.xlsx). Perhaps that will
be an adequate substitute?


You just demonstrated that "Murphy's Law" holds ;<

I click on the link you quoted in my default browser and a PDF isdisplayed [actually my original starting point months ago].

If I use my alternate browser {Firefox instead of SeaMonkey} I get tochose which of several files to view. {one of them is an .xlsx file}


Murphy gets a second jab in.

The 2006 version has the data I want in a slightly different layout thatthe 2021 version. The first is a better match for how I do things ;/

Also the PDF structure of the two links react slightly differently whenselecting with mouse movements/clicks. The 2006 version seems to allowme to select only what I want. [ 2021 version grabs everything betweenfirst and last click. 2006 appears to select only the columns of interest]

Can't spend time right now to verify first impression. Will know morethis weekend.


*THANK YOU*

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

Reply via email to