Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-25 Thread David Wright
On Sun 23 Feb 2025 at 22:13:55 (+0700), Max Nikulin wrote: > On 22/02/2025 05:02, David Wright wrote: > > > > With mupdf, I don't even > > know how to copy, as the mouse just drags the page around. > > I have not tried it, but... > https://manpages.debian.org/bookworm/mupdf/mupdf.1.en.html#Right~

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-23 Thread Max Nikulin
On 22/02/2025 05:02, David Wright wrote: On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote: P.S. "pdftotext -layout" in some cases is better than without "-layout". I think the results are roughly comparable with my scrapings, for this document at least. Perhaps both pdftotext and xpd

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-23 Thread Greg
On 2025-02-23, Max Nikulin wrote: > > I am sure there should be ready to use tools that extract tables from > PDF and from aligned text. Out of curiosity I tried to create a small > python script to process text you attached earlier. It does not try to For previously created python wheels ther

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-23 Thread Max Nikulin
On 22/02/2025 05:02, David Wright wrote: With mupdf, I don't even know how to copy, as the mouse just drags the page around. I have not tried it, but... https://manpages.debian.org/bookworm/mupdf/mupdf.1.en.html#Right~2 On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote: When text fi

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-22 Thread David Wright
On Fri 21 Feb 2025 at 17:13:17 (-0500), Cindy Sue Causey wrote: > On Fri, 2025-02-21 at 21:20 +, debian-u...@howorth.org.uk wrote: > > For me, FF opens a normal web page and tries to download a PDF file as > > well. Cheeky thing! For both the 2006 and 2021 pages. I can't be > > bothered trying

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-22 Thread songbird
fxkl4...@protonmail.com wrote: > in discussions about pdf utilities i've don't recall atril being mentioned > it's become my goto viewer perhaps because it is normally a part of the MATE desktop? i've been using it for years and so far no major issues that i've noticed, but i'm also not doing

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-22 Thread Greg
On 2025-02-21, David Wright wrote: >> > >> > I get: >> > >> > Access Denied >> > You don't have permission to access >> > "http://www.fns.usda.gov/cnpp/thrifty-food-plan-2006"; on this server. >> > Reference #18.dd831002.1740148075.35e89c97 >> > >> > https://errors.edgesuite.net/18.dd831002.174

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread tomas
On Fri, Feb 21, 2025 at 03:59:55PM -0600, David Wright wrote: > On Fri 21 Feb 2025 at 21:20:45 (+), debian-u...@howorth.org.uk wrote: [...] > > > I get: > > > > > > Access Denied > > > You don't have permission to access > > > "http://www.fns.usda.gov/cnpp/thrifty-food-plan-2006"; on this se

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread fxkl47BF
in discussions about pdf utilities i've don't recall atril being mentioned it's become my goto viewer

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread Cindy Sue Causey
On Fri, 2025-02-21 at 21:20 +, debian-u...@howorth.org.uk wrote: > Greg wrote: > > On 2025-02-21, David Wright wrote: > > >   > > > > > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 > > > > > >   Table ES-1. Thrifty Food Plan market baskets, > > > > > > quantities > > > > > >

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread David Wright
On Fri 21 Feb 2025 at 21:20:45 (+), debian-u...@howorth.org.uk wrote: > On Fri 21 Feb 2025 at 14:30:08 (-), Greg wrote: > > On 2025-02-21, David Wright wrote: > > > > > >> > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 > > >> > > Table ES-1. Thrifty Food Plan market ba

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread David Wright
On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote: > On 21/02/2025 08:00, David Wright wrote: > > I dragged the mouse > > across the Males table and dumped it in a file. > > David, I recall you mentioned xpdf in your messages. It allows to > select rectangular regions. Sometimes it is conv

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread debian-user
Greg wrote: > On 2025-02-21, David Wright wrote: > > > >> > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 > >> > > Table ES-1. Thrifty Food Plan market baskets, quantities > >> > > of food purchased for a week, by age-gender group, 2006 > > > > I don't read PDFs /in/ the br

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread Greg
On 2025-02-21, David Wright wrote: > >> > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 >> > > Table ES-1. Thrifty Food Plan market baskets, quantities of food >> > >purchased for a week, by age-gender group, 2006 > > I don't read PDFs /in/ the browser: it downloads it i

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Max Nikulin
On 21/02/2025 08:00, David Wright wrote: I dragged the mouse across the Males table and dumped it in a file. David, I recall you mentioned xpdf in your messages. It allows to select rectangular regions. Sometimes it is convenient since this strategy does not depend on order of objects inside

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread David Wright
On Thu 20 Feb 2025 at 13:52:06 (-0600), Richard Owlett wrote: > On 2/20/25 11:20 AM, debian-u...@howorth.org.uk wrote: > > Richard Owlett wrote: > > > I wish to extract CSV formatted data from a PDF document. [1] > > > Page ES-7 has a weekly grocery list for males grouped by age. > > > I need only

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Richard Owlett
On 2/20/25 11:20 AM, debian-u...@howorth.org.uk wrote: Richard Owlett wrote: I wish to extract CSV formatted data from a PDF document. [1] Page ES-7 has a weekly grocery list for males grouped by age. I need only the first and last columns. Can someone point me in a suitable direction? TIA [

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Hans
Am Donnerstag, 20. Februar 2025, 15:08:27 CET schrieb Richard Owlett: > I wish to extract CSV formatted data from a PDF document. [1] > Page ES-7 has a weekly grocery list for males grouped by age. > I need only the first and last columns. > > Can someone point me in a suitable direction? > > TIA

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread debian-user
Richard Owlett wrote: > I wish to extract CSV formatted data from a PDF document. [1] > Page ES-7 has a weekly grocery list for males grouped by age. > I need only the first and last columns. > > Can someone point me in a suitable direction? > > TIA > > [1] https://www.fns.usda.gov/cnpp/thrifty

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread John Hasler
Try pdftotext. -- John Hasler j...@sugarbit.com Elmwood, WI USA

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread mick.crane
On 2025-02-20 14:08, Richard Owlett wrote: I wish to extract CSV formatted data from a PDF document. [1] Page ES-7 has a weekly grocery list for males grouped by age. I need only the first and last columns. Can someone point me in a suitable direction? TIA [1] https://www.fns.usda.gov/cnpp/thr