xdf IS solution to "How to extract TABULAR data from a PDF document?"

2025-04-18 Thread Richard Owlett
t a scanned image or otherwise obscured. With xpdf, the contents of the rectangle is copied, and I've always found the boundaries quite precise. LARGE snip The mechanical steps I followed for my test: 1. open PDF with xpdf 2. navigate to first desired page 3. highlight the desired data my p

Re: How to extract TABULAR data from a PDF document?

2025-04-18 Thread jeremy ardley
On 15/4/25 22:19, Richard Owlett wrote: I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf ]. Sug

Re: How to extract TABULAR data from a PDF document?

2025-04-18 Thread Richard Owlett
On 4/17/25 10:09 PM, jeremy ardley wrote: On 15/4/25 22:19, Richard Owlett wrote: I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/

Re: How to extract TABULAR data from a PDF document?

2025-04-18 Thread Richard Owlett
On 4/17/25 9:45 PM, David Wright wrote: On Thu 17 Apr 2025 at 14:24:35 (-0500), Richard Owlett wrote: On 4/16/25 8:35 AM, David Wright wrote: Ironically, a copy/paste from xpdf seems to do a better job than -layout at preserving the columns widths over the page break. (Perhaps the text at the b

Re: How to extract TABULAR data from a PDF document?

2025-04-18 Thread jeremy ardley
On 18/4/25 15:43, to...@tuxteam.de wrote: I see my colleagues now writing programs with LLMs. I don't look forward to the day I'll have to debug a larger corpus of this mess. Obviously you've never had to herd junior developers. I have had to. It sucks and productivity is woeful due to all

Re: How to extract TABULAR data from a PDF document?

2025-04-18 Thread tomas
On Fri, Apr 18, 2025 at 01:35:19PM +0800, jeremy ardley wrote: > > On 18/4/25 13:10, to...@tuxteam.de wrote: > > > I'm not sure if it is mentioned but just take a picture of each page and > > > ask > > > a good Large Language Model to give you a table. > > After this, I'd double-check each indivi

Re: How to extract TABULAR data from a PDF document?

2025-04-17 Thread jeremy ardley
eading and once as a secondary heading All other numerical values appear to match between the two versions. Only the dark-green vegetables cost figure ($1.06 vs $1.86) is a true data discrepancy. The original image shows $1.06, so my value is correct for that entry.

Re: How to extract TABULAR data from a PDF document?

2025-04-17 Thread jeremy ardley
e. I've been doing this for a couple of years now scanning bank statements etc. I had previously tried the online pdf to text servers with varying results. I also tried using PDF to text programs that often got poor results for tabular data as what you see is not necessarily how it is store

Re: How to extract TABULAR data from a PDF document?

2025-04-17 Thread tomas
On Fri, Apr 18, 2025 at 11:09:52AM +0800, jeremy ardley wrote: [...] > I'm not sure if it is mentioned but just take a picture of each page and ask > a good Large Language Model to give you a table. After this, I'd double-check each individual number. You'll never know if they are being made up,

Re: How to extract TABULAR data from a PDF document?

2025-04-17 Thread David Wright
On Thu 17 Apr 2025 at 14:24:35 (-0500), Richard Owlett wrote: > On 4/16/25 8:35 AM, David Wright wrote: > > Ironically, a copy/paste from xpdf seems to do a better job > > than -layout at preserving the columns widths over the page break. > > (Perhaps the text at the bottom of the second page messe

Re: How to extract TABULAR data from a PDF document?

2025-04-17 Thread Detlef Vollmann
On 4/17/25 21:24, Richard Owlett wrote: Selected text can be copied to the clipboard (with the edit/copy menu item). On X11, selected text will be available in the X selection buffer. Where is a Toolbar with a sidebar button? I've never seen such a "sidebar button". However, on the left ma

Re: How to extract TABULAR data from a PDF document?

2025-04-17 Thread Richard Owlett
On 4/16/25 8:35 AM, David Wright wrote: On Wed 16 Apr 2025 at 07:21:07 (-0500), Richard Owlett wrote: On 4/15/25 11:01 AM, Kent West wrote: On Tue, Apr 15, 2025 at 10:32 AM Nicolas George wrote: Richard Owlett (HE12025-04-15): I don't know how to approach the problem. What I would like to end

Re: How to extract TABULAR data from a PDF document?

2025-04-16 Thread David Wright
On Wed 16 Apr 2025 at 07:21:07 (-0500), Richard Owlett wrote: > On 4/15/25 11:01 AM, Kent West wrote: > > On Tue, Apr 15, 2025 at 10:32 AM Nicolas George wrote: > > > Richard Owlett (HE12025-04-15): > > > > I don't know how to approach the problem. > > > > What I would like to end up with is a CSV

Re: How to extract TABULAR data from a PDF document?

2025-04-16 Thread Kent West
); it should be: $ pdftotext -f 106 -l 107 -layout TFP2021.pdf TFP2021.txt Without the "-layout", your data is not going to be as "columnized" as it is in the original PDF, and you probably won't be able to easily use the data. I apologize for missing that switch in my

Re: How to extract TABULAR data from a PDF document?

2025-04-16 Thread Richard Owlett
On 4/15/25 12:56 PM, David Christensen wrote: On 4/15/25 07:19, Richard Owlett wrote: I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/defaul

Re: How to extract TABULAR data from a PDF document?

2025-04-16 Thread Richard Owlett
On 4/15/25 11:01 AM, Kent West wrote: On Tue, Apr 15, 2025 at 10:32 AM Nicolas George wrote: Richard Owlett (HE12025-04-15): I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [

Re: How to extract TABULAR data from a PDF document?

2025-04-16 Thread Richard Owlett
On 4/15/25 10:31 AM, Nicolas George wrote: Richard Owlett (HE12025-04-15): I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/files/reso

Re: How to extract TABULAR data from a PDF document?

2025-04-15 Thread David Christensen
On 4/15/25 07:19, Richard Owlett wrote: I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/files/resource-files/ TFP2021.pdf ]. Sugge

Re: How to extract TABULAR data from a PDF document?

2025-04-15 Thread Nicolas George
Richard Owlett (HE12025-04-15): > I don't know how to approach the problem. > What I would like to end up with is a CSV formatted file containing the two > left columns of Table A4.14 (pages 106&107) of > [ > https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf > ]. > > Sug

How to extract TABULAR data from a PDF document?

2025-04-15 Thread Richard Owlett
I don't know how to approach the problem. What I would like to end up with is a CSV formatted file containing the two left columns of Table A4.14 (pages 106&107) of [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf ]. Suggestions? TIA

Re: How to extract TABULAR data from a PDF document?

2025-04-15 Thread Kent West
On Tue, Apr 15, 2025 at 10:32 AM Nicolas George wrote: > Richard Owlett (HE12025-04-15): > > I don't know how to approach the problem. > > What I would like to end up with is a CSV formatted file containing the > two > > left columns of Table A4.14 (pages 106&107) of > > [ > > > https://fns-prod.

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-25 Thread David Wright
On Sun 23 Feb 2025 at 22:13:55 (+0700), Max Nikulin wrote: > On 22/02/2025 05:02, David Wright wrote: > > > > With mupdf, I don't even > > know how to copy, as the mouse just drags the page around. > > I have not tried it, but... > https://manpages.debian.org/bookworm/mupdf/mupdf.1.en.html#Right~

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-23 Thread Max Nikulin
On 22/02/2025 05:02, David Wright wrote: On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote: P.S. "pdftotext -layout" in some cases is better than without "-layout". I think the results are roughly comparable with my scrapings, for this document at least. Perhaps both pdftotext and xpd

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-23 Thread Greg
On 2025-02-23, Max Nikulin wrote: > > I am sure there should be ready to use tools that extract tables from > PDF and from aligned text. Out of curiosity I tried to create a small > python script to process text you attached earlier. It does not try to For previously created python wheels ther

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-23 Thread Max Nikulin
On 22/02/2025 05:02, David Wright wrote: With mupdf, I don't even know how to copy, as the mouse just drags the page around. I have not tried it, but... https://manpages.debian.org/bookworm/mupdf/mupdf.1.en.html#Right~2 On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote: When text fi

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-22 Thread David Wright
On Fri 21 Feb 2025 at 17:13:17 (-0500), Cindy Sue Causey wrote: > On Fri, 2025-02-21 at 21:20 +, debian-u...@howorth.org.uk wrote: > > For me, FF opens a normal web page and tries to download a PDF file as > > well. Cheeky thing! For both the 2006 and 2021 pages. I can't be > > bothered trying

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-22 Thread songbird
fxkl4...@protonmail.com wrote: > in discussions about pdf utilities i've don't recall atril being mentioned > it's become my goto viewer perhaps because it is normally a part of the MATE desktop? i've been using it for years and so far no major issues that i've noticed, but i'm also not doing

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-22 Thread Greg
On 2025-02-21, David Wright wrote: >> > >> > I get: >> > >> > Access Denied >> > You don't have permission to access >> > "http://www.fns.usda.gov/cnpp/thrifty-food-plan-2006"; on this server. >> > Reference #18.dd831002.1740148075.35e89c97 >> > >> > https://errors.edgesuite.net/18.dd831002.174

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread tomas
On Fri, Feb 21, 2025 at 03:59:55PM -0600, David Wright wrote: > On Fri 21 Feb 2025 at 21:20:45 (+), debian-u...@howorth.org.uk wrote: [...] > > > I get: > > > > > > Access Denied > > > You don't have permission to access > > > "http://www.fns.usda.gov/cnpp/thrifty-food-plan-2006"; on this se

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread fxkl47BF
in discussions about pdf utilities i've don't recall atril being mentioned it's become my goto viewer

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread Cindy Sue Causey
On Fri, 2025-02-21 at 21:20 +, debian-u...@howorth.org.uk wrote: > Greg wrote: > > On 2025-02-21, David Wright wrote: > > >   > > > > > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 > > > > > >   Table ES-1. Thrifty Food Plan market baskets, > > > > > > quantities > > > > > >

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread David Wright
On Fri 21 Feb 2025 at 21:20:45 (+), debian-u...@howorth.org.uk wrote: > On Fri 21 Feb 2025 at 14:30:08 (-), Greg wrote: > > On 2025-02-21, David Wright wrote: > > > > > >> > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 > > >> > > Table ES-1. Thrifty Food Plan market ba

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread David Wright
On Fri 21 Feb 2025 at 09:53:46 (+0700), Max Nikulin wrote: > On 21/02/2025 08:00, David Wright wrote: > > I dragged the mouse > > across the Males table and dumped it in a file. > > David, I recall you mentioned xpdf in your messages. It allows to > select rectangular regions. Sometimes it is conv

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread debian-user
Greg wrote: > On 2025-02-21, David Wright wrote: > > > >> > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 > >> > > Table ES-1. Thrifty Food Plan market baskets, quantities > >> > > of food purchased for a week, by age-gender group, 2006 > > > > I don't read PDFs /in/ the br

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-21 Thread Greg
On 2025-02-21, David Wright wrote: > >> > > [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 >> > > Table ES-1. Thrifty Food Plan market baskets, quantities of food >> > >purchased for a week, by age-gender group, 2006 > > I don't read PDFs /in/ the browser: it downloads it i

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Max Nikulin
On 21/02/2025 08:00, David Wright wrote: I dragged the mouse across the Males table and dumped it in a file. David, I recall you mentioned xpdf in your messages. It allows to select rectangular regions. Sometimes it is convenient since this strategy does not depend on order of objects inside

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread David Wright
On Thu 20 Feb 2025 at 13:52:06 (-0600), Richard Owlett wrote: > On 2/20/25 11:20 AM, debian-u...@howorth.org.uk wrote: > > Richard Owlett wrote: > > > I wish to extract CSV formatted data from a PDF document. [1] > > > Page ES-7 has a weekly grocery list for males g

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Richard Owlett
On 2/20/25 11:20 AM, debian-u...@howorth.org.uk wrote: Richard Owlett wrote: I wish to extract CSV formatted data from a PDF document. [1] Page ES-7 has a weekly grocery list for males grouped by age. I need only the first and last columns. Can someone point me in a suitable direction? TIA

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Hans
Am Donnerstag, 20. Februar 2025, 15:08:27 CET schrieb Richard Owlett: > I wish to extract CSV formatted data from a PDF document. [1] > Page ES-7 has a weekly grocery list for males grouped by age. > I need only the first and last columns. > > Can someone point me in a sui

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread debian-user
Richard Owlett wrote: > I wish to extract CSV formatted data from a PDF document. [1] > Page ES-7 has a weekly grocery list for males grouped by age. > I need only the first and last columns. > > Can someone point me in a suitable direction? > > TIA > > [1] ht

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread John Hasler
Try pdftotext. -- John Hasler j...@sugarbit.com Elmwood, WI USA

Re: Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread mick.crane
On 2025-02-20 14:08, Richard Owlett wrote: I wish to extract CSV formatted data from a PDF document. [1] Page ES-7 has a weekly grocery list for males grouped by age. I need only the first and last columns. Can someone point me in a suitable direction? TIA [1] https://www.fns.usda.gov/cnpp

Alternative to Debian Repository - extract CSV formatted data from PDF

2025-02-20 Thread Richard Owlett
I wish to extract CSV formatted data from a PDF document. [1] Page ES-7 has a weekly grocery list for males grouped by age. I need only the first and last columns. Can someone point me in a suitable direction? TIA [1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006 Table ES-1. Thrifty

Re: Problems with big data transfer and partitions? - suggestion!

2025-01-23 Thread Michael Stone
On Thu, Jan 23, 2025 at 04:16:29PM +0100, Hans wrote: Fourth: exfat (needed or big files) does not have a journal like ext3 or ext4, so data may be going corrupt on the harddrive and could not be restored. That's not what a journal is for, and if the copy completes and the disk is unmo

Problems with big data transfer and partitions? - suggestion!

2025-01-23 Thread Hans
Hi folks, in the last weks there were several issues with data transfer from ext4 to exfat. Most cases wanted to be done, to transfer seceral terrabyte of date to a MS-Windows system. Thinking of it, IMO this is a bad choice. Olease let me explain: Besides to host tzerrabyte of important

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread Greg Wooledge
On Mon, Jan 20, 2025 at 00:08:54 +, David wrote: > I would have recognised this > echo a{1..5}b > as brace expansion, but I hadn't absorbed the extra glorious > capabilities of its commas. The commas were the original form. The .. range feature was added in bash version 3.0.

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread David
On Sun, 19 Jan 2025 at 16:24, Greg Wooledge wrote: > On Sun, Jan 19, 2025 at 12:43:51 -0300, Eduardo M KALINOWSKI wrote: > > Em 19/01/2025 08:57, David escreveu: > > > On Sun, 19 Jan 2025 at 02:51, Default User > > > wrote: > > > > time sudo rsync -aHSxvvv --human-readable --delete --numeric-id

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread Greg Wooledge
On Sun, Jan 19, 2025 at 12:43:51 -0300, Eduardo M KALINOWSKI wrote: > Em 19/01/2025 08:57, David escreveu: > > On Sun, 19 Jan 2025 at 02:51, Default User > > wrote: > > > time sudo rsync -aHSxvvv --human-readable --delete --numeric-ids -- > > > info=progress2,stats2,name2 -- > > > exclude={"/dev/

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread Eduardo M KALINOWSKI
Em 19/01/2025 08:57, David escreveu: On Sun, 19 Jan 2025 at 02:51, Default User wrote: time sudo rsync -aHSxvvv --human-readable --delete --numeric-ids -- info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media /*","/lost+found"} /media/user/DRIVE1

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread Charles Curley
On Sat, 18 Jan 2025 21:01:14 -0700 Charles Curley wrote: > I suggest that instead of using rsync directly you use rsnapshot. You > can set it up so that it only copies if DRIVE2 is there. The cron > entries let it happen automatically. Another advantage to rsnapshot is that you don't have to fid

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread David
On Sun, 19 Jan 2025 at 02:51, Default User wrote: > I may just delete everything on DRIVE2 overnight, and then try rsync > with: > > time sudo rsync -aHSxvvv --human-readable --delete --numeric-ids -- > info=progress2,stats2,name2 -- > exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/

Re: rsync: source and destination drive data used sizes differ

2025-01-19 Thread Michel Verdier
On 2025-01-19, e...@gmx.us wrote: > I've never used LUKS before, so we're even. With a non-encrypted > filesystem, you would > unmount the partition > mkfs -t whatever /dev/whatever > mount it again It's the same with luks and the device used is a mapping in /dev/mapper

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread tomas
On Sat, Jan 18, 2025 at 08:27:17PM -0500, Default User wrote: > Hi! [...] > Every night, I have been using rsync to copy from DRIVE1 to DRIVE2, > doing: > > time sudo rsync -avvv --human-readable --delete --numeric-ids > --info=progress2,stats2,name2 -- > exclude={"/dev/*","/proc/*","/sys/*","/t

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread eben
On 1/18/25 22:21, Default User wrote: > Hi, Eben! > > I hate to sound stupid, but how would I do that. I have never used mkfs > before. I've never used LUKS before, so we're even. With a non-encrypted filesystem, you would unmount the partition mkfs -t whatever /dev/whatever mount it again

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread David Christensen
information. The first drive, Drive 1, is my "backup drive". I backup daily using Borgbackup Version 1.2.4 from the Debian Stable repositories, and rsnapshot Version 1.4.5-1, also from the Debian Stable repositories. It also has a whole bunch of other archival programs, data and image f

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread Default User
Hi, Charles! Thanks for the reply. I will have to ponder that.

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread Charles Curley
On Sat, 18 Jan 2025 20:36:42 -0500 Default User wrote: > So, back to the original question: what in the world am I supposed to > do to have rsync copy so that the size change in the two drives is > equal, and DRIVE2 has (theoretically) the same data, taking up the > same space, a

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread Default User
Hi, Eben! I hate to sound stupid, but how would I do that. I have never used mkfs before.

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread eben
On 1/18/25 21:50, Default User wrote: > Hi Andy! > > Thanks for the reply. > > I may just delete everything on DRIVE2 overnight, Might be faster to mkfs than to rm *.

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread Default User
Hi Andy! Thanks for the reply. I may just delete everything on DRIVE2 overnight, and then try rsync with: time sudo rsync -aHSxvvv --human-readable --delete --numeric-ids -- info=progress2,stats2,name2 -- exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media /*","/lost+found"}

Re: rsync: source and destination drive data used sizes differ

2025-01-18 Thread Andy Smith
Hi Default, On Sat, Jan 18, 2025 at 08:36:42PM -0500, Default User wrote: > So, back to the original question: what in the world am I supposed to > do to have rsync copy so that the size change in the two drives is > equal, and DRIVE2 has (theoretically) the same data, taking up the sam

rsync: source and destination drive data used sizes differ

2025-01-18 Thread Default User
ckup daily using Borgbackup Version 1.2.4 from the Debian Stable repositories, and rsnapshot Version 1.4.5-1, also from the Debian Stable repositories. It also has a whole bunch of other archival programs, data and image files on it as well. sudo df -h /media/user/DRIVE1 Filesystem  Size 

rsync: source and destination drive data used sizes differ

2025-01-18 Thread Default User
ckup daily using Borgbackup Version 1.2.4 from the Debian Stable repositories, and rsnapshot Version 1.4.5-1, also from the Debian Stable repositories. It also has a whole bunch of other archival programs, data and image files on it as well. sudo df -h /media/user/DRIVE1 Filesystem Size

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-11 Thread Andy Smith
st partition”. These sectors > wouldn't by synchronized by MD RAID, unless you're using it on the whole > drives—as opposed to partition by partition. I don't claim that “this is > it”, but this might explain some difference between your drives' booting > behavior, even

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-11 Thread Roy J. Tellason, Sr.
On Tuesday 10 September 2024 08:39:59 pm Andy Smith wrote: > This does leave me wondering however, if the boot code in the mBR of > sdb is now set to believe that this is "the second drive", I suppose > (hd1) in grub terms? With the implication that should sda fail or be > removed, this machine may

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-11 Thread Florent Rougon
st partition”. These sectors wouldn't by synchronized by MD RAID, unless you're using it on the whole drives—as opposed to partition by partition. I don't claim that “this is it”, but this might explain some difference between your drives' booting behavior, even with identical:

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-10 Thread Andy Smith
Hi, On Wed, Sep 11, 2024 at 12:45:46AM +0200, Florent Rougon wrote: > The partition table indeed starts at offset 446 (decimal), however I'd > still rather run grub-install or “dpkg-reconfigure grub-pc” than copy > the first 446 bytes from one drive to another drive. The reason is that, > AFAIUI,

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-10 Thread Florent Rougon
Le 10/09/2024, Andy Smith a écrit: > Good point. I understand the bootloader is actually the first 446 > bytes so maybe I should only be looking at these. > > https://unix.stackexchange.com/a/254668/36243 The partition table indeed starts at offset 446 (decimal), however I'd still rather run

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-10 Thread Andy Smith
Hi, On Tue, Sep 10, 2024 at 03:58:58PM +0200, Florent Rougon wrote: > Le 09/09/2024, Andy Smith a écrit: > > Can I simply copy the first 512 bytes of sdb to the start of sda? > > I would not do this, one of the reasons being that AFAICT, the start > offsets of the (up to 4) primary partitions of

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-10 Thread Florent Rougon
Hi, Not an expert on this matter, so take this with a grain of salt. Le 09/09/2024, Andy Smith a écrit: > Can I simply copy the first 512 bytes of sdb to the start of sda? I would not do this, one of the reasons being that AFAICT, the start offsets of the (up to 4) primary partitions of each d

Re: Matching grub data in the MBR with the installed grub-pc package

2024-09-09 Thread Andy Smith
MBR of sda wants to do. I'm particularly interested in > seeing if the binary grub data in the MBR actually comes from the > grub that is installed from the grub-pc package in the OS. $ xxd /usr/lib/grub/i386-pc/boot.img > /tmp/img.hex $ sudo dd if=/dev/sda bs=1 count=512 2>/dev/null

Matching grub data in the MBR with the installed grub-pc package

2024-09-09 Thread Andy Smith
y "yes, this MBR has grub v and is set to find its grub.cfg on (hdX)", then I might be able to see some difference in what the MBR of sda wants to do. I'm particularly interested in seeing if the binary grub data in the MBR actually comes from the grub that is installed from the grub-pc package in the OS. Thanks, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting

Re: virt-manager: Cannot recv data: Connection reset by peer

2024-07-02 Thread Ceppo
On Fri, Jun 28, 2024 at 06:13:39PM GMT, Bruno Kleinert wrote: > Am Mittwoch, dem 12.06.2024 um 15:30 + schrieb Ceppo: > FWIW, if you're running firewalld on that machine that seems somewhat > related: > https://libvirt.org/firewall.html#firewalld-and-the-virtual-network-driver > > I'm affected

Telemetry, data hoarding [was: how2 format a flash drive]

2024-07-02 Thread tomas
On Tue, Jul 02, 2024 at 04:09:39AM -0400, Jeffrey Walton wrote: > On Tue, Jul 2, 2024 at 3:53 AM George at Clug wrote: > > > > Is telemetry evil? Are guns evil? Philosophical questions? > > > > I find it objectionable when people gather "telemetry" about "me" and not > > just the causes of the

Re: virt-manager: Cannot recv data: Connection reset by peer

2024-06-28 Thread Bruno Kleinert
rt-manager to manage virtual machine with QEMU/KVM user > session for some months without any issue. From monday the session is "Not > Connected" and when I try to connect I get the following message: > > Unable to connect to libvirt qemu:///session. > > Can

virt-manager: Cannot recv data: Connection reset by peer

2024-06-12 Thread Ceppo
t any issue. From monday the session is "Not Connected" and when I try to connect I get the following message: Unable to connect to libvirt qemu:///session. Cannot recv data: Connection reset by peer and Details: Unable to connect to libvirt qemu:///session.

Re: HDD long-term data storage with ensured integrity

2024-05-04 Thread Marc SCHAEFER
On Fri, May 03, 2024 at 01:50:52PM -0700, David Christensen wrote: > Thank you for devising a benchmark and posting some data. :-) I did not do the comparison hosted on github. I just wrote the script which tests the dm-integrity on dm-raid error detection and error correction. > FreeBS

Re: HDD long-term data storage with ensured integrity

2024-05-03 Thread David Christensen
t4+dm-integrity+dm-raid layered approach. Thank you for devising a benchmark and posting some data. :-) FreeBSD also offers a layered solution. From the top down: * UFS2 file system, which supports snapshots (requires partitions with soft updates enabled). * gpart(8) for partitions (volum

Re: HDD long-term data storage with ensured integrity

2024-05-03 Thread Michael Kjörling
On 3 May 2024 13:26 +0200, from schae...@alphanet.ch (Marc SCHAEFER): > https://github.com/t13a/dm-integrity-benchmarks > > Contenders are btrfs, zfs, and notably ext4+dm-integrity+dm-raid ZFS' selling point is not performance, _especially_ on rotational drives. In fact, it's fairly widely accept

Re: HDD long-term data storage with ensured integrity

2024-05-03 Thread Marc SCHAEFER
type f -print0 | xargs -0 md5sum > $tmp_dir/MD5SUMS) # corrupting some data in one PV count=5000 blocks=$(blockdev --getsz ${pvs[1]}) if [ $blocks -lt 32767 ]; then factor=1 else factor=$(( ($blocks - 1) / 32767)) fi p=1 for i in $(seq 1 $count) do offset=$(($RANDOM * $factor)) e

Re: HDD long-term data storage with ensured integrity

2024-04-12 Thread David Christensen
On 4/12/24 08:14, piorunz wrote: On 10/04/2024 12:10, David Christensen wrote: Those sound like some compelling features. I believe the last time I tried Btrfs was Debian 9 (?).  I ran into problems because I did not do the required manual maintenance (rebalancing).  Does the Btrfs in Debian 1

Re: HDD long-term data storage with ensured integrity

2024-04-12 Thread piorunz
On 10/04/2024 12:10, David Christensen wrote: Those sound like some compelling features. I believe the last time I tried Btrfs was Debian 9 (?).  I ran into problems because I did not do the required manual maintenance (rebalancing).  Does the Btrfs in Debian 11 or Debian 12 still require manua

Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread David Christensen
On 4/10/24 08:49, Paul Leiber wrote: Am 10.04.2024 um 13:10 schrieb David Christensen: Does the Btrfs in Debian 11 or Debian 12 still require manual maintenance?  If so, what and how often? Scrub and balance are actions which have been recommended. I am using btrfsmaintenance scripts [1][2] t

Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread Paul Leiber
Am 10.04.2024 um 13:10 schrieb David Christensen: On 4/9/24 17:08, piorunz wrote: On 02/04/2024 13:53, David Christensen wrote: Does anyone have any comments or suggestions regarding how to use magnetic hard disk drives, commodity x86 computers, and Debian for long-term data storage with

Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread Curt
igrate drives on the fly while partition is live and heavily used, >> replace them with different sizes and types, mixed capacities, change >> Raid levels, change amount of drives too. I could go from single drive >> to Raid10 on 4 drives and back while my data is 100% available a

Re: HDD long-term data storage with ensured integrity

2024-04-10 Thread David Christensen
On 4/9/24 17:08, piorunz wrote: On 02/04/2024 13:53, David Christensen wrote: Does anyone have any comments or suggestions regarding how to use magnetic hard disk drives, commodity x86 computers, and Debian for long-term data storage with ensured integrity? I use Btrfs, on all my systems

Re: HDD long-term data storage with ensured integrity

2024-04-09 Thread piorunz
On 02/04/2024 13:53, David Christensen wrote: Does anyone have any comments or suggestions regarding how to use magnetic hard disk drives, commodity x86 computers, and Debian for long-term data storage with ensured integrity? I use Btrfs, on all my systems, including some servers, with soft

Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread David Christensen
On 4/8/24 13:04, Marc SCHAEFER wrote: Hello, On Mon, Apr 08, 2024 at 11:28:04AM -0700, David Christensen wrote: So, an ext4 file system on an LVM logical volume? Why LVM? Are you implementing redundancy (RAID)? Is your data larger than a single disk (concatenation/ JBOD)? Something else

Re: Why LVM (was: HDD long-term data storage with ensured integrity)

2024-04-08 Thread DdB
ub), so i setup a /boot outside, but the problems stayed (due to lvm's limitations). I came to use it to gain some flexibility (although it is an experiment) and found myself setting up zfs for its data integrity + flexibility, just to have a quality backup of the lvm-volume(s) on a zfs pool.

Why LVM (was: HDD long-term data storage with ensured integrity)

2024-04-08 Thread Stefan Monnier
David Christensen [2024-04-08 11:28:04] wrote: > Why LVM? Personally, I've been using LVM everywhere I can (i.e. everywhere except on my OpenWRT router, tho I've also used LVM there back when my router had an HDD. I also use LVM on my 2GB USB rescue image). To me the question is rather the rever

Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread Marc SCHAEFER
Hello, On Mon, Apr 08, 2024 at 11:28:04AM -0700, David Christensen wrote: > So, an ext4 file system on an LVM logical volume? > > Why LVM? Are you implementing redundancy (RAID)? Is your data larger than > a single disk (concatenation/ JBOD)? Something else? For off-site long-

Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread David Christensen
On 4/8/24 02:38, Marc SCHAEFER wrote: For offline storage: On Tue, Apr 02, 2024 at 05:53:15AM -0700, David Christensen wrote: Does anyone have any comments or suggestions regarding how to use magnetic hard disk drives, commodity x86 computers, and Debian for long-term data storage with ensured

Re: HDD long-term data storage with ensured integrity

2024-04-08 Thread Marc SCHAEFER
For offline storage: On Tue, Apr 02, 2024 at 05:53:15AM -0700, David Christensen wrote: > Does anyone have any comments or suggestions regarding how to use magnetic > hard disk drives, commodity x86 computers, and Debian for long-term data > storage with ensured integrity? I use LVM on

Re: HDD long-term data storage with ensured integrity

2024-04-03 Thread Jonathan Dowland
On Tue Apr 2, 2024 at 10:57 PM BST, David Christensen wrote: > AIUI neither LVM nor ext4 have data and metadata checksum and correction > features. But, it should be possible to achieve such by including > dm-integrity (for checksumming) and some form of RAID (for correction) > in

Re: HDD long-term data storage with ensured integrity

2024-04-03 Thread David Christensen
On 4/2/24 14:57, David Christensen wrote: AIUI neither LVM nor ext4 have data and metadata checksum and correction features.  But, it should be possible to achieve such by including dm-integrity (for checksumming) and some form of RAID (for correction) in the storage stack.  I need to explore

Re: HDD long-term data storage with ensured integrity

2024-04-02 Thread David Christensen
On 4/2/24 06:55, Stefan Monnier wrote: The most obvious alternative to ZFS on Debian would be Btrfs. Does anyone have any comments or suggestions regarding Btrfs and data corruption bugs, concurrency, CMM level, PSP, etc.? If you're worried about such things, I'd think "t

Re: HDD long-term data storage with ensured integrity

2024-04-02 Thread Stefan Monnier
> The most obvious alternative to ZFS on Debian would be Btrfs. Does anyone > have any comments or suggestions regarding Btrfs and data corruption bugs, > concurrency, CMM level, PSP, etc.? If you're worried about such things, I'd think "the most obvious alternative"

HDD long-term data storage with ensured integrity

2024-04-02 Thread David Christensen
just let me use an external CD-Drive with the netboot > image. ... all is well. Now you get to solve the same problem I have been stuck on since last November -- how to use those HDD's. ZFS has been my bulk storage solution of choice for the past ~4 years, but the recent data corrupt

Re: Fast Random Data Generation (Was: Re: Unidentified subject!)

2024-02-13 Thread Linux-Fan
found it during the development of another application where I needed a lot of random data for simulation purposes :) My implementation code is here: https://github.com/m7a/bo-big/blob/master/latest/Big4.java See the end of that file to compare with the “Numerical Recipes” RNG linked further

Re: Fast Random Data Generation (Was: Re: Unidentified subject!)

2024-02-12 Thread David Christensen
? (Other than multiple OS processes with one PRNG on one JVM each?) I found it during the development of another application where I needed a lot of random data for simulation purposes :) My implementation code is here: https://github.com/m7a/bo-big/blob/master/latest/Big4.java If I were to do

  1   2   3   4   5   6   7   8   9   10   >