Re: ZFS...

Alan Somers Tue, 30 Apr 2019 07:15:37 -0700

On Tue, Apr 30, 2019 at 8:05 AM Michelle Sullivan <[email protected]> wrote:
>
>
>
> Michelle Sullivan
> http://www.mhix.org/
> Sent from my iPad
>
> > On 01 May 2019, at 00:01, Alan Somers <[email protected]> wrote:
> >
> >> On Tue, Apr 30, 2019 at 7:30 AM Michelle Sullivan <[email protected]> 
> >> wrote:
> >>
> >> Karl Denninger wrote:
> >>> On 4/30/2019 05:14, Michelle Sullivan wrote:
> >>>>>> On 30 Apr 2019, at 19:50, Xin LI <[email protected]> wrote:
> >>>>>> On Tue, Apr 30, 2019 at 5:08 PM Michelle Sullivan <[email protected]> 
> >>>>>> wrote:
> >>>>>> but in my recent experience 2 issues colliding at the same time 
> >>>>>> results in disaster
> >>>>> Do we know exactly what kind of corruption happen to your pool?  If you 
> >>>>> see it twice in a row, it might suggest a software bug that should be 
> >>>>> investigated.
> >>>>>
> >>>>> All I know is it’s a checksum error on a meta slab (122) and from what 
> >>>>> I can gather it’s the spacemap that is corrupt... but I am no expert.  
> >>>>> I don’t believe it’s a software fault as such, because this was cause 
> >>>>> by a hard outage (damaged UPSes) whilst resilvering a single (but 
> >>>>> completely failed) drive.  ...and after the first outage a second 
> >>>>> occurred (same as the first but more damaging to the power hardware)... 
> >>>>> the host itself was not damaged nor were the drives or controller.
> >>> .....
> >>>>> Note that ZFS stores multiple copies of its essential metadata, and in 
> >>>>> my experience with my old, consumer grade crappy hardware (non-ECC RAM, 
> >>>>> with several faulty, single hard drive pool: bad enough to crash almost 
> >>>>> monthly and damages my data from time to time),
> >>>> This was a top end consumer grade mb with non ecc ram that had been 
> >>>> running for 8+ years without fault (except for hard drive platter 
> >>>> failures.). Uptime would have been years if it wasn’t for patching.
> >>> Yuck.
> >>>
> >>> I'm sorry, but that may well be what nailed you.
> >>>
> >>> ECC is not just about the random cosmic ray.  It also saves your bacon
> >>> when there are power glitches.
> >>
> >> No. Sorry no.  If the data is only half to disk, ECC isn't going to save
> >> you at all... it's all about power on the drives to complete the write.
> >
> > ECC RAM isn't about saving the last few seconds' worth of data from
> > before a power crash.  It's about not corrupting the data that gets
> > written long before a crash.  If you have non-ECC RAM, then a cosmic
> > ray/alpha ray/row hammer attack/bad luck can corrupt data after it's
> > been checksummed but before it gets DMAed to disk.  Then disk will
> > contain corrupt data and you won't know it until you try to read it
> > back.
>
> I know this... unless I misread Karl’s message he implied the ECC would have 
> saved the corruption in the crash... which is patently false... I think 
> you’ll agree..


I don't think that's what Karl meant.  I think he meant that the
non-ECC RAM could've caused latent corruption that was only detected
when the crash forced a reboot and resilver.

>
> Michelle
>
>
> >
> > -Alan
> >
> >>>
> >>> Unfortunately however there is also cache memory on most modern hard
> >>> drives, most of the time (unless you explicitly shut it off) it's on for
> >>> write caching, and it'll nail you too.  Oh, and it's never, in my
> >>> experience, ECC.
> >
> > Fortunately, ZFS never sends non-checksummed data to the hard drive.
> > So an error in the hard drive's cache ram will usually get detected by
> > the ZFS checksum.
> >
> >>
> >> No comment on that - you're right in the first part, I can't comment if
> >> there are drives with ECC.
> >>
> >>>
> >>> In addition, however, and this is something I learned a LONG time ago
> >>> (think Z-80 processors!) is that as in so many very important things
> >>> "two is one and one is none."
> >>>
> >>> In other words without a backup you WILL lose data eventually, and it
> >>> WILL be important.
> >>>
> >>> Raidz2 is very nice, but as the name implies it you have two
> >>> redundancies.  If you take three errors, or if, God forbid, you *write*
> >>> a block that has a bad checksum in it because it got scrambled while in
> >>> RAM, you're dead if that happens in the wrong place.
> >>
> >> Or in my case you write part data therefore invalidating the checksum...
> >>>
> >>>> Yeah.. unlike UFS that has to get really really hosed to restore from 
> >>>> backup with nothing recoverable it seems ZFS can get hosed where issues 
> >>>> occur in just the wrong bit... but mostly it is recoverable (and my 
> >>>> experience has been some nasty shit that always ended up being 
> >>>> recoverable.)
> >>>>
> >>>> Michelle
> >>> Oh that is definitely NOT true.... again, from hard experience,
> >>> including (but not limited to) on FreeBSD.
> >>>
> >>> My experience is that ZFS is materially more-resilient but there is no
> >>> such thing as "can never be corrupted by any set of events."
> >>
> >> The latter part is true - and my blog and my current situation is not
> >> limited to or aimed at FreeBSD specifically,  FreeBSD is my experience.
> >> The former part... it has been very resilient, but I think (based on
> >> this certain set of events) it is easily corruptible and I have just
> >> been lucky.  You just have to hit a certain write to activate the issue,
> >> and whilst that write and issue might be very very difficult (read: hit
> >> and miss) to hit in normal every day scenarios it can and will
> >> eventually happen.
> >>
> >>>   Backup
> >>> strategies for moderately large (e.g. many Terabytes) to very large
> >>> (e.g. Petabytes and beyond) get quite complex but they're also very
> >>> necessary.
> >>>
> >> and there in lies the problem.  If you don't have a many 10's of
> >> thousands of dollars backup solutions, you're either:
> >>
> >> 1/ down for a looooong time.
> >> 2/ losing all data and starting again...
> >>
> >> ..and that's the problem... ufs you can recover most (in most
> >> situations) and providing the *data* is there uncorrupted by the fault
> >> you can get it all off with various tools even if it is a complete
> >> mess....  here I am with the data that is apparently ok, but the
> >> metadata is corrupt (and note: as I had stopped writing to the drive
> >> when it started resilvering the data - all of it - should be intact...
> >> even if a mess.)
> >>
> >> Michelle
> >>
> >> --
> >> Michelle Sullivan
> >> http://www.mhix.org/
> >>
> >> _______________________________________________
> >> [email protected] mailing list
> >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> >> To unsubscribe, send any mail to "[email protected]"
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Re: ZFS...

Reply via email to