On 3/2/25 07:49, Eben King wrote:
I backed up my system on Saturday (yesterday), and pulled a stupid.
I'll explain.
Normally I hibernate, and while it's hibernated, boot off a thumb drive
and back up (either by partition or the whole drive) to a dedicated
drive. The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way. After the backup
has completed, I resume and it's right where I left off.
So. This time, while the backup was in process, I mounted /home
read-only to check something out. Apparently that's not good enough to
keep the filesystem intact, because at the end when I resumed, several
things in $HOME didn't work right. E.G., some widgets in the panel were
misconfigured, T-bird had lost its configuration, and Firefox plugins
didn't work. I fixed the panel widgets, T-bird appears to be *mostly*
fixed (we'll see if this gets sent as text), but the Firefox plugins are
still a mess. The Noscript icon shows up in the row with the hamburger
menu, but doesn't show up in Tools → Addons & Themes. Adblocker for
Youtube, I'm not sure it's doing anything because I still see ads before
a video and I didn't before. So what can I do to fix this, while still
keeping my history, cookies, tabs, etc?
On 3/2/25 09:17, Michael Stone wrote:
> Even mounted read-only, the driver will replay the journal and resolve
> any outstanding actions--but the hibernated system doesn't know that,
> and will proceed without taking any changes into account. You could have
> mounted with "-o ro,norecovery" which will prevent the journal replay
> and make the mount truly read-only. Your best bet at this point is to
> force an fsck to at least ensure that the filesystem is consistent, but
> if there was any data corruption that won't uncorrupt it. To be certain
> that all the data is ok you'll have to restore to the last backup made
> before this happened.
On 3/2/25 09:51, Eben King wrote:
>
> On 3/2/25 12:03, Charles Curley wrote:
>> On Sun, 2 Mar 2025 10:49:41 -0500
>> Eben King <e...@gmx.us> wrote:
>> ...
>> How did you do the backup? Per file (e.g. rsnapshot or amanda), or per
>> block device (e.g. dd if=/dev/sda1 of=…)?
>
>
> Block device. If I've resize / moved a partition since the last backup
> I'll do a full (dd if=/dev/sda of=/dev/sdc), if not I'll do it by
> partition (dd if=/dev/sdaX of=/dev/sdcX). Since only maybe 70% of the
> drive is in partitions, it's faster that way.
>
>
> I just shut down and booted off a thumb drive, and ran "fsck -f" on each
> ext4 partition. Most had no errors, but a few had "this inode is too
> wide" errors. There may have been one or two others. Anyhow, I'll
> check again in a few days and see if those (or other) errors recur.
>
>
>> If you know the relevant files and have a per file backup, restoring
>> them should be a matter of selecting the correct file.
>
>
> The most obnoxious errors are definitely under ~/.mozilla/firefox and
> probably ~/.mozilla/firefox/<profile>/extension* .
>
>> I am assuming that the corruption is all in /home and that it is
>> included in your backups. If either one of those is false, you may be
>> in deep yogurt.
>
>
> Yeah, I don't know how much I trust the backup of /home right now.
I am glad I do not do hibernation -- I power down before taking images
of the raw device.
At this point, I am uncertain if the /home ext4 file systems are correct
on either the OS disc or the copied image disc (?). I would start
thinking about doing a backup/ wipe/ fresh install/ restore procedure.
The "norecovery" option for mount(8) seems like a dangerous design
choice. "readonly" is supposed to mean "do not write to disk". I must
remember that land mine if and when I want to do forensic work.
> eben@cerberus:/$ sudo smartctl -a /dev/sda
> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-31-amd64] (local build)
> Copyright (C) 2002-22, Bruce Allen, Christian Franke,
www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Family: Seagate BarraCuda 3.5 (SMR)
> Device Model: ST2000DM008-2UB102
AIUI SMR does not work well for OS (e.g. /tmp, swap) and general-purpose
(e.g. /home) disks that see frequent small random write workloads. I
prefer small high-quality 2.5" SSD's (Intel SSD 520 Series 60 GB) for my
OS and /home disks, and put my bulk data on a file server. I would
re-purpose that HDD for images -- CMR should be okay for large
sequential write workloads.
> ...
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
Good.
> ...
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail
> Always - 146369262
> 3 Spin_Up_Time 0x0003 095 095 000 Pre-fail
> Always - 0
> 4 Start_Stop_Count 0x0032 100 100 020 Old_age
> Always - 723
> 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x000f 084 060 045 Pre-fail
> Always - 232382570
> 9 Power_On_Hours 0x0032 093 093 000 Old_age
> Always - 6346h+20m+46.297s
> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age
> Always - 541
> 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age
> Always - 0
> 184 End-to-End_Error 0x0032 100 100 099 Old_age
> Always - 0
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
> Always - 0
> 188 Command_Timeout 0x0032 100 100 000 Old_age
> Always - 0 0 0
> 189 High_Fly_Writes 0x003a 100 100 000 Old_age
> Always - 0
> 190 Airflow_Temperature_Cel 0x0022 060 053 040 Old_age
> Always - 40 (Min/Max 27/40)
> 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age
> Always - 0
> 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
> Always - 155
> 193 Load_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 1040
> 194 Temperature_Celsius 0x0022 040 047 000 Old_age
> Always - 40 (0 25 0 0 0)
> 195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age
> Always - 146369262
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
> Always - 0
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
> Always - 0
> 240 Head_Flying_Hours 0x0000 100 253 000 Old_age
> Offline - 6320h+12m+26.797s
> 241 Total_LBAs_Written 0x0000 100 253 000 Old_age
> Offline - 18657342320
> 242 Total_LBAs_Read 0x0000 100 253 000 Old_age
> Offline - 92379620242
Those statistics look acceptable for a used desktop HDD. All of the
most worrisome statistics are 100%:
Reallocated_Sector_Ct
Reported_Uncorrect
Current_Pending_Sector
Offline_Uncorrectable
> SMART Error Log Version: 1
> No Errors Logged
Good.
> SMART Self-test log structure revision number 1
> No self-tests have been logged. [To run self-tests, use: smartctl -t]
Are you running tests periodically?
David