Re: filesystem damage

David Christensen Sun, 02 Mar 2025 11:36:59 -0800

On 3/2/25 07:49, Eben King wrote:

I backed up my system on Saturday (yesterday), and pulled a stupid.
I'll explain.



Normally I hibernate, and while it's hibernated, boot off a thumb drive
and back up (either by partition or the whole drive) to a dedicated
drive.  The idea is if my main drive takes a dump, I could replace it
with the backup drive, boot, and be on my merry way.  After the backup
has completed, I resume and it's right where I left off.


So.  This time, while the backup was in process, I mounted /home
read-only to check something out.  Apparently that's not good enough to
keep the filesystem intact, because at the end when I resumed, several
things in $HOME didn't work right.  E.G., some widgets in the panel were
misconfigured, T-bird had lost its configuration, and Firefox plugins
didn't work.  I fixed the panel widgets, T-bird appears to be *mostly*
fixed (we'll see if this gets sent as text), but the Firefox plugins are
still a mess.  The Noscript icon shows up in the row with the hamburger
menu, but doesn't show up in Tools → Addons & Themes.  Adblocker for
Youtube, I'm not sure it's doing anything because I still see ads before
a video and I didn't before.  So what can I do to fix this, while still
keeping my history, cookies, tabs, etc?



On 3/2/25 09:17, Michael Stone wrote:
> Even mounted read-only, the driver will replay the journal and resolve
> any outstanding actions--but the hibernated system doesn't know that,
> and will proceed without taking any changes into account. You could have
> mounted with "-o ro,norecovery" which will prevent the journal replay
> and make the mount truly read-only. Your best bet at this point is to
> force an fsck to at least ensure that the filesystem is consistent, but
> if there was any data corruption that won't uncorrupt it. To be certain
> that all the data is ok you'll have to restore to the last backup made
> before this happened.


On 3/2/25 09:51, Eben King wrote:
>
> On 3/2/25 12:03, Charles Curley wrote:
>> On Sun, 2 Mar 2025 10:49:41 -0500
>> Eben King <e...@gmx.us> wrote:
>> ...
>> How did you do the backup? Per file (e.g. rsnapshot or amanda), or per
>> block device (e.g. dd if=/dev/sda1 of=…)?
>
>
> Block device.  If I've resize / moved a partition since the last backup
> I'll do a full (dd if=/dev/sda of=/dev/sdc), if not I'll do it by
> partition (dd if=/dev/sdaX of=/dev/sdcX).  Since only maybe 70% of the
> drive is in partitions, it's faster that way.
>
>
> I just shut down and booted off a thumb drive, and ran "fsck -f" on each
> ext4 partition.  Most had no errors, but a few had "this inode is too
> wide" errors.  There may have been one or two others.  Anyhow, I'll
> check again in a few days and see if those (or other) errors recur.
>
>
>> If you know the relevant files and have a per file backup, restoring
>> them should be a matter of selecting the correct file.
>
>
> The most obnoxious errors are definitely under ~/.mozilla/firefox and
> probably ~/.mozilla/firefox/<profile>/extension* .
>
>> I am assuming that the corruption is all in /home and that it is
>> included in your backups. If either one of those is false, you may be
>> in deep yogurt.
>
>
> Yeah, I don't know how much I trust the backup of /home right now.

I am glad I do not do hibernation -- I power down before taking imagesof the raw device.

At this point, I am uncertain if the /home ext4 file systems are correcton either the OS disc or the copied image disc (?). I would startthinking about doing a backup/ wipe/ fresh install/ restore procedure.

The "norecovery" option for mount(8) seems like a dangerous designchoice. "readonly" is supposed to mean "do not write to disk". I mustremember that land mine if and when I want to do forensic work.



> eben@cerberus:/$ sudo smartctl -a /dev/sda
> smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-31-amd64] (local build)

>
> === START OF INFORMATION SECTION ===
> Model Family:     Seagate BarraCuda 3.5 (SMR)
> Device Model:     ST2000DM008-2UB102

AIUI SMR does not work well for OS (e.g. /tmp, swap) and general-purpose(e.g. /home) disks that see frequent small random write workloads. Iprefer small high-quality 2.5" SSD's (Intel SSD 520 Series 60 GB) for myOS and /home disks, and put my bulk data on a file server. I wouldre-purpose that HDD for images -- CMR should be okay for largesequential write workloads.



> ...
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED


Good.


> ...
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
>    1 Raw_Read_Error_Rate     0x000f   082   064   006    Pre-fail
> Always       -       146369262
>    3 Spin_Up_Time            0x0003   095   095   000    Pre-fail
> Always       -       0
>    4 Start_Stop_Count        0x0032   100   100   020    Old_age
> Always       -       723
>    5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail
> Always       -       0
>    7 Seek_Error_Rate         0x000f   084   060   045    Pre-fail
> Always       -       232382570
>    9 Power_On_Hours          0x0032   093   093   000    Old_age
> Always       -       6346h+20m+46.297s
>   10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail
> Always       -       0
>   12 Power_Cycle_Count       0x0032   100   100   020    Old_age
> Always       -       541
> 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
> Always       -       0
> 184 End-to-End_Error        0x0032   100   100   099    Old_age
> Always       -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
> Always       -       0
> 188 Command_Timeout         0x0032   100   100   000    Old_age
> Always       -       0 0 0
> 189 High_Fly_Writes         0x003a   100   100   000    Old_age
> Always       -       0
> 190 Airflow_Temperature_Cel 0x0022   060   053   040    Old_age
> Always       -       40 (Min/Max 27/40)
> 191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age
> Always       -       0
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
> Always       -       155
> 193 Load_Cycle_Count        0x0032   100   100   000    Old_age
> Always       -       1040
> 194 Temperature_Celsius     0x0022   040   047   000    Old_age
> Always       -       40 (0 25 0 0 0)
> 195 Hardware_ECC_Recovered  0x001a   082   064   000    Old_age
> Always       -       146369262
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
> Always       -       0
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
> Always       -       0
> 240 Head_Flying_Hours       0x0000   100   253   000    Old_age
> Offline      -       6320h+12m+26.797s
> 241 Total_LBAs_Written      0x0000   100   253   000    Old_age
> Offline      -       18657342320
> 242 Total_LBAs_Read         0x0000   100   253   000    Old_age
> Offline      -       92379620242

Those statistics look acceptable for a used desktop HDD. All of themost worrisome statistics are 100%:


Reallocated_Sector_Ct
Reported_Uncorrect
Current_Pending_Sector
Offline_Uncorrectable


> SMART Error Log Version: 1
> No Errors Logged


Good.


> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Are you running tests periodically?


David

Re: filesystem damage

Reply via email to