On Jan 09 16:40:48, [email protected] wrote:
> I'm running OpenBSD on a Protectli box as a router/firewall. The disk is an
> SSD. Every now and then I reboot it ("sudo shutdown -r now") just to make
> sure it comes back up. Several times it hung on disk errors that the auto
> 'fsck' can't fix. I was able to manually run 'fsck' and answer its prompts
> to clean up the problems, which sometimes were unreferenced inodes or
> similar things. It deleted some files in /var. The system runs OK, so
> perhaps the files aren't used in my minimal setup.
>
> I have two questions:
>
> (1) In "/etc/rc" I changed [fsck -p "$@"] to [fsck -f "$@"] in an attempt to
> get it to force fix problems, so the system could recover without someone
> manually doing it. That didn't work (it still stopped startup with the disk
> errors), so I tried making it [do_fsck -f -y] but that didn't work either.
> How does one make the system recover (e.g., how would an unstaffed/dark
> computer operations center do it)?
>
> (2) Why would the system develop disk problems? Might the SSD be failing?
Of course.
> Should I proactively replace it?
There's hardly anything proactive about it,
it it's showing unrecoverable fsck errors already.
> If I do replace it, should I start fresh
> with a clean install versus cloning the current disk?
Definitely a clean install on another disk.
> By the way, the SSD is a Samsung SSD 870 EVO 500GB (only using a tiny bit of
> it). Micromat's Lifespan says it has 100% life left, and their Tech Tools
> Pro found no bad blocks.
Boot from the new clean install
and read the entire old disk with dd if=/dev/sdXc
Jan