I apologize for this being so long, but since the problem occurs
sporadically I wanted to get as much information in this post as
possible because I don't know when it will happen again.
This problem started a bout two weeks ago. I woke up to find a black
screen and a kernel panic. I rebooted and was presented with many fsck
errors that could not be handled automatically so I ran it manually, as
directed. I took all the defaults. Any time that I was shown a file
name it seemed to be a flash file in my daughters /home directory or
otherwise related to flash. Afterwards, the only partition that I found
anything in lost+found was /home and all of the files there were,
indeed, showing my daughter as owner. I shutdown and rebooted to get
everything clean and it seemed good for a while. Since then, however,
every day or two things just stop working properly. Menus cease to do
anything, pages don't load in the browser, etc. If I exit from X and
work at a console, some commands (like ls) seem to work fine, others do
not, giving me I/O error messages. I can't even do a typescript, or
redirect the output to a file that I could attach here, since I just get
errors. I can't even do a ctl-alt-del to reboot, as I get an error saying:
INIT: cannot execute "/sbin/shutdown"
I have no choice but to power down with the power button, which I really
don't like to do.
It happened again, today, and I manually copied down the errors so I
hope that I got it all correct. This is what I did before shutting down:
marc@quixote:~$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs
(rw,relatime,size=10240k,nr_inodes=3081484,mode=755)
devpts on /dev/pts type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=2472496k,mode=755)
/dev/sda2 on / type ext3 (ro,relatime,errors=remount-ro,data=ordered)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
pstore on /sys/fs/pstore type pstore (rw,relatime)
tmpfs on /run/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=6622700k)
/dev/mapper/vg1-home on /home type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-tmp--jessie on /tmp type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-usr--jessie on /usr type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-usrlocal on /usr/local type ext3 (ro,relatime,data=ordered)
/dev/mapper/vg1-photos on /usr/local/photos type ext3
(rw,relatime,data=ordered)
/dev/mapper/vg1-vDisks on /usr/local/vdisks type ext3
(rw,relatime,data=ordered)
/dev/mapper/vg1-var--jessie on /var type ext3 (ro,relatime,data=ordered)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
(rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k)
cgmfs on /run/cgmanager/fs type tmpfs (rw,relatime,size=100k,mode=755)
systemd on /sys/fs/cgroup/systemd type cgroup
(rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/x86_64-linux-gnu/systemd-shim-cgroup-release-agent,name=systemd)
tmpfs on /run/user/1000 type tmpfs
(rw,nosuid,nodev,relatime,size=2472496k,mode=700,uid=1000,gid=1000)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse
(rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
Note that almost all real filesystems are readonly.
I logged out and back in as root. From /root I attempted to copy a text
file to /usr/local/photos (which still shows as rw):
cp wheezy1.script /usr/local/photos
[] sd: 0:0:0:0: [sda] Unhandled error code
[] sd: 0:0:0:0: [sda]
[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[] sd: 0:0:0:0: [sda] CDB:
[] Read(10): 28 00 00 3e bc 68 00 00 08 00
[] end_request: I/O error, dev sda, sector 4111464
[] sd: 0:0:0:0: [sda] Unhandled error code
[] sd: 0:0:0:0: [sda]
[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[] sd: 0:0:0:0: [sda] CDB:
[] Read(10): 28 00 00 3e bc 68 00 00 08 00
[] end_request: I/O error, dev sda, sector 4111464
-bash /bin/cp: Input/output error
NOTE: all the empty brackets on the left actually had timestamps in
them. The same is true in all following cases, as well.
I then changed directory to /usr/local/photos and tried to create a new
file with touch:
touch tempfile
[] Write(10): 2a 00 08 56 9e 0c 00 00 08 00
[] sd: 0:0:0:0: [sda] Unhandled error code
[] sd: 0:0:0:0: [sda]
[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[] sd: 0:0:0:0: [sda] CDB:
[] Read(10): 28 00 08 56 05 1c 00 00 08 00
[] sd: 0:0:0:0: [sda] Unhandled error code
[] sd: 0:0:0:0: [sda]
[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[] sd: 0:0:0:0: [sda] CDB:
Finally, I tried to unmount /home with the intention of remounting it to
see if it would come back as rw:
umount /home
[] sd: 0:0:0:0: [sda] Unhandled error code
[] sd: 0:0:0:0: [sda]
[] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[] sd: 0:0:0:0: [sda] CDB:
[] Write(10) 2a 00 00 75 5e 0c 00 00 08 00
[] end_request: I/O error, dev sda, sector 209018380
[] Buffer I/O errpr on device dm-6, logical block 0
[] lost page write due to I/O error on dm-6
[] EXT4-fs error (device dm-6): ext4_put_super: 795: Couldn't clean up
the journal
NOTE: NONE of my filesystems are EXT4. They are ALL EXT3.
Due to the errors I did not try to remount /home.
Then I shutdown with the power button and rebooted. Everything is
working, now. All the filesystems that should be rw are rw, but within
a day or two this will almost certainly happen again.
If anyone can give me a clue how to correct this I would be most
grateful. If further info is necessary it will probably have to wait
until this happens again.
BTW: I have plenty of space available in the LV, so I could create a new
partition for /home and copy everything from the current partition while
it is not giving me errors, if that is likely to fix the problem, but I
would still like to know just went wrong and how to prevent it in the
future.
Marc