Hi!
I want to keep you updated: The problem isn't fixed, still, so I
I'm running this simple script via cron to avoid uncontrolled kernel panic:
---snip---
#!/usr/bin/sh
# Detect RAM corruption. If detected log a message and reboot
# to prevent kernel panic
#cron jobs need a PATH
PATH=/sbin:/usr/sbin:/usr/bin:/bin
if journalctl -b -g 'Code: Bad RIP value|BUG: Bad rss-counter state mm:'
>/dev/null
then
MSG='RAM corruption detected, starting pro-active reboot'
logger -t reboot-before-panic -p local0.notice "$MSG"
shutdown -r +1 "$MSG"
fi
---
Still I suspect it might be related to snapshots being made. After a few days
of running the problems started again like this:
Mar 26 23:00:01 h19 systemd[1]: Started Timeline of Snapper Snapshots.
Mar 26 23:00:01 h19 dbus-daemon[5700]: [system] Activating via systemd: service
name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.343'
(uid=0 pid=11200 comm="/usr/lib/snapper/systemd-helper --timeline ")
Mar 26 23:00:01 h19 systemd[1]: Starting DBus interface for snapper...
Mar 26 23:00:01 h19 dbus-daemon[5700]: [system] Successfully activated service
'org.opensuse.Snapper'
Mar 26 23:00:01 h19 systemd[1]: Started DBus interface for snapper.
Mar 26 23:00:01 h19 systemd[1]: snapper-timeline.service: Succeeded.
Mar 26 23:00:01 h19 systemd[1]: Created slice Slice /system/systemd-coredump.
Mar 26 23:00:01 h19 systemd[1]: Started Process Core Dump (PID 11227/UID 0).
Mar 26 23:00:01 h19 systemd-coredump[11231]: Process 11226 (run-crons) of user
0 dumped core.
Stack trace of thread 11226:
#0 0x00007f89ff9dacdb raise
(libc.so.6 + 0x4acdb)
#1 0x00007f89ff9dc324 abort
(libc.so.6 + 0x4c324)
#2 0x00007f89ffa20b07
__libc_message (libc.so.6 + 0x90b07)
#3 0x00007f89ffa28b8a
malloc_printerr (libc.so.6 + 0x98b8a)
#4 0x00007f89ffa2a634
_int_free (libc.so.6 + 0x9a634)
#5 0x000055c998de3963
command_substitute (bash + 0x9f963)
#6 0x000055c998ddb380 n/a
(bash + 0x97380)
#7 0x000055c998ddda57 n/a
(bash + 0x99a57)
#8 0x000055c998ddcb94 n/a
(bash + 0x98b94)
#9 0x000055c998dc8955 n/a
(bash + 0x84955)
#10 0x000055c998dc756d
execute_command_internal (bash + 0x8356d)
#11 0x000055c998dc86e1
execute_command (bash + 0x846e1)
#12 0x000055c998dc76fd
execute_command_internal (bash + 0x836fd)
#13 0x000055c998dc86e1
execute_command (bash + 0x846e1)
#14 0x000055c998dc8516
execute_command_internal (bash + 0x84516)
#15 0x000055c998dc773c
execute_command_internal (bash + 0x8373c)
#16 0x000055c998dc86e1
execute_command (bash + 0x846e1)
#17 0x000055c998dc8007
execute_command_internal (bash + 0x84007)
#18 0x000055c998dc86e1
execute_command (bash + 0x846e1)
#19 0x000055c998dbce2b
reader_loop (bash + 0x78e2b)
#20 0x000055c998dbcabc main
(bash + 0x78abc)
#21 0x00007f89ff9c52bd
__libc_start_main (libc.so.6 + 0x352bd)
#22 0x000055c998df729a _start
(bash + 0xb329a)
Mar 26 23:00:01 h19 systemd[1]: [email protected]: Succeeded.
Mar 26 23:00:01 h19 kernel: BUG: Bad rss-counter state mm:00000000acc74328
idx:1 val:14
Mar 26 23:01:01 h19 systemd[1]: snapperd.service: Succeeded.
Mar 26 23:05:01 h19 reboot-before-panic[12356]: RAM corruption detected,
starting pro-active reboot
Regards,
Ulrich
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/