** Changed in: ubuntu-power-systems Status: Triaged => Incomplete
-- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1758273 Title: DD2.2 freezes/hangs after 20mins of uptime Status in The Ubuntu-power-systems project: Incomplete Status in systemd package in Ubuntu: Incomplete Bug description: == Comment: #0 - Application Cdeadmin - 2018-03-19 09:30:53 == == Comment: #1 - Application Cdeadmin <> - 2018-03-19 09:30:55 == == Comment: #2 - Application Cdeadmin <> - 2018-03-19 09:30:57 == ------- Comment From brihh 2018-03-19 09:10:30 EDT ------- Needless to say, machine is pretty unusable. Needed for Performance testing for release. == Comment: #3 - Application Cdeadmin <> - 2018-03-19 10:10:59 == ------- Comment From vaibhav92 2018-03-19 10:06:17 EDT ------- @pridhiviraj Any idea who can look into this ? ------- Comment From dougmill-ibm 2018-03-19 10:09:59 EDT ------- If machine is locked-up and console is unresponsive, try collecting eSEL data from the BMC. == Comment: #4 - Application Cdeadmin <c> - 2018-03-19 10:40:59 == ------- Comment From pridhiviraj 2018-03-19 10:38:47 EDT ------- @brihh Can you use latest 03/15 PNOR and re-create the issue. And also before it hangs please collect OPAL and kernel logs. @vaibhav92 If it is re-creatable OPAL/EM team need to look at it. == Comment: #5 - Application Cdeadmin <> - 2018-03-19 11:01:58 == ------- Comment From vaibhav92 2018-03-19 10:41:53 EDT ------- Saw this in the kernel-log of the system: [ 1247.404962] PM: suspend entry (s2idle) [ 1247.404970] PM: Syncing filesystems ... done. Looks like its getting suspended after 20 mins of inactivity. Looking at the /etc/systemd/logind.conf see IdleAction by default is 'ignore': #IdleAction=ignore #IdleActionSec=30min But clearly someone is issuing a suspend to the system. So this probably need to be looked by the distro/Power-Management/EM team ------- Comment From megcurry 2018-03-19 10:43:14 EDT ------- Please advise re. Assignments and Labels that would get the right team working on this.....Mirror label gets a Bz opened, and that is necessary or at least useful for some of the LTC teams to look at things, right? ------- Comment From brihh 2018-03-19 10:47:45 EDT ------- odd about inactivity - i once had a test running for 20mins and it still froze: `08:52:23 up 20 min, 4 users, load average: 64.40, 98.94, 64.32` ------- Comment From brihh 2018-03-19 10:49:30 EDT ------- @pridhiviraj where is the latest 3/15 PNOR that I can load? ``` == Comment: #8 - Application Cdeadmin <> - 2018-03-19 12:40:54 == ------- Comment From pridhiviraj 2018-03-19 12:34:18 EDT ------- ``` Mar 19 11:35:01 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry is Mon Mar 19 11:35:31 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Mar 19 11:35:01 p215n15 CRON[6464]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Mar 19 11:42:39 p215n15 systemd[1]: Starting Cleanup of Temporary Directories... Mar 19 11:42:39 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry is Mon Mar 19 11:43:39 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Mar 19 11:42:40 p215n15 systemd-tmpfiles[6504]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring. Mar 19 11:42:40 p215n15 systemd[1]: Started Cleanup of Temporary Directories. Mar 19 11:45:01 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry is Mon Mar 19 11:46:01 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Mar 19 11:45:01 p215n15 CRON[6524]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) Mar 19 11:47:39 p215n15 systemd[1]: Starting Message of the Day... Mar 19 11:47:39 p215n15 rsyslogd-2007: action 'action 13' suspended, next retry is Mon Mar 19 11:48:39 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Mar 19 11:47:40 p215n15 50-motd-news[6528]: * Meltdown, Spectre and Ubuntu: What are the attack vectors, Mar 19 11:47:40 p215n15 50-motd-news[6528]: how the fixes work, and everything else you need to know Mar 19 11:47:40 p215n15 50-motd-news[6528]: - https://ubu.one/u2Know Mar 19 11:47:40 p215n15 systemd[1]: Started Message of the Day. Mar 19 11:48:02 p215n15 NetworkManager[4662]: <info> [1521474482.7252] manager: sleep: sleep requested (sleeping: no enabled: yes) Mar 19 11:48:02 p215n15 NetworkManager[4662]: <info> [1521474482.7259] manager: NetworkManager state is now ASLEEP Mar 19 11:48:02 p215n15 gnome-shell[5538]: Screen lock is locked down, not locking Mar 19 11:48:02 p215n15 whoopsie[5264]: [11:48:02] offline Mar 19 11:48:02 p215n15 systemd[1]: Reached target Sleep. Mar 19 11:48:02 p215n15 systemd[1]: Starting Suspend... Mar 19 11:48:02 p215n15 systemd-sleep[6576]: Suspending system... Mar 19 11:48:02 p215n15 kernel: [ 1246.906320] PM: suspend entry (s2idle) ``` @vaibhav92 You are right, from the above messages looks like system is suspending by some cron job i guess. == Comment: #9 - Application Cdeadmin <> - 2018-03-19 13:30:57 == ------- Comment From brihh 2018-03-19 13:28:36 EDT ------- @pridhiviraj interesting. know offhand how i can shut that off? seems a bit annoying .. :-) == Comment: #11 - Application Cdeadmin <> - 2018-03-19 17:10:51 == ------- Comment From mzipse 2018-03-19 16:52:11 EDT ------- At our daily defect call, it was suggested that we check to see if Opal-PRD is running, which is a prereq for the Firmware recovery to work properly. Opal-PRD is an app that should be part of the distro and should automatically be started. Never-the-less, if you are want to check on it, here's the command to run at the OS..... sudo service opal-prd status You'll need root authority to do this. The output will look something like this..... # sudo service opal-prd status ? opal-prd.service - OPAL PRD daemon Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2018-03-14 19:04:36 CDT; 4 days ago Docs: man:opal-prd(8) Main PID: 5085 (opal-prd) Tasks: 1 CGroup: /system.slice/opal-prd.service ??5085 /usr/sbin/opal-prd --pnor /dev/mtd0 Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00d0] 1d000000 bd021c00 0000bf02 1c000000 *................* Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00e0] c1021b00 0000c302 1d000000 00030000 *................* Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x00f0] 0000ff06 21465245 51000206 16000000 *....!FREQ.......* Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0100] 5d08f600 00006008 f6000000 6308f600 *].....`.....c...* Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0110] 00006608 f6000000 6908f600 00006c08 *..f.....i.....l.* Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0120] f6000000 6f08f600 00007208 *....o.....r. * Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:OCC1 rsp status=0x00, length=0x01CC Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:rsp data: (up to 16 bytes) Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:~[0x0000] 03100200 03000000 00000000 00000000 *................* Mar 18 19:52:21 ws003p1 opal-prd[5085]: HBRT: HTMGT:<<processOccError() == Comment: #13 - Application Cdeadmin <> - 2018-03-19 17:30:51 == ------- Comment From brihh 2018-03-19 17:22:06 EDT ------- @mzipse seems to be there and running: ``` root@p215n15:~# service opal-prd status ? opal-prd.service - OPAL PRD daemon Loaded: loaded (/lib/systemd/system/opal-prd.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2018-03-19 18:19:58 EDT; 1min 35s ago Docs: man:opal-prd(8) Main PID: 4648 (opal-prd) Tasks: 1 (limit: 27033) CGroup: /system.slice/opal-prd.service ??4648 /usr/sbin/opal-prd Mar 19 18:19:58 p215n15 opal-prd[4648]: SCOM: read: chip 0x8, addr 0x8010a50, val 0x0, rc 0 Mar 19 18:19:58 p215n15 opal-prd[4648]: SCOM: read: chip 0x8, addr 0x8010a54, val 0x0, rc 0 Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: PRDF:<<PRDF::noLock_initialize() Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: PRDF:<<PRDF::initialize() Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: ATTN_SLOW:I>Service::enableAttns() enter Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: ATTN_SLOW:I>Service::enableAttns() exit Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: ATTN_SLOW:I><<ATTN_RT::enableAttns rc: 0 Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: calling get_ipoll_events Mar 19 18:19:58 p215n15 opal-prd[4648]: HBRT: enabling IPOLL events 0x5b90000000000000 Mar 19 18:19:58 p215n15 opal-prd[4648]: FW: writing init message ``` == Comment: #14 - Application Cdeadmin <> - 2018-03-20 03:50:53 == ------- Comment From vaibhav92 2018-03-20 03:47:08 EDT ------- Did some kernel tracing and it seems that someone is starting the systemd-suspend.service that invokes systemd-sleep which then forces a suspend by writing to /sys/power/state file. Further investigation is needed as to who is invoking the systemd-suspend.service. In the meantime you can issue command on the host that will disable the service and prevent system from getting suspended: `systemctl mask systemd-suspend.service` Actually, it is also possible that the presence of the suspend config indicates user error. If Ubuntu Server was installed, I don't think suspend should have been configured. Did someone install "desktop Ubuntu" (or some other non-server config) on a server? Also, 18.04 should be used, not 17.10 (and never 17.04), on P9. == Comment: #25 - Application Cdeadmin <> - 2018-03-23 00:20:53 == ------- Comment From vaibhav92 2018-03-23 00:19:40 EDT ------- Hi @mzipse @stewart-ibm. AFAIK this issue is not related to CAPP or CAPI at all. @brian_horton had confirmed that he was seeing this even without enabling/running any CAPI workloads. Disabling the systemd-suspend.service made the problem go away. Hence asked the bug to be mirrored to canonical. @mzipse not sure what CAPP errors PRD team saw. Can you please ask them to get in touch with me. == Comment: #26 - Vaibhav Jain <> - 2018-03-23 00:39:45 == Summary of the issue: Ubuntu 18.04 is forcing a system suspend after 20 mins of system boot. Suspend is forced even if system is running a workload or user is logged on to the terminal and performing any activity. The issue goes away if systemd-suspend service is disabled via: "systemctl mask systemd-suspend.service" So requesting canonical to look into this issue as a possible bug in systemd or user inactivity monitor. ~ Vaibhav To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1758273/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp