This bug was fixed in the package linux - 4.8.0-49.52 --------------- linux (4.8.0-49.52) yakkety; urgency=low
* linux: 4.8.0-49.52 -proposed tracker (LP: #1684427) * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself (LP: #1682561) - Drivers: hv: util: move waiting for release to hv_utils_transport itself linux (4.8.0-48.51) yakkety; urgency=low * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034) * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg() (LP: #1681893) - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg() linux (4.8.0-47.50) yakkety; urgency=low * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678) * CVE-2017-6353 - sctp: deny peeloff operation on asocs with threads sleeping on it * CVE-2017-5986 - sctp: avoid BUG_ON on sctp_wait_for_sndbuf * vfat: missing iso8859-1 charset (LP: #1677230) - [Config] NLS_ISO8859_1=y * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527) - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs * Regression: KVM modules should be on main kernel package (LP: #1678099) - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial 4.4.0-63.84~14.04.2 (LP: #1664912) - SAUCE: apparmor: fix link auditing failure due to, uninitialized var * regession tests failing after stackprofile test is run (LP: #1661030) - SAUCE: fix regression with domain change in complain mode * Permission denied and inconsistent behavior in complain mode with 'ip netns list' command (LP: #1648903) - SAUCE: fix regression with domain change in complain mode * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt from a unshared mount namespace (LP: #1656121) - SAUCE: apparmor: null profiles should inherit parent control flags * apparmor refcount leak of profile namespace when removing profiles (LP: #1660849) - SAUCE: apparmor: fix ns ref count link when removing profiles from policy * tor in lxd: apparmor="DENIED" operation="change_onexec" namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined" name="system_tor" (LP: #1648143) - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked namespaces * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840) - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails * apparmor auditing denied access of special apparmor .null fi\ le (LP: #1660836) - SAUCE: apparmor: Don't audit denied access of special apparmor .null file * apparmor label leak when new label is unused (LP: #1660834) - SAUCE: apparmor: fix label leak when new label is unused * apparmor reference count bug in label_merge_insert() (LP: #1660833) - SAUCE: apparmor: fix reference count bug in label_merge_insert() * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996) - SAUCE: apparmor: fix replacement race in reading rawdata * unix domain socket cross permission check failing with nested namespaces (LP: #1660832) - SAUCE: apparmor: fix cross ns perm of unix domain sockets * [Hyper-V][Mellanox] net/mlx4_core: Avoid delays during VF driver device shutdown (LP: #1672785) - Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow" - net/mlx4_core: Avoid delays during VF driver device shutdown * Update ENA driver to 1.1.2 from net-next (LP: #1664312) - net: ena: Remove unnecessary pci_set_drvdata() - net: ena: Fix error return code in ena_device_init() - net: ena: change the return type of ena_set_push_mode() to be void. - net: ena: use setup_timer() and mod_timer() - net/ena: remove ntuple filter support from device feature list - net/ena: fix queues number calculation - net/ena: fix ethtool RSS flow configuration - net/ena: fix RSS default hash configuration - net/ena: fix NULL dereference when removing the driver after device reset failed - net/ena: refactor ena_get_stats64 to be atomic context safe - net/ena: fix potential access to freed memory during device reset - net/ena: use READ_ONCE to access completion descriptors - net/ena: reduce the severity of ena printouts - net/ena: change driver's default timeouts - net/ena: change condition for host attribute configuration - net/ena: update driver version to 1.1.2 * ISST-LTE:pVM:roselp4:ubuntu16.04.2: number of numa_miss and numa_foreign wrong in numastat (LP: #1672953) - mm: fix remote numa hits statistics - mm: get rid of __GFP_OTHER_NODE * Using an NVMe drive causes huge power drain (LP: #1664602) - nvme/scsi: Remove power management support - nvme: Pass pointers, not dma addresses, to nvme_get/set_features() - nvme: introduce struct nvme_request - nvme: Add a quirk mechanism that uses identify_ctrl - nvme: Enable autonomous power state transitions * POWER9: Additional patches for TTY and CPU_IDLE (LP: #1674325) - tty: Fix ldisc crash on reopened tty - SAUCE: powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration * Ubuntu 16.10: Network checksum fixes needed for IPoIB for Mellanox CX4/CX5 card (LP: #1670247) - Revert "powerpc: port 64 bits pgtable_cache to 32 bits" - powerpc/Makefile: Drop CONFIG_WORD_SIZE for BITS - powerpc: port 64 bits pgtable_cache to 32 bits - [Config] CONFIG_WORD_SIZE disappeared - powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold - powerpc/64: Use optimized checksum routines on little-endian - CONFIG_GENERIC_CSUM=n for ppc64el - powerpc/64: Fix checksum folding in csum_add() * [Hyper-V] Rebase Hyper-V to the upstream 4.10 kernel (LP: #1670544) - PCI: hv: Use device serial number as PCI domain - PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal - PCI: hv: Use the correct buffer size in new_pcichild_device() - scsi: storvsc: Payload buffer incorrectly sized for 32 bit kernels. - hv_netvsc: remove excessive logging on MTU change - net: centralize net_device min/max MTU checking - net: deprecate eth_change_mtu, remove usage - net: use core MTU range checking in virt drivers - hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf() - net: use core MTU range checking in virt drivers - tools: hv: fix a compile warning in snprintf - tools: hv: remove unnecessary header files and netlink related code - vmbus: add support for dynamic device id's - Drivers: hv: utils: reduce HV_UTIL_NEGO_TIMEOUT timeout - Drivers: hv: utils: Fix the mapping between host version and protocol to use - Drivers: hv: vss: Improve log messages. - hv: change clockevents unbind tactics - Drivers: hv: balloon: Disable hot add when CONFIG_MEMORY_HOTPLUG is not set - Drivers: hv: balloon: Fix info request to show max page count - Drivers: hv: balloon: Add logging for dynamic memory operations - [Config] CONFIG_UIO_HV_GENERIC=m - uio-hv-generic: new userspace i/o driver for VMBus - hyperv: Fix spelling of HV_UNKOWN - Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw() (v2) - ethernet: use net core MTU range checking in more drivers * Kernel linux-image-4.4.0-67-generic prevent the boot on Microsoft Hyper-v 2012r2 Gen2 VM (LP: #1674635) - scsi: storvsc: Workaround for virtual DVD SCSI version * Enable lspcon on i915 (LP: #1676747) - drm: Helper for lspcon in drm_dp_dual_mode - drm/i915: Add lspcon support for I915 driver - drm/i915: Parse VBT data for lspcon - drm/i915: Enable lspcon initialization - drm/i915: Add lspcon resume function * stress_smoke_test passing and exiting rc=9 (linux 4.9.0-12.13 ADT test failure with linux 4.9.0-12.13) (LP: #1658633) - ext4: lock the xattr block before checksuming it * ip_rcv_finish() NULL pointer kernel panic (LP: #1672470) - (upstream) bridge: drop netfilter fake rtable unconditionally * dm-queue-length module is not included in installer/initramfs (LP: #1673350) - d-i: Also add dm-queue-length to multipath modules * Broadcom bluetooth modules sometimes fail to initialize (LP: #1483101) - Bluetooth: btbcm: Add a delay for module reset * Need support of Broadcom bluetooth device [413c:8143] (LP: #1166113) - Bluetooth: btusb: Add support for 413c:8143 * Unable to Connect Third HDD via USB Hub (LP: #1663991) - mm/slub.c: fix random_seq offset destruction * POWER9 : Enable Stop 0-2 with ESL=EC=0 (LP: #1666197) - powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro - powernv:stop: Rename pnv_arch300_idle_init to pnv_power9_idle_init - cpuidle:powernv: Add helper function to populate powernv idle states. - powernv: Pass PSSCR value and mask to power9_idle_stop - Documentation:powerpc: Add device-tree bindings for power-mgt - powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop * Nvlink2: Additional patches (LP: #1667081) - mm: enable CONFIG_MOVABLE_NODE on non-x86 arches - of/fdt: mark hotpluggable memory - dt: add documentation of "hotpluggable" memory property - powerpc/mm: Fix memory hotplug BUG() on radix - powerpc/powernv: Initialise nest mmu - powerpc/powernv: Use OPAL call for TCE kill on NVLink2 - powerpc/mm: refactor radix physical page mapping - powerpc/mm: add radix__create_section_mapping() - powerpc/mm: add radix__remove_section_mapping() - powerpc/mm: unstub radix__vmemmap_remove_mapping() - [Config] Update CONFIG_MOVABLE_NODE values and annotations - [Config] CONFIG_MOVABLE_NODE=n for s390x * FC Adapter (LPe32000-based) prints "iotag out of range", goes offline, and delays boot a lot (Ubuntu17.04/Emulex/lpfc)) (LP: #1670490) - scsi: lpfc: Correct WQ creation for pagesize - scsi: lpfc: Add missing memory barrier * CIFS: Call echo service immediately after socket reconnect (LP: #1669941) - Call echo service immediately after socket reconnect * Kernel: Fix Transactional memory config typo (LP: #1669023) - powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state() * h-prod does not function across cores (LP: #1670726) - KVM: PPC: Book3S HV: Fix H_PROD to actually wake the target vcpu * [Hyper-V] Missing PCI patches breaking SR-IOV hot remove (LP: #1670518) - PCI: hv: Fix hv_pci_remove() for hot-remove - PCI: hv: Delete the device earlier from hbus->children for hot-remove - PCI: hv: Make unnecessarily global IRQ masking functions static - PCI: hv: Allocate physically contiguous hypercall params buffer * move aufs.ko from -extra to linux-image package (LP: #1673498) - [config] aufs.ko moved to linux-image package * POWER9: Improve CAS negotiation (LP: #1671169) - powerpc: Parse the command line before calling CAS - powerpc: Add missing error check to prom_find_boot_cpu() - powerpc/pseries: Advertise HPT resizing support via CAS - powerpc/64: Disable use of radix under a hypervisor - powerpc/pseries: Advertise Hot Plug Event support to firmware - powerpc: Update to new option-vector-5 format for CAS * Power9 kernel: add virtualization patches (LP: #1670800) - powerpc/fadump: Set core e_flags using kernel's ELF ABI version - powerpc/sparse: Add more assembler prototypes - powerpc/pasemi: Fix Nemo SB600 i8259 interrupts. - powerpc/pasemi: Fix device_type of Nemo SB600 node. - powerpc/pseries: Use H_CLEAR_HPT to clear MMU hash table during kexec - powerpc/pseries: Move CMO code from plapr_wrappers.h to platforms/pseries - powerpc: Fix old style declaration GCC warnings - powerpc/pseries: add definitions for new H_SIGNAL_SYS_RESET hcall - powerpc/prom: Define structs for client architecture vectors - powerpc/prom: Switch to using structs for ibm_architecture_vec - tracing: Have the reg function allow to fail - powerpc: port 64 bits pgtable_cache to 32 bits - powerpc/64: Don't try to use radix MMU under a hypervisor - powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options - powerpc/64: Enable use of radix MMU under hypervisor on POWER9 * lsattr 32bit does not work on 64bit kernel (Inappropriate ioctl error) (LP: #1619918) - btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls * linux-tools-common should Depends: lsb-release (LP: #1667571) - [Config] linux-tools-common depends on lsb-release * CAPI:Ubuntu: Kernel panic while rebooting (LP: #1667599) - pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot() * Add Use-After-Free Patch for Ubuntu16.10 - EEH on BELL3 adapter fails to recover (serial/tty) (LP: #1669153) - 8250_pci: Fix potential use-after-free in error path * Request to backport cxlflash patches to Xenial SRU stream (LP: #1623750) - scsi: cxlflash: Scan host only after the port is ready for I/O - scsi: cxlflash: Fix to avoid EEH and host reset collisions - scsi: cxlflash: Improve EEH recovery time * FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft bootme, HTX & Linux errors seen with 256 virtual LUNs (LP: #1667239) - cxl: Fix coredump generation when cxl_get_fd() is used * POWER9: Additional patches for 17.04 and 16.04.2 (LP: #1667116) - powerpc/mm: Update PROTFAULT handling in the page fault path - powerpc/mm/radix: Update pte update sequence for pte clear case - powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm - powerpc/mm/radix: Skip ptesync in pte update helpers - SAUCE: powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU * [Hyper-V] Ubuntu 14.04.2 LTS Generation 2 SCSI Errors on VSS Based Backups (LP: #1470250) - Drivers: hv: vss: Operation timeouts should match host expectation - SAUCE: Tools: hv: vss: Thaw the filesystem and continue after freeze fails * PowerNV: No rate limit for kernel error "KVM can't copy data from" (LP: #1667416) - SAUCE: KVM: PPC: Book3S: Ratelimit copy data failure error messages * kernel 4.4.0-63 with USB WLAN RTL8192CU freezes desktop (LP: #1666421) - rtlwifi: rtl_usb: Fix missing entry in USB driver's private data * Export symbol "dev_pm_qos_update_user_latency_tolerance" (LP: #1666401) - PM / QoS: Export dev_pm_qos_update_user_latency_tolerance * Linux ZFS port doesn't respect RLIMIT_FSIZE (LP: #1656259) - SAUCE: (noup) Update zfs to 0.6.5.8-0ubuntu4.2 -- Stefan Bader <stefan.ba...@canonical.com> Thu, 20 Apr 2017 09:38:36 +0200 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1667239 Title: FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft bootme, HTX & Linux errors seen with 256 virtual LUNs Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Status in linux source package in Yakkety: Fix Released Bug description: == Comment: #1 - Application Cdeadmin <cdead...@us.ibm.com> - 2016-06-02 15:28:27 == ==== State: Open by: anitrap on 01 June 2016 17:36:39 ==== Contact: Anitra Powell (anit...@us.ibm.com ) Backup: Dion Bell (bel...@us.ibm.com) Primary BMC (1603G): ===================================================== # cat /proc/ractrends/Helper/FwInfo FW_VERSION=2.13.91819 FW_DATE=Mar 10 2016 FW_BUILDTIME=10:59:31 CDT FW_DESC=8335 SRC BUILD RR9 03102016 FW_PRODUCTID=1 FW_RELEASEID=RR9 FW_CODEBASEVERSION=2.X # PNOR (1603G): ======================== # ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47 Product Name : OpenPOWER Firmware Product Version : IBM-firestone-ibm-OP8_v1.7_1.62 Product Extra : hostboot-bc98d0b-1a29dff Product Extra : occ-0362706-16fdfa7 Product Extra : skiboot-5.1.13 Product Extra : hostboot-binaries-43d5a59 Product Extra : firestone-xml-e7b4fa2-c302f0e Product Extra : capp-ucode-105cb8f Partition Info: ================= ver 1.5.4.3 - OS, HTX, Firmware and Machine details OS: GNU/Linux OS Version: Ubuntu 16.04 LTS \n \l Kernel Version: 4.4.8c0ffee0+ HTX Version: htxubuntu-396 Host Name: fsbmc30p1 Machine Serial No: 210995A Machine Type/Model: 8335-GCA root@fsbmc30p1:~# uname -a Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux FlashGT NVMe setup: =================== 1 FlashGT card in slot 1 running in superpipe mode with 128 LUNs per port (total of 256 LUNs). lsscsi [0:0:0:0] disk ATA ST1000NX0313 BE33 /dev/sda [1:0:0:0] disk ATA ST1000NX0313 BE33 /dev/sdb [4:0:0:0] disk NVMe SAMSUNG MZ1LV960 3011 /dev/sdc [4:1:0:0] disk NVMe SAMSUNG MZ1LV960 3011 /dev/sdd [5:0:0:0] cd/dvd AMI Virtual CDROM0 1.00 /dev/sr0 [5:0:0:1] cd/dvd AMI Virtual CDROM1 1.00 /dev/sr1 [5:0:0:2] cd/dvd AMI Virtual CDROM2 1.00 /dev/sr2 [5:0:0:3] cd/dvd AMI Virtual CDROM3 1.00 /dev/sr3 [6:0:0:0] disk AMI Virtual Floppy0 1.00 /dev/sde [6:0:0:1] disk AMI Virtual Floppy1 1.00 /dev/sdf [6:0:0:2] disk AMI Virtual Floppy2 1.00 /dev/sdg [6:0:0:3] disk AMI Virtual Floppy3 1.00 /dev/sdh [7:0:0:0] disk AMI Virtual HDisk0 1.00 /dev/sdi [7:0:0:1] disk AMI Virtual HDisk1 1.00 /dev/sdj [7:0:0:2] disk AMI Virtual HDisk2 1.00 /dev/sdk [7:0:0:3] disk AMI Virtual HDisk3 1.00 /dev/sdl [7:0:0:4] disk AMI Virtual HDisk4 1.00 /dev/sdm lspci | grep -i acc 0004:01:00.0 Processing accelerators: IBM Device 0601 (rev 01) ls -l /sys/class/cxl total 0 lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0 -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0 lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0m -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0m lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0s -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0s lrwxrwxrwx 1 root root 0 May 31 13:27 card0 -> ../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0 lscfg | grep afu + afu0.0 Slot1/card0/afu0.0 + afu0.0m Slot1/card0/afu0.0/afu0.0m + afu0.0s Slot1/card0/afu0.0/afu0.0s /opt/ibm/capikv/bin/cxlfstatus CXL Flash Device Status Found 0601 0004:01:00.0 Slot1 Device: SCSI Block Mode LUN WWID sg2: 4:0:0:0, sdc, superpipe, 60025380025382463300046000000000 sg3: 4:1:0:0, sdd, superpipe, 60025380025382463300052000000000 dpkg -l | grep capi 4el no description given 3.0-1970-3042652 ppc6 4el no description given 3.0-1970-3042652 ppc6 root@fsbmc30p1:/tmp# dpkg -l | grep afu ii afuimage 3.0-1970-3042652 all no description given cat /opt/ibm/capikv/version.txt 1970-3042652 /opt/ibm/capikv/afu/cxl_afu_dump /dev/cxl/afu0.0m -v AFU Version = 160525N1 NVMe0 Version = BTV73011 NVMe0 NEXT = BTV73011 NVMe0 STATUS = 0x702 NVMe1 Version = BTV73011 NVMe1 NEXT = BTV73011 NVMe1 STATUS = 0x702 cat /tmp/test_lun_mode 128 Problem: =========== While running soft bootme (shutdown -r from OS every hour, I noticed htx errors after the 9th & 17th reboot of partition. At this point they seem like different issues so I am opening up 2 different defects. I've already opened up defect SW354759 for the first set of htx errors and assigned to htx_screen. This defect is for issue that happened after 17th reboot (Jun 1 @ 6am). On the 18th reboot (Jun 1 @ 7am), the shutdown -r command failed... I had to manually power down system. I guess I will open to surelock_screen first since it seems similar to the one Dion opened up while running 128 virtual LUNs per port (defect http://w3.rchland.ibm.com/projects/bestquest/?defect=SW353881) . For this fail, other exercisers eventually failed also. Test Info: ============ - running Soft bootme (shutdown -r every hour) - mdt.bu + hxecom (GPUs were running). I copied a modified mdt.bu to another mdt file so I would not see any errors in htx after reboot. Sample of HTX errors (for this defect) ============================== /dev/sg2.53 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock READCMP5 numopers= 20000 loop= 4956 blk=0x4eee len= 4096 offset=0 Seed Values= 37882, 44181, 50758 Data Pattern Seed Values = 37882, 44182, 50758 LBA Fencepost = 0xb94a cblk_read error - Device or resource busy /dev/sg2.18 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock READCMP9 numopers= 20000 loop= 1501 blk=0x93f1 len= 4096 offset=0 Seed Values= 37847, 44740, 50780 Data Pattern Seed Values = 37847, 44741, 50780 LBA Fencepost = 0xb275 cblk_read error - Device or resource busy /dev/sg2.98 Jun 1 06:26:53 2016 err=00000010 sev=4 hxesurelock READCMP5 numopers= 20000 loop= 10365 blk=0x86d5 len= 4096 offset=0 Seed Values= 37927, 41320, 50710 Data Pattern Seed Values = 37927, 41321, 50710 LBA Fencepost = 0xbc7c cblk_read error - Device or resource busy /dev/sg2.116 Jun 1 06:30:45 2016 err=00000005 sev=4 hxesurelock RDCMP10 numopers= 20000 loop= 6383 blk=0xc33d len= 4096 offset=0 Seed Values= 37945, 49039, 50726 Data Pattern Seed Values = 37945, 49040, 50726 LBA Fencepost = 0xd0b0 cblk_read error - Input/output error /dev/fpu17 Jun 1 06:30:51 2016 err=0000000b sev=1 hxefpu64 pthread_create call failed with rc: 11, errno: 11, Resource temporarily unavailable /dev/fpu17 Jun 1 06:30:51 2016 err=0000000b sev=1 hxefpu64 Hardware Exerciser stopped on an error /dev/sctu43 Jun 1 06:30:51 2016 err=0000000b sev=1 hxesctu pthread_create call failed with rc: 11, errno: 11, Resource temporarily unavailable /dev/sctu43 Jun 1 06:30:51 2016 err=0000000b sev=1 hxesctu Hardware Exerciser stopped on an error Logs: ====== /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1 /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/htxerr /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/syslog /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/kern.log /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/bootme.log sample of syslog during first htx error: ================================================ Jun 1 06:19:20 fsbmc30p1 systemd[1]: Started Cleanup of Temporary Directories. Jun 1 06:25:01 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next retry is Wed Jun 1 06:25:31 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Jun 1 06:25:01 fsbmc30p1 CRON[99327]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )) Jun 1 06:26:53 fsbmc30p1 CXLBLK[37882]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:53 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next retry is Wed Jun 1 06:27:23 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ] Jun 1 06:26:53 fsbmc30p1 CXLBLK[37847]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:53 fsbmc30p1 CXLBLK[37927]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 CXLBLK[37961]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg3, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 CXLBLK[37954]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 CXLBLK[37887]: cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 0x607,for chunk->dev_name = /dev/sg2, chunk index = 0 Jun 1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took 200250 ns sample from kern.log during fail: ================================= Jun 1 06:08:11 fsbmc30p1 kernel: [ 250.251041] nvidia-uvm: Loaded the UVM driver in lite mode, major device number 241 Jun 1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took 200250 ns Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.764382] hxesurelock[40392]: unhandled signal 11 at 0000000000000024 nip 00003fff84602978 lr 00003fff84602974 code 30001 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868242] Unable to handle kernel paging request for data at address 0x0000000c Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868599] Faulting instruction address: 0xc00000000035e2b0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868865] Oops: Kernel access of bad area, sig: 11 [#1] Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868928] SMP NR_CPUS=2048 NUMA PowerNV Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.868992] Modules linked in: nvidia_uvm(POE) iptable_filter ip_tables x_tables nvidia(POE) ipmi_devintf joydev input_leds mac_hid opal_prd ofpart cmdlinepart powernv_flash mtd at24 ipmi_powernv ipmi_msghandler uio_pdrv_genirq uio ibmpowernv powernv_rng binfmt_misc nfsd ib_iser auth_rpcgss rdma_cm iw_cm ib_cm nfs_acl ib_sa ib_mad lockd ib_core grace ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear mlx4_en hid_generic usbhid hid uas usb_storage cxlflash ast bnx2x i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm cxl vxlan mlx4_core ahci ip6_udp_tunnel udp_tunnel libahci mdio libcrc32c Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870299] CPU: 80 PID: 40392 Comm: hxesurelock Tainted: P OE 4.4.8c0ffee0+ #2 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870379] task: c000007935fe23a0 ti: c000007910810000 task.ti: c000007910810000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870476] NIP: c00000000035e2b0 LR: c00000000035e280 CTR: 0000000000000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870552] REGS: c0000079108135e0 TRAP: 0300 Tainted: P OE (4.4.8c0ffee0+) Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870642] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28053988 XER: 00000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] CFAR: c000000000008468 DAR: 000000000000000c DSISR: 40000000 SOFTE: 1 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR00: c00000000035e280 c000007910813860 c000000001594600 0000000000000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR04: c000007823192400 000000000002574f 0000000000000001 0000000000000000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR08: c0000079241b8a00 0000000000000000 00000000000044fb 65776f702f62696c Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR12: 2d656c3436637072 c00000000fb6f800 00000000464c457f 0000000000010c78 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR16: 0000000000000000 0000000000000039 d000000034fa04c5 0000000000010000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR20: 00000000000000cd 0000000000000550 0000000000010000 00000000039e0000 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR24: 00003fffffffffff c000007910813af8 c000007823192600 c00000793f57b980 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR28: c00000793f573e80 00003fffffffffff 000000000000001f c000007926f29790 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872149] NIP [c00000000035e2b0] elf_core_dump+0xd60/0x1300 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872277] LR [c00000000035e280] elf_core_dump+0xd30/0x1300 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872351] Call Trace: Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872407] [c000007910813860] [c00000000035e280] elf_core_dump+0xd30/0x1300 (unreliable) Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872527] [c000007910813a60] [c00000000036898c] do_coredump+0xcec/0x11e0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872625] [c000007910813c20] [c0000000000ce7a0] get_signal+0x540/0x7b0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872705] [c000007910813d10] [c000000000017344] do_signal+0x54/0x2b0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872785] [c000007910813e00] [c00000000001776c] do_notify_resume+0xbc/0xd0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872877] [c000007910813e30] [c000000000009838] ret_from_except_lite+0x64/0x68 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.872963] Instruction dump: Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.873004] 60000000 2fa30000 409effa8 e95f0050 39200000 794737e3 4082ffa4 e91f00a0 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.873148] 2fa80000 419e002c e92800f8 e9290000 <8129000c> 79279fe3 41820018 7948efe3 Jun 1 06:28:16 fsbmc30p1 kernel: [ 1454.884655] ---[ end trace f8abb6e0d0322daa ]--- gsave info: ============== GSA Location: /gsa/ausgsa/projects/s/sift/hst/trial_data/Surelock/Ubuntu/flashgt/fsbmc30p1_ubuntu1604_FlashGT_bootme_test5/FAIL201606011024 <===== This is from RTC side description =====> See the Discussion field for the initial comments from CQ. </===== This is from RTC side description =====> ==== State: Open by: mpvageli on 02 June 2016 14:20:06 ==== Oops: Kernel access of bad area, sig: 11 [#1] # ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47 Product Name : OpenPOWER Firmware Product Version : IBM-firestone-ibm-OP8_v1.7_1.62 Product Extra : hostboot-bc98d0b-1a29dff Product Extra : occ-0362706-16fdfa7 Product Extra : skiboot-5.1.13 Product Extra : hostboot-binaries-43d5a59 Product Extra : firestone-xml-e7b4fa2-c302f0e Product Extra : capp-ucode-105cb8f == Comment: #9 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-06-07 12:04:49 == root@fsbmc30p1:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04 LTS Release: 16.04 Codename: xenial root@fsbmc30p1:~# cat /etc/*release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=16.04 DISTRIB_CODENAME=xenial DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS" NAME="Ubuntu" VERSION="16.04 LTS (Xenial Xerus)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 16.04 LTS" VERSION_ID="16.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/" UBUNTU_CODENAME=xenial root@fsbmc30p1:~# uname -a Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le ppc64le ppc64le GNU/Linux root@fsbmc30p1:~# == Comment: #24 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-07-07 07:14:05 == From kernel logs =========== [ 7087.918089] device enP3p5s0f2 left promiscuous mode [ 8801.190528] cxlflash 0007:00:00.0: send_tmf: TMF timed out! [ 8806.190383] cxlflash 0007:00:00.0: send_tmf: TMF timed out! [ 8816.507485] hxesurelock[14180]: unhandled signal 11 at 0000000000000024 nip 00003fff852c2ee8 lr 00003fff852c2938 code 30001 [ 8816.511368] hxesurelock[13501]: unhandled signal 11 at 0000000000000024 nip 00003fff890b2ee8 lr 00003fff890b2938 code 30001 [ 8816.526807] Unable to handle kernel paging request for data at address 0x0000000c [ 8816.526928] Faulting instruction address: 0xc00000000035e2b0 [ 8816.530233] Unable to handle kernel paging request for data at address 0x0000000c [ 8816.530596] Faulting instruction address: 0xc00000000035e2b0 3f:mon> t [c000000686a13a60] c00000000036898c do_coredump+0xcec/0x11e0 [c000000686a13c20] c0000000000ce7a0 get_signal+0x540/0x7b0 [c000000686a13d10] c000000000017344 do_signal+0x54/0x2b0 [c000000686a13e00] c00000000001776c do_notify_resume+0xbc/0xd0 [c000000686a13e30] c000000000009838 ret_from_except_lite+0x64/0x68 --- Exception: 300 (Data Access) at 00003fff890b2ee8 SP (3fff83c2c490) is in userspace 3f:mon> r R00 = c00000000035e280 R16 = 0000000000000000 R01 = c000000686a13860 R17 = 0000000000000042 R02 = c000000001594600 R18 = d000000021b104fa R03 = 0000000000000000 R19 = 0000000000010000 R04 = c000002fb7463400 R20 = 00000000000000cd R05 = 00000000000001bf R21 = 0000000000000628 R06 = 0000000000000001 R22 = 0000000000010000 R07 = 0000000000000000 R23 = 0000000000250000 R08 = c00000281af21500 R24 = 00003fffffffffff R09 = 0000000000000000 R25 = c000000686a13af8 R10 = 00000000000044fb R26 = c000002fb7463800 R11 = 6c2d656c34366370 R27 = c000002ff0e05cc0 R12 = 756e672d78756e69 R28 = c000002ff0e05c40 R13 = c00000000fb65680 R29 = 00003fffffffffff R14 = 00000000464c457f R30 = 0000000000000016 R15 = 0000000000010e70 R31 = c000002fb94bd3b8 pc = c00000000035e2b0 elf_core_dump+0xd60/0x1300 cfar= c000000000008468 slb_miss_realmode+0x50/0x78 lr = c00000000035e280 elf_core_dump+0xd30/0x1300 msr = 9000000100009033 cr = 28053828 ctr = 0000000000000000 xer = 0000000000000000 trap = 300 dar = 000000000000000c dsisr = 40000000 3f:mon> hxesurelock process has segfaulted and kernel has crashed while dumping core. == Comment: #87 - Frederic Barrat <frederic.bar...@fr.ibm.com> - 2017-02-21 11:50:40 == Fix is in kernel v4.10: bdecf76e319a29735d828575f4a9269f0e17c547 "cxl: Fix coredump generation when cxl_get_fd() is used" We'd like to have it backported to 16.10 and 16.04 LTS. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667239/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp