This bug was fixed in the package linux - 4.8.0-49.52

---------------
linux (4.8.0-49.52) yakkety; urgency=low

  * linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.8.0-48.51) yakkety; urgency=low

  * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.8.0-47.50) yakkety; urgency=low

  * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using 
stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor  auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

  * [Hyper-V][Mellanox] net/mlx4_core: Avoid delays during VF driver device
    shutdown (LP: #1672785)
    - Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow"
    - net/mlx4_core: Avoid delays during VF driver device shutdown

  * Update ENA driver to 1.1.2 from net-next (LP: #1664312)
    - net: ena: Remove unnecessary pci_set_drvdata()
    - net: ena: Fix error return code in ena_device_init()
    - net: ena: change the return type of ena_set_push_mode() to be void.
    - net: ena: use setup_timer() and mod_timer()
    - net/ena: remove ntuple filter support from device feature list
    - net/ena: fix queues number calculation
    - net/ena: fix ethtool RSS flow configuration
    - net/ena: fix RSS default hash configuration
    - net/ena: fix NULL dereference when removing the driver after device reset
      failed
    - net/ena: refactor ena_get_stats64 to be atomic context safe
    - net/ena: fix potential access to freed memory during device reset
    - net/ena: use READ_ONCE to access completion descriptors
    - net/ena: reduce the severity of ena printouts
    - net/ena: change driver's default timeouts
    - net/ena: change condition for host attribute configuration
    - net/ena: update driver version to 1.1.2

  * ISST-LTE:pVM:roselp4:ubuntu16.04.2: number of numa_miss and numa_foreign
    wrong in numastat (LP: #1672953)
    - mm: fix remote numa hits statistics
    - mm: get rid of __GFP_OTHER_NODE

  * Using an NVMe drive causes huge power drain (LP: #1664602)
    - nvme/scsi: Remove power management support
    - nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
    - nvme: introduce struct nvme_request
    - nvme: Add a quirk mechanism that uses identify_ctrl
    - nvme: Enable autonomous power state transitions

  * POWER9: Additional patches for TTY and CPU_IDLE (LP: #1674325)
    - tty: Fix ldisc crash on reopened tty
    - SAUCE: powerpc/powernv/cpuidle: Pass correct drv->cpumask for registration

  * Ubuntu 16.10: Network checksum fixes needed for IPoIB for Mellanox CX4/CX5
    card (LP: #1670247)
    - Revert "powerpc: port 64 bits pgtable_cache to 32 bits"
    - powerpc/Makefile: Drop CONFIG_WORD_SIZE for BITS
    - powerpc: port 64 bits pgtable_cache to 32 bits
    - [Config] CONFIG_WORD_SIZE disappeared
    - powerpc/64: Fix checksum folding in csum_tcpudp_nofold and
      ip_fast_csum_nofold
    - powerpc/64: Use optimized checksum routines on little-endian
    - CONFIG_GENERIC_CSUM=n for ppc64el
    - powerpc/64: Fix checksum folding in csum_add()

  * [Hyper-V] Rebase Hyper-V to the upstream 4.10 kernel (LP: #1670544)
    - PCI: hv: Use device serial number as PCI domain
    - PCI: hv: Fix wslot_to_devfn() to fix warnings on device removal
    - PCI: hv: Use the correct buffer size in new_pcichild_device()
    - scsi: storvsc: Payload buffer incorrectly sized for 32 bit kernels.
    - hv_netvsc: remove excessive logging on MTU change
    - net: centralize net_device min/max MTU checking
    - net: deprecate eth_change_mtu, remove usage
    - net: use core MTU range checking in virt drivers
    - hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()
    - net: use core MTU range checking in virt drivers
    - tools: hv: fix a compile warning in snprintf
    - tools: hv: remove unnecessary header files and netlink related code
    - vmbus: add support for dynamic device id's
    - Drivers: hv: utils: reduce HV_UTIL_NEGO_TIMEOUT timeout
    - Drivers: hv: utils: Fix the mapping between host version and protocol to 
use
    - Drivers: hv: vss: Improve log messages.
    - hv: change clockevents unbind tactics
    - Drivers: hv: balloon: Disable hot add when CONFIG_MEMORY_HOTPLUG is not 
set
    - Drivers: hv: balloon: Fix info request to show max page count
    - Drivers: hv: balloon: Add logging for dynamic memory operations
    - [Config] CONFIG_UIO_HV_GENERIC=m
    - uio-hv-generic: new userspace i/o driver for VMBus
    - hyperv: Fix spelling of HV_UNKOWN
    - Drivers: hv: ring_buffer: count on wrap around mappings in
      get_next_pkt_raw() (v2)
    - ethernet: use net core MTU range checking in more drivers

  * Kernel linux-image-4.4.0-67-generic prevent the boot on Microsoft Hyper-v
    2012r2 Gen2 VM (LP: #1674635)
    - scsi: storvsc: Workaround for virtual DVD SCSI version

  * Enable lspcon on i915 (LP: #1676747)
    - drm: Helper for lspcon in drm_dp_dual_mode
    - drm/i915: Add lspcon support for I915 driver
    - drm/i915: Parse VBT data for lspcon
    - drm/i915: Enable lspcon initialization
    - drm/i915: Add lspcon resume function

  * stress_smoke_test passing and exiting rc=9 (linux 4.9.0-12.13 ADT test
    failure with linux 4.9.0-12.13) (LP: #1658633)
    - ext4: lock the xattr block before checksuming it

  * ip_rcv_finish() NULL pointer kernel panic (LP: #1672470)
    - (upstream) bridge: drop netfilter fake rtable unconditionally

  * dm-queue-length module is not included in installer/initramfs (LP: #1673350)
    - d-i: Also add dm-queue-length to multipath modules

  * Broadcom bluetooth modules sometimes fail to initialize (LP: #1483101)
    - Bluetooth: btbcm: Add a delay for module reset

  * Need support of Broadcom bluetooth device [413c:8143] (LP: #1166113)
    - Bluetooth: btusb: Add support for 413c:8143

  * Unable to Connect Third HDD via USB Hub (LP: #1663991)
    - mm/slub.c: fix random_seq offset destruction

  * POWER9 : Enable Stop 0-2 with ESL=EC=0 (LP: #1666197)
    - powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macro
    - powernv:stop: Rename pnv_arch300_idle_init to pnv_power9_idle_init
    - cpuidle:powernv: Add helper function to populate powernv idle states.
    - powernv: Pass PSSCR value and mask to power9_idle_stop
    - Documentation:powerpc: Add device-tree bindings for power-mgt
    - powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop

  * Nvlink2: Additional patches (LP: #1667081)
    - mm: enable CONFIG_MOVABLE_NODE on non-x86 arches
    - of/fdt: mark hotpluggable memory
    - dt: add documentation of "hotpluggable" memory property
    - powerpc/mm: Fix memory hotplug BUG() on radix
    - powerpc/powernv: Initialise nest mmu
    - powerpc/powernv: Use OPAL call for TCE kill on NVLink2
    - powerpc/mm: refactor radix physical page mapping
    - powerpc/mm: add radix__create_section_mapping()
    - powerpc/mm: add radix__remove_section_mapping()
    - powerpc/mm: unstub radix__vmemmap_remove_mapping()
    - [Config] Update CONFIG_MOVABLE_NODE values and annotations
    - [Config] CONFIG_MOVABLE_NODE=n for s390x

  * FC Adapter (LPe32000-based) prints "iotag out of range", goes offline, and
    delays boot a lot (Ubuntu17.04/Emulex/lpfc)) (LP: #1670490)
    - scsi: lpfc: Correct WQ creation for pagesize
    - scsi: lpfc: Add missing memory barrier

  * CIFS: Call echo service immediately after socket reconnect (LP: #1669941)
    - Call echo service immediately after socket reconnect

  * Kernel: Fix Transactional memory config typo (LP: #1669023)
    - powerpc/process: Fix CONFIG_ALIVEC typo in restore_tm_state()

  * h-prod does not function across cores (LP: #1670726)
    - KVM: PPC: Book3S HV: Fix H_PROD to actually wake the target vcpu

  * [Hyper-V] Missing PCI patches breaking SR-IOV hot remove (LP: #1670518)
    - PCI: hv: Fix hv_pci_remove() for hot-remove
    - PCI: hv: Delete the device earlier from hbus->children for hot-remove
    - PCI: hv: Make unnecessarily global IRQ masking functions static
    - PCI: hv: Allocate physically contiguous hypercall params buffer

  * move aufs.ko from -extra to linux-image package (LP: #1673498)
    - [config] aufs.ko moved to linux-image package

  * POWER9: Improve CAS negotiation (LP: #1671169)
    - powerpc: Parse the command line before calling CAS
    - powerpc: Add missing error check to prom_find_boot_cpu()
    - powerpc/pseries: Advertise HPT resizing support via CAS
    - powerpc/64: Disable use of radix under a hypervisor
    - powerpc/pseries: Advertise Hot Plug Event support to firmware
    - powerpc: Update to new option-vector-5 format for CAS

  * Power9 kernel: add virtualization patches (LP: #1670800)
    - powerpc/fadump: Set core e_flags using kernel's ELF ABI version
    - powerpc/sparse: Add more assembler prototypes
    - powerpc/pasemi: Fix Nemo SB600 i8259 interrupts.
    - powerpc/pasemi: Fix device_type of Nemo SB600 node.
    - powerpc/pseries: Use H_CLEAR_HPT to clear MMU hash table during kexec
    - powerpc/pseries: Move CMO code from plapr_wrappers.h to platforms/pseries
    - powerpc: Fix old style declaration GCC warnings
    - powerpc/pseries: add definitions for new H_SIGNAL_SYS_RESET hcall
    - powerpc/prom: Define structs for client architecture vectors
    - powerpc/prom: Switch to using structs for ibm_architecture_vec
    - tracing: Have the reg function allow to fail
    - powerpc: port 64 bits pgtable_cache to 32 bits
    - powerpc/64: Don't try to use radix MMU under a hypervisor
    - powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options
    - powerpc/64: Enable use of radix MMU under hypervisor on POWER9

  * lsattr 32bit does not work on 64bit kernel (Inappropriate ioctl error)
    (LP: #1619918)
    - btrfs: fix btrfs_compat_ioctl failures on non-compat ioctls

  * linux-tools-common should Depends: lsb-release (LP: #1667571)
    - [Config] linux-tools-common depends on lsb-release

  * CAPI:Ubuntu: Kernel panic while rebooting (LP: #1667599)
    - pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()

  * Add Use-After-Free Patch for Ubuntu16.10 - EEH on BELL3 adapter fails to
    recover (serial/tty) (LP: #1669153)
    - 8250_pci: Fix potential use-after-free in error path

  * Request to backport cxlflash patches to Xenial SRU stream (LP: #1623750)
    - scsi: cxlflash: Scan host only after the port is ready for I/O
    - scsi: cxlflash: Fix to avoid EEH and host reset collisions
    - scsi: cxlflash: Improve EEH recovery time

  * FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft bootme,
    HTX & Linux errors seen with 256 virtual LUNs (LP: #1667239)
    - cxl: Fix coredump generation when cxl_get_fd() is used

  * POWER9: Additional patches for 17.04 and 16.04.2 (LP: #1667116)
    - powerpc/mm: Update PROTFAULT handling in the page fault path
    - powerpc/mm/radix: Update pte update sequence for pte clear case
    - powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full 
mm
    - powerpc/mm/radix: Skip ptesync in pte update helpers
    - SAUCE: powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting
      up CPU

  * [Hyper-V] Ubuntu 14.04.2 LTS Generation 2 SCSI Errors on VSS Based Backups
    (LP: #1470250)
    - Drivers: hv: vss: Operation timeouts should match host expectation
    - SAUCE: Tools: hv: vss: Thaw the filesystem and continue after freeze fails

  * PowerNV: No rate limit for kernel error "KVM can't copy data from"
    (LP: #1667416)
    - SAUCE: KVM: PPC: Book3S: Ratelimit copy data failure error messages

  * kernel 4.4.0-63 with USB WLAN RTL8192CU freezes desktop (LP: #1666421)
    - rtlwifi: rtl_usb: Fix missing entry in USB driver's private data

  * Export symbol "dev_pm_qos_update_user_latency_tolerance" (LP: #1666401)
    - PM / QoS: Export dev_pm_qos_update_user_latency_tolerance

  * Linux ZFS port doesn't respect RLIMIT_FSIZE (LP: #1656259)
    - SAUCE: (noup) Update zfs to 0.6.5.8-0ubuntu4.2

 -- Stefan Bader <stefan.ba...@canonical.com>  Thu, 20 Apr 2017 09:38:36
+0200

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1667239

Title:
  FlashGT Integration and Setup: fsbmc30: After 17th reboot of soft
  bootme, HTX & Linux errors seen with 256 virtual LUNs

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released

Bug description:
  == Comment: #1 - Application Cdeadmin <cdead...@us.ibm.com> - 2016-06-02 
15:28:27 ==
  ==== State: Open by: anitrap on 01 June 2016 17:36:39 ====

  Contact: Anitra Powell  (anit...@us.ibm.com )
  Backup: Dion Bell (bel...@us.ibm.com)

  
  Primary BMC (1603G):
  =====================================================
  # cat /proc/ractrends/Helper/FwInfo
  FW_VERSION=2.13.91819
  FW_DATE=Mar 10 2016
  FW_BUILDTIME=10:59:31 CDT
  FW_DESC=8335 SRC BUILD RR9 03102016
  FW_PRODUCTID=1
  FW_RELEASEID=RR9
  FW_CODEBASEVERSION=2.X
  #

  PNOR (1603G):
  ========================
  # ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47
  Product Name          : OpenPOWER Firmware
  Product Version       : IBM-firestone-ibm-OP8_v1.7_1.62
  Product Extra         : hostboot-bc98d0b-1a29dff
  Product Extra         : occ-0362706-16fdfa7
  Product Extra         : skiboot-5.1.13
  Product Extra         : hostboot-binaries-43d5a59
  Product Extra         : firestone-xml-e7b4fa2-c302f0e
  Product Extra         : capp-ucode-105cb8f

  Partition Info:
  =================
         ver 1.5.4.3 - OS, HTX, Firmware and Machine details

                             OS: GNU/Linux
                     OS Version: Ubuntu 16.04 LTS \n \l
                 Kernel Version: 4.4.8c0ffee0+
                    HTX Version: htxubuntu-396
                      Host Name: fsbmc30p1
              Machine Serial No: 210995A
             Machine Type/Model: 8335-GCA

  root@fsbmc30p1:~# uname -a
  Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le 
ppc64le ppc64le GNU/Linux

  FlashGT NVMe setup:
  ===================
  1 FlashGT card in slot 1 running in superpipe mode with 128 LUNs per port 
(total of 256 LUNs).

  lsscsi
  [0:0:0:0]    disk    ATA      ST1000NX0313     BE33  /dev/sda
  [1:0:0:0]    disk    ATA      ST1000NX0313     BE33  /dev/sdb
  [4:0:0:0]    disk    NVMe     SAMSUNG MZ1LV960 3011  /dev/sdc
  [4:1:0:0]    disk    NVMe     SAMSUNG MZ1LV960 3011  /dev/sdd
  [5:0:0:0]    cd/dvd  AMI      Virtual CDROM0   1.00  /dev/sr0
  [5:0:0:1]    cd/dvd  AMI      Virtual CDROM1   1.00  /dev/sr1
  [5:0:0:2]    cd/dvd  AMI      Virtual CDROM2   1.00  /dev/sr2
  [5:0:0:3]    cd/dvd  AMI      Virtual CDROM3   1.00  /dev/sr3
  [6:0:0:0]    disk    AMI      Virtual Floppy0  1.00  /dev/sde
  [6:0:0:1]    disk    AMI      Virtual Floppy1  1.00  /dev/sdf
  [6:0:0:2]    disk    AMI      Virtual Floppy2  1.00  /dev/sdg
  [6:0:0:3]    disk    AMI      Virtual Floppy3  1.00  /dev/sdh
  [7:0:0:0]    disk    AMI      Virtual HDisk0   1.00  /dev/sdi
  [7:0:0:1]    disk    AMI      Virtual HDisk1   1.00  /dev/sdj
  [7:0:0:2]    disk    AMI      Virtual HDisk2   1.00  /dev/sdk
  [7:0:0:3]    disk    AMI      Virtual HDisk3   1.00  /dev/sdl
  [7:0:0:4]    disk    AMI      Virtual HDisk4   1.00  /dev/sdm

  lspci | grep -i acc
  0004:01:00.0 Processing accelerators: IBM Device 0601 (rev 01)

  ls -l /sys/class/cxl
  total 0
  lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0 -> 
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0
  lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0m -> 
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0m
  lrwxrwxrwx 1 root root 0 May 31 13:27 afu0.0s -> 
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0/afu0.0/afu0.0s
  lrwxrwxrwx 1 root root 0 May 31 13:27 card0 -> 
../../devices/pci0004:00/0004:00:00.0/0004:01:00.0/cxl/card0

  lscfg | grep afu
  + afu0.0           Slot1/card0/afu0.0
  + afu0.0m          Slot1/card0/afu0.0/afu0.0m
  + afu0.0s          Slot1/card0/afu0.0/afu0.0s

  /opt/ibm/capikv/bin/cxlfstatus
  CXL Flash Device Status

  Found 0601 0004:01:00.0 Slot1
      Device:       SCSI  Block       Mode                          LUN WWID
         sg2:    4:0:0:0,   sdc, superpipe, 60025380025382463300046000000000
         sg3:    4:1:0:0,   sdd, superpipe, 60025380025382463300052000000000

  dpkg -l | grep capi
  4el      no description given               3.0-1970-3042652                
ppc6
  4el      no description given               3.0-1970-3042652                
ppc6

  root@fsbmc30p1:/tmp# dpkg -l | grep afu
  ii  afuimage                                3.0-1970-3042652                
all          no description given

  cat /opt/ibm/capikv/version.txt
  1970-3042652

  /opt/ibm/capikv/afu/cxl_afu_dump /dev/cxl/afu0.0m -v
  AFU Version     = 160525N1

   NVMe0 Version = BTV73011
   NVMe0 NEXT    = BTV73011
   NVMe0 STATUS  = 0x702

   NVMe1 Version = BTV73011
   NVMe1 NEXT    = BTV73011
   NVMe1 STATUS  = 0x702

  cat /tmp/test_lun_mode
  128

  Problem:
  ===========
  While running soft bootme (shutdown -r from OS every hour, I noticed htx 
errors after the 9th & 17th reboot of partition.  At this point they seem like 
different issues so I am opening up 2 different defects.  I've already opened 
up defect SW354759 for the first set of htx errors and assigned to htx_screen.

  This defect is for issue that happened after 17th reboot (Jun 1 @
  6am). On the 18th reboot (Jun 1 @ 7am), the shutdown -r command
  failed... I had to manually power down system.

  I guess I will open to surelock_screen first since it seems similar to
  the one Dion opened up while running 128 virtual LUNs per port (defect
  http://w3.rchland.ibm.com/projects/bestquest/?defect=SW353881) .  For
  this fail, other exercisers eventually failed also.

  Test Info:
  ============
  - running Soft bootme (shutdown -r every hour)
  - mdt.bu + hxecom  (GPUs were running). I copied a modified mdt.bu to another 
mdt file so I would not see any errors in htx after reboot.

  Sample of HTX errors (for this defect)
  ==============================
  /dev/sg2.53       Jun  1 06:26:53 2016 err=00000010 sev=4 hxesurelock   
  READCMP5  numopers=     20000  loop=      4956  blk=0x4eee 
  len=      4096   offset=0   Seed Values= 37882, 44181, 50758 
  Data Pattern Seed Values = 37882, 44182, 50758    LBA Fencepost = 0xb94a
  cblk_read error - Device or resource busy

  /dev/sg2.18       Jun  1 06:26:53 2016 err=00000010 sev=4 hxesurelock   
  READCMP9  numopers=     20000  loop=      1501  blk=0x93f1 
  len=      4096   offset=0   Seed Values= 37847, 44740, 50780 
  Data Pattern Seed Values = 37847, 44741, 50780    LBA Fencepost = 0xb275
  cblk_read error - Device or resource busy

  /dev/sg2.98       Jun  1 06:26:53 2016 err=00000010 sev=4 hxesurelock   
  READCMP5  numopers=     20000  loop=     10365  blk=0x86d5 
  len=      4096   offset=0   Seed Values= 37927, 41320, 50710 
  Data Pattern Seed Values = 37927, 41321, 50710    LBA Fencepost = 0xbc7c
  cblk_read error - Device or resource busy

  /dev/sg2.116      Jun  1 06:30:45 2016 err=00000005 sev=4 hxesurelock   
  RDCMP10  numopers=     20000  loop=      6383  blk=0xc33d 
  len=      4096   offset=0   Seed Values= 37945, 49039, 50726 
  Data Pattern Seed Values = 37945, 49040, 50726    LBA Fencepost = 0xd0b0
  cblk_read error - Input/output error

  /dev/fpu17        Jun  1 06:30:51 2016 err=0000000b sev=1 hxefpu64      
  pthread_create call failed with rc: 11, errno: 11, Resource temporarily 
unavailable

  /dev/fpu17        Jun  1 06:30:51 2016 err=0000000b sev=1 hxefpu64      
  Hardware Exerciser stopped on an error

  /dev/sctu43       Jun  1 06:30:51 2016 err=0000000b sev=1 hxesctu       
  pthread_create call failed with rc: 11, errno: 11, Resource temporarily 
unavailable

  /dev/sctu43       Jun  1 06:30:51 2016 err=0000000b sev=1 hxesctu       
  Hardware Exerciser stopped on an error

  Logs:
  ======
  /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1

  /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/htxerr
  /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/syslog
  /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/kern.log
  /gsa/ausgsa/home/a/n/anitrap/web/public/fsbmc30/softbootme_fail_1/bootme.log

  sample of syslog during first htx error:
  ================================================
  Jun  1 06:19:20 fsbmc30p1 systemd[1]: Started Cleanup of Temporary 
Directories.
  Jun  1 06:25:01 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next 
retry is Wed Jun  1 06:25:31 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
  Jun  1 06:25:01 fsbmc30p1 CRON[99327]: (root) CMD (test -x /usr/sbin/anacron 
|| ( cd / && run-parts --report /etc/cron.daily ))
  Jun  1 06:26:53 fsbmc30p1 CXLBLK[37882]: 
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
  Jun  1 06:26:53 fsbmc30p1 rsyslogd-2007: action 'action 10' suspended, next 
retry is Wed Jun  1 06:27:23 2016 [v8.16.0 try http://www.rsyslog.com/e/2007 ]
  Jun  1 06:26:53 fsbmc30p1 CXLBLK[37847]: 
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
  Jun  1 06:26:53 fsbmc30p1 CXLBLK[37927]: 
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0

  Jun  1 06:26:59 fsbmc30p1 CXLBLK[37961]: 
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 
0x607,for chunk->dev_name = /dev/sg3, chunk index = 0
  Jun  1 06:26:59 fsbmc30p1 CXLBLK[37954]: 
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
  Jun  1 06:26:59 fsbmc30p1 CXLBLK[37887]: 
cflash_block_kern_mc.c,cblk_notify_mc_err,5504,LOG_EVENT reason 7 error_num = 
0x607,for chunk->dev_name = /dev/sg2, chunk index = 0
  Jun  1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took 
200250 ns

  sample from kern.log during fail:
  =================================
  Jun  1 06:08:11 fsbmc30p1 kernel: [  250.251041] nvidia-uvm: Loaded the UVM 
driver in lite mode, major device number 241
  Jun  1 06:26:59 fsbmc30p1 kernel: [ 1378.248405] hrtimer: interrupt took 
200250 ns
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.764382] hxesurelock[40392]: 
unhandled signal 11 at 0000000000000024 nip 00003fff84602978 lr 
00003fff84602974 code 30001
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.868242] Unable to handle kernel 
paging request for data at address 0x0000000c
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.868599] Faulting instruction 
address: 0xc00000000035e2b0
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.868865] Oops: Kernel access of bad 
area, sig: 11 [#1]
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.868928] SMP NR_CPUS=2048 NUMA PowerNV
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.868992] Modules linked in: 
nvidia_uvm(POE) iptable_filter ip_tables x_tables nvidia(POE) ipmi_devintf 
joydev input_leds mac_hid opal_prd ofpart cmdlinepart powernv_flash mtd at24 
ipmi_powernv ipmi_msghandler uio_pdrv_genirq uio ibmpowernv powernv_rng 
binfmt_misc nfsd ib_iser auth_rpcgss rdma_cm iw_cm ib_cm nfs_acl ib_sa ib_mad 
lockd ib_core grace ib_addr sunrpc iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath 
linear mlx4_en hid_generic usbhid hid uas usb_storage cxlflash ast bnx2x 
i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
drm cxl vxlan mlx4_core ahci ip6_udp_tunnel udp_tunnel libahci mdio libcrc32c
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870299] CPU: 80 PID: 40392 Comm: 
hxesurelock Tainted: P           OE   4.4.8c0ffee0+ #2
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870379] task: c000007935fe23a0 ti: 
c000007910810000 task.ti: c000007910810000
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870476] NIP: c00000000035e2b0 LR: 
c00000000035e280 CTR: 0000000000000000
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870552] REGS: c0000079108135e0 TRAP: 
0300   Tainted: P           OE    (4.4.8c0ffee0+)
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870642] MSR: 9000000100009033 
<SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28053988  XER: 00000000
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] CFAR: c000000000008468 DAR: 
000000000000000c DSISR: 40000000 SOFTE: 1 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR00: c00000000035e280 
c000007910813860 c000000001594600 0000000000000000 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR04: c000007823192400 
000000000002574f 0000000000000001 0000000000000000 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR08: c0000079241b8a00 
0000000000000000 00000000000044fb 65776f702f62696c 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR12: 2d656c3436637072 
c00000000fb6f800 00000000464c457f 0000000000010c78 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR16: 0000000000000000 
0000000000000039 d000000034fa04c5 0000000000010000 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR20: 00000000000000cd 
0000000000000550 0000000000010000 00000000039e0000 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR24: 00003fffffffffff 
c000007910813af8 c000007823192600 c00000793f57b980 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.870852] GPR28: c00000793f573e80 
00003fffffffffff 000000000000001f c000007926f29790 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872149] NIP [c00000000035e2b0] 
elf_core_dump+0xd60/0x1300
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872277] LR [c00000000035e280] 
elf_core_dump+0xd30/0x1300
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872351] Call Trace:
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872407] [c000007910813860] 
[c00000000035e280] elf_core_dump+0xd30/0x1300 (unreliable)
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872527] [c000007910813a60] 
[c00000000036898c] do_coredump+0xcec/0x11e0
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872625] [c000007910813c20] 
[c0000000000ce7a0] get_signal+0x540/0x7b0
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872705] [c000007910813d10] 
[c000000000017344] do_signal+0x54/0x2b0
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872785] [c000007910813e00] 
[c00000000001776c] do_notify_resume+0xbc/0xd0
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872877] [c000007910813e30] 
[c000000000009838] ret_from_except_lite+0x64/0x68
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.872963] Instruction dump:
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.873004] 60000000 2fa30000 409effa8 
e95f0050 39200000 794737e3 4082ffa4 e91f00a0 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.873148] 2fa80000 419e002c e92800f8 
e9290000 <8129000c> 79279fe3 41820018 7948efe3 
  Jun  1 06:28:16 fsbmc30p1 kernel: [ 1454.884655] ---[ end trace 
f8abb6e0d0322daa ]---

  gsave info: 
  ==============
  GSA Location: 
/gsa/ausgsa/projects/s/sift/hst/trial_data/Surelock/Ubuntu/flashgt/fsbmc30p1_ubuntu1604_FlashGT_bootme_test5/FAIL201606011024

  <===== This is from RTC side description =====>
  See the Discussion field for the initial comments from CQ.
  </===== This is from RTC side description =====>
  ==== State: Open by: mpvageli on 02 June 2016 14:20:06 ====

   Oops: Kernel access of bad area, sig: 11 [#1]

  # ipmitool -H 127.0.0.1 -I lanplus -U ADMIN -P admin fru list 47
  Product Name          : OpenPOWER Firmware
  Product Version       : IBM-firestone-ibm-OP8_v1.7_1.62
  Product Extra         : hostboot-bc98d0b-1a29dff
  Product Extra         : occ-0362706-16fdfa7
  Product Extra         : skiboot-5.1.13
  Product Extra         : hostboot-binaries-43d5a59
  Product Extra         : firestone-xml-e7b4fa2-c302f0e
  Product Extra         : capp-ucode-105cb8f

  == Comment: #9 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-06-07 
12:04:49 ==
  root@fsbmc30p1:~# lsb_release -a
  No LSB modules are available.
  Distributor ID:       Ubuntu
  Description:  Ubuntu 16.04 LTS
  Release:      16.04
  Codename:     xenial
  root@fsbmc30p1:~# cat /etc/*release
  DISTRIB_ID=Ubuntu
  DISTRIB_RELEASE=16.04
  DISTRIB_CODENAME=xenial
  DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"
  NAME="Ubuntu"
  VERSION="16.04 LTS (Xenial Xerus)"
  ID=ubuntu
  ID_LIKE=debian
  PRETTY_NAME="Ubuntu 16.04 LTS"
  VERSION_ID="16.04"
  HOME_URL="http://www.ubuntu.com/";
  SUPPORT_URL="http://help.ubuntu.com/";
  BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/";
  UBUNTU_CODENAME=xenial
  root@fsbmc30p1:~# uname -a
  Linux fsbmc30p1 4.4.8c0ffee0+ #2 SMP Tue May 24 10:50:26 CDT 2016 ppc64le 
ppc64le ppc64le GNU/Linux
  root@fsbmc30p1:~#

  == Comment: #24 - VIPIN K. PARASHAR <vipar...@in.ibm.com> - 2016-07-07 
07:14:05 ==
  From kernel logs
  ===========

  [ 7087.918089] device enP3p5s0f2 left promiscuous mode
  [ 8801.190528] cxlflash 0007:00:00.0: send_tmf: TMF timed out!
  [ 8806.190383] cxlflash 0007:00:00.0: send_tmf: TMF timed out!
  [ 8816.507485] hxesurelock[14180]: unhandled signal 11 at 0000000000000024 
nip 00003fff852c2ee8 lr 00003fff852c2938 code 30001
  [ 8816.511368] hxesurelock[13501]: unhandled signal 11 at 0000000000000024 
nip 00003fff890b2ee8 lr 00003fff890b2938 code 30001
  [ 8816.526807] Unable to handle kernel paging request for data at address 
0x0000000c
  [ 8816.526928] Faulting instruction address: 0xc00000000035e2b0
  [ 8816.530233] Unable to handle kernel paging request for data at address 
0x0000000c
  [ 8816.530596] Faulting instruction address: 0xc00000000035e2b0
  3f:mon> t
  [c000000686a13a60] c00000000036898c do_coredump+0xcec/0x11e0
  [c000000686a13c20] c0000000000ce7a0 get_signal+0x540/0x7b0
  [c000000686a13d10] c000000000017344 do_signal+0x54/0x2b0
  [c000000686a13e00] c00000000001776c do_notify_resume+0xbc/0xd0
  [c000000686a13e30] c000000000009838 ret_from_except_lite+0x64/0x68
  --- Exception: 300 (Data Access) at 00003fff890b2ee8
  SP (3fff83c2c490) is in userspace
  3f:mon> r
  R00 = c00000000035e280   R16 = 0000000000000000
  R01 = c000000686a13860   R17 = 0000000000000042
  R02 = c000000001594600   R18 = d000000021b104fa
  R03 = 0000000000000000   R19 = 0000000000010000
  R04 = c000002fb7463400   R20 = 00000000000000cd
  R05 = 00000000000001bf   R21 = 0000000000000628
  R06 = 0000000000000001   R22 = 0000000000010000
  R07 = 0000000000000000   R23 = 0000000000250000
  R08 = c00000281af21500   R24 = 00003fffffffffff
  R09 = 0000000000000000   R25 = c000000686a13af8
  R10 = 00000000000044fb   R26 = c000002fb7463800
  R11 = 6c2d656c34366370   R27 = c000002ff0e05cc0
  R12 = 756e672d78756e69   R28 = c000002ff0e05c40
  R13 = c00000000fb65680   R29 = 00003fffffffffff
  R14 = 00000000464c457f   R30 = 0000000000000016
  R15 = 0000000000010e70   R31 = c000002fb94bd3b8
  pc  = c00000000035e2b0 elf_core_dump+0xd60/0x1300
  cfar= c000000000008468 slb_miss_realmode+0x50/0x78
  lr  = c00000000035e280 elf_core_dump+0xd30/0x1300
  msr = 9000000100009033   cr  = 28053828
  ctr = 0000000000000000   xer = 0000000000000000   trap =  300
  dar = 000000000000000c   dsisr = 40000000
  3f:mon> 

  
  hxesurelock process has segfaulted and kernel has crashed while
  dumping core.

  == Comment: #87 - Frederic Barrat <frederic.bar...@fr.ibm.com> - 2017-02-21 
11:50:40 ==
  Fix is in kernel v4.10:
  bdecf76e319a29735d828575f4a9269f0e17c547
  "cxl: Fix coredump generation when cxl_get_fd() is used"

  We'd like to have it backported to 16.10 and 16.04 LTS.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667239/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to