I've made some good progress here.

I found that older version like 4.19 work, so I ran git bisect. I'm
still doing the final check, but it looks like the series that causes
the issue is the one containing these:

d53d2f78cead bpf: Use vmalloc special flag
1a7b7d922081 modules: Use vmalloc special flag
868b104d7379 mm/vmalloc: Add flag for freeing of special permsissions

In particular:

commit 868b104d7379e28013e9d48bdd2db25e0bdcf751 (HEAD)
Author: Rick Edgecombe <rick.p.edgeco...@intel.com>
Date:   Thu Apr 25 17:11:36 2019 -0700

    mm/vmalloc: Add flag for freeing of special permsissions
    
    Add a new flag VM_FLUSH_RESET_PERMS, for enabling vfree operations to
    immediately clear executable TLB entries before freeing pages, and handle
    resetting permissions on the directmap. This flag is useful for any kind
    of memory with elevated permissions, or where there can be related
    permissions changes on the directmap. Today this is RO+X and RO memory.
    
    Although this enables directly vfreeing non-writeable memory now,
    non-writable memory cannot be freed in an interrupt because the allocation
    itself is used as a node on deferred free list. So when RO memory needs to
    be freed in an interrupt the code doing the vfree needs to have its own
    work queue, as was the case before the deferred vfree list was added to
    vmalloc.
    
    For architectures with set_direct_map_ implementations this whole operation
    can be done with one TLB flush when centralized like this. For others with
    directmap permissions, currently only arm64, a backup method using
    set_memory functions is used to reset the directmap. When arm64 adds
    set_direct_map_ functions, this backup can be removed.
    
    When the TLB is flushed to both remove TLB entries for the vmalloc range
    mapping and the direct map permissions, the lazy purge operation could be
    done to try to save a TLB flush later. However today vm_unmap_aliases
    could flush a TLB range that does not include the directmap. So a helper
    is added with extra parameters that can allow both the vmalloc address and
    the direct mapping to be flushed during this operation. The behavior of the
    normal vm_unmap_aliases function is unchanged.

and

commit d53d2f78ceadba081fc7785570798c3c8d50a718
Author: Rick Edgecombe <rick.p.edgeco...@intel.com>
Date:   Thu Apr 25 17:11:38 2019 -0700

    bpf: Use vmalloc special flag
    
    Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special
    permissioned memory in vmalloc and remove places where memory was set RW
    before freeing which is no longer needed. Don't track if the memory is RO
    anymore because it is now tracked in vmalloc.


This is _extremely_ in "subtly break under the hash MMU" areas.

Hopefully this is enough to get some Power MMU experts to weigh in. I
will keep working on it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1927076

Title:
  IPv6 TCP in reuseport_bpf_cpu from ubuntu_kernel_selftests/net crash
  P8 node entei (Oops: Exception in kernel mode, sig: 4 [#1])

Status in ubuntu-kernel-tests:
  New
Status in The Ubuntu-power-systems project:
  Confirmed
Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Focal:
  Confirmed
Status in linux source package in Hirsute:
  Confirmed

Bug description:
  It looks like our P8 node "entei" tend to fail with the IPv6 TCP test
  from reuseport_bpf_cpu in ubuntu_kernel_selftests/net on 5.8 kernels:

   # send cpu 119, receive socket 119
   # send cpu 121, receive socket 121
   # send cpu 123, receive socket 123
   # send cpu 125, receive socket 125
   # send cpu 127, receive socket 127
   # ---- IPv6 TCP ----
  publish-job-status: using request.json

  It failed silently here, this can be 100% reproduced with Groovy 5.8
  and Focal 5.8.

  This will cause the ubuntu_kernel_selftests being interrupted, the
  test result for other tests cannot be processed to our result page.

  Please find attachment for the complete "net" test result on this node
  with Groovy 5.8.0-52.59

  Add the kqa-blocker tag as this might needs to be manually verified.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1927076/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to