Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot

Michael Bringmann Mon, 12 Jun 2017 07:48:04 -0700

Hello:

On 06/06/2017 01:09 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 06, 2017 at 11:18:36AM -0500, Michael Bringmann wrote:
>> On 05/25/2017 10:30 AM, Michael Bringmann wrote:
>>> I will try that patch shortly.  I also updated my patch to be conditional
>>> on whether the pool's cpumask attribute was empty.  You should have received
>>> V2 of that patch by now.
>>
>> Let's try this again.
>>
>> The hotplug problem goes away with the changes that you provided earlier, and
> 
> So, that means we're ending up in situations where NUMA online is a
> proper superset of NUMA possible.
> 
>> shown in the patch below.  I kept this change to get_unbound_pool' as a just
>> in case to explain the crash in the event that it occurs again:
>>
>>     if (!cpumask_weight(pool->attrs->cpumask))
>>         cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
>>
>> I could also insert 
>>
>>     BUG(!cpumask_weight(pool->attrs->cpumask, 
>> cpumask_of(smp_processor_id()));
>>
>> at that place, but I really prefer not to crash the system if there is a 
>> workaround.
> 
> I'm not sure because it doesn't make any logical sense and it's not
> right in terms of correctness.  The above would be able to enable CPUs
> which are explicitly excluded from a workqueue.  The only fallback
> which makes sense is falling back to the default pwq.


What would that look like?  Are you sure that would always be valid?
In a system that is hot-adding and hot-removing CPUs?

>>> Can you please post the messages with the debug patch from the prev
>>> thread?  In fact, let's please continue on that thread.  I'm having a
>>> hard time following what's going wrong with the code.
>>
>> Are these the failure logs that you requested?
>>
>>
>> Red Hat Enterprise Linux Server 7.3 (Maipo)
>> Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le
>>
>> ltcalpine2-lp20 login: root
>> Password: 
>> Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
>> [root@ltcalpine2-lp20 ~]# numactl -H
>> available: 2 nodes (0,6)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
>> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
>> 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
>> node 6 size: 19858 MB
>> node 6 free: 16920 MB
>> node distances:
>> node   0   6 
>>   0:  10  40 
>>   6:  40  10 
>> [root@ltcalpine2-lp20 ~]# numactl -H
>> available: 2 nodes (0,6)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
>> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 
>> 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 
>> 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 
>> 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 
>> 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 
>> 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 
>> 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 
>> 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191
>> node 6 size: 19858 MB
>> node 6 free: 16362 MB
>> node distances:
>> node   0   6 
>>   0:  10  40 
>>   6:  40  10 
>> [root@ltcalpine2-lp20 ~]# [  321.310943] workqueue:get_unbound_pool has 
>> empty cpumask for pool attrs
>> [  321.310961] ------------[ cut here ]------------
>> [  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 
>> alloc_unbound_pwq+0x5c0/0x5e0
>> [  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag 
>> udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng 
>> ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c 
>> sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log 
>> dm_mod
>> [  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 
>> 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
>> [  321.311106] task: c000000408961080 task.stack: c000000406394000
>> [  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 
>> 0000000000000000
>> [  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  
>> (4.12.0-rc1.wi91275_debug_03.ppc64le+)
>> [  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
>> [  321.311150]   CR: 28000082  XER: 00000000
>> [  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
>> [  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 
>> 000000000000003b 
>> [  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 
>> ffffffffa4a5a483 
>> [  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 
>> c000000408961a38 
>> [  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 
>> c00000040a902ac0 
>> [  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 
>> 0000000000000000 
>> [  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 
>> 0000000000000000 
>> [  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 
>> c0000000015a4c88 
>> [  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 
>> c0000004fe052800 
>> [  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
>> [  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
>> [  321.311305] Call Trace:
>> [  321.311310] [c000000406397a30] [c000000000116c7c] 
>> alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
>> [  321.311323] [c000000406397ad0] [c000000000116e30] 
>> wq_update_unbound_numa+0x190/0x270
>> [  321.311334] [c000000406397b60] [c000000000118eb0] 
>> workqueue_offline_cpu+0xe0/0x130
>> [  321.311345] [c000000406397bf0] [c0000000000e9f20] 
>> cpuhp_invoke_callback+0x240/0xcd0
>> [  321.311355] [c000000406397cb0] [c0000000000eab28] 
>> cpuhp_down_callbacks+0x78/0xf0
>> [  321.311365] [c000000406397d00] [c0000000000eae6c] 
>> cpuhp_thread_fun+0x18c/0x1a0
>> [  321.311376] [c000000406397d30] [c0000000001251cc] 
>> smpboot_thread_fn+0x2fc/0x3b0
>> [  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
>> [  321.311397] [c000000406397e30] [c00000000000b4f4] 
>> ret_from_kernel_thread+0x5c/0x68
>> [  321.311406] Instruction dump:
>> [  321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 
>> 3c82ff6c 3863d590 
>> [  321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 
>> 60000000 60420000 
> 
> The only way offlining can lead to this failure is when wq numa
> possible cpu mask is a proper subset of the matching online mask.  Can
> you please print out the numa online cpu and wq_numa_possible_cpumask
> masks and verify that online stays within the possible for each node?
> If not, the ppc arch init code needs to be updated so that cpu <->
> node binding is establish for all possible cpus on boot.  Note that
> this isn't a requirement coming solely from wq.  All node affine (thus
> percpu) allocations depend on that.

The ppc arch init code already records all nodes used by the CPUs visible in
the device-tree at boot time into the possible and online node bindings.  The
problem here occurs when we hot-add new CPUs to the powerpc system -- they may
require nodes that are mentioned by the VPHN hcall, but which were not used
at boot time.

I will run a test that dumps these masks later this week to try to provide
the information that you are interested in.

Right now we are having a discussion on another thread as to how to properly
set the possible node mask at boot given that there is no mechanism to hot-add
nodes to the system.  The latest idea appears to be adding another property
or two to define the maximum number of nodes that should be added to the
possible / online node masks to allow for dynamic growth after boot.

> 
> Thanks.
> 

Thanks.

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
[email protected]

Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot

Reply via email to