Hello: On 06/06/2017 01:09 PM, Tejun Heo wrote: > Hello, > > On Tue, Jun 06, 2017 at 11:18:36AM -0500, Michael Bringmann wrote: >> On 05/25/2017 10:30 AM, Michael Bringmann wrote: >>> I will try that patch shortly. I also updated my patch to be conditional >>> on whether the pool's cpumask attribute was empty. You should have received >>> V2 of that patch by now. >> >> Let's try this again. >> >> The hotplug problem goes away with the changes that you provided earlier, and > > So, that means we're ending up in situations where NUMA online is a > proper superset of NUMA possible. > >> shown in the patch below. I kept this change to get_unbound_pool' as a just >> in case to explain the crash in the event that it occurs again: >> >> if (!cpumask_weight(pool->attrs->cpumask)) >> cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id())); >> >> I could also insert >> >> BUG(!cpumask_weight(pool->attrs->cpumask, >> cpumask_of(smp_processor_id())); >> >> at that place, but I really prefer not to crash the system if there is a >> workaround. > > I'm not sure because it doesn't make any logical sense and it's not > right in terms of correctness. The above would be able to enable CPUs > which are explicitly excluded from a workqueue. The only fallback > which makes sense is falling back to the default pwq.
What would that look like? Are you sure that would always be valid? In a system that is hot-adding and hot-removing CPUs? >>> Can you please post the messages with the debug patch from the prev >>> thread? In fact, let's please continue on that thread. I'm having a >>> hard time following what's going wrong with the code. >> >> Are these the failure logs that you requested? >> >> >> Red Hat Enterprise Linux Server 7.3 (Maipo) >> Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le >> >> ltcalpine2-lp20 login: root >> Password: >> Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com >> [root@ltcalpine2-lp20 ~]# numactl -H >> available: 2 nodes (0,6) >> node 0 cpus: >> node 0 size: 0 MB >> node 0 free: 0 MB >> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 >> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 >> 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 >> node 6 size: 19858 MB >> node 6 free: 16920 MB >> node distances: >> node 0 6 >> 0: 10 40 >> 6: 40 10 >> [root@ltcalpine2-lp20 ~]# numactl -H >> available: 2 nodes (0,6) >> node 0 cpus: >> node 0 size: 0 MB >> node 0 free: 0 MB >> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 >> 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 >> 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 >> 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 >> 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 >> 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 >> 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 >> 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 >> 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 >> node 6 size: 19858 MB >> node 6 free: 16362 MB >> node distances: >> node 0 6 >> 0: 10 40 >> 6: 40 10 >> [root@ltcalpine2-lp20 ~]# [ 321.310943] workqueue:get_unbound_pool has >> empty cpumask for pool attrs >> [ 321.310961] ------------[ cut here ]------------ >> [ 321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 >> alloc_unbound_pwq+0x5c0/0x5e0 >> [ 321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag >> udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng >> ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c >> sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log >> dm_mod >> [ 321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted >> 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8 >> [ 321.311106] task: c000000408961080 task.stack: c000000406394000 >> [ 321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: >> 0000000000000000 >> [ 321.311121] REGS: c0000004063977b0 TRAP: 0700 Not tainted >> (4.12.0-rc1.wi91275_debug_03.ppc64le+) >> [ 321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> >> [ 321.311150] CR: 28000082 XER: 00000000 >> [ 321.311159] CFAR: c000000000a2dc80 SOFTE: 1 >> [ 321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 >> 000000000000003b >> [ 321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 >> ffffffffa4a5a483 >> [ 321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 >> c000000408961a38 >> [ 321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 >> c00000040a902ac0 >> [ 321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 >> 0000000000000000 >> [ 321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 >> 0000000000000000 >> [ 321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 >> c0000000015a4c88 >> [ 321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 >> c0000004fe052800 >> [ 321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0 >> [ 321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0 >> [ 321.311305] Call Trace: >> [ 321.311310] [c000000406397a30] [c000000000116c7c] >> alloc_unbound_pwq+0x5bc/0x5e0 (unreliable) >> [ 321.311323] [c000000406397ad0] [c000000000116e30] >> wq_update_unbound_numa+0x190/0x270 >> [ 321.311334] [c000000406397b60] [c000000000118eb0] >> workqueue_offline_cpu+0xe0/0x130 >> [ 321.311345] [c000000406397bf0] [c0000000000e9f20] >> cpuhp_invoke_callback+0x240/0xcd0 >> [ 321.311355] [c000000406397cb0] [c0000000000eab28] >> cpuhp_down_callbacks+0x78/0xf0 >> [ 321.311365] [c000000406397d00] [c0000000000eae6c] >> cpuhp_thread_fun+0x18c/0x1a0 >> [ 321.311376] [c000000406397d30] [c0000000001251cc] >> smpboot_thread_fn+0x2fc/0x3b0 >> [ 321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0 >> [ 321.311397] [c000000406397e30] [c00000000000b4f4] >> ret_from_kernel_thread+0x5c/0x68 >> [ 321.311406] Instruction dump: >> [ 321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 >> 3c82ff6c 3863d590 >> [ 321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 >> 60000000 60420000 > > The only way offlining can lead to this failure is when wq numa > possible cpu mask is a proper subset of the matching online mask. Can > you please print out the numa online cpu and wq_numa_possible_cpumask > masks and verify that online stays within the possible for each node? > If not, the ppc arch init code needs to be updated so that cpu <-> > node binding is establish for all possible cpus on boot. Note that > this isn't a requirement coming solely from wq. All node affine (thus > percpu) allocations depend on that. The ppc arch init code already records all nodes used by the CPUs visible in the device-tree at boot time into the possible and online node bindings. The problem here occurs when we hot-add new CPUs to the powerpc system -- they may require nodes that are mentioned by the VPHN hcall, but which were not used at boot time. I will run a test that dumps these masks later this week to try to provide the information that you are interested in. Right now we are having a discussion on another thread as to how to properly set the possible node mask at boot given that there is no mechanism to hot-add nodes to the system. The latest idea appears to be adding another property or two to define the maximum number of nodes that should be added to the possible / online node masks to allow for dynamic growth after boot. > > Thanks. > Thanks. -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 [email protected]

