https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105739
--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> --- After inlning I see: IPA function summary for rcu_tasks_trace_pertask/5350 inlinable global time: 13.535950 self size: 11 global size: 16 min size: 11 self stack: 0 global stack: 0 estimated growth:5 size:8.000000, time:5.807250 size:3.000000, time:2.000000, executed if:(not inlined) size:2.000000, time:2.000000, nonconst if:(op0 changed) calls: rcu_tasks_trace_pertask.part.0/5788 inlined freq:0.62 Stack frame offset 0, callee self size 0 trc_wait_for_one_reader/5799 inlined freq:0.62 Stack frame offset 0, callee self size 0 __builtin_unreachable/5800 unreachable freq:0.00 loop depth: 0 size: 0 time: 0 predicate: (false) op0 is compile time invariant op1 is compile time invariant trc_wait_for_one_reader.part.0/5784 --param max-inline-insns-auto limit reached freq:0.31 loop depth: 0 size: 3 time: 12 callee size:100 stack: 0 __builtin_expect/5421 function body not available freq:1.00 loop depth: 0 size: 0 time: 0 op1 is compile time invariant So it seems that we determine call in trc_wait_for_one_reader unreachable. It originally calls printk: IPA function summary for trc_wait_for_one_reader/5348 inlinable global time: 13.500000 self size: 20 global size: 20 min size: 1 self stack: 0 global stack: 0 size:1.000000, time:1.000000 size:3.000000, time:2.000000, executed if:(not inlined) size:0.500000, time:0.500000, executed if:(not inlined), nonconst if:(op0[ref offset: 8672] changed) && (not inlined) size:2.500000, time:2.500000, nonconst if:(op0[ref offset: 8672] changed) size:1.500000, time:0.250000, executed if:(op0[ref offset: 8672] != -1) && (not inlined) size:3.500000, time:1.250000, executed if:(op0[ref offset: 8672] != -1) calls: trc_wait_for_one_reader.part.0/5784 function not considered for inlining freq:0.50 loop depth: 0 size: 3 time: 12 callee size:55 stack: 0 predicate: (op0[ref offset: 8672] == -1) _printk/5452 function body not available freq:0.00 loop depth: 0 size: 5 time: 14 predicate: (op0[ref offset: 8672] != -1) op0 is compile time invariant So we somehow figure out (op0[ref offset: 8672] != -1) here: Enqueueing calls in rcu_tasks_trace_pertask.part.0/5788. Estimating body: trc_wait_for_one_reader/5348 Known to be false: not inlined, op0[ref offset: 8672] != -1, op0[ref offset: 8672] changed size:4 time:7.000000 nonspec time:12.000000 This seems to be based on: Jump functions: Jump functions of caller rcu_tasks_trace_pertask.part.0/5788: callsite rcu_tasks_trace_pertask.part.0/5788 -> trc_wait_for_one_reader/5348 : param 0: PASS THROUGH: 0, op nop_expr Aggregate passed by reference: offset: 8672, type: int, CONST: -1 value: 0x0, mask: 0xffffffffffffffff Unknown VR param 1: PASS THROUGH: 1, op nop_expr value: 0x0, mask: 0xffffffffffffffff Unknown VR Which seems correct: rcu_tasks_trace_pertask.part.0 (struct task_struct * t, struct list_head * hop) { struct task_struct * D.58527; u64 pfo_val__; _Bool _1; long int _2; long int _3; struct task_struct * _14; <bb 4> [local count: 1073741824]: <bb 2> [local count: 1073741824]: __asm__ __volatile__("" : : : "memory"); MEM[(volatile u8 *)t_1(D) + 1089B] ={v} 0; __asm__ __volatile__("lock; addl $0,-4(%%rsp)" : : : "cc", "memory"); t_1(D)->trc_ipi_to_cpu = -1; trc_wait_for_one_reader (t_1(D), hop_2(D)); <bb 3> [local count: 1073741824]: return; } so there is really store to trc_ipi_to_cpu. The code uses it as: _60 ={v} t_59(D)->trc_ipi_to_cpu; __asm__ __volatile__("" : : : "memory"); if (_60 != -1) goto <bb 3>; [50.00%] else goto <bb 6>; [50.00%] So bug may be that we ignore volatile flag when determining IPA predicates? problem goes away with -fno-partial-inlining.