https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105739

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
After inlning I see:
IPA function summary for rcu_tasks_trace_pertask/5350 inlinable                 
  global time:     13.535950                                                    
  self size:       11                                                           
  global size:     16                                                           
  min size:       11                                                            
  self stack:      0                                                            
  global stack:    0                                                            
  estimated growth:5                                                            
    size:8.000000, time:5.807250                                                
    size:3.000000, time:2.000000,  executed if:(not inlined)                    
    size:2.000000, time:2.000000,  nonconst if:(op0 changed)                    
  calls:                                                                        
    rcu_tasks_trace_pertask.part.0/5788 inlined                                 
      freq:0.62                                                                 
      Stack frame offset 0, callee self size 0                                  
      trc_wait_for_one_reader/5799 inlined                                      
        freq:0.62                                                               
        Stack frame offset 0, callee self size 0                                
        __builtin_unreachable/5800 unreachable                                  
          freq:0.00 loop depth: 0 size: 0 time:  0 predicate: (false)           
           op0 is compile time invariant                                        
           op1 is compile time invariant                                        
        trc_wait_for_one_reader.part.0/5784 --param max-inline-insns-auto limit
reached
          freq:0.31 loop depth: 0 size: 3 time: 12 callee size:100 stack: 0     
    __builtin_expect/5421 function body not available                           
      freq:1.00 loop depth: 0 size: 0 time:  0                                  
       op1 is compile time invariant                                            

So it seems that we determine call in trc_wait_for_one_reader unreachable.
It originally calls printk:
IPA function summary for trc_wait_for_one_reader/5348 inlinable                 
  global time:     13.500000                                                    
  self size:       20                                                           
  global size:     20                                                           
  min size:       1                                                             
  self stack:      0                                                            
  global stack:    0                                                            
    size:1.000000, time:1.000000                                                
    size:3.000000, time:2.000000,  executed if:(not inlined)                    
    size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst
if:(op0[ref offset: 8672] changed) && (not inlined)
    size:2.500000, time:2.500000,  nonconst if:(op0[ref offset: 8672] changed)  
    size:1.500000, time:0.250000,  executed if:(op0[ref offset: 8672] != -1) &&
(not inlined)
    size:3.500000, time:1.250000,  executed if:(op0[ref offset: 8672] != -1)    
  calls:                                                                        
    trc_wait_for_one_reader.part.0/5784 function not considered for inlining    
      freq:0.50 loop depth: 0 size: 3 time: 12 callee size:55 stack: 0
predicate: (op0[ref offset: 8672] == -1)
    _printk/5452 function body not available                                    
      freq:0.00 loop depth: 0 size: 5 time: 14 predicate: (op0[ref offset:
8672] != -1)
       op0 is compile time invariant                                            

So we somehow figure out (op0[ref offset: 8672] != -1) here:
Enqueueing calls in rcu_tasks_trace_pertask.part.0/5788.                        
   Estimating body: trc_wait_for_one_reader/5348                                
   Known to be false: not inlined, op0[ref offset: 8672] != -1, op0[ref offset:
8672] changed
   size:4 time:7.000000 nonspec time:12.000000                                  

This seems to be based on:

Jump functions:                                                                 
  Jump functions of caller  rcu_tasks_trace_pertask.part.0/5788:                
    callsite  rcu_tasks_trace_pertask.part.0/5788 ->
trc_wait_for_one_reader/5348 :
       param 0: PASS THROUGH: 0, op nop_expr                                    
         Aggregate passed by reference:                                         
           offset: 8672, type: int, CONST: -1                                   
         value: 0x0, mask: 0xffffffffffffffff                                   
         Unknown VR                                                             
       param 1: PASS THROUGH: 1, op nop_expr                                    
         value: 0x0, mask: 0xffffffffffffffff                                   
         Unknown VR                                                             

Which seems correct:
rcu_tasks_trace_pertask.part.0 (struct task_struct * t, struct list_head * hop) 
{                                                                               
  struct task_struct * D.58527;                                                 
  u64 pfo_val__;                                                                
  _Bool _1;                                                                     
  long int _2;                                                                  
  long int _3;                                                                  
  struct task_struct * _14;                                                     

  <bb 4> [local count: 1073741824]:                                             

  <bb 2> [local count: 1073741824]:                                             
  __asm__ __volatile__("" :  :  : "memory");                                    
  MEM[(volatile u8 *)t_1(D) + 1089B] ={v} 0;                                    
  __asm__ __volatile__("lock; addl $0,-4(%%rsp)" :  :  : "cc", "memory");       
  t_1(D)->trc_ipi_to_cpu = -1;                                                  
  trc_wait_for_one_reader (t_1(D), hop_2(D));                                   

  <bb 3> [local count: 1073741824]:                                             
  return;                                                                       

}                                                                               

so there is really store to trc_ipi_to_cpu. The code uses it as:

  _60 ={v} t_59(D)->trc_ipi_to_cpu;                                             
  __asm__ __volatile__("" :  :  : "memory");                                    
  if (_60 != -1)                                                                
    goto <bb 3>; [50.00%]                                                       
  else                                                                          
    goto <bb 6>; [50.00%]                                                       

So bug may be that we ignore volatile flag when determining IPA predicates?

problem goes away with -fno-partial-inlining.

Reply via email to