Sorry my previous answer was cut.

The motivation for prepass global code motion is indeed that after register 
allocation, inter-block scheduling is even more restricted due to 
anti-dependencies, including those due to live-out on side exit branches. 
Global code motion is a key performance enabler especially for the non-temporal 
loads (i.e. L1 cache bypass loads), which have an exposed latency close to 20 
cycles on the current kvx cores.

The dataflow issues encountered with SEL_SCHED in prepass with control 
speculation enabled was inconsistent liveness reported by the compiler. I am 
running a test suite to reproduce it (saw it 3 months ago).

Here is again a motivating example where I expect the scheduler to speculate 
loads from the second to the first block in the loop, which dominates it, so in 
principle SCHED_RGN should do it:

  typedef struct list_cell_ {
    struct list_cell_ *next;
    float payload;
  } list_cell_, *list_cell;

  float
  list_sum(list_cell_ *list)
  {
    float result = 0.0;
    while (list->next) {
      list = list->next;
      result += 1.0f/list->payload;
      if (!list->next) break;
      list = list->next;
      result += 1.0f/list->payload;
    }
    return result;
  }

Here is the TARGET_SCHED_SET_SCHED_FLAGS, with comments that reflect my 
understanding on what to do. The commented line prevents SEL_SCHED with control 
speculation unless postpass (as in ia64):

  static void
  kvx_sched_set_sched_flags (struct spec_info_def *spec_info)                   
                                                       
  {
    unsigned int *flags = &(current_sched_info->flags);                         
                                                       
    // Speculative scheduling is enabled by non-zero spec_info->mask.           
                                                       
    spec_info->mask = 0;                                                        
                                                       
    if (*flags & (SEL_SCHED | SCHED_RGN))                                       
                                                       
      {
        //if (!sel_sched_p () || reload_completed)                              
                                                       
          {
            // Must do this in case of speculation.                             
                                                       
            *flags |= USE_DEPS_LIST | DO_SPECULATION;                           
                                                       
            // Do control speculation only.                                     
                                                       
            spec_info->mask = BEGIN_CONTROL;                                    
                                                       
            // Speculative scheduling without CHECK.                            
                                                       
            spec_info->flags = SEL_SCHED_SPEC_DONT_CHECK_CONTROL;               
                                                       
            // Dump into the sched_dump.                                        
                                                       
            spec_info->dump = sched_dump;                                       
                                                       
          }
      }
  }

The TARGET_SCHED_SET_SCHED_FLAGS is implemented by (should memoize to return 0 
if already speculated with the same ts, assuming not relevant here):

  static int
  kvx_sched_speculate_insn (rtx_insn *insn, ds_t ts, rtx *new_pat)
  {
    rtx pattern = PATTERN (insn);
    if (GET_CODE (pattern) == SET)
      {
        rtx src = SET_SRC (pattern);
        if (GET_CODE (src) == MEM)
          {
            *new_pat = pattern;
            return 1;
          }
      }
    return -1;
  }

And TARGET_SCHED_NEEDS_BLOCK_P always returns false.

When I compile the motivating example above for the KVX, 
kvx_sched_speculate_insn() is indeed called with reload_completed==0 (prepass) 
for the two loads of the second block, but no code motion to the first block 
happens. Generated code is the same for SCHED_RGN (default) or SEL_SCHED 
(-fselective-scheduling), up to a renaming of the registers, although SEL_SCHED 
calls kvx_sched_speculate_insn() several times for each load.

For the ia64 on the motivating example, it seems there is no prepass control 
speculation either:

  ./gcc/ia64/gcc/cc1 -fpreprocessed list_sum2.i -quiet -dumpbase list_sum2.c 
-dp -auxbase list_sum2 -O3 -version -ffast-math -o list_sum2.s -da -dp 
-msched-control-spec -msched-in-control-spec
  grep _speculative list_sum2.c.*
  list_sum2.c.298r.mach:            ] UNSPEC_LDS)) 24 {movsf_speculative}
  ...

I noticed that the ia64 target uses the undocumented target hooks 
TARGET_SCHED_GET_INSN_SPEC_DS and TARGET_SCHED_GET_INSN_CHECKED_DS whose code 
is actually executed on this example.

Any recommendation on how to get load control speculation in prepass for any of 
the GCC 7.5 targets?

Best,

Benoît

Reply via email to