Hi Maxim and Vlad,

I just tracked an ICE while building glibc for ARM to this patch,
which introduced --param max-sched-extend-regions-iters with a default
of two:

  http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00998.html

The testcase is attached; an arm-linux-gnueabi compiler should be able to
reproduce it with -p -O2.  The failure is inability to find two consecutive
registers to hold a DImode value.  The cause is roughly like this:

  DImode add;
  if (({complicated asm with many local register variables}))
    return 0;

The register variables get lifted out of the if statement and moved before
the add, thus occupying basically all available hard registers.

If it were just that, I might try to cobble around it in glibc.  But there's
actually another layer:

  if (DImode compare)
    {
       DImode add;
       if (({complicated asm with many local register variables}))
         return 0;
       ...
    }

The register variables and their initializations get hoisted all the way out
of the first if.  On ia64, with a million execution units to spare and a
fat pipeline, this may make sense.  On targets with a simpler execution
model, though, it's pretty awful.  If the condition (which we have no
information on the likelihood of) is false, we've added lots of cycles for
no gain.  It's not like the scheduler was filling holes; the initializations
were scheduled as early as possible because they had no dependencies.

With the parameter turned back down to one, the testcase compiles, and the
code looks sensible again.  No, I wasn't able to work out why profiling was
necessary to trigger this problem; I suspect it makes some register
unavailable, but I'm not sure which.  I didn't look into that further.

What's your opinion?  We could easily change the default of the parameter
for ARM, but I assume there are other affected targets.  I don't know if we
need the extended region scheduling to be smarter, or if it should simply be
turned off for some targets.

-- 
Daniel Jacobowitz
CodeSourcery
typedef union
{
  struct
  {
    int __lock;
    unsigned int __futex;
    __extension__ unsigned long long int __total_seq;
    __extension__ unsigned long long int __wakeup_seq;
  } __data;
}
pthread_cond_t;
__pthread_cond_signal (cond)
     pthread_cond_t *cond;
{
  if (cond->__data.__total_seq > cond->__data.__wakeup_seq)
    {
      ++cond->__data.__wakeup_seq;
      if (!__builtin_expect (({
	do { } while (0);
	long int __ret;
	__ret = ({
	  register int _a1 asm ("r0"), _nr asm ("r7");
	  register int _v2 asm ("v2") = (int)(((4 << 24) | 1));
	  register int _v1 asm ("v1") = (int)((&cond->__data.__lock));
	  register int _a4 asm ("a4") = (int)((1));
	  register int _a3 asm ("a3") = (int)((1));
	  register int _a2 asm ("a2") = (int)(5);
	  _a1 = (int) ((&cond->__data.__futex));
	  _nr = ((0 + 240));
	  asm volatile ("swi	0x0	@ syscall " "SYS_ify(futex)"
			: "=r" (_a1)
			: "r" (_nr), "r" (_a1), "r" (_a2),
			"r" (_a3), "r" (_a4), "r" (_v1), "r" (_v2)
			: "memory");
	  _a1;});
	__ret;}), 0))
	return 0;
    }
}

Reply via email to