Hi Maxim and Vlad, I just tracked an ICE while building glibc for ARM to this patch, which introduced --param max-sched-extend-regions-iters with a default of two:
http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00998.html The testcase is attached; an arm-linux-gnueabi compiler should be able to reproduce it with -p -O2. The failure is inability to find two consecutive registers to hold a DImode value. The cause is roughly like this: DImode add; if (({complicated asm with many local register variables})) return 0; The register variables get lifted out of the if statement and moved before the add, thus occupying basically all available hard registers. If it were just that, I might try to cobble around it in glibc. But there's actually another layer: if (DImode compare) { DImode add; if (({complicated asm with many local register variables})) return 0; ... } The register variables and their initializations get hoisted all the way out of the first if. On ia64, with a million execution units to spare and a fat pipeline, this may make sense. On targets with a simpler execution model, though, it's pretty awful. If the condition (which we have no information on the likelihood of) is false, we've added lots of cycles for no gain. It's not like the scheduler was filling holes; the initializations were scheduled as early as possible because they had no dependencies. With the parameter turned back down to one, the testcase compiles, and the code looks sensible again. No, I wasn't able to work out why profiling was necessary to trigger this problem; I suspect it makes some register unavailable, but I'm not sure which. I didn't look into that further. What's your opinion? We could easily change the default of the parameter for ARM, but I assume there are other affected targets. I don't know if we need the extended region scheduling to be smarter, or if it should simply be turned off for some targets. -- Daniel Jacobowitz CodeSourcery
typedef union { struct { int __lock; unsigned int __futex; __extension__ unsigned long long int __total_seq; __extension__ unsigned long long int __wakeup_seq; } __data; } pthread_cond_t; __pthread_cond_signal (cond) pthread_cond_t *cond; { if (cond->__data.__total_seq > cond->__data.__wakeup_seq) { ++cond->__data.__wakeup_seq; if (!__builtin_expect (({ do { } while (0); long int __ret; __ret = ({ register int _a1 asm ("r0"), _nr asm ("r7"); register int _v2 asm ("v2") = (int)(((4 << 24) | 1)); register int _v1 asm ("v1") = (int)((&cond->__data.__lock)); register int _a4 asm ("a4") = (int)((1)); register int _a3 asm ("a3") = (int)((1)); register int _a2 asm ("a2") = (int)(5); _a1 = (int) ((&cond->__data.__futex)); _nr = ((0 + 240)); asm volatile ("swi 0x0 @ syscall " "SYS_ify(futex)" : "=r" (_a1) : "r" (_nr), "r" (_a1), "r" (_a2), "r" (_a3), "r" (_a4), "r" (_v1), "r" (_v2) : "memory"); _a1;}); __ret;}), 0)) return 0; } }