During Linux kernel development we ran into a few situations that showed that indirect calls (through a function pointer) are significant slower on IA64 than on other platforms. Various ugly workarounds have been added to work around that.
Some investigation shows the code gcc generates for indirect calls on ia64 isn't very good. The IA64 optimization manuals recommend to load branch registers as early as possible before a indirect jump, so that the CPU can start fetching the code stream at the target. Otherwise there is a longer stall. I ran some statistics over a 2.6.19 linux kernel with a recent 4.3 snapshot by grepping for indirect calls and in near all cases i looked at the branch register was loaded in the bundle directly preceding the bundle that contains the jump. Earlier versions (4.1 and 4.0) also weren't any better. >From looking at code in many cases it would have been possible to load the branch register earlier since there was no conditional state. This is a enhancement request to change the scheduler to be more aggressive at moving branch register loads earlier before jumps on ia64. -- Summary: Branch registers loaded too late on ia64 Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: ak at muc dot de GCC target triplet: ia64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30688