Chips derived from McKinley-core (Itanium 2, etc.) have an anomaly which can cause stalls if an F-unit instruction (including a NOP) is issued within a six-cycle window after reading certain application registers (such as ar.bsp). Furthermore, power-considerations also argue against the use of F-unit instructions unless they're really needed.
Similarly, using B-unit NOPs is probably not a great idea for McKinley-derived cores: certain templates with B unit instructions cause split-issue and the BBB template in particular causes a branch-prediction anomaly (the first two branches are predicated based on the slot 0 hints. Unfortunately, at the moment the GCC scheduler still seems to favor using F and B-unit NOPs whenever possible. This made sense for Merced (since it didn't have sufficient M and I execution units) but really should be avoided for McKinley and newer cores. -- Summary: GCC should avoid generating F- and B-unit NOPs Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: davidm at hpl dot hp dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: ia64-linux GCC host triplet: ia64-linux GCC target triplet: ia64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20632