http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46390
Summary: cpuid assembly instruction is optimized away in loop over threads. Product: gcc Version: 4.4.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: ed...@svi.nl GCC seems to optimize the cpuid assembly instruction (used to retrieve the current APIC ID) away, because the input for the instruction seems to be independent from the loop: for (i = 0; i < CPU_SETSIZE; i++) { /* Add cpu i to the cpu set 'set'. */ CPU_ZERO(&set); CPU_SET(i, &set); /* Try to switch the affinity and collect the APIC ID of thread i. Sleep gives the OS a chance to switch to the desired thread. */ err = sched_setaffinity(pid, sizeof(cpu_set_t), &set); sleep(0); if (err >= 0) { asm("cpuid" : "=b" (additional) : "a" (1) : "%ecx", "%edx"); id = (additional & APIC_ID_MASK) >> 24; list[(*cnt)++] = id; } } It can be 'fixed' by using the loop variable i as a 'dummy input' for the assembly instruction: asm("cpuid" : "=b" (additional) : "a" (1), "b" (i) : "%ecx", "%edx"); This method to retrieve APIC ID's is common practice: http://software.intel.com/en-us/articles/optimal-performance-on-multithreaded-software-with-intel-tools/