tbp wrote:
On 8/23/07, Tim Prince <[EMAIL PROTECTED]> wrote:
Note that icc9 has a strong bias for pentium4, which had no stall
penalty for mistyped fp vectors as for Intel it came with the pentium
M line, so you see a pxor even if generating code for the core2.
# cat autoicc.cc
float foo(const float *a, int n) {
float sum = 0.f;
for (int i = 0; i <n; ++i)
if (a[i] > 0.f)
sum += a[i];
return sum;
}
int main() { return 0; }
# /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED.
4007a9: pxor %xmm4,%xmm4
4007ad: cmpltps %xmm3,%xmm4
4007b1: andps %xmm3,%xmm4
# /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED.
400b50: xorps %xmm3,%xmm3
400b53: cmpltps %xmm4,%xmm3
400b57: andps %xmm3,%xmm4
For what little it's worth, I found no measurable difference between
these choices on Core 2 Duo. People I know prefer to use the Intel -xW
option, or the gcc default, in the absence of clear evidence that
another option could improve performance without reducing the range of
supported targets.
---AV & Spam Filtering by M+Guardian - Risk Free Email (TM)---