Thank you Jonh and Vincent for the answers!
Well, I did a little more research, and tested a very simple code (anexed).
Ran ldd on it, and got
+++++++++
[EMAIL PROTECTED]:~$ ldd a.out
libc.so.6 => /lib/libc.so.6 (0x00002b630e4d5000)
/lib64/ld-linux-x86-64.so.2 (0x00002b630e3b8000)
+++++++++
So I think that my gcc is using 64bits libraries.

My question now points to other direction: the SSE/SSE2 registers can
support, as Vincent said, 2 x 64bits float or 2 x 64bits
integer. Forgeting about the SIMD instructions (not using vectorization at
any level yet), what in the 32bits processor uses two registers (double,
64bits), in amd64 uses just one register (of half of 128bit register, for
64bits float), right?  Is gcc already using this amd64
registers? Or PGI compilers, or sun express compilers? To be more
clear, what I am expecting is that, if I need 10 digits, in an
32bits I would need 2 registers, and so more slow code (two cicles per
instruction, etc), in 64bits I would need only one
register, and so my code will be faster.

And we, scientists, are worried about this roundoff errors ;-)

Thank you!

Ivan





2006/11/22, Vincent Diepeveen <[EMAIL PROTECTED]>:

hi Ivan,

there is for your amd64 hardware different compiler versions of gcc. One
that is 32 bits and one that is 64 bits.
For 64 bits doubles it doesn't matter however which of the both you use,
yet i'd advice going for the 64 bits version.

To define variables just use:
  "double"

double is 64 bits in total when using gcc. intel c++ sometimes uses less
but
i guess not at x64. that kind of cheating happens more for specfp and
chips that lack certain important instructions, such as itanium lacks a lot.

mantissa in double is 52 bits of the precision of total length of number.
then 1 bit for sign and the other 11 for exponent.

the 128 bits SSE/SSE2 splits itself up into vector registers.
So either 2 independant unrelated doubles of 64 bits, 4 floats of 32 bits,
or 4 integers of 32 bits or 2 integers of 64 bits.

if you have no professor degree spaghetti programming, then better stay
away from SIMD (SSE/SSE2) and let the compiler automatically generate it for
you.

You like to compile at AMD hardware with the additional flags (additional
to the ones you want for SIMD):

gcc -O2 -march=k8 -mtune=k8 -o myprogram

-O3 and higher potentially buggy for most software i try, but if you have
a deterministic way to test executables you can of course try it out.

Let's not complain about those bugs too loud. GCC team is happy amateurs,
like all gnu folks never having followed any course about QAD (quality
assurance testing) which is most important thing of a product; test until it
is reliable and bugfree working. However QAD and testing is for
professionals rather than amateurs, we understand that very well, and
instead cheer for having the privilege to work with this free available
software.

As it seems the gcc 4.1 release was a snapshot lobotomized deliberately
for AMD, pgo for most software no longer helps much at AMD hardware with
gcc. Earlier 4.1 snapshots were at least for my integer code a lot faster
with pgo (using -O3 in the speedtests by the way). So not worth the trouble
trying to get pgo working, as you need to check it for deterministic output
too; which with floating point is quite hard i assume.

Some tricks indeed get rid of a few digits significance in compilers, or
cause roundoff errors to sooner backtrack in the endresult (which nearly no
scientist ever notices, saying more about the scientist than the different
compilers), yet when using above tips that shouldn't happen too quickly
and in doubles you should have roughly 52 * log(2) / log(10) = 52 / 3.322= 15 
digits or so

Good Luck!

Vincent

----- Original Message -----
*From:* Ivan Paganini <[EMAIL PROTECTED]>
*To:* beowulf@beowulf.org
*Sent:* Wednesday, November 22, 2006 12:01 PM
*Subject:* [Beowulf] Question about amd64 architecture and floating
pointoperations

Hello everybody at beowulf. Sorry about the _really_ newbie question, but
after doing some tests and researching a little, a question arose when
fooling around with amd64 (more precisely, an amd64 Athlon 4200 X2) and gcc
and sun studio 11. The architecture has 64 bits integer registers and 128
bits floating point registers, but my test programs in C just gave me the
same precision that I got with an old athlon 2400 xp (32bits), that is, long
double go only to 1x10^ 4961, even with the -m64 flag. I always imagined
that I would get the double precision without the long double declaration
(or, maybe, 40bits precision). What am I missing here? Is the compiler (gcc
4.1, sun studio express 11), the operating system (ubuntu 64bits edgy), or
just an error in my logic?

Thank you for the patience!

--
-----------------------------------------------------------
Ivan S. P. Marin
Laboratório de Física Computacional
lfc.ifsc.usp.br
Instituto de Física de São Carlos - USP
----------------------------------------------------------

------------------------------
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf




--
-----------------------------------------------------------
Ivan S. P. Marin
Laboratório de Física Computacional
lfc.ifsc.usp.br
Instituto de Física de São Carlos - USP
----------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to