Re: Debian for x86-64 (AMD Opteron)
On Thu, Apr 10, 2003 at 12:43:38PM +0100, Colin Watson wrote: > On Thu, Apr 10, 2003 at 12:39:25PM +0200, Cyrille Chepelov wrote: > > Le Thu, Apr 10, 2003, ? 10:40:47AM +0100, Hamish Marson a ?crit: > > > I'm not sure how your logic works out that a 64 bit reg is going to > > > be faster than a 32bit one. Or do you mean simply you're expecting a > > > speedu because there are MORE 64 bit registers tahn 32 bit > > > registers? > > > > Reg pressure is pretty bad on x86; and int is still 32 bit on x86-64 > > (IIRC, long is 64 bit and of course any T* ). So yes, anything which > > plays with pointers will be larger on x86-64, but it's not an > > automatic doubling in size of everything. And mapping libraries twice > > also eats a good deal of memory. OTOH, 16 general-purpose 8,16,32 or > > 64-bit registers (not even counting a large SSE2 register file as > > well) should help gcc feel more at home (especially with less code > > dedicated to handling register<->memory swap-outs) > > > > I don't have numbers to back either choice, but it looks to me that a > > mixed userland with everything duplicated should be a last resort. And > > I'm sure some people have numbers out these. > > Based on the numbers I've seen, the factors mentioned seem to balance > each other out fairly well. I'm not (yet) allowed to talk about the > details though ... No need to be secretive. :) In Fred Weber's MPF presentation last year he put up some specific numbers about how code generation was affected by going to 64-bit mode. As I recall he said that the average instruction length increased slightly and static data size increased (larger pointers), but both the static and dynamic instruction counts went down because of the decrease in register spilling, and that generally offset the bloating effect of having larger pointers. I think the general consensus on the has been that on x86-64, compiling for 64-bit mode is seldom a performance loss and often a performance win. This is contrary to other mixed 32/64-bit architectures, usually because in those cases the 32-bit arch that the 64-bit arch was based on wasn't as archtecturally hamstrung in the first place as x86 is. Does what I've said jive with what you're seeing, in a very high-level sense? I don't want to get you in trouble with the NDA police or anything. :) -- Anderson MacKay <[EMAIL PROTECTED]> Green Hills Software -- Hardware Target Connections
Re: Debian for x86-64 (AMD Opteron)
On Thu, Apr 10, 2003 at 04:50:59PM +0100, Hamish Marson wrote: > Ah right. Light dawneth. Yes, you make excellent sense. Basically ia32 > is so hacked about & wacky (In order to be backwardly compatible) as to > be very slow, yet ia64 is a new instruction set with none of the baggage > that it had to carry around. Thus you can optimise ia64 architecture > better than ia32. Yep, that's pretty much it. But ia64 != x86-64 ... ia64 is the Intel Itanium's instruction set, while x86-64 is the AMD Opteron/Athlon 64, and they're very very different beasts. The x86-64 architecture really is just another extension of the x86 architecture (and so retains all the scary stuff that's been in there from the dawn of time for backwards compatibility) ... but it does add in a few nice features that make life easier for the compiler. IA64 would take some time to explain in full, but in short it's completely incompatible with x86 (of any sort), and is based on an idea called VLIW where multiple instruction "bundles" are issued together and more responsibility is placed on the compiler's instruction scheduler to extract parallelism from the instruction stream. > Compared to something like PowerPC (Sparc maybe? Although I don't think > Sparc was concieved as a 64 bit instruction set was it? I could be wrong > there though) where you start with a 64-bit definition and then cut it > back to 32-bit & so gain some optimisations which make 32-bit PowerPC > faster than 64-bit PowerPC (Except where you genuinely need 64bit of > course). Really it's that when you're in 64-bit mode you use 64-bit operands for many operations (particularly pointers and pointer arithmetic), and it takes more time to do 64-bit math than 32-bit math (c.f. on Opteron a 32-bit multiply has a 3-cycle latency, but a 64-bit mul has a 5-cycle latency). Furthermore, in 64-bit mode you put more stress on the memory subsystem because you're loading and storing some non-zero number of 64-bit data chunks that would have been smaller (probably 32 bits in size) in 32-bit mode. All that to say that if you can do something in 32-bit mode and all other things are equal, 32-bit operations are more efficient than 64-bit operations for many cases ... there's less bits to work on. In the case of x64-64, all things are *not* equal :), and the extra registers and such tends to offset the extra overhead of dealing with 64-bit operands. Or so AMD has said. Also, in 64-bit mode AMD left sizeof(int)==4 so that the overhead of 64-bit integer operations isn't incurred for many code paths that don't need it. They also made some noise about changing the C calling convention around and supposedly it's more efficient now, but I don't know much in the way of details on that. -- Anderson MacKay <[EMAIL PROTECTED]> Green Hills Software -- Hardware Target Connections