On Tue, 2014-07-29 at 10:11 -0700, Mike Stump wrote: > On Jul 29, 2014, at 7:56 AM, Peter Bergner <berg...@vnet.ibm.com> wrote: > > Currently, the IBM long double routines in libgcc use a union to construct > > a long double from two double values. This causes horrific code generation > > that copies the two double from the FP registers over to GPRs and back > > again, giving us two loads and two stores, which leads to two load-hit-store > > hazzards. > > Gosh, it’s too bad we don’t have any sort of technology to optimize moving > data around.
Well the problem is we're trying to move it around, when we'd really like the data to stay in the FP registers the entire time. The problem is that unions and structs that are the same size as a TImode/TFmode/TDmode are always converted to TImode and that is what ends up causing the whole fp -> int -> fp shuffle which leads to crappy code. On power8 where we have int <-> fp reg copy instructions, it's better than the copy thru the stack frame, but even that is unnecessary. Peter