Implicit altivec vs. linux kernel build
Hi ! There seem to be a problem with gcc 4.0 and implicit generation of altivec instructions when -mcpu=970. The problem is that the kernel cannot afford to use altivec instructions (nor FPU) except in controlled environment. Specifically, things like the RAID6 code has altivec (and SSE/2, which I think has a similar problem) implementation which runs in the proper environment. In order to build that, we have -mcpu=970. Unfortunately, with 4.0, that causes gcc to implicitely generate altivec code, which breaks it all. So what is the proper way or set of options for me to: 1) optionally have POWER4 optimisations (that must be independant on the rest below) 2) be able to use altivec instructions in assembly 3) be able to use altivec in a few selected bits of C code 4) never have altivec code implicitely generated by the compiler Regards, Ben.
Re: Implicit altivec vs. linux kernel build
On Sun, 2005-02-27 at 17:47 -0500, Andrew Pinski wrote: > s the proper way or set of options for me to: > > > > 1) optionally have POWER4 optimisations (that must be independant on > > the rest below) > > 2) be able to use altivec instructions in assembly > > 3) be able to use altivec in a few selected bits of C code > > 4) never have altivec code implicitely generated by the compiler > > Use -mcpu=970 -mno-altivec this will cause not altivec instructions to > be created. -mcpu=970 implies you are compiling for 970 and want the > best code produced. Ok, but, will the above allow me to explicitely use altivec instructions in the RAID6 code and the assembly ? Also, if I have CONIFG_POWER4 not enabled (which mean I need code that will work on POWER3 for example), what should I use so can I still use altivec instructions in the RAID6 code and assembly ? The problem is that when CONFIG_POWER4 is not set, the kernel should still boot any machine. When CONFIG_POWER4 is set, it's just an optimisation to generate code for POWER4 and later only, but it should have no impact on the kind of code I am _able_ to generate in selected places. Ben.
Re: Implicit altivec vs. linux kernel build
On Sun, 2005-02-27 at 17:53 -0500, David Edelsohn wrote: > >>>>> Benjamin Herrenschmidt writes: > > Ben> There seem to be a problem with gcc 4.0 and implicit generation of > Ben> altivec instructions when -mcpu=970. > > Ben> The problem is that the kernel cannot afford to use altivec instructions > Ben> (nor FPU) except in controlled environment. Specifically, things like > Ben> the RAID6 code has altivec (and SSE/2, which I think has a similar > Ben> problem) implementation which runs in the proper environment. > > Ben> In order to build that, we have -mcpu=970. Unfortunately, with 4.0, that > Ben> causes gcc to implicitely generate altivec code, which breaks it all. > > Geoff made the decision to enable Altivec implicitly for > processors that support it and to use Altivec for block moves. We need to > find out what he had in mind for situations like this. > > One work around is to use -mcpu=power4 instead of -mcpu=970, to > avoid enabling Altivec when compiling kernel code. Yes, but as I wrote, that prevents building the RAID6 code which contains some selected altivec bits and cause gas to not get passed the proper option so we can have instructions like "dssall" in the low level assembly files. The later can probably be worked around by adding the proper Wa,-mcpu=any I suppose ... The RAID6 code is a bit more annoying since currently, it's not a separate file. Ben.
Re: Implicit altivec vs. linux kernel build
> Yes, but as I wrote, that prevents building the RAID6 code which > contains some selected altivec bits and cause gas to not get passed the > proper option so we can have instructions like "dssall" in the low level > assembly files. > > The later can probably be worked around by adding the proper > Wa,-mcpu=any I suppose ... The RAID6 code is a bit more annoying since > currently, it's not a separate file. Oh, and there are gcc version that will refuse -mcpu=power4 -maltivec so I can't even use -mcpu=power4 for the whole kernel and -maltivec just for the file containing the raid6 code Ben.
Re: Implicit altivec vs. linux kernel build
On Sun, 2005-02-27 at 18:40 -0500, Andrew Pinski wrote: > On Feb 27, 2005, at 6:35 PM, David Edelsohn wrote: > > > As Andrew Pinski mentioned, you also can use -mcpu=970 > > -mno-altivec. That should allow the assembler to accept Altivec > > instructions, but GCC will not know about any Altivec registers for > > inlined assembly parameters. > > As I and Ben found out that does not work, altivec is still turned on > and there is no way to turn it off. Alan has a patch which he is going > to post (again) on my request to fix the problem with "-mcpu=power4 > -maltivec", > I don't know if -mno-altivec will work though. Ok. What I need is -mcpu=power4 -maltivec Later on, if we end up having another CPU with altivec for which we wish to provide a specific -mcpu=<> option for optimization, we'll have to add -mno-altivec. The issue then will be, can we have -mno-altivec in the main Makefile CFLAGS and "override" this with -maltivec on a single file (the RAID6 code). The way the kernel makefiles work, this won't be simple unless gcc can grok -mno-altivec -maltivec with the later overriding the former... Ben.
Re: Implicit altivec vs. linux kernel build
On Sun, 2005-02-27 at 18:56 -0500, David Edelsohn wrote: > >>>>> Benjamin Herrenschmidt writes: > > Ben> Ok. What I need is -mcpu=power4 -maltivec > > Sorry, no. -maltivec means generate Altivec code, not just enable > Altivec instructions and registers. The above option is not different > than -mcpu=970. There is no DWIM option. No, that's fine. What I need is - the entire kernel beeing built with -mcpu=power4 - the raid6 code only beeing built with the additional -maltivec That should work fine. The only problem I see is that the day we have a CPU, let's call it POWER8 for the sake of this demonstration, that has altivec and is different enough to justify a specific "optimize" option, we'll have to use -mcpu=POWER8 -mno-altivec for the whole kernel, which makes it difficult to enable altivec only for the raid6 file since the kenrel makefiles, afaik, can only add an option to a specific file. Unless -mcpu=POWER8 -mno-altivec -maltivec is legal ... Ben.
Re: Implicit altivec vs. linux kernel build
On Sun, 2005-02-27 at 19:32 -0500, David Edelsohn wrote: > >>>>> Benjamin Herrenschmidt writes: > > Ben> The only problem I see is that the day we have a CPU, let's call it > Ben> POWER8 for the sake of this demonstration, that has altivec and is > Ben> different enough to justify a specific "optimize" option, we'll have to > Ben> use -mcpu=POWER8 -mno-altivec for the whole kernel, which makes it > Ben> difficult to enable altivec only for the raid6 file since the kenrel > Ben> makefiles, afaik, can only add an option to a specific file. Unless > Ben> -mcpu=POWER8 -mno-altivec -maltivec is legal ... > > It depends why you are using -mcpu=power8. If one wants to > generate common PowerPC code tuned for POWER8, one could use > -mtune=power8. If one specifically wants to generate POWER4, POWER5, etc. > base architecture instructions, GCC probably should add a PowerPC/AS > generic cpu type to match the existing "powerpc" and "powerpc64" types so > that one could enable the instructions common to the architecture and tune > for the latest processor without enabling processor-specific features. I'm talking about processor specific features. For example, we currently use -mcpu=power4 with CONFIG_POWER4 is enabled to enable gcc to generate power4-and-later only instructions. What if we want a similar CONFIG_POWER8 option in the future because those new instructions make an interesting enough difference in perfs ? -mcpu=power8 will enable altivec by default, so I'll have to add -mno-altivec to the "generic" CFLAGS. But then, how can I specify in the "additional" CFLAGS for the RAID6 code -maltivec ? Ben.
RE: Implicit altivec vs. linux kernel build
> Surely it would be possible to use -ffixed-* options to reserve all the > altivec registers and get precisely that effect? Nah, I don't need to be that drastic. The RAID6 code is already in a separate file that can have a specific additional set of compile flags, so I can just enable altivec just for this file and make sure the functions it provides are always called from the proper environment (that is have the enable_kernel_altivec() call etc... be _outside_ of that file). Ben.
Re: Bad unwinder data for __kernel_sigtramp_rt64 in PPC 64 vDSO corrupts Condition Register
On Wed, 2007-10-17 at 12:58 +0930, Alan Modra wrote: > On Tue, Oct 16, 2007 at 08:21:55PM +0200, Jakub Jelinek wrote: > > On Tue, Oct 16, 2007 at 06:02:13PM +0100, Andrew Haley wrote: > > > The reason is that the unwinder data for CR in the vDSO is wrong. The > > > line that affects the CR is here in > > My fault. > > > According to __builtin_init_dwarf_reg_size_table on ppc64-linux > > r0..r31, fp0..fp31, mq, lr, ctr, ap, vrsave, vscr, spe_acc, spefcsr, sfp > > are 64-bit, v0..v31 128-bit and cr0..cr7, xer 32-bit. > > So both kernel and gcc/config/rs6000/linux-unwind.h are wrong. > > > > > arch/powerpc/kernel/vdso64/sigtramp.S: > > > > > > rsave (70, 38*RSIZE)/* cr */ > > > > This should just be changed to > > /* Size of CR regs in DWARF unwind info. */ > > #define CRSIZE 4 > > ... > > rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ > > > > and similarly linux-unwind.h should do: > > > > fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa; > > /* CR? regs are just 32-bit and PPC is big-endian. */ > > fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4; > > This looks good to me. I don't think we can change the unwinder to > use a different size for cr as that would break unwinding through > normal stack frames that save cr. So the kernel fix would look like that right ? If you are ok, I'll submit it tomorrow. Index: linux-work/arch/powerpc/kernel/vdso64/sigtramp.S === --- linux-work.orig/arch/powerpc/kernel/vdso64/sigtramp.S 2007-10-17 13:32:49.0 +1000 +++ linux-work/arch/powerpc/kernel/vdso64/sigtramp.S2007-10-17 13:34:18.0 +1000 @@ -134,13 +134,16 @@ V_FUNCTION_END(__kernel_sigtramp_rt64) 9: /* This is where the pt_regs pointer can be found on the stack. */ -#define PTREGS 128+168+56 +#define PTREGS 128+168+56 /* Size of regs. */ -#define RSIZE 8 +#define RSIZE 8 + +/* Size of CR reg in DWARF unwind info. */ +#define CRSIZE 4 /* This is the offset of the VMX reg pointer. */ -#define VREGS 48*RSIZE+33*8 +#define VREGS 48*RSIZE+33*8 /* Describe where general purpose regs are saved. */ #define EH_FRAME_GEN \ @@ -178,7 +181,7 @@ V_FUNCTION_END(__kernel_sigtramp_rt64) rsave (31, 31*RSIZE); \ rsave (67, 32*RSIZE);/* ap, used as temp for nip */ \ rsave (65, 36*RSIZE);/* lr */ \ - rsave (70, 38*RSIZE) /* cr */ + rsave (70, 38*RSIZE + (RSIZE - CRSIZE)) /* cr */ /* Describe where the FP regs are saved. */ #define EH_FRAME_FP \