On Wed, Jun 26, 2013 at 4:51 PM, Andrew Pinski <pins...@gmail.com> wrote: > On Wed, Jun 26, 2013 at 4:41 PM, Yufeng Zhang <yufeng.zh...@arm.com> wrote: >> On 06/27/13 00:04, Andrew Pinski wrote: >>> >>> On Wed, Jun 26, 2013 at 3:39 PM, Yufeng Zhang<yufeng.zh...@arm.com> >>> wrote: >>>> >>>> This patch updates assign_parm_find_data_types to assign passed_mode and >>>> nominal_mode with the mode of the built pointer type instead of the >>>> hard-coded Pmode in the case of pass-by-reference. This is in line with >>>> the >>>> assignment to passed_mode and nominal_mode in other cases inside the >>>> function. >>>> >>>> assign_parm_find_data_types generally uses TYPE_MODE to calculate >>>> passed_mode and nominal_mode: >>>> >>>> /* Find mode of arg as it is passed, and mode of arg as it should be >>>> during execution of this function. */ >>>> passed_mode = TYPE_MODE (passed_type); >>>> nominal_mode = TYPE_MODE (nominal_type); >>>> >>>> this includes the case when the passed argument is a pointer by itself. >>>> >>>> However there is a discrepancy when it deals with argument passed by >>>> invisible reference; it builds the argument's corresponding pointer type, >>>> but sets passed_mode and nominal_mode with Pmode directly. >>>> >>>> This is OK for targets where Pmode == ptr_mode, but on AArch64 with ILP32 >>>> they are different with Pmode as DImode and ptr_mode as SImode. When such >>>> a >>>> reference is passed on stack, the reference is prepared by the caller in >>>> the >>>> lower 4 bytes of an 8-byte slot but is fetched by the callee as an 8-byte >>>> datum, of which the higher 4 bytes may contain junk. It is probably the >>>> combination of Pmode != ptr_mode and the particular ABI specification >>>> that >>>> make the AArch64 ILP32 the first target on which the issue manifests >>>> itself. >>>> >>>> Bootstrapped on x86_64-none-linux-gnu. >>>> >>>> OK for the trunk? >>> >>> >>> >>> IA64-hpux also uses Pmode != ptr_mode, can you provide the testcase >>> which fails without this change? >>> I used a powerpc64 target where Pmode != ptr_mode which did not hit >>> this bug either. >> >> >> The issue was firstly observed in one of the compat tests which passes a >> large number of non-small structures. The following is a trimmed-down >> reproducible code snippet (although not runnable but shall be easy to be >> make runnable): >> >> struct s5 >> { >> double a; >> double b; >> double c; >> double d; >> double e; >> } gS; >> >> double foo (struct s5 p1, struct s5 p2,struct s5 p3,struct s5 p4,struct s5 >> p5,struct s5 p6,struct s5 p7,struct s5 p8, struct s5 p9) >> { >> return p9.c; >> } >> --------------- CUT --------------- >> >> The code-gen (-O2) without the patch is: >> >> .text >> .align 2 >> .global foo >> .type foo, %function >> foo: >> ldr x0, [sp] <<=== here! >> ldr d0, [x0,16] >> ret >> .size foo, .-foo >> >> Where the arrow points is the load of the pointer to 'p9' that is passed on >> stack. The instruction really should be ldr w0, [sp], i.e. the pointer mode >> is SImode rather than DImode. >> >> It needs a number of conditions for the issue to manifest: >> >> 1. pass-by-reference; on aarch64 one example is a struct that is larger than >> 16 bytes. >> 2. the reference is passed on stack; on aarch64, this usually only happens >> after registers x0 - x7 are used. >> 3. the size of stack slot for passing pointer is larger than the pointer >> size; on aarch64, it is 8-byte vs. 4-byte >> 4. the unused part of the stack slot is not zeroed out, i.e. undefined by >> the ABI > > This is the real issue. I think it is better if we change the ABI to > say they are zero'd. It really makes things like this a mess. > >> 5. in the runtime, the unused part of such a stack slot contains junk. >> >> The runtime segmentation fault may only be generated when all the above >> conditions are met. I'm not familiar with IA64-hpux or powerpc64 procedure >> call ABIs, but I guess those targets are just being lucky? > > Or rather their ABIs all say are zero or sign extended for values less > than 8 byte wide.
One more thing, it looks like your change will not work correctly for big-endian ILP32 AARCH64 either as the least significant word is offsetted by 4. Did you test big-endian ILP32 AARCH64? Thanks, Andrew Pinski