On Thu, Oct 31, 2013 at 11:27 AM, Bernd Edlinger <bernd.edlin...@hotmail.de> wrote: > Hello, > > meanwhile, I have added a test case to that patch. > > Boot-strapped and regression-tested as usual. > > OK for trunk?
Err, well. This just means that the generic C++ memory model handling isn't complete. We do if (TREE_CODE (to) == COMPONENT_REF && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1))) get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos, &offset); and thus restrict ourselves to bitfields to initialize the region we may touch. This just lacks computing bitregion_start / bitregion_end for other fields. Btw, what does the C++ memory model say for struct { char x; short a; short b; } a __attribute__((packed)); short *p = &a.a; *p = 1; I suppose '*p' may not touch bits outside of a.a either? Or are memory accesses via pointers less constrained? What about array element accesses? That is, don't we need else { bitregion_start = 0; bitregion_end = bitsize; } ? Richard. > Bernd. > >> Hi, >> >> On Fri, 25 Oct 2013 11:26:20, Richard Biener wrote: >>> >>> On Fri, Oct 25, 2013 at 10:40 AM, Bernd Edlinger >>> <bernd.edlin...@hotmail.de> wrote: >>>> Hello, >>>> >>>> this patch fixes the recently discovered data store race on arm-eabi-gcc >>>> with -fno-strict-volatile-bitfields >>>> for structures like this: >>>> >>>> #define test_type unsigned short >>>> >>>> typedef struct s{ >>>> unsigned char Prefix[1]; >>>> test_type Type; >>>> }__attribute((__packed__,__aligned__(4))) ss; >>>> >>>> volatile ss v; >>>> >>>> void __attribute__((noinline)) >>>> foo (test_type u) >>>> { >>>> v.Type = u; >>>> } >>>> >>>> test_type __attribute__((noinline)) >>>> bar (void) >>>> { >>>> return v.Type; >>>> } >>>> >>>> >>>> I've manually confirmed the correct code generation using variations of the >>>> example above on an ARM cross-compiler for -fno-strict-volatile-bitfields. >>>> >>>> Note, that this example is still causes ICE's for >>>> -fstrict-volatile-bitfields, >>>> but I'd like to fix that separately. >>>> >>>> Boot-strapped and regression-tested on x86_64-linux-gnu. >>>> >>>> Ok for trunk? >>> >>> Isn't it more appropriate to fix it here: >>> >>> if (TREE_CODE (to) == COMPONENT_REF >>> && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1))) >>> get_bit_range (&bitregion_start, &bitregion_end, to, &bitpos, &offset); >>> >>> ? >>> >> >> Honestly, I'd call this is a work-around, not a design. >> >> Therefore I would not move that workaround to expr.c. >> >> Also the block below is only a work-around IMHO. >> >> if (MEM_P (str_rtx) && bitregion_start> 0) >> { >> enum machine_mode bestmode; >> HOST_WIDE_INT offset, size; >> >> gcc_assert ((bitregion_start % BITS_PER_UNIT) == 0); >> >> offset = bitregion_start / BITS_PER_UNIT; >> bitnum -= bitregion_start; >> size = (bitnum + bitsize + BITS_PER_UNIT - 1) / BITS_PER_UNIT; >> bitregion_end -= bitregion_start; >> bitregion_start = 0; >> bestmode = get_best_mode (bitsize, bitnum, >> bitregion_start, bitregion_end, >> MEM_ALIGN (str_rtx), VOIDmode, >> MEM_VOLATILE_P (str_rtx)); >> str_rtx = adjust_bitfield_address_size (str_rtx, bestmode, offset, >> size); >> } >> >> Here, if bitregion_start = 8, we have a 4 byte aligned memory context, >> and whoops, now it is only 1 byte aligned. >> >> this example: >> >> struct s >> { >> char a; >> int b:24; >> }; >> >> struct s ss; >> >> void foo(int b) >> { >> ss.b = b; >> } >> >> >> gets compiled (at -O3) to: >> >> foo: >> @ Function supports interworking. >> @ args = 0, pretend = 0, frame = 0 >> @ frame_needed = 0, uses_anonymous_args = 0 >> @ link register save eliminated. >> ldr r3, .L2 >> mov r1, r0, lsr #8 >> mov r2, r0, lsr #16 >> strb r1, [r3, #2] >> strb r0, [r3, #1] >> strb r2, [r3, #3] >> bx lr >> >> while... >> >> struct s >> { >> char a; >> int b:24; >> }; >> >> struct s ss; >> >> void foo(int b) >> { >> ss.b = b; >> } >> >> >> gets compiled (at -O3) to >> >> foo: >> @ Function supports interworking. >> @ args = 0, pretend = 0, frame = 0 >> @ frame_needed = 0, uses_anonymous_args = 0 >> @ link register save eliminated. >> ldr r3, .L2 >> mov r2, r0, lsr #16 >> strb r2, [r3, #2] >> strh r0, [r3] @ movhi >> bx lr >> >> which is more efficient, but only because the memory context is still >> aligned in this case. >> >>> Btw, the C++ standard doesn't cover packed or aligned attributes so >>> we could declare this a non-issue. Any opinion on that? >>> >>> Thanks, >>> Richard. >>> >>>> Thanks >>>> Bernd.