On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > On 10/10/16 11:22, Richard Biener wrote: > > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > > > Hi Richard, > > > > > > As I mentioned, here is the patch applying to the main store merging patch > > > to > > > re-implement encode_tree_to_bitpos > > > to operate on the bytes directly. > > > > > > This works fine on little-endian but breaks on big-endian, even for > > > merging > > > bitfields within a single byte. > > > Consider the code snippet from gcc.dg/store_merging_6.c: > > > > > > struct bar { > > > int a : 3; > > > unsigned char b : 4; > > > unsigned char c : 1; > > > char d; > > > char e; > > > char f; > > > char g; > > > }; > > > > > > void > > > foo1 (struct bar *p) > > > { > > > p->b = 3; > > > p->a = 2; > > > p->c = 1; > > > p->d = 4; > > > p->e = 5; > > > } > > > > > > The correct GIMPLE for these merged stores on big-endian is: > > > MEM[(voidD.49 *)p_2(D)] = 18180; > > > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > whereas with this patch we emit: > > > MEM[(voidD.49 *)p_2(D)] = 39428; > > > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > The dump for merging the individual stores without this patch (using the > > > correct but costly wide_int approach in the base patch) is: > > > After writing 3 of size 4 at position 3 the merged region contains: > > > 6 0 0 0 0 0 > > > After writing 2 of size 3 at position 0 the merged region contains: > > > 46 0 0 0 0 0 > > > After writing 1 of size 1 at position 7 the merged region contains: > > > 47 0 0 0 0 0 > > > After writing 4 of size 8 at position 8 the merged region contains: > > > 47 4 0 0 0 0 > > > After writing 5 of size 8 at position 16 the merged region contains: > > > 47 4 5 0 0 0 > > > > > > > > > And with this patch it is: > > > After writing 3 of size 4 at position 3 the merged region contains: > > > 18 0 0 0 0 0 > > > After writing 2 of size 3 at position 0 the merged region contains: > > > 1a 0 0 0 0 0 > > > After writing 1 of size 1 at position 7 the merged region contains: > > > 9a 0 0 0 0 0 > > > After writing 4 of size 8 at position 8 the merged region contains: > > > 9a 4 0 0 0 0 > > > After writing 5 of size 8 at position 16 the merged region contains: > > > 9a 4 5 0 0 0 > > > > > > (Note the dump just dumps the byte array from index 0 to <len> so the > > > first > > > thing printed is the lowest numbered byte. > > > Also, each byte is dumped in hex.) > > > > > > The code as included here doesn't do any byte swapping for big-endian but > > > as > > > seen from the dump even writing a sub-byte > > > bitfield goes wrong so it would be nice to resolve that before going > > > forward. > > > Any help with debugging this is hugely appreciated. I've included an ASCII > > > diagram of the steps in the algorithm > > > in the patch itself. > > Ah, I think you need to account for BITS_BIG_ENDIAN in > > shift_bytes_in_array. You have to shift towards MSB which means changing > > left to right shifts for BITS_BIG_ENDIAN. > > Thanks, I'll try it out. But this is on aarch64 where > BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 > so there's something else bad here.
Maybe I'm confusing all the macros, so maybe it's BYTES_BIG_ENDIAN (vs. WORDS_BIG_ENDIAN -- in theory this approach should work for pdp11 as well). Richard. > > You also seem to miss to account for amnt / BITS_PER_UNIT != 0. > > Independently of BYTES_BIG_ENDIAN it would be > > > > ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; > > ... > > doh, yes. I'll fix that. > > > (so best use a single load / store and operate on a temporary). > > Thanks, > Kyrill > > > Richard. > > > > > Thanks, > > > Kyrill > > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)