https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65369
Thomas Preud'homme <thopre01 at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |NEW
Last reconfirmed|2015-03-10 00:00:00 |2015-03-12 0:00
Assignee|thopre01 at gcc dot gnu.org |unassigned at gcc dot
gnu.org
--- Comment #27 from Thomas Preud'homme <thopre01 at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #26)
> So, on my version of the testcase with r210843 -O3 -mcpu=power8 there are
> like 49
> 32 bit load in host endianness found at: _105 = MEM[(const unsigned char
> *)load_src_25];
> occurrences, so I've added a quick hack (should have used dbg counters
> parhaps), and
> with BSWAPCNT=16 it works fine, with BSWAPCNT=17 it fails.
> In the *.optimized dump, I've noticed that this single load matters for
> vectorization in md4_update function, with BSWAPCNT=16 a chunk of code isn't
> vectorized, with BSWAPCNT=17 it is.
>
> So very well this might just trigger a latent bug in the vectorizer or
> powerpc backend.
Using trunk I get the following difference for bswap
@@ -1110,10 +1111,10 @@ nettle_md4_update (struct md4_ctx * ctx,
_100 = MEM[(const uint8_t *)data_149 + 1B];
_101 = (unsigned int) _100;
_102 = _101 << 8;
+ _106 = MEM[(const uint8_t *)data_149];
_104 = *data_149;
_105 = (unsigned int) _104;
_123 = _99 | _105;
- _106 = _102 | _123;
Which looks perfectly fine. So yeah, I guess the problem is at a different
level.