https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119009
--- Comment #3 from Michal Jireš <mjires at gcc dot gnu.org> --- Thanks a lot for the script. I have reproduced it: # bad3714b - before my patch BM_UIOVecSink/0 33.8 us 33.8 us 20659 bytes_per_second=2.82508G/s html # 0895aef0 - my patch BM_UIOVecSink/0 41.0 us 41.0 us 16890 bytes_per_second=2.32381G/s html However current trunk shows the opposite: # 3605e057 - trunk BM_UIOVecSink/0 33.7 us 33.7 us 20161 bytes_per_second=2.82955G/s html # revert patch BM_UIOVecSink/0 39.9 us 39.9 us 17399 bytes_per_second=2.38832G/s html Is it still a problem on your machine with current trunk? Perf record/report of: snappy_benchmark --benchmark_filter=BM_UIOVecSink/0 --benchmark_min_warmup_time=5 --benchmark_time_unit=us shows regression in functions: 61.46% void snappy::SnappyDecompressor::DecompressAllTags<snappy::SnappyIOVecWriter>(snappy::SnappyIOVecWriter*) 25.65% snappy::(anonymous namespace)::IncrementalCopy(char const*, char*, char*, char*) relevant symbols: _ZN6snappy18SnappyDecompressor17DecompressAllTagsINS_17SnappyIOVecWriterEEEvPT_ _ZN6snappy12_GLOBAL__N_1L15IncrementalCopyEPKcPcS3_S3_ are identical outside of address changes. Changing alignment of DecompressAllTags with asm("nop; nop") or __attribute__((aligned(128))) removes the regression. 19,023,629 branch-misses:u # bad3714b 53,781,446 branch-misses:u # 0895aef0 The underlying problem seems to be branch misses caused by different alignment, but I cannot pinpoint any specific instruction(s) as a source. I am not sure we can reliably prevent this. In any case, reliable solution would be unrelated to my patch.