https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119009

--- Comment #3 from Michal Jireš <mjires at gcc dot gnu.org> ---
Thanks a lot for the script.    

I have reproduced it:    
# bad3714b - before my patch    
BM_UIOVecSink/0   33.8 us   33.8 us   20659 bytes_per_second=2.82508G/s html    
# 0895aef0 - my patch    
BM_UIOVecSink/0   41.0 us   41.0 us   16890 bytes_per_second=2.32381G/s html    

However current trunk shows the opposite:    
# 3605e057 - trunk    
BM_UIOVecSink/0   33.7 us   33.7 us   20161 bytes_per_second=2.82955G/s html    
# revert patch    
BM_UIOVecSink/0   39.9 us   39.9 us   17399 bytes_per_second=2.38832G/s html    

Is it still a problem on your machine with current trunk?    



Perf record/report of:    
snappy_benchmark --benchmark_filter=BM_UIOVecSink/0
--benchmark_min_warmup_time=5 --benchmark_time_unit=us    

shows regression in functions:    
  61.46% void
snappy::SnappyDecompressor::DecompressAllTags<snappy::SnappyIOVecWriter>(snappy::SnappyIOVecWriter*)
 
  25.65% snappy::(anonymous namespace)::IncrementalCopy(char const*, char*,
char*, char*)    

relevant symbols:    
_ZN6snappy18SnappyDecompressor17DecompressAllTagsINS_17SnappyIOVecWriterEEEvPT_ 
_ZN6snappy12_GLOBAL__N_1L15IncrementalCopyEPKcPcS3_S3_    
are identical outside of address changes.    

Changing alignment of DecompressAllTags with asm("nop; nop") or
__attribute__((aligned(128))) removes the regression.    

19,023,629      branch-misses:u # bad3714b    
53,781,446      branch-misses:u # 0895aef0    
The underlying problem seems to be branch misses caused by different alignment,
but I cannot pinpoint any specific instruction(s) as a source.    

I am not sure we can reliably prevent this. In any case, reliable solution
would be unrelated to my patch.

Reply via email to