https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Created attachment 55101
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55101&action=edit
hottest loop

jpegxl build machinery adds -fno-vectorize and -fno-slp-vectorize to clang
flags.  Adding -fno-tree-vectorize -fno-tree-slp-vectorize makes GCC generated
code more similar.  With this most difference is caused by
FindBestPatchDictionary or FindTextLikePatches if that function is not inlined.

  15.22%  cjxl     libjxl.so.0.7.0       [.] jxl::(anonymous
namespace)::FindTextLikePatches                                                 
  10.19%  cjxl     libjxl.so.0.7.0       [.] jxl::FindBestPatchDictionary       
   5.27%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::QuantizeBlockAC       
   5.06%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::EstimateEntropy       
   4.82%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::EstimateEntropy       
   4.35%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::QuantizeBlockAC       
   4.21%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::(anonymous
namespace)::TransformFromPixels                                                 
   3.87%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::(anonymous
namespace)::TransformFromPixels                                                 
   3.78%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::FindBestMultiplier    
   3.27%  cjxl     libjxl.so.0.7.0       [.] jxl::N_AVX2::FindBestMultiplier    

I think it is mostly register allocation not handling well the internal loop
quoted above.  I am adding preprocessed sources.

Reply via email to