https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119900
Filip Kastl <pheeck at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[16 regression] imagick |[16 regression] imagick |slowdown with -Ofast |slowdown with -Ofast |-march=native -fprofile-use |-march=native -fprofile-use |between g:b986ed16c2546674 |since |and g:e1098c7b08d9e601 |r16-39-gf6859fb621179e --- Comment #1 from Filip Kastl <pheeck at gcc dot gnu.org> --- Bisected to r16-39-gf6859fb621179e commit f6859fb621179ec9bf5631eb8902619ab8d4467b Author: Jan Hubicka <hubi...@ucw.cz> Date: Sat Apr 19 18:51:27 2025 +0200 Add tables for SSE fp conversion costs as disucssed, I will proceed adding costs for common SSE operations which are currently globbed into addss cost, so we do not need to set it incorrectly for znver5. Looking through the stats, there are quite few missing cases, so I am starting with those that I think are more common. I plan to do it in smaller steps so individual changes gets benchmarked by LNT and also can be bisected to. This patch adds costs for various SSE and AVX FP->FP conversions (extensions and truncations). Looking through Agner Fog's tables, these are bit assymetric so I added cost for CVTSS2SD which is also used for CVTSD2SS, CVTPS2PD and CVTPD2PS, cost for 256bit VCVTPS2PS (also used for oposite direction) and cost for 512bit one. I plan to add int->int conversions next and then int->fp & fp->int which are more tricky since they may bundle inter-unit move. I also noticed that size tables are wrong for all SSE instructions so I updated them. With some love I think vectorization can work as size optimization, too, but we need more work on that. Those values I can find in Agner Fog tables are taken from there, other are guesses (especially for yongfeng_cost and shijidadao_cost). gcc/ChangeLog: * config/i386/i386.cc (vec_fp_conversion_cost): New function. (ix86_rtx_costs): Use it for SSE/AVX FP conversoins. (ix86_builtin_vectorization_cost): Fix indentation; and use vec_fp_conversion_cost in vec_promote_demote. (fp_conversion_stmt_cost): New function. (ix86_vector_costs::add_stmt_cost): Use it to cost NOP_EXPR and vec_promote_demote. * config/i386/i386.h (struct processor_costs): * config/i386/x86-tune-costs.h (struct processor_costs):