On Mon, 2025-09-08 at 14:18 -0600, Alex Hung wrote: > > > On 8/25/25 02:52, Xi Ruoyao wrote: > > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually > > inlined by the compiler) populate_dml21_surface_config_from_plane_state > > and populate_dml21_plane_config_from_plane_state which may use FPU. In > > a x86-64 build: > > > > $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \ > > > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o > > | > > > grep %xmm -c > > 63 > > > > Thus it needs to be guarded with DC_FP_START. But we must note that the > > current code quality of the in-kernel FPU use in AMD dml2 is very much > > problematic: we are actually calling DC_FP_START in dml21_wrapper.c > > here, and this translation unit is built with CC_FLAGS_FPU. Strictly > > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is > > allowed to generate FPU uses anywhere in the translated code, perhaps > > out of the DC_FP_START guard. This problematic pattern also occurs in > > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really > > Let me share Austin's comments below: > > " > Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage. > > CC_FLAGS_FPU allows the compiler to generate FPU code whereas > DC_FP_START ensures that the FPU registers don't get tainted during runtime.
But the correct pattern would be: isolating the code using FPU into its own translation unit, say xxx_fp.c, then call DC_FP_START in another translation unit (for which CC_FLAGS_FPU is NOT used) before calling the routines in xxx_fp.c. It's because there's generally no way to guarantee the compiler only generate FP code in the range of DC_FP_START ... DC_FP_END if CC_FLAGS_FPU is used. The compiler can generate FPU code in situations we don't intend to use (and don't anticipate) FPU, for example if a "counting number of ones in a word" pattern is detected when the code is built for LoongArch, the compiler can invoke FPU to use a popcount instruction with CC_FLAGS_FPU. (That doesn't really happen now but it may happen in the future.) And the correct pattern is already used by, for example dcn10_resource.c: DC_FP_START(); voltage_supported = dcn_validate_bandwidth(dc, context, validate_mode); DC_FP_END(); Here dcn_validate_bandwidth is in dcn_calcs.c where CC_FLAGS_FPU is in- effect, but dcn10_resource.c itself is not built with CC_FLAGS_FPU. -- Xi Ruoyao <xry...@xry111.site>