On Mon, 2025-09-08 at 14:18 -0600, Alex Hung wrote:
> 
> 
> On 8/25/25 02:52, Xi Ruoyao wrote:
> > dml21_map_dc_state_into_dml_display_cfg calls (the call is usually
> > inlined by the compiler) populate_dml21_surface_config_from_plane_state
> > and populate_dml21_plane_config_from_plane_state which may use FPU.  In
> > a x86-64 build:
> > 
> >      $ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \
> >      > drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o 
> > |
> >      > grep %xmm -c
> >      63
> > 
> > Thus it needs to be guarded with DC_FP_START.  But we must note that the
> > current code quality of the in-kernel FPU use in AMD dml2 is very much
> > problematic: we are actually calling DC_FP_START in dml21_wrapper.c
> > here, and this translation unit is built with CC_FLAGS_FPU.  Strictly
> > speaking this does not make any sense: with CC_FLAGS_FPU the compiler is
> > allowed to generate FPU uses anywhere in the translated code, perhaps
> > out of the DC_FP_START guard.  This problematic pattern also occurs in
> > at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c.  Thus we really
> 
> Let me share Austin's comments below:
> 
> "
> Both CC_FLAGS_FPU and DC_FP_START are required for FPU usage.
> 
> CC_FLAGS_FPU allows the compiler to generate FPU code whereas 
> DC_FP_START ensures that the FPU registers don't get tainted during runtime.

But the correct pattern would be: isolating the code using FPU into its
own translation unit, say xxx_fp.c, then call DC_FP_START in another
translation unit (for which CC_FLAGS_FPU is NOT used) before calling the
routines in xxx_fp.c.

It's because there's generally no way to guarantee the compiler only
generate FP code in the range of DC_FP_START ... DC_FP_END if
CC_FLAGS_FPU is used.  The compiler can generate FPU code in situations
we don't intend to use (and don't anticipate) FPU, for example if a
"counting number of ones in a word" pattern is detected when the code is
built for LoongArch, the compiler can invoke FPU to use a popcount
instruction with CC_FLAGS_FPU.  (That doesn't really happen now but it
may happen in the future.)

And the correct pattern is already used by, for example
dcn10_resource.c:

    DC_FP_START();
    voltage_supported = dcn_validate_bandwidth(dc, context, validate_mode);
    DC_FP_END();

Here dcn_validate_bandwidth is in dcn_calcs.c where CC_FLAGS_FPU is in-
effect, but dcn10_resource.c itself is not built with CC_FLAGS_FPU.

-- 
Xi Ruoyao <xry...@xry111.site>

Reply via email to