On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
<[email protected]> wrote:
>
> On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <[email protected]> wrote:
> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <[email protected]> wrote:
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > * config/i386/i386.c (enum x86_64_reg_class): Add
> > > X86_64_SSEHF_CLASS.
> > > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > (examine_argument): Ditto.
> > > (construct_container): Ditto.
> > > (classify_argument): Ditto, and set HFmode/HCmode to
> > > X86_64_SSEHF_CLASS.
> > > (function_value_32): Return _FLoat16/Complex Float16 by
> > > %xmm0.
> > > (function_value_64): Return _Float16/Complex Float16 by SSE
> > > register.
> > > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > (ix86_secondary_reload): Require gpr as intermediate register
> > > to store _Float16 from sse register when sse4 is not
> > > available.
> > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > sse2.
> > > (ix86_scalar_mode_supported_p): Ditto.
> > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > (*pushhf): Ditto.
> > > (*movhf_internal): Ditto.
> > > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > _Float16 for x86.
> > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > which is used by extract_bit_field but not backends.
> > >
> [...]
> >
> > Ping, i'd like to ask for approval for the below codes which is
> > related to generic part.
> >
> > start from ..
> > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > index ff3b4449b37..775ee397836 100644
> > > --- a/gcc/emit-rtl.c
> > > +++ b/gcc/emit-rtl.c
> > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode
> > > imode,
> > > fix them all. */
> > > if (omode == word_mode)
> > > ;
> > > + /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI
> > > (reg:HF))
> > > + here. Though extract_bit_field is the culprit here, not the
> > > backends. */
> > > + else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > + && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > + ;
> > > /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though
> > > store_bit_field
> > > is the culprit here, and not the backends. */
> > > else if (known_ge (osize, regsize) && known_ge (isize, osize))
> >
> > and end here.
>
> So the main restriction otherwise in place is
>
> /* Subregs involving floating point modes are not allowed to
> change size. Therefore (subreg:DI (reg:DF) 0) is fine, but
> (subreg:SI (reg:DF) 0) isn't. */
> else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> {
> if (! (known_eq (isize, osize)
> /* LRA can use subreg to store a floating point value in
> an integer mode. Although the floating point and the
> integer modes need the same number of hard registers,
> the size of floating point mode can be less than the
> integer mode. LRA also uses subregs for a register
> should be used in different mode in on insn. */
> || lra_in_progress))
> return false;
>
> I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
After debug, I find (subreg:SI (reg:HF)) is not really needed, it
would be finally handled by below cut
----cut-----
/* Find a correspondingly-sized integer field, so we can apply
shifts and masks to it. */
scalar_int_mode int_mode;
if (!int_mode_for_mode (tmode).exists (&int_mode))
/* If this fails, we should probably push op0 out to memory and then
do a load. */
int_mode = int_mode_for_mode (mode).require ();
target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
bitnum, target, unsignedp, reverse);
-----end----
and generate things like below cut
---cut----
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
(set (reg:HI 86)
(and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
(const_int -1 [0xffffffffffffffff])))
(clobber (reg:CC 17 flags))
])
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
-1
(nil))
(insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
(subreg:HF (reg:HI 86) 0))
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
-1
(nil))
(insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
(reg:HF 82 [ <retval> ]))
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
-1
(nil))
----end---
The real problem is here, when validate_subreg doesn't allow subreg
between integer mode and float mode with different sizes. It will hit
gcc_assert in gen_lowpart
----cut-----
/* Don't use LHS paradoxical subreg if explicit truncation is needed
between the mode of the extraction (word_mode) and the target
mode. Instead, create a temporary and use convert_move to set
the target. */
if (REG_P (target)
&& TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
{
target = gen_lowpart (ext_mode, target);
if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
spec_target_subreg = target;
}
----end----
So how about changes like below, remove changes in validate_subreg and
add some guard in extract_bit_field_using_extv.
modified gcc/emit-rtl.c
@@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode,
fix them all. */
if (omode == word_mode)
;
- /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
- here. Though extract_bit_field is the culprit here, not the backends. */
- else if (known_gt (regsize, osize) && known_gt (osize, isize)
- && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
- ;
/* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though store_bit_field
is the culprit here, and not the backends. */
else if (known_ge (osize, regsize) && known_ge (isize, osize))
modified gcc/expmed.c
@@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
extraction_insn *extv, rtx op0,
between the mode of the extraction (word_mode) and the target
mode. Instead, create a temporary and use convert_move to set
the target. */
+ machine_mode tmode = GET_MODE (target);
if (REG_P (target)
- && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
+ && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
+ /* When validate_subreg doesn't allow subreg between integer mode
+ and float mode with different size, It will hit gcc_assert in
+ gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is
+ not really needed, codes like below will be finally generated.
+ (set (reg:SI 1)
+ (and:SI (reg:DI 2) -1))
+ (set (reg:SF 3)
+ (subreg:SF (reg:SI 1))) */
+ && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
+ && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
{
target = gen_lowpart (ext_mode, target);
if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> to "work around" this restriction. Alternatively one could finally do away
> with all the exceptions and simply allow all such subregs giving them
> semantics as to intermediate same-size subregs to integer modes
> if this definition issue is why we disallow them?
>
> That is, any float-mode source or destination subreg is interpreted as
> wrapping the source operand (if float-mode) in a same size int subreg
> and performing the subreg in an integer mode first if the destination
> mode is a float mode?
>
> Also I detest that validate_subreg list things not allowed as opposed
> to things allowed. Why are FLOAT_MODE special, but
> fractional and accumulating modes not? The subreg documentation
> also doesn't talk about cases not allowed.
>
> Richard.
--
BR,
Hongtao