On Thu, Jul 29, 2021 at 12:53 PM Hongtao Liu <[email protected]> wrote:
>
> On Thu, Jul 29, 2021 at 5:57 AM Joseph Myers <[email protected]> wrote:
> >
> > On Wed, 21 Jul 2021, liuhongt via Gcc-patches wrote:
> >
> > > @@ -23254,13 +23337,15 @@ ix86_get_excess_precision (enum
> > > excess_precision_type type)
> > > provide would be identical were it not for the unpredictable
> > > cases. */
> > > if (!TARGET_80387)
> > > - return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > > + return TARGET_SSE2
> > > + ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
> > > + : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > > else if (!TARGET_MIX_SSE_I387)
> > > {
> > > if (!(TARGET_SSE && TARGET_SSE_MATH))
> > > return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
> > > else if (TARGET_SSE2)
> > > - return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
> > > + return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
> > > }
> > >
> > > /* If we are in standards compliant mode, but we know we will
> >
> > This patch is not changing the default "fast" mode at all; that's
> > promoting to float, unconditionally. But you have a subsequent change
> > there in patch 4 to make the promotions in the default "fast" mode depend
> > on hardware support for the new instructions; it's unhelpful for the
> > documentation not to corresponding exactly to the code changes in the same
> > patch.
> Yes, will change.
> >
> > Rather than using FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 whenever TARGET_SSE2
> > (i.e. whenever the type is available), it might make more sense to follow
> > AArch64 and use it only when the hardware instructions are available. In
> > any case, it seems peculiar to use a different threshold in the "fast"
> We want to provide some debuggability to the software emulation.
> When there's inconsistency between software emulation and hardware
> instructions, users can still debug on non-avx512fp16 processor w/
> software emulation and extra option -fexcess-precision=standard,
> Also since TARGET_C_EXCESS_PRECISION is not related to type, for
> testcase w/o _Float16 and is supposed to be runned on x86 fpu, if gcc
> is built w/ --with-arch=sapphirerapid, it will regress those
> testcases. .i.e. gcc.target/i386/excess-precision-*.c, that's why we
> can't follow AArch64.
> > case from the "standard" case. -fexcess-precision=standard is not "avoid
> > excess precision", it's "implement excess precision in the front end".
> > Whenever "fast" is implementing excess precision in the front end,
> > "standard" should be doing the same thing as "fast".
> >
> > > +Soft-fp keeps the intermediate result of the operation at 32-bit
> > > precision by defaults,
> > > +which may lead to inconsistent behavior between soft-fp and avx512fp16
> > > instructions,
> > > +using @option{-fexcess-precision=standard} will force round back after
> > > every operation.
> >
> > "soft-fp" is, as the name of some code within GCC, an internal
> > implementation detail, which should not be referenced in the user manual.
> > What results in intermediate results being in a wider precision is not
> > soft-fp; it's promotions inserted by the front end as a result of how the
> > above hook is defined (promotions inserted by the optabs/expand code are
> > an implementation detail that should always be followed automatically by a
> > truncation of the result and so not be user-visible).
> Yes, will reorganize the words.
> >
> > As far as I know, the official name of "avx512fp16" is "AVX512-FP16" and
> > text in the manual should use the official capitalization, hyphenation
> > etc. in such names unless literally referring to command-line options
> > inside @option or similar.
> Yes, will change.
> >
Update patch for documents.
> > --
> > Joseph S. Myers
> > [email protected]
>
>
>
> --
> BR,
> Hongtao
Also as a follow up of [1], I merge the below change into the updated patch.
Richard, please comment under this thread.
> > > + /* FIXME: validate_subreg only allows (subreg:WORD_MODE (reg:HF) 0). */
> >
> > I think that needs "fixing" then, or alternatively the caller should care.
> >
> How about this
>
> modified gcc/emit-rtl.c
> @@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
> fix them all. */
> if (omode == word_mode)
> ;
> + /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
> + here. Though extract_bit_field is the culprit here, not the backends.
> */
> + else if (imode == HFmode && omode == SImode)
> + ;
> /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though store_bit_field
> is the culprit here, and not the backends. */
> else if (known_ge (osize, regsize) && known_ge (isize, osize))
> new file gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> + union {
> + int a;
> + _Float16 b;
> + }c;
> + c.a = a;
> + return c.b;
> +}
>
> If it's ok, I'll merge the upper change to the former commit:
[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576074.html
--
BR,
Hongtao
From a094e41a96ceaefe629fa3ab6a1329860589a535 Mon Sep 17 00:00:00 2001
From: liuhongt <[email protected]>
Date: Mon, 5 Jul 2021 17:05:45 +0800
Subject: [PATCH 2/5] [i386] Enable _Float16 type for TARGET_SSE2 and above.
gcc/ChangeLog:
* config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
* config/i386/i386.c (enum x86_64_reg_class): Add
X86_64_SSEHF_CLASS.
(merge_classes): Handle X86_64_SSEHF_CLASS.
(examine_argument): Ditto.
(construct_container): Ditto.
(classify_argument): Ditto, and set HFmode/HCmode to
X86_64_SSEHF_CLASS.
(function_value_32): Return _FLoat16/Complex Float16 by
%xmm0.
(function_value_64): Return _Float16/Complex Float16 by SSE
register.
(ix86_print_operand): Handle CONST_DOUBLE HFmode.
(ix86_secondary_reload): Require gpr as intermediate register
to store _Float16 from sse register when sse4 is not
available.
(ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
sse2.
(ix86_scalar_mode_supported_p): Ditto.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
(ix86_get_excess_precision): Return
FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 under sse2.
* config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
(VALID_INT_MODE_P): Add HFmode and HCmode.
* config/i386/i386.md (*pushhf_rex64): New define_insn.
(*pushhf): Ditto.
(*movhf_internal): Ditto.
* doc/extend.texi (Half-Precision Floating Point): Documemt
_Float16 for x86.
* emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
which is used by extract_bit_field but not backends.
gcc/lto/ChangeLog:
* lto-lang.c (lto_type_for_mode): Return float16_type_node
when mode == TYPE_MODE (float16_type_node).
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-float16-1.c: New test.
* gcc.target/i386/sse2-float16-2.c: Ditto.
* gcc.target/i386/sse2-float16-3.c: Ditto.
* gcc.target/i386/float16-5.c: Ditto.
---
gcc/config/i386/i386-modes.def | 1 +
gcc/config/i386/i386.c | 97 +++++++++++++-
gcc/config/i386/i386.h | 3 +-
gcc/config/i386/i386.md | 118 +++++++++++++++++-
gcc/doc/extend.texi | 14 +++
gcc/emit-rtl.c | 4 +
gcc/lto/lto-lang.c | 3 +
gcc/testsuite/gcc.target/i386/float16-5.c | 12 ++
.../gcc.target/i386/sse2-float16-1.c | 8 ++
.../gcc.target/i386/sse2-float16-2.c | 16 +++
.../gcc.target/i386/sse2-float16-3.c | 12 ++
11 files changed, 278 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-float16-3.c
diff --git a/gcc/config/i386/i386-modes.def b/gcc/config/i386/i386-modes.def
index 4e7014be034..9232f59a925 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see
FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
FLOAT_MODE (TF, 16, ieee_quad_format);
+FLOAT_MODE (HF, 2, ieee_half_format);
/* In ILP32 mode, XFmode has size 12 and alignment 4.
In LP64 mode, XFmode has size and alignment 16. */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff96134fb37..597e4d68247 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -387,6 +387,7 @@ enum x86_64_reg_class
X86_64_INTEGER_CLASS,
X86_64_INTEGERSI_CLASS,
X86_64_SSE_CLASS,
+ X86_64_SSEHF_CLASS,
X86_64_SSESF_CLASS,
X86_64_SSEDF_CLASS,
X86_64_SSEUP_CLASS,
@@ -2023,8 +2024,10 @@ merge_classes (enum x86_64_reg_class class1, enum x86_64_reg_class class2)
return X86_64_MEMORY_CLASS;
/* Rule #4: If one of the classes is INTEGER, the result is INTEGER. */
- if ((class1 == X86_64_INTEGERSI_CLASS && class2 == X86_64_SSESF_CLASS)
- || (class2 == X86_64_INTEGERSI_CLASS && class1 == X86_64_SSESF_CLASS))
+ if ((class1 == X86_64_INTEGERSI_CLASS
+ && (class2 == X86_64_SSESF_CLASS || class2 == X86_64_SSEHF_CLASS))
+ || (class2 == X86_64_INTEGERSI_CLASS
+ && (class1 == X86_64_SSESF_CLASS || class1 == X86_64_SSEHF_CLASS)))
return X86_64_INTEGERSI_CLASS;
if (class1 == X86_64_INTEGER_CLASS || class1 == X86_64_INTEGERSI_CLASS
|| class2 == X86_64_INTEGER_CLASS || class2 == X86_64_INTEGERSI_CLASS)
@@ -2178,6 +2181,8 @@ classify_argument (machine_mode mode, const_tree type,
/* The partial classes are now full classes. */
if (subclasses[0] == X86_64_SSESF_CLASS && bytes != 4)
subclasses[0] = X86_64_SSE_CLASS;
+ if (subclasses[0] == X86_64_SSEHF_CLASS && bytes != 2)
+ subclasses[0] = X86_64_SSE_CLASS;
if (subclasses[0] == X86_64_INTEGERSI_CLASS
&& !((bit_offset % 64) == 0 && bytes == 4))
subclasses[0] = X86_64_INTEGER_CLASS;
@@ -2350,6 +2355,12 @@ classify_argument (machine_mode mode, const_tree type,
gcc_unreachable ();
case E_CTImode:
return 0;
+ case E_HFmode:
+ if (!(bit_offset % 64))
+ classes[0] = X86_64_SSEHF_CLASS;
+ else
+ classes[0] = X86_64_SSE_CLASS;
+ return 1;
case E_SFmode:
if (!(bit_offset % 64))
classes[0] = X86_64_SSESF_CLASS;
@@ -2367,6 +2378,15 @@ classify_argument (machine_mode mode, const_tree type,
classes[0] = X86_64_SSE_CLASS;
classes[1] = X86_64_SSEUP_CLASS;
return 2;
+ case E_HCmode:
+ classes[0] = X86_64_SSE_CLASS;
+ if (!(bit_offset % 64))
+ return 1;
+ else
+ {
+ classes[1] = X86_64_SSEHF_CLASS;
+ return 2;
+ }
case E_SCmode:
classes[0] = X86_64_SSE_CLASS;
if (!(bit_offset % 64))
@@ -2481,6 +2501,7 @@ examine_argument (machine_mode mode, const_tree type, int in_return,
(*int_nregs)++;
break;
case X86_64_SSE_CLASS:
+ case X86_64_SSEHF_CLASS:
case X86_64_SSESF_CLASS:
case X86_64_SSEDF_CLASS:
(*sse_nregs)++;
@@ -2580,13 +2601,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
/* First construct simple cases. Avoid SCmode, since we want to use
single register to pass this type. */
- if (n == 1 && mode != SCmode)
+ if (n == 1 && mode != SCmode && mode != HCmode)
switch (regclass[0])
{
case X86_64_INTEGER_CLASS:
case X86_64_INTEGERSI_CLASS:
return gen_rtx_REG (mode, intreg[0]);
case X86_64_SSE_CLASS:
+ case X86_64_SSEHF_CLASS:
case X86_64_SSESF_CLASS:
case X86_64_SSEDF_CLASS:
if (mode != BLKmode)
@@ -2683,6 +2705,14 @@ construct_container (machine_mode mode, machine_mode orig_mode,
GEN_INT (i*8));
intreg++;
break;
+ case X86_64_SSEHF_CLASS:
+ exp [nexps++]
+ = gen_rtx_EXPR_LIST (VOIDmode,
+ gen_rtx_REG (HFmode,
+ GET_SSE_REGNO (sse_regno)),
+ GEN_INT (i*8));
+ sse_regno++;
+ break;
case X86_64_SSESF_CLASS:
exp [nexps++]
= gen_rtx_EXPR_LIST (VOIDmode,
@@ -3903,6 +3933,19 @@ function_value_32 (machine_mode orig_mode, machine_mode mode,
/* Most things go in %eax. */
regno = AX_REG;
+ /* Return _Float16/_Complex _Foat16 by sse register. */
+ if (mode == HFmode)
+ regno = FIRST_SSE_REG;
+ if (mode == HCmode)
+ {
+ rtx ret = gen_rtx_PARALLEL (mode, rtvec_alloc(1));
+ XVECEXP (ret, 0, 0)
+ = gen_rtx_EXPR_LIST (VOIDmode,
+ gen_rtx_REG (SImode, FIRST_SSE_REG),
+ GEN_INT (0));
+ return ret;
+ }
+
/* Override FP return register with %xmm0 for local functions when
SSE math is enabled or for functions with sseregparm attribute. */
if ((fn || fntype) && (mode == SFmode || mode == DFmode))
@@ -3939,6 +3982,8 @@ function_value_64 (machine_mode orig_mode, machine_mode mode,
switch (mode)
{
+ case E_HFmode:
+ case E_HCmode:
case E_SFmode:
case E_SCmode:
case E_DFmode:
@@ -13411,6 +13456,15 @@ ix86_print_operand (FILE *file, rtx x, int code)
(file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
}
+ else if (CONST_DOUBLE_P (x) && GET_MODE (x) == HFmode)
+ {
+ long l = real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (x),
+ REAL_MODE_FORMAT (HFmode));
+ if (ASSEMBLER_DIALECT == ASM_ATT)
+ putc ('$', file);
+ fprintf (file, "0x%04x", (unsigned int) l);
+ }
+
else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
{
long l;
@@ -18928,6 +18982,16 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass,
return NO_REGS;
}
+ /* Require movement to gpr, and then store to memory. */
+ if (mode == HFmode
+ && !TARGET_SSE4_1
+ && SSE_CLASS_P (rclass)
+ && !in_p && MEM_P (x))
+ {
+ sri->extra_cost = 1;
+ return GENERAL_REGS;
+ }
+
/* This condition handles corner case where an expression involving
pointers gets vectorized. We're trying to use the address of a
stack slot as a vector initializer.
@@ -21555,10 +21619,27 @@ ix86_scalar_mode_supported_p (scalar_mode mode)
return default_decimal_float_supported_p ();
else if (mode == TFmode)
return true;
+ else if (mode == HFmode && TARGET_SSE2)
+ return true;
else
return default_scalar_mode_supported_p (mode);
}
+/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
+ if MODE is HFmode, and punt to the generic implementation otherwise. */
+
+static bool
+ix86_libgcc_floating_mode_supported_p (scalar_float_mode mode)
+{
+ /* NB: Always return TRUE for HFmode so that the _Float16 type will
+ be defined by the C front-end for AVX512FP16 intrinsics. We will
+ issue an error in ix86_expand_move for HFmode if AVX512FP16 isn't
+ enabled. */
+ return ((mode == HFmode && TARGET_SSE2)
+ ? true
+ : default_libgcc_floating_mode_supported_p (mode));
+}
+
/* Implements target hook vector_mode_supported_p. */
static bool
ix86_vector_mode_supported_p (machine_mode mode)
@@ -23254,13 +23335,15 @@ ix86_get_excess_precision (enum excess_precision_type type)
provide would be identical were it not for the unpredictable
cases. */
if (!TARGET_80387)
- return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+ return TARGET_SSE2
+ ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
+ : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
else if (!TARGET_MIX_SSE_I387)
{
if (!(TARGET_SSE && TARGET_SSE_MATH))
return FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE;
else if (TARGET_SSE2)
- return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
+ return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
}
/* If we are in standards compliant mode, but we know we will
@@ -23820,6 +23903,10 @@ ix86_run_selftests (void)
#undef TARGET_SCALAR_MODE_SUPPORTED_P
#define TARGET_SCALAR_MODE_SUPPORTED_P ix86_scalar_mode_supported_p
+#undef TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P
+#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
+ix86_libgcc_floating_mode_supported_p
+
#undef TARGET_VECTOR_MODE_SUPPORTED_P
#define TARGET_VECTOR_MODE_SUPPORTED_P ix86_vector_mode_supported_p
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 0c2c93daf32..b1e66ee192e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1018,7 +1018,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
#define VALID_SSE2_REG_MODE(MODE) \
((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \
|| (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \
- || (MODE) == V2DImode || (MODE) == DFmode)
+ || (MODE) == V2DImode || (MODE) == DFmode || (MODE) == HFmode)
#define VALID_SSE_REG_MODE(MODE) \
((MODE) == V1TImode || (MODE) == TImode \
@@ -1047,6 +1047,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
|| (MODE) == CQImode || (MODE) == CHImode \
|| (MODE) == CSImode || (MODE) == CDImode \
|| (MODE) == SDmode || (MODE) == DDmode \
+ || (MODE) == HFmode || (MODE) == HCmode \
|| (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \
|| (TARGET_64BIT \
&& ((MODE) == TImode || (MODE) == CTImode \
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8b809c49fe0..d475347172d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1222,6 +1222,9 @@ (define_mode_iterator MODEF [SF DF])
;; All x87 floating point modes
(define_mode_iterator X87MODEF [SF DF XF])
+;; All x87 floating point modes plus HF
+(define_mode_iterator X87MODEFH [SF DF XF HF])
+
;; All SSE floating point modes
(define_mode_iterator SSEMODEF [SF DF TF])
(define_mode_attr ssevecmodef [(SF "V4SF") (DF "V2DF") (TF "TF")])
@@ -3130,6 +3133,32 @@ (define_split
operands[0] = replace_equiv_address (operands[0], stack_pointer_rtx);
})
+(define_insn "*pushhf_rex64"
+ [(set (match_operand:HF 0 "push_operand" "=X,X")
+ (match_operand:HF 1 "nonmemory_no_elim_operand" "r,x"))]
+ "TARGET_64BIT"
+{
+ /* Anything else should be already split before reg-stack. */
+ gcc_assert (which_alternative == 0);
+ return "push{q}\t%q1";
+}
+ [(set_attr "isa" "*,sse4")
+ (set_attr "type" "push,multi")
+ (set_attr "mode" "DI,TI")])
+
+(define_insn "*pushhf"
+ [(set (match_operand:HF 0 "push_operand" "=X,X")
+ (match_operand:HF 1 "general_no_elim_operand" "rmF,x"))]
+ "!TARGET_64BIT"
+{
+ /* Anything else should be already split before reg-stack. */
+ gcc_assert (which_alternative == 0);
+ return "push{l}\t%k1";
+}
+ [(set_attr "isa" "*,sse4")
+ (set_attr "type" "push,multi")
+ (set_attr "mode" "SI,TI")])
+
(define_insn "*pushsf_rex64"
[(set (match_operand:SF 0 "push_operand" "=X,X,X")
(match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,v"))]
@@ -3158,10 +3187,11 @@ (define_insn "*pushsf"
(set_attr "unit" "i387,*,*")
(set_attr "mode" "SF,SI,SF")])
+(define_mode_iterator MODESH [SF HF])
;; %%% Kill this when call knows how to work this out.
(define_split
- [(set (match_operand:SF 0 "push_operand")
- (match_operand:SF 1 "any_fp_register_operand"))]
+ [(set (match_operand:MODESH 0 "push_operand")
+ (match_operand:MODESH 1 "any_fp_register_operand"))]
"reload_completed"
[(set (reg:P SP_REG) (plus:P (reg:P SP_REG) (match_dup 2)))
(set (match_dup 0) (match_dup 1))]
@@ -3209,8 +3239,8 @@ (define_expand "movtf"
"ix86_expand_move (TFmode, operands); DONE;")
(define_expand "mov<mode>"
- [(set (match_operand:X87MODEF 0 "nonimmediate_operand")
- (match_operand:X87MODEF 1 "general_operand"))]
+ [(set (match_operand:X87MODEFH 0 "nonimmediate_operand")
+ (match_operand:X87MODEFH 1 "general_operand"))]
""
"ix86_expand_move (<MODE>mode, operands); DONE;")
@@ -3646,6 +3676,86 @@ (define_insn "*movsf_internal"
]
(const_string "*")))])
+(define_insn "*movhf_internal"
+ [(set (match_operand:HF 0 "nonimmediate_operand"
+ "=?r,?m,v,v,?r,m,?v,v")
+ (match_operand:HF 1 "general_operand"
+ "rmF,rF,C,v, v,v, r,m"))]
+ "!(MEM_P (operands[0]) && MEM_P (operands[1]))
+ && (lra_in_progress
+ || reload_completed
+ || !CONST_DOUBLE_P (operands[1])
+ || (TARGET_SSE && TARGET_SSE_MATH
+ && standard_sse_constant_p (operands[1], HFmode) == 1)
+ || memory_operand (operands[0], HFmode))"
+{
+ switch (get_attr_type (insn))
+ {
+ case TYPE_IMOV:
+ return "mov{w}\t{%1, %0|%0, %1}";
+
+ case TYPE_SSELOG1:
+ return standard_sse_constant_opcode (insn, operands);
+
+ case TYPE_SSEMOV:
+ return ix86_output_ssemov (insn, operands);
+
+ case TYPE_SSELOG:
+ if (SSE_REG_P (operands[0]))
+ return MEM_P (operands[1])
+ ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+ else
+ return MEM_P (operands[1])
+ ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
+ : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set (attr "isa")
+ (cond [(eq_attr "alternative" "2,3,4,6,7")
+ (const_string "sse2")
+ (eq_attr "alternative" "5")
+ (const_string "sse4")
+ ]
+ (const_string "*")))
+ (set (attr "type")
+ (cond [(eq_attr "alternative" "0,1")
+ (const_string "imov")
+ (eq_attr "alternative" "2")
+ (const_string "sselog1")
+ (eq_attr "alternative" "4,5,6,7")
+ (const_string "sselog")
+ ]
+ (const_string "ssemov")))
+ (set (attr "memory")
+ (cond [(eq_attr "alternative" "4,6")
+ (const_string "none")
+ (eq_attr "alternative" "5")
+ (const_string "store")
+ (eq_attr "alternative" "7")
+ (const_string "load")
+ ]
+ (const_string "*")))
+ (set (attr "prefix")
+ (cond [(eq_attr "alternative" "0,1")
+ (const_string "orig")
+ ]
+ (const_string "maybe_vex")))
+ (set (attr "mode")
+ (cond [(eq_attr "alternative" "0,1")
+ (const_string "HI")
+ (eq_attr "alternative" "2")
+ (const_string "V4SF")
+ (eq_attr "alternative" "4,5,6,7")
+ (const_string "TI")
+ (eq_attr "alternative" "3")
+ (const_string "SF")
+ ]
+ (const_string "*")))])
+
(define_split
[(set (match_operand 0 "any_fp_register_operand")
(match_operand 1 "memory_operand"))]
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b83cd4919bb..de4b64fecec 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1102,6 +1102,7 @@ typedef _Complex float __attribute__((mode(IC))) _Complex_ibm128;
@section Half-Precision Floating Point
@cindex half-precision floating point
@cindex @code{__fp16} data type
+@cindex @code{__Float16} data type
On ARM and AArch64 targets, GCC supports half-precision (16-bit) floating
point via the @code{__fp16} type defined in the ARM C Language Extensions.
@@ -1150,6 +1151,19 @@ calls.
It is recommended that portable code use the @code{_Float16} type defined
by ISO/IEC TS 18661-3:2015. @xref{Floating Types}.
+On x86 targets with @code{target("sse2")} and above, GCC supports half-precision
+(16-bit) floating point via the @code{_Float16} type which is defined by
+18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
+which contains same data format as C.
+
+Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
+operations will be emulated by software emulation and the @code{float}
+instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
+the intermediate result of the operation as 32-bit precision. This may lead
+to inconsistent behavior between software emulation and AVX512-FP16
+instructions. Using @option{-fexcess-precision=standard} and
+@option{-mfpmath=sse} will force round back after each operation.
+
@node Decimal Float
@section Decimal Floating Types
@cindex decimal floating types
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index ff3b4449b37..681f8d95c9f 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -928,6 +928,10 @@ validate_subreg (machine_mode omode, machine_mode imode,
fix them all. */
if (omode == word_mode)
;
+ /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF))
+ here. Though extract_bit_field is the culprit here, not the backends. */
+ else if (imode == HFmode && omode == SImode)
+ ;
/* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though store_bit_field
is the culprit here, and not the backends. */
else if (known_ge (osize, regsize) && known_ge (isize, osize))
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index c13c7e45ac1..92f499643b5 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -992,6 +992,9 @@ lto_type_for_mode (machine_mode mode, int unsigned_p)
return unsigned_p ? unsigned_intTI_type_node : intTI_type_node;
#endif
+ if (float16_type_node && mode == TYPE_MODE (float16_type_node))
+ return float16_type_node;
+
if (mode == TYPE_MODE (float_type_node))
return float_type_node;
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 00000000000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+ union {
+ int a;
+ _Float16 b;
+ }c;
+ c.a = a;
+ return c.b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-1.c b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
new file mode 100644
index 00000000000..1b645eb499d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse2" } */
+
+_Float16/* { dg-error "is not supported on this target" } */
+foo (_Float16 x) /* { dg-error "is not supported on this target" } */
+{
+ return x;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-2.c b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
new file mode 100644
index 00000000000..3da7683fc31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx512f" } */
+
+union flt
+{
+ _Float16 flt;
+ short s;
+};
+
+_Float16
+foo (union flt x)
+{
+ return x.flt;
+}
+
+/* { dg-final { scan-assembler {(?n)pinsrw[\t ].*%xmm0} } } */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-float16-3.c b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
new file mode 100644
index 00000000000..60ff9d4ab80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-float16-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-avx512f" } */
+
+#include<complex.h>
+
+_Complex _Float16
+foo (_Complex _Float16 x)
+{
+ return x;
+}
+
+/* { dg-final { scan-assembler {(?n)movd[\t ].*%xmm0} } } */
--
2.27.0