Re: vector _M_start and 0 offset

2018-09-29 Thread Marc Glisse

Hello,

here is a clang-friendly version of the patch (same changelog), tested a 
while ago. Is it ok or do you prefer something like the


+ if(this->_M_impl._M_start._M_offset != 0) __builtin_unreachable();

version suggested by François?

--
Marc GlisseIndex: include/bits/stl_bvector.h
===
--- include/bits/stl_bvector.h	(revision 264371)
+++ include/bits/stl_bvector.h	(working copy)
@@ -802,25 +802,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
 
 #if __cplusplus >= 201103L
   void
   assign(initializer_list __l)
   { _M_assign_aux(__l.begin(), __l.end(), random_access_iterator_tag()); }
 #endif
 
   iterator
   begin() _GLIBCXX_NOEXCEPT
-  { return this->_M_impl._M_start; }
+  { return iterator(this->_M_impl._M_start._M_p, 0); }
 
   const_iterator
   begin() const _GLIBCXX_NOEXCEPT
-  { return this->_M_impl._M_start; }
+  { return const_iterator(this->_M_impl._M_start._M_p, 0); }
 
   iterator
   end() _GLIBCXX_NOEXCEPT
   { return this->_M_impl._M_finish; }
 
   const_iterator
   end() const _GLIBCXX_NOEXCEPT
   { return this->_M_impl._M_finish; }
 
   reverse_iterator
@@ -835,21 +835,21 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   rend() _GLIBCXX_NOEXCEPT
   { return reverse_iterator(begin()); }
 
   const_reverse_iterator
   rend() const _GLIBCXX_NOEXCEPT
   { return const_reverse_iterator(begin()); }
 
 #if __cplusplus >= 201103L
   const_iterator
   cbegin() const noexcept
-  { return this->_M_impl._M_start; }
+  { return const_iterator(this->_M_impl._M_start._M_p, 0); }
 
   const_iterator
   cend() const noexcept
   { return this->_M_impl._M_finish; }
 
   const_reverse_iterator
   crbegin() const noexcept
   { return const_reverse_iterator(end()); }
 
   const_reverse_iterator


No a*x+b*x factorization for signed vectors

2018-09-29 Thread Marc Glisse

Hello,

this is a simple patch to remove the wrong-code part of PR 87319. I didn't 
spend much time polishing that code, since it is meant to disappear 
anyway.


We could probably remove the inner == inner2 test in 
signed_or_unsigned_type_for, I hadn't noticed when copy-pasting the code.


bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2018-09-30  Marc Glisse  

PR middle-end/87319
* fold-const.c (fold_plusminus_mult_expr): Handle complex and vectors.
* tree.c (signed_or_unsigned_type_for): Handle complex.

--
Marc GlisseIndex: gcc/fold-const.c
===
--- gcc/fold-const.c	(revision 264371)
+++ gcc/fold-const.c	(working copy)
@@ -7136,21 +7136,21 @@ fold_plusminus_mult_expr (location_t loc
 	  alt1 = arg10;
 	  same = maybe_same;
 	  if (swap)
 	maybe_same = alt0, alt0 = alt1, alt1 = maybe_same;
 	}
 }
 
   if (!same)
 return NULL_TREE;
 
-  if (! INTEGRAL_TYPE_P (type)
+  if (! ANY_INTEGRAL_TYPE_P (type)
   || TYPE_OVERFLOW_WRAPS (type)
   /* We are neither factoring zero nor minus one.  */
   || TREE_CODE (same) == INTEGER_CST)
 return fold_build2_loc (loc, MULT_EXPR, type,
 			fold_build2_loc (loc, code, type,
  fold_convert_loc (loc, type, alt0),
  fold_convert_loc (loc, type, alt1)),
 			fold_convert_loc (loc, type, same));
 
   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
Index: gcc/tree.c
===
--- gcc/tree.c	(revision 264371)
+++ gcc/tree.c	(working copy)
@@ -11202,34 +11202,45 @@ int_cst_value (const_tree x)
   return val;
 }
 
 /* If TYPE is an integral or pointer type, return an integer type with
the same precision which is unsigned iff UNSIGNEDP is true, or itself
if TYPE is already an integer type of signedness UNSIGNEDP.  */
 
 tree
 signed_or_unsigned_type_for (int unsignedp, tree type)
 {
-  if (TREE_CODE (type) == INTEGER_TYPE && TYPE_UNSIGNED (type) == unsignedp)
+  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) == unsignedp)
 return type;
 
   if (TREE_CODE (type) == VECTOR_TYPE)
 {
   tree inner = TREE_TYPE (type);
   tree inner2 = signed_or_unsigned_type_for (unsignedp, inner);
   if (!inner2)
 	return NULL_TREE;
   if (inner == inner2)
 	return type;
   return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type));
 }
 
+  if (TREE_CODE (type) == COMPLEX_TYPE)
+{
+  tree inner = TREE_TYPE (type);
+  tree inner2 = signed_or_unsigned_type_for (unsignedp, inner);
+  if (!inner2)
+	return NULL_TREE;
+  if (inner == inner2)
+	return type;
+  return build_complex_type (inner2);
+}
+
   if (!INTEGRAL_TYPE_P (type)
   && !POINTER_TYPE_P (type)
   && TREE_CODE (type) != OFFSET_TYPE)
 return NULL_TREE;
 
   return build_nonstandard_integer_type (TYPE_PRECISION (type), unsignedp);
 }
 
 /* If TYPE is an integral or pointer type, return an integer type with
the same precision which is unsigned, or itself if TYPE is already an


[libstdc++,doc] Tweak links to FSF web site

2018-09-29 Thread Gerald Pfeifer
Commmitted.

Gerald

2018-09-29  Gerald Pfeifer  

* doc/xml/gnu/fdl-1.3.xml: The Free Software Foundation web
site now uses https. Also omit the unnecessary trailing slash.
* doc/xml/gnu/gpl-3.0.xml: Ditto.

Index: doc/xml/gnu/fdl-1.3.xml
===
--- doc/xml/gnu/fdl-1.3.xml (revision 264709)
+++ doc/xml/gnu/fdl-1.3.xml (working copy)
@@ -6,7 +6,7 @@
   Version 1.3, 3 November 2008
   
 Copyright © 2000, 2001, 2002, 2007, 2008
-http://www.w3.org/1999/xlink"; 
xlink:href="http://www.fsf.org/";>Free Software Foundation, Inc.
+http://www.w3.org/1999/xlink"; 
xlink:href="https://www.fsf.org";>Free Software Foundation, Inc.
   
   
 Everyone is permitted to copy and distribute verbatim copies of this
Index: doc/xml/gnu/gpl-3.0.xml
===
--- doc/xml/gnu/gpl-3.0.xml (revision 264709)
+++ doc/xml/gnu/gpl-3.0.xml (working copy)
@@ -9,7 +9,7 @@
   
   
 Copyright © 2007 Free Software Foundation, Inc.
-http://www.w3.org/1999/xlink"; 
xlink:href="http://www.fsf.org/";>http://www.fsf.org/
+http://www.w3.org/1999/xlink"; 
xlink:href="https://www.fsf.org";>https://www.fsf.org
   
   
 Everyone is permitted to copy and distribute verbatim copies of this 
license

((X /[ex] A) +- B) * A --> X +- A * B

2018-09-29 Thread Marc Glisse

Hello,

I noticed quite ugly code from both testcases. This transformation does 
not fix either, but it helps a bit.


bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2018-09-30  Marc Glisse  

gcc/
* match.pd (((X /[ex] A) +- B) * A): New transformation.

gcc/testsuite/
* gcc.dg/tree-ssa/muldiv-1.c: New file.
* gcc.dg/tree-ssa/muldiv-2.c: Likewise.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 264371)
+++ gcc/match.pd	(working copy)
@@ -2637,20 +2637,39 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& operand_equal_p (@1, build_low_bits_mask (TREE_TYPE (@1),
 		TYPE_PRECISION (type)), 0))
(convert @0)))
 
 
 /* (X /[ex] A) * A -> X.  */
 (simplify
   (mult (convert1? (exact_div @0 @@1)) (convert2? @1))
   (convert @0))
 
+/* ((X /[ex] A) +- B) * A  -->  X +- A * B.  */
+(for op (plus minus)
+ (simplify
+  (mult (convert1? (op (convert2? (exact_div @0 INTEGER_CST@@1)) INTEGER_CST@2)) @1)
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@2))
+   && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE (@2)))
+   (with
+ {
+   wi::overflow_type overflow;
+   wide_int mul = wi::mul (wi::to_wide (@1), wi::to_wide (@2),
+			   TYPE_SIGN (type), &overflow);
+ }
+ (if (types_match (type, TREE_TYPE (@2))
+ 	 && types_match (TREE_TYPE (@0), TREE_TYPE (@2)) && !overflow)
+  (op @0 { wide_int_to_tree (type, mul); })
+  (with { tree utype = unsigned_type_for (type); }
+   (convert (op (convert:utype @0)
+		(mult (convert:utype @1) (convert:utype @2))
+
 /* Canonicalization of binary operations.  */
 
 /* Convert X + -C into X - C.  */
 (simplify
  (plus @0 REAL_CST@1)
  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
   (with { tree tem = const_unop (NEGATE_EXPR, type, @1); }
(if (!TREE_OVERFLOW (tem) || !flag_trapping_math)
 (minus @0 { tem; })
 
Index: gcc/testsuite/gcc.dg/tree-ssa/muldiv-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/muldiv-1.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/muldiv-1.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized-raw" } */
+
+// ldist produces (((q-p-4)/4)&...+1)*4
+// Make sure we remove at least the division
+// Eventually this should just be n*4
+
+void foo(int*p, __SIZE_TYPE__ n){
+  for(int*q=p+n;p!=q;++p)*p=0;
+}
+
+/* { dg-final { scan-tree-dump "builtin_memset" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "div" "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/muldiv-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/muldiv-2.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/muldiv-2.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
+
+// 'a' should disappear, but we are not there yet
+
+int* f(int* a, int* b, int* c){
+__PTRDIFF_TYPE__ d = b - a;
+d += 1;
+return a + d;
+}
+
+/* { dg-final { scan-tree-dump-not "div" "optimized" } } */


[wwwdocs,fortran] readings.html -- Fix link to Michel Olagnon's Fortran 90 List

2018-09-29 Thread Gerald Pfeifer
The old link did not work any longer, without any redirect.  This
seems to be a valid replacement.

Committed.

Gerald

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.302
diff -u -r1.302 readings.html
--- readings.html   24 Sep 2018 21:51:59 -  1.302
+++ readings.html   29 Sep 2018 12:26:56 -
@@ -466,7 +466,7 @@
Other resources: 
   
 
-  http://www.ifremer.fr/ditigo/molagnon/fortran90/engfaq.html";>
+  http://www.fortran-2000.com/MichelList/";>
   Michel Olagnon's Fortran 90 List contains a "Tests and
   Benchmarks" section mentioning commercial testsuites.
 


[PATCH] i386: Replace __m512 with __m512d in _mm512_abs_pd

2018-09-29 Thread H.J. Lu
_mm512_abs_pd takes __m512d, not __m512.

OK for trunk and release branches?

Thanks.


H.J.
--
gcc/

PR target/87467
* config/i386/avx512fintrin.h (_mm512_abs_pd): Replace __m512
with __m512d.

gcc/testsuite/

* gcc.target/i386/pr87467.c: New test.
---
 gcc/config/i386/avx512fintrin.h |  2 +-
 gcc/testsuite/gcc.target/i386/pr87467.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87467.c

diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
index 80525f9fb4d..599701e10b3 100644
--- a/gcc/config/i386/avx512fintrin.h
+++ b/gcc/config/i386/avx512fintrin.h
@@ -7798,7 +7798,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 __A)
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_pd (__m512 __A)
+_mm512_abs_pd (__m512d __A)
 {
   return (__m512d) _mm512_and_epi64 ((__m512i) __A,
 _mm512_set1_epi64 (0x7fffLL));
diff --git a/gcc/testsuite/gcc.target/i386/pr87467.c 
b/gcc/testsuite/gcc.target/i386/pr87467.c
new file mode 100644
index 000..6a298d1746e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87467.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+/* { dg-final { scan-assembler-times "vpandq\[ 
\\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
+
+#include 
+
+__m512d
+avx512f_test (__m512d x)
+{
+  return _mm512_abs_pd (x);
+}
-- 
2.17.1



[committed] Use structure to bubble up information about unterminated strings from c_strlen

2018-09-29 Thread Jeff Law

This patch changes the NONSTR argument to c_strlen to instead be a
little data structure c_strlen can populate with nuggets of information
about the string.

There's clearly a need for the decl related to the non-string argument.
I see an immediate need for the length of a non-terminated string
(c_strlen returns NULL for non-terminated strings).  I also see a need
for the offset within the non-terminated strong as well.

We only populate the structure when c_strlen encounters a non-terminated
string.  One could argue we should always fill in the members.  Right
now I think filling it in for unterminated cases makes the most sense,
but I could be convinced otherwise.

I won't be surprised if subsequent warnings from Martin need additional
information about the string.  The idea here is we can add more elements
to the structure without continually adding arguments to c_strlen.

Bootstrapped in isolation as well as with Martin's patches for strnlen
and sprintf checking.  Installing on the trunk.

Jeff

* builtins.c (unterminated_array): Pass in c_strlen_data * to
c_strlen rather than just a tree *.
(c_strlen): Change NONSTR argument to a c_strlen_data pointer.
Update recursive calls appropriately.  If caller did not provide a
suitable data pointer, create a local one.  When a non-terminated
string is discovered, bubble up information about the string via the
c_strlen_data object.
* builtins.h (c_strlen): Update prototype.
(c_strlen_data): New structure.
* gimple-fold.c (get_range_strlen): Update calls to c_strlen.
For a type 2 call, if c_strlen indicates a non-terminated string
use the length of the non-terminated string.
(gimple_fold_builtin_stpcpy): Update calls to c_strlen.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7be6ceb3d62..23e0ec7b34d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,19 @@
+2018-09-29  Jeff Law  
+
+   * builtins.c (unterminated_array): Pass in c_strlen_data * to
+   c_strlen rather than just a tree *.
+   (c_strlen): Change NONSTR argument to a c_strlen_data pointer.
+   Update recursive calls appropriately.  If caller did not provide a
+   suitable data pointer, create a local one.  When a non-terminated
+   string is discovered, bubble up information about the string via the
+   c_strlen_data object.
+   * builtins.h (c_strlen): Update prototype.
+   (c_strlen_data): New structure.
+   * gimple-fold.c (get_range_strlen): Update calls to c_strlen.
+   For a type 2 call, if c_strlen indicates a non-terminated string
+   use the length of the non-terminated string.
+   (gimple_fold_builtin_stpcpy): Update calls to c_strlen.
+
 2018-09-28  John David Anglin  
 
* match.pd (simple_comparison): Don't optimize if either operand is
diff --git a/gcc/builtins.c b/gcc/builtins.c
index e655623febd..fe411efd9a9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -570,9 +570,10 @@ warn_string_no_nul (location_t loc, const char *fn, tree 
arg, tree decl)
 tree
 unterminated_array (tree exp)
 {
-  tree nonstr = NULL;
-  c_strlen (exp, 1, &nonstr);
-  return nonstr;
+  c_strlen_data data;
+  memset (&data, 0, sizeof (c_strlen_data));
+  c_strlen (exp, 1, &data);
+  return data.decl;
 }
 
 /* Compute the length of a null-terminated character string or wide
@@ -592,10 +593,12 @@ unterminated_array (tree exp)
accesses.  Note that this implies the result is not going to be emitted
into the instruction stream.
 
-   If a not zero-terminated string value is encountered and NONSTR is
-   non-zero, the declaration of the string value is assigned to *NONSTR.
-   *NONSTR is accumulating, thus not cleared on success, therefore it has
-   to be initialized to NULL_TREE by the caller.
+   Additional information about the string accessed may be recorded
+   in DATA.  For example, if SRC references an unterminated string,
+   then the declaration will be stored in the DECL field.   If the
+   length of the unterminated string can be determined, it'll be
+   stored in the LEN field.  Note this length could well be different
+   than what a C strlen call would return.
 
ELTSIZE is 1 for normal single byte character strings, and 2 or
4 for wide characer strings.  ELTSIZE is by default 1.
@@ -603,8 +606,16 @@ unterminated_array (tree exp)
The value returned is of type `ssizetype'.  */
 
 tree
-c_strlen (tree src, int only_value, tree *nonstr, unsigned eltsize)
+c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
 {
+  /* If we were not passed a DATA pointer, then get one to a local
+ structure.  That avoids having to check DATA for NULL before
+ each time we want to use it.  */
+  c_strlen_data local_strlen_data;
+  memset (&local_strlen_data, 0, sizeof (c_strlen_data));
+  if (!data)
+data = &local_strlen_data;
+
   gcc_checking_asse

[committed] Fix _mm512_{,mask_}abs_pd (PR target/87467)

2018-09-29 Thread Jakub Jelinek
Hi!

These two functions were copied from their _mm512_*abs_ps counterparts and
weren't fully adjusted, plus avx512f-abspd-1.c testcase was identical to
avx512f-absps-1.c and thus nothing caught this up in the testsuite.
Sorry for that.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk, 8 and 7 branches as obvious.

2018-09-29  Jakub Jelinek  

PR target/87467
* config/i386/avx512fintrin.h (_mm512_abs_pd, _mm512_mask_abs_pd): Use
__m512d type for __A argument rather than __m512.

* gcc.target/i386/avx512f-abspd-1.c (SIZE): Divide by two.
(CALC): Use double instead of float.
(TEST): Adjust to test _mm512_abs_pd and _mm512_mask_abs_pd rather than
_mm512_abs_ps and _mm512_mask_abs_ps.

--- gcc/config/i386/avx512fintrin.h.jj  2018-07-11 22:55:44.660456510 +0200
+++ gcc/config/i386/avx512fintrin.h 2018-09-29 10:29:46.731170222 +0200
@@ -7798,7 +7798,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask1
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_abs_pd (__m512 __A)
+_mm512_abs_pd (__m512d __A)
 {
   return (__m512d) _mm512_and_epi64 ((__m512i) __A,
 _mm512_set1_epi64 (0x7fffLL));
@@ -7806,7 +7806,7 @@ _mm512_abs_pd (__m512 __A)
 
 extern __inline __m512d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512 __A)
+_mm512_mask_abs_pd (__m512d __W, __mmask8 __U, __m512d __A)
 {
   return (__m512d)
 _mm512_mask_and_epi64 ((__m512i) __W, __U, (__m512i) __A,
--- gcc/testsuite/gcc.target/i386/avx512f-abspd-1.c.jj  2017-04-07 
21:20:58.092845868 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-abspd-1.c 2018-09-29 
10:25:59.062006644 +0200
@@ -6,11 +6,11 @@
 
 #include "avx512f-helper.h"
 
-#define SIZE (AVX512F_LEN / 32)
+#define SIZE (AVX512F_LEN / 64)
 #include "avx512f-mask-type.h"
 
 static void
-CALC (float *i1, float *r)
+CALC (double *i1, double *r)
 {
   int i;
 
@@ -24,27 +24,27 @@ CALC (float *i1, float *r)
 void
 TEST (void)
 {
-  float ck[SIZE];
+  double ck[SIZE];
   int i;
-  UNION_TYPE (AVX512F_LEN, ) s, d, dm;
+  UNION_TYPE (AVX512F_LEN, d) s, d, dm;
   MASK_TYPE mask = MASK_VALUE;
 
   for (i = 0; i < SIZE; i++)
 {
-  s.a[i] = i * ((i & 1) ? 3.5f : -7.5f);
+  s.a[i] = i * ((i & 1) ? 3.5 : -7.5);
   d.a[i] = DEFAULT_VALUE;
   dm.a[i] = DEFAULT_VALUE;
 }
 
   CALC (s.a, ck);
 
-  d.x = INTRINSIC (_abs_ps) (s.x);
-  dm.x = INTRINSIC (_mask_abs_ps) (dm.x, mask, s.x);
+  d.x = INTRINSIC (_abs_pd) (s.x);
+  dm.x = INTRINSIC (_mask_abs_pd) (dm.x, mask, s.x);
 
-  if (UNION_CHECK (AVX512F_LEN, ) (d, ck))
+  if (UNION_CHECK (AVX512F_LEN, d) (d, ck))
 abort ();
 
-  MASK_MERGE () (ck, mask, SIZE);
-  if (UNION_CHECK (AVX512F_LEN, ) (dm, ck))
+  MASK_MERGE (d) (ck, mask, SIZE);
+  if (UNION_CHECK (AVX512F_LEN, d) (dm, ck))
 abort ();
 }

Jakub


[PATCH] i386: Use TImode for BLKmode values in 2 integer registers

2018-09-29 Thread H.J. Lu
When passing and returning BLKmode values in 2 integer registers, use
1 TImode register instead of 2 DImode registers. Otherwise, V1TImode
may be used to move and store such BLKmode values, which prevent RTL
optimizations.

Tested on x86-64.  OK for trunk?

Thanks.

H.J.
---
gcc/

PR target/87370
* config/i386/i386.c (construct_container): Use TImode for
BLKmode values in 2 integer registers.

gcc/testsuite/

PR target/87370
* gcc.target/i386/pr87370.c: New test.
---
 gcc/config/i386/i386.c  | 17 +--
 gcc/testsuite/gcc.target/i386/pr87370.c | 39 +
 2 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87370.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 176cce521b7..54752513076 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7914,9 +7914,22 @@ construct_container (machine_mode mode, machine_mode 
orig_mode,
   if (n == 2
   && regclass[0] == X86_64_INTEGER_CLASS
   && regclass[1] == X86_64_INTEGER_CLASS
-  && (mode == CDImode || mode == TImode)
+  && (mode == CDImode || mode == TImode || mode == BLKmode)
   && intreg[0] + 1 == intreg[1])
-return gen_rtx_REG (mode, intreg[0]);
+{
+  if (mode == BLKmode)
+   {
+ /* Use TImode for BLKmode values in 2 integer registers.  */
+ exp[0] = gen_rtx_EXPR_LIST (VOIDmode,
+ gen_rtx_REG (TImode, intreg[0]),
+ GEN_INT (0));
+ ret = gen_rtx_PARALLEL (mode, rtvec_alloc (1));
+ XVECEXP (ret, 0, 0) = exp[0];
+ return ret;
+   }
+  else
+   return gen_rtx_REG (mode, intreg[0]);
+}
 
   /* Otherwise figure out the entries of the PARALLEL.  */
   for (i = 0; i < n; i++)
diff --git a/gcc/testsuite/gcc.target/i386/pr87370.c 
b/gcc/testsuite/gcc.target/i386/pr87370.c
new file mode 100644
index 000..c7b6295a33b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87370.c
@@ -0,0 +1,39 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+struct A
+{
+  int b[4];
+};
+struct B
+{
+  char a[12];
+  int b;
+};
+struct C
+{
+  char a[16];
+};
+
+struct A
+f1 (void)
+{
+  struct A x = {};
+  return x;
+}
+
+struct B
+f2 (void)
+{
+  struct B x = {};
+  return x;
+}
+
+struct C
+f3 (void)
+{
+  struct C x = {};
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "xmm" } } */
-- 
2.17.1



[wwwdocs] gcc-6/changes.html - openmp.org has moved to https

2018-09-29 Thread Gerald Pfeifer
Somehow this one apparently fell through when I made updates 
in the last weeks.

Committed.

Gerald

Index: gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.107
diff -u -r1.107 changes.html
--- gcc-6/changes.html  2 Sep 2018 08:14:17 -   1.107
+++ gcc-6/changes.html  29 Sep 2018 16:48:41 -
@@ -186,7 +186,7 @@
 
 C family
   
-Version 4.5 of the http://www.openmp.org/specifications/";
+Version 4.5 of the https://www.openmp.org/specifications/";
>OpenMP specification is now supported in the C and C++ 
compilers.
 
 The C and C++ compilers now support attributes on enumerators.  For 
instance,


[wwwdocs] Move our main page to HTML 5

2018-09-29 Thread Gerald Pfeifer
After my changes in the past weeks (and months) this is the last of 
our regular, not automatically generated, pages that moves to HTML 5.


There are some further simplifications and cleanups on this page in
particular and our style sheet(s), but the majority of work is behind 
us, and for those of you making changes things pretty much as easy as
it gets. :-)


Committed.

Gerald

Index: index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.1100
diff -u -r1.1100 index.html
--- index.html  23 Sep 2018 18:02:39 -  1.1100
+++ index.html  29 Sep 2018 16:51:24 -
@@ -1,9 +1,6 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
-
+
 
+
 
 
 GCC, the GNU Compiler Collection


Re: [PATCH] i386: Replace __m512 with __m512d in _mm512_abs_pd

2018-09-29 Thread Uros Bizjak
On Sat, Sep 29, 2018 at 3:49 PM H.J. Lu  wrote:
>
> _mm512_abs_pd takes __m512d, not __m512.
>
> OK for trunk and release branches?
>
> Thanks.
>
>
> H.J.
> --
> gcc/
>
> PR target/87467
> * config/i386/avx512fintrin.h (_mm512_abs_pd): Replace __m512
> with __m512d.
>
> gcc/testsuite/
>
> * gcc.target/i386/pr87467.c: New test.

OK everywhere.

Thanks,
Uros.

> ---
>  gcc/config/i386/avx512fintrin.h |  2 +-
>  gcc/testsuite/gcc.target/i386/pr87467.c | 11 +++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr87467.c
>
> diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h
> index 80525f9fb4d..599701e10b3 100644
> --- a/gcc/config/i386/avx512fintrin.h
> +++ b/gcc/config/i386/avx512fintrin.h
> @@ -7798,7 +7798,7 @@ _mm512_mask_abs_ps (__m512 __W, __mmask16 __U, __m512 
> __A)
>
>  extern __inline __m512d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm512_abs_pd (__m512 __A)
> +_mm512_abs_pd (__m512d __A)
>  {
>return (__m512d) _mm512_and_epi64 ((__m512i) __A,
>  _mm512_set1_epi64 
> (0x7fffLL));
> diff --git a/gcc/testsuite/gcc.target/i386/pr87467.c 
> b/gcc/testsuite/gcc.target/i386/pr87467.c
> new file mode 100644
> index 000..6a298d1746e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr87467.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -O2" } */
> +/* { dg-final { scan-assembler-times "vpandq\[ 
> \\t\]+\[^\{\n\]*%zmm\[0-9\]+(?:\n|\[ \\t\]+#)"  1 } } */
> +
> +#include 
> +
> +__m512d
> +avx512f_test (__m512d x)
> +{
> +  return _mm512_abs_pd (x);
> +}
> --
> 2.17.1
>


Re: [PATCH] i386: Use TImode for BLKmode values in 2 integer registers

2018-09-29 Thread Uros Bizjak
On Sat, Sep 29, 2018 at 6:36 PM H.J. Lu  wrote:
>
> When passing and returning BLKmode values in 2 integer registers, use
> 1 TImode register instead of 2 DImode registers. Otherwise, V1TImode
> may be used to move and store such BLKmode values, which prevent RTL
> optimizations.
>
> Tested on x86-64.  OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> gcc/
>
> PR target/87370
> * config/i386/i386.c (construct_container): Use TImode for
> BLKmode values in 2 integer registers.
>
> gcc/testsuite/
>
> PR target/87370
> * gcc.target/i386/pr87370.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.c  | 17 +--
>  gcc/testsuite/gcc.target/i386/pr87370.c | 39 +
>  2 files changed, 54 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr87370.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 176cce521b7..54752513076 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -7914,9 +7914,22 @@ construct_container (machine_mode mode, machine_mode 
> orig_mode,
>if (n == 2
>&& regclass[0] == X86_64_INTEGER_CLASS
>&& regclass[1] == X86_64_INTEGER_CLASS
> -  && (mode == CDImode || mode == TImode)
> +  && (mode == CDImode || mode == TImode || mode == BLKmode)
>&& intreg[0] + 1 == intreg[1])
> -return gen_rtx_REG (mode, intreg[0]);
> +{
> +  if (mode == BLKmode)
> +   {
> + /* Use TImode for BLKmode values in 2 integer registers.  */
> + exp[0] = gen_rtx_EXPR_LIST (VOIDmode,
> + gen_rtx_REG (TImode, intreg[0]),
> + GEN_INT (0));
> + ret = gen_rtx_PARALLEL (mode, rtvec_alloc (1));
> + XVECEXP (ret, 0, 0) = exp[0];
> + return ret;
> +   }
> +  else
> +   return gen_rtx_REG (mode, intreg[0]);
> +}
>
>/* Otherwise figure out the entries of the PARALLEL.  */
>for (i = 0; i < n; i++)
> diff --git a/gcc/testsuite/gcc.target/i386/pr87370.c 
> b/gcc/testsuite/gcc.target/i386/pr87370.c
> new file mode 100644
> index 000..c7b6295a33b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr87370.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +
> +struct A
> +{
> +  int b[4];
> +};
> +struct B
> +{
> +  char a[12];
> +  int b;
> +};
> +struct C
> +{
> +  char a[16];
> +};
> +
> +struct A
> +f1 (void)
> +{
> +  struct A x = {};
> +  return x;
> +}
> +
> +struct B
> +f2 (void)
> +{
> +  struct B x = {};
> +  return x;
> +}
> +
> +struct C
> +f3 (void)
> +{
> +  struct C x = {};
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "xmm" } } */
> --
> 2.17.1
>


Re: [C++ Patch] PR 84423 ("[6/7/8/9 Regression] [concepts] ICE with invalid using declaration")

2018-09-29 Thread Paolo Carlini

Hi again,

On 9/28/18 9:15 PM, Paolo Carlini wrote:
Thanks. About the location, you are certainly right, but doesn't seem 
trivial. Something we can do *now* is using 
declspecs->locations[ds_typedef] and declspecs->locations[ds_alias], 
but that gives us the location of the keyword 'typedef' and 'using', 
respectively, whereas I think that we would like to have the location 
of 'auto' itself. I could look into that as a follow-up piece work


In fact, completing the work turned out to be easy: ensure that 
cp_parser_alias_declaration saves the location of the defining-type-id 
too and then consistently use locations[ds_type_spec] in the error 
messages. Tested x86_64-linux. Still Ok? ;)


Thanks, Paolo.

///

/cp
2018-09-29  Paolo Carlini  

PR c++/84423
* pt.c (convert_template_argument): Immediately return error_mark_node
if the second argument is erroneous.
* parser.c (cp_parser_alias_declaration): Save the location
of the type-id too. 
* decl.c (grokdeclarator): Improve error message for 'auto' in
alias declaration.

/testsuite
2018-09-29  Paolo Carlini  

PR c++/84423
* g++.dg/concepts/pr84423.C: New.
Index: cp/decl.c
===
--- cp/decl.c   (revision 264687)
+++ cp/decl.c   (working copy)
@@ -11879,6 +11879,7 @@ grokdeclarator (const cp_declarator *declarator,
   /* If this is declaring a typedef name, return a TYPE_DECL.  */
   if (typedef_p && decl_context != TYPENAME)
 {
+  bool alias_p = decl_spec_seq_has_spec_p (declspecs, ds_alias);
   tree decl;
 
   /* This declaration:
@@ -11901,7 +11902,12 @@ grokdeclarator (const cp_declarator *declarator,
 
   if (type_uses_auto (type))
{
- error ("typedef declared %");
+ if (alias_p)
+   error_at (declspecs->locations[ds_type_spec],
+ "% not allowed in alias declaration");
+ else
+   error_at (declspecs->locations[ds_type_spec],
+ "typedef declared %");
  type = error_mark_node;
}
 
@@ -11961,7 +11967,7 @@ grokdeclarator (const cp_declarator *declarator,
  inlinep, friendp, raises != NULL_TREE,
  declspecs->locations);
 
-  if (decl_spec_seq_has_spec_p (declspecs, ds_alias))
+  if (alias_p)
/* Acknowledge that this was written:
 `using analias = atype;'.  */
TYPE_DECL_ALIAS_P (decl) = 1;
Index: cp/parser.c
===
--- cp/parser.c (revision 264687)
+++ cp/parser.c (working copy)
@@ -19073,6 +19073,7 @@ cp_parser_alias_declaration (cp_parser* parser)
G_("types may not be defined in alias template declarations");
 }
 
+  cp_token *type_token = cp_lexer_peek_token (parser->lexer);
   type = cp_parser_type_id (parser);
 
   /* Restore the error message if need be.  */
@@ -19107,6 +19108,9 @@ cp_parser_alias_declaration (cp_parser* parser)
   set_and_check_decl_spec_loc (&decl_specs,
   ds_alias,
   using_token);
+  set_and_check_decl_spec_loc (&decl_specs,
+  ds_type_spec,
+  type_token);
 
   if (parser->num_template_parameter_lists
   && !cp_parser_check_template_parameters (parser,
Index: cp/pt.c
===
--- cp/pt.c (revision 264687)
+++ cp/pt.c (working copy)
@@ -7776,7 +7776,7 @@ convert_template_argument (tree parm,
   tree val;
   int is_type, requires_type, is_tmpl_type, requires_tmpl_type;
 
-  if (parm == error_mark_node)
+  if (parm == error_mark_node || error_operand_p (arg))
 return error_mark_node;
 
   /* Trivially convert placeholders. */
Index: testsuite/g++.dg/concepts/pr84423.C
===
--- testsuite/g++.dg/concepts/pr84423.C (nonexistent)
+++ testsuite/g++.dg/concepts/pr84423.C (working copy)
@@ -0,0 +1,10 @@
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fconcepts" }
+
+template using A = auto;  // { dg-error "30:.auto. not allowed in 
alias declaration" }
+
+template class> struct B {};
+
+B b;
+
+typedef auto C;  // { dg-error "9:typedef declared .auto." }


[wwwdocs] style.mhtml - Remove the last traces of setting a DOCTYPE.

2018-09-29 Thread Gerald Pfeifer
This removes the last traces of setting a DOCTYPE.  All regular pages 
now have this included directly and for install/ the conversion from 
texinfo now does that.

David, this should be yet another change that'll help make your script
easier.

Committed.

Gerald

Index: style.mhtml
===
RCS file: /cvs/gcc/wwwdocs/htdocs/style.mhtml,v
retrieving revision 1.156
diff -u -r1.156 style.mhtml
--- style.mhtml 15 Sep 2018 23:03:24 -  1.156
+++ style.mhtml 29 Sep 2018 17:17:18 -
@@ -4,20 +4,12 @@
 
 
 
-;;; The "install/" pages are HTML, not XHTML.
+;;; The pages under install/ are generated from texinfo sources.
 
  "install/.*">
   
 >
 
-
- 
- 
- >
->
-
 ;;; Redefine the  tag so that we can put XHTML attributes inside.
 
 


[PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands

2018-09-29 Thread H.J. Lu
Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx
instructions which only read the low 4 or 8 bytes from the source.

gcc/

PR target/87317
* config/i386/sse.md (*sse4_1_v8qiv8hi2): New
pattern.
(*sse4_1_v4qiv4si2): Likewise.
(*sse4_1_v4hiv4si2): Likewise.
(*sse4_1_v2hiv2di2): Likewise.
(*sse4_1_v2siv2di2): Likewise.

gcc/testsuite/

PR target/87317
* gcc.target/i386/pr87317-1.c: New file.
* gcc.target/i386/pr87317-2.c: Likewise.
* gcc.target/i386/pr87317-3.c: Likewise.
* gcc.target/i386/pr87317-4.c: Likewise.
* gcc.target/i386/pr87317-5.c: Likewise.
* gcc.target/i386/pr87317-6.c: Likewise.
* gcc.target/i386/pr87317-7.c: Likewise.
* gcc.target/i386/pr87317-8.c: Likewise.
* gcc.target/i386/pr87317-9.c: Likewise.
* gcc.target/i386/pr87317-10.c: Likewise.
---
 gcc/config/i386/sse.md | 98 ++
 gcc/testsuite/gcc.target/i386/pr87317-1.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-10.c | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-2.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-3.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-4.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-5.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-6.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-7.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-8.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-9.c  | 13 +++
 11 files changed, 228 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-9.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d2722fdfcd0..c8ff35b125c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15521,6 +15521,26 @@
(set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_v8qiv8hi2"
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
+   (any_extend:V8HI
+ (vec_select:V8QI
+   (subreg:V16QI
+ (vec_concat:V2DI
+   (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+   (const_int 0)) 0)
+   (parallel [(const_int 0) (const_int 1)
+  (const_int 2) (const_int 3)
+  (const_int 4) (const_int 5)
+  (const_int 6) (const_int 7)]]
+  "TARGET_SSE4_1 &&  && "
+  "%vpmovbw\t{%1, %0|%0, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_v16qiv16si2"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
(any_extend:V16SI
@@ -15562,6 +15582,28 @@
(set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_v4qiv4si2"
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+   (any_extend:V4SI
+ (vec_select:V4QI
+   (subreg:V16QI
+ (vec_merge:V4SI
+   (vec_duplicate:V4SI
+ (match_operand:SI 1 "nonimmediate_operand" "m,*m,m"))
+   (const_vector:V4SI
+  [(const_int 0) (const_int 0)
+   (const_int 0) (const_int 0)])
+   (const_int 1)) 0)
+   (parallel [(const_int 0) (const_int 1)
+  (const_int 2) (const_int 3)]]
+  "TARGET_SSE4_1 && "
+  "%vpmovbd\t{%1, %0|%0, %k1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_v16hiv16si2"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
(any_extend:V16SI
@@ -15598,6 +15640,24 @@
(set_attr "prefix" "orig,orig,maybe_evex")
(set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_v4hiv4si2"
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+   (any_extend:V4SI
+ (vec_select:V4HI
+   (subreg:V8HI
+ (vec_concat:V2DI
+   (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+   (const_int 0)) 0)
+   (parallel [(const_int 0) (const_int 1)
+  (const_int 2) (const_int 3)]]
+  "TARGET_SSE4_1 && "
+  "%vpmovwd\t{%1, %0|%0, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   

[GCC 8] [PATCH] i386: Remove _Unwind_Frames_Increment

2018-09-29 Thread H.J. Lu
Hi,

I'd like to backport r263030:

https://gcc.gnu.org/ml/gcc-cvs/2018-07/msg00756.html

to GCC 8 for CET. Is it OK?

Thanks.

-- 
H.J.


Re: [PATCH] dumpfile.c: use prefixes other that 'note: ' for MSG_{OPTIMIZED_LOCATIONS|MISSED_OPTIMIZATION}

2018-09-29 Thread Andreas Schwab
That produces extra output that breaks a few tests.

g++.dg/vect/pr33426-ivdep-2.cc  -std=c++11 (test for excess errors)
g++.dg/vect/pr33426-ivdep-2.cc  -std=c++14 (test for excess errors)
g++.dg/vect/pr33426-ivdep-2.cc  -std=c++98 (test for excess errors)
g++.dg/vect/pr33426-ivdep-3.cc   (test for excess errors)
g++.dg/vect/pr33426-ivdep-4.cc   (test for excess errors)
g++.dg/vect/pr33426-ivdep.cc  -std=c++11 (test for excess errors)
g++.dg/vect/pr33426-ivdep.cc  -std=c++14 (test for excess errors)
g++.dg/vect/pr33426-ivdep.cc  -std=c++98 (test for excess errors)
gcc.dg/vect/nodump-vect-opt-info-1.c (test for excess errors)
gcc.dg/vect/vect-ivdep-1.c (test for excess errors)
gcc.dg/vect/vect-ivdep-1.c -flto -ffat-lto-objects (test for excess errors)
gcc.dg/vect/vect-ivdep-2.c (test for excess errors)
gcc.dg/vect/vect-ivdep-2.c -flto -ffat-lto-objects (test for excess errors)

FAIL: gcc.dg/vect/vect-ivdep-1.c (test for excess errors)
Excess errors:
/usr/local/gcc/gcc-20180929/gcc/testsuite/gcc.dg/vect/vect-ivdep-1.c:11:3: 
optimized:  loop versioned for vectorization to enhance alignment

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


PING^4 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-29 Thread H.J. Lu
On Tue, Sep 18, 2018 at 9:44 AM H.J. Lu  wrote:
>
> On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu  wrote:
> > On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu  wrote:
> >> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu  wrote:
> >>> With -mavx, for
> >>>
> >>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i
> >>> extern float f;
> >>> extern double d;
> >>> extern int i;
> >>>
> >>> void
> >>> foo (void)
> >>> {
> >>>   d = f;
> >>>   f = i;
> >>> }
> >>>
> >>> we need to generate
> >>>
> >>> vxorp[ds]   %xmmN, %xmmN, %xmmN
> >>> ...
> >>> vcvtss2sd   f(%rip), %xmmN, %xmmX
> >>> ...
> >>> vcvtsi2ss   i(%rip), %xmmN, %xmmY
> >>>
> >>> to avoid partial XMM register stall.  This patch adds a pass to generate
> >>> a single
> >>>
> >>> vxorps  %xmmN, %xmmN, %xmmN
> >>>
> >>> at function entry, which is shared by all SF and DF conversions, instead
> >>> of generating one
> >>>
> >>> vxorp[ds]   %xmmN, %xmmN, %xmmN
> >>>
> >>> for each SF/DF conversion.
> >>>
> >>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
> >>>
> >>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops
> >>>
> >>> are
> >>>
> >>> 1. On Broadwell server:
> >>>
> >>> 500.perlbench_r (-0.82%)
> >>> 502.gcc_r (0.73%)
> >>> 505.mcf_r (-0.24%)
> >>> 520.omnetpp_r (-2.22%)
> >>> 523.xalancbmk_r (-1.47%)
> >>> 525.x264_r (0.31%)
> >>> 531.deepsjeng_r (0.27%)
> >>> 541.leela_r (0.85%)
> >>> 548.exchange2_r (-0.11%)
> >>> 557.xz_r (-0.34%)
> >>> Geomean: (-0.23%)
> >>>
> >>> 503.bwaves_r (0.00%)
> >>> 507.cactuBSSN_r (-1.88%)
> >>> 508.namd_r (0.00%)
> >>> 510.parest_r (-0.56%)
> >>> 511.povray_r (0.49%)
> >>> 519.lbm_r (-1.28%)
> >>> 521.wrf_r (-0.28%)
> >>> 526.blender_r (0.55%)
> >>> 527.cam4_r (-0.20%)
> >>> 538.imagick_r (2.52%)
> >>> 544.nab_r (-0.18%)
> >>> 549.fotonik3d_r (-0.51%)
> >>> 554.roms_r (-0.22%)
> >>> Geomean: (0.00%)
> >>>
> >>> 2. On Skylake client:
> >>>
> >>> 500.perlbench_r (-0.29%)
> >>> 502.gcc_r (-0.36%)
> >>> 505.mcf_r (1.77%)
> >>> 520.omnetpp_r (-0.26%)
> >>> 523.xalancbmk_r (-3.69%)
> >>> 525.x264_r (-0.32%)
> >>> 531.deepsjeng_r (0.00%)
> >>> 541.leela_r (-0.46%)
> >>> 548.exchange2_r (0.00%)
> >>> 557.xz_r (0.00%)
> >>> Geomean: (-0.34%)
> >>>
> >>> 503.bwaves_r (0.00%)
> >>> 507.cactuBSSN_r (-0.56%)
> >>> 508.namd_r (0.87%)
> >>> 510.parest_r (0.00%)
> >>> 511.povray_r (-0.73%)
> >>> 519.lbm_r (0.84%)
> >>> 521.wrf_r (0.00%)
> >>> 526.blender_r (-0.81%)
> >>> 527.cam4_r (-0.43%)
> >>> 538.imagick_r (2.55%)
> >>> 544.nab_r (0.28%)
> >>> 549.fotonik3d_r (0.00%)
> >>> 554.roms_r (0.32%)
> >>> Geomean: (0.12%)
> >>>
> >>> 3. On Skylake server:
> >>>
> >>> 500.perlbench_r (-0.55%)
> >>> 502.gcc_r (0.69%)
> >>> 505.mcf_r (0.00%)
> >>> 520.omnetpp_r (-0.33%)
> >>> 523.xalancbmk_r (-0.21%)
> >>> 525.x264_r (-0.27%)
> >>> 531.deepsjeng_r (0.00%)
> >>> 541.leela_r (0.00%)
> >>> 548.exchange2_r (-0.11%)
> >>> 557.xz_r (0.00%)
> >>> Geomean: (0.00%)
> >>>
> >>> 503.bwaves_r (0.58%)
> >>> 507.cactuBSSN_r (0.00%)
> >>> 508.namd_r (0.00%)
> >>> 510.parest_r (0.18%)
> >>> 511.povray_r (-0.58%)
> >>> 519.lbm_r (0.25%)
> >>> 521.wrf_r (0.40%)
> >>> 526.blender_r (0.34%)
> >>> 527.cam4_r (0.19%)
> >>> 538.imagick_r (5.87%)
> >>> 544.nab_r (0.17%)
> >>> 549.fotonik3d_r (0.00%)
> >>> 554.roms_r (0.00%)
> >>> Geomean: (0.62%)
> >>>
> >>> On Skylake client, impacts on 538.imagick_r are
> >>>
> >>> size before:
> >>>
> >>>textdata bss dec hex filename
> >>> 277   108765576 2572029  273efd imagick_r.exe
> >>>
> >>> size after:
> >>>
> >>>textdata bss dec hex filename
> >>> 2511825   108765576 2528277  269415 imagick_r.exe
> >>>
> >>> number of vxorp[ds]:
> >>>
> >>> before  after   difference
> >>> 14570   4515-69%
> >>>
> >>> OK for trunk?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> H.J.
> >>> ---
> >>> gcc/
> >>>
> >>> 2018-08-28  H.J. Lu  
> >>> Sunil K Pandey  
> >>>
> >>> PR target/87007
> >>> * config/i386/i386-passes.def: Add
> >>> pass_remove_partial_avx_dependency.
> >>> * config/i386/i386-protos.h
> >>> (make_pass_remove_partial_avx_dependency): New.
> >>> * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
> >>> New function.
> >>> (pass_data_remove_partial_avx_dependency): New.
> >>> (pass_remove_partial_avx_dependency): Likewise.
> >>> (make_pass_remove_partial_avx_dependency): Likewise.
> >>> * config/i386/i386.md (SF/DF conversion splitters): Disabled
> >>> for TARGET_AVX.
> >>>
> >>> gcc/testsuite/
> >>>
> >>> 2018-08-28  H.J. Lu  
> >>> Sunil K Pandey  
> >>>
> >>> PR target/87007
> >>> * gcc.target/i386/pr87007.c: New file.
> >>
> >>
> >> PING:
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html
> >>
> >
> > PING.
> >
>
> Hi Kirll, Jakub, Jan,
>
> Can you take a look?
>

PING.

-- 
H.J.


[PATCH] libgo: Don't assume sys.GoarchAmd64 == 64-bit pointer

2018-09-29 Thread H.J. Lu
On x86-64, sys.GoarchAmd64 == 1 for -mx32.  But -mx32 has 32-bit
pointer, not 64-bit.  There is

// _64bit = 1 on 64-bit systems, 0 on 32-bit systems
_64bit = 1 << (^uintptr(0) >> 63) / 2

We should check both _64bit and sys.GoarchAmd64.

PR go/87470
* go/runtime/malloc.go (arenaBaseOffset): Also check _64bit.
---
 libgo/go/runtime/malloc.go | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgo/go/runtime/malloc.go b/libgo/go/runtime/malloc.go
index ac4759ffbf1..c3445387057 100644
--- a/libgo/go/runtime/malloc.go
+++ b/libgo/go/runtime/malloc.go
@@ -306,7 +306,7 @@ const (
//
// On other platforms, the user address space is contiguous
// and starts at 0, so no offset is necessary.
-   arenaBaseOffset uintptr = sys.GoarchAmd64 * (1 << 47)
+   arenaBaseOffset uintptr = _64bit * sys.GoarchAmd64 * (1 << 47)
 
// Max number of threads to run garbage collection.
// 2, 3, and 4 are all plausible maximums depending
-- 
2.17.1



Need link from high authority site?

2018-09-29 Thread Anurag Chatap
Hey There

My name is Anurag Chatap from https://Programesecure.com,

I would love to add your site link in my article Otherwise if you want you can 
provide us an Article and We will publish it.

Site DA: 31
Site PA: 35
Traffic: 3+ pageviews per month
Age: 3.8 Years

The link will be permanent and Dofollow.

Would you be interested in any of those options?

I’d be happy to give you a link. Price is very reasonable, If you have any 
question please feel free to ask.


Thanks!
Anurag Chatap
CEO, Programesecure.com

 If you don't want to receive any more emails, just click here.
  

Re: [PATCH] libgo: Don't assume sys.GoarchAmd64 == 64-bit pointer

2018-09-29 Thread Ian Lance Taylor
"H.J. Lu"  writes:

> On x86-64, sys.GoarchAmd64 == 1 for -mx32.  But -mx32 has 32-bit
> pointer, not 64-bit.  There is
>
> // _64bit = 1 on 64-bit systems, 0 on 32-bit systems
> _64bit = 1 << (^uintptr(0) >> 63) / 2
>
> We should check both _64bit and sys.GoarchAmd64.

Thanks, but I think the correct fix is to set GOARCH to amd64p32 when
using x32.  I'm trying that to see if it will work out.

Ian