Re: [PATCH] RTEMS/EPIPHANY: Add RTEMS support

2018-01-05 Thread Sebastian Huber

On 04/01/18 18:37, Jeff Law wrote:

On 01/04/2018 01:51 AM, Sebastian Huber wrote:

gcc/
config.gcc (epiphany-*-elf*): Add (epiphany-*-rtems*)
configuration.
config/epiphany/rtems.h: New file.

libgcc/
config.host (epiphany-*-elf*): Add (epiphany-*-rtems*)
configuration.

Seems to me like this would fall maintainership of the RTEMS bits.  SO
you could self-approve.  I gave it a looksie and didn't see anything
objectionable.


I would like to back port this to GCC 7 before the GCC 7.3 release.




You might consider not including dbxelf in tm_file.  I realize many
other targets include it, but embedded stabs is something we're really
like to get rid of.


For RTEMS we don't need this embedded stabs support at all, but I would 
like to stay as close to the *-*-elf configuration as possible. Would it 
make sense to deprecate the dbxelf in general?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH] Punt on possibly throwing assignments in SLSR (PR tree-optimization/83605)

2018-01-05 Thread Richard Biener
On Thu, 4 Jan 2018, Jakub Jelinek wrote:

> Hi!
> 
> While the testcase could be perhaps handled with some extra effort (the
> issue there is just CSE of an early possibly throwing trapping addition,
> so maybe_cleanup_or_replace_eh_stmt + gimple_purge_dead_eh_edges +
> TODO_cleanup_cfg might do the job, but I'm afraid the pass wouldn't know
> what to do if it rewrites some arithmetics as a different one where
> EH edges would need to be added etc.
> 
> So, instead this patch just punts with -fnon-call-exceptions and possibly
> throwing statements.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> If later on we find something where it will be worthwhile to spend time on
> this, it can be easily reverted together with the full support for that.
> 
> 2018-01-04  Jakub Jelinek  
> 
>   PR tree-optimization/83605
>   * gimple-ssa-strength-reduction.c: Include tree-eh.h.
>   (find_candidates_dom_walker::before_dom_children): Ignore stmts that
>   can throw.
> 
>   * gcc.dg/pr83605.c: New test.
> 
> --- gcc/gimple-ssa-strength-reduction.c.jj2018-01-03 10:19:54.0 
> +0100
> +++ gcc/gimple-ssa-strength-reduction.c   2018-01-04 10:39:48.273512779 
> +0100
> @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.
>  #include "params.h"
>  #include "tree-ssa-address.h"
>  #include "tree-affine.h"
> +#include "tree-eh.h"
>  #include "builtins.h"
>  
>  /* Information about a strength reduction candidate.  Each statement
> @@ -1747,6 +1748,9 @@ find_candidates_dom_walker::before_dom_c
>  {
>gimple *gs = gsi_stmt (gsi);
>  
> +  if (stmt_could_throw_p (gs))
> + continue;
> +
>if (gimple_vuse (gs) && gimple_assign_single_p (gs))
>   slsr_process_ref (gs);
>  
> --- gcc/testsuite/gcc.dg/pr83605.c.jj 2018-01-04 10:35:20.874481057 +0100
> +++ gcc/testsuite/gcc.dg/pr83605.c2018-01-04 10:35:01.972479109 +0100
> @@ -0,0 +1,20 @@
> +/* PR tree-optimization/83605 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -ftrapv -fexceptions -fnon-call-exceptions" } */
> +
> +int a;
> +
> +int
> +foo (int x)
> +{
> +  int b = a;
> +  {
> +int c;
> +int *d = (x == 0) ? &c : &b;
> +
> +for (a = 0; a < 2; ++a)
> +  c = (x + b) < a;
> +
> +return *d;
> +  }
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Revert DECL_USER_ALIGN part of r241959

2018-01-05 Thread Richard Biener
On Wed, Jan 3, 2018 at 4:06 PM, Richard Sandiford
 wrote:
> r241959 included code to stop us increasing the alignment of a
> "user-aligned" variable.  This wasn't the main purpose of the patch,
> and I think it was just there to make the testcase work.
>
> The documentation for the aligned attribute says:
>
>   This attribute specifies a minimum alignment for the variable or
>   structure field, measured in bytes.
>
> The DECL_USER_ALIGN code seemed to be treating as a sort of maximum
> instead, but there's not really such a thing as a maximum here: the
> variable might still end up at the start of a section that has a higher
> alignment, or might end up by chance on a "very aligned" boundary at
> link or load time.
>
> I think people who add alignment attributes want to ensure that
> accesses to that variable are fast, so it seems counter-intuitive
> for it to make the access slower.  The vect-align-4.c test is an
> example of this: for targets with 128-bit vectors, we get better
> code without the aligned attribute than we do with it.
>
> Tested on aarch64-linux-gnu so far, will test more widely if OK.

Works for me - I think I did this to copy behavior of code elsewhere
(pass_increase_alignment::increase_alignment).

Richard.

> Thanks,
> Richard
>
>
> 2018-01-03  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
> punt for user-aligned variables.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-align-4.c: New test.
> * gcc.dg/vect/vect-nb-iter-ub-2.c (cc): Remove alignment attribute
> and redefine as a structure with an unaligned member "b".
> (foo): Update accordingly.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-01-03 15:03:14.301330558 +
> +++ gcc/tree-vect-data-refs.c   2018-01-03 15:03:14.454324422 +
> @@ -920,19 +920,6 @@ vect_compute_data_ref_alignment (struct
>   return true;
> }
>
> -  if (DECL_USER_ALIGN (base))
> -   {
> - if (dump_enabled_p ())
> -   {
> - dump_printf_loc (MSG_NOTE, vect_location,
> -  "not forcing alignment of user-aligned "
> -  "variable: ");
> - dump_generic_expr (MSG_NOTE, TDF_SLIM, base);
> - dump_printf (MSG_NOTE, "\n");
> -   }
> - return true;
> -   }
> -
>/* Force the alignment of the decl.
>  NOTE: This is the only change to the code we make during
>  the analysis phase, before deciding to vectorize the loop.  */
> Index: gcc/testsuite/gcc.dg/vect/vect-align-4.c
> ===
> --- /dev/null   2018-01-03 08:32:43.873058927 +
> +++ gcc/testsuite/gcc.dg/vect/vect-align-4.c2018-01-03 15:03:14.453324462 
> +
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-add-options bind_pic_locally } */
> +
> +__attribute__((aligned (8))) int a[2048] = {};
> +
> +void
> +f1 (void)
> +{
> +  for (int i = 0; i < 2048; i++)
> +a[i]++;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "Vectorizing an unaligned access" "vect" 
> } } */
> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
> peeling" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
> ===
> --- gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c   2018-01-03 
> 15:03:14.301330558 +
> +++ gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c   2018-01-03 
> 15:03:14.454324422 +
> @@ -3,18 +3,19 @@
>  #include "tree-vect.h"
>
>  int ii[32];
> -char cc[66] __attribute__((aligned(1))) =
> +struct { char a; char b[66]; } cc = { 0,
>{ 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0,
>  10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0,
>  20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0,
> -30, 0, 31, 0 };
> +30, 0, 31, 0 }
> +};
>
>  void __attribute__((noinline,noclone))
>  foo (int s)
>  {
>int i;
> for (i = 0; i < s; i++)
> - ii[i] = (int) cc[i*2];
> + ii[i] = (int) cc.b[i*2];
>  }
>
>  int main (int argc, const char **argv)


Re: Revert DECL_USER_ALIGN part of r241959

2018-01-05 Thread Richard Biener
On Fri, Jan 5, 2018 at 9:47 AM, Richard Biener
 wrote:
> On Wed, Jan 3, 2018 at 4:06 PM, Richard Sandiford
>  wrote:
>> r241959 included code to stop us increasing the alignment of a
>> "user-aligned" variable.  This wasn't the main purpose of the patch,
>> and I think it was just there to make the testcase work.
>>
>> The documentation for the aligned attribute says:
>>
>>   This attribute specifies a minimum alignment for the variable or
>>   structure field, measured in bytes.
>>
>> The DECL_USER_ALIGN code seemed to be treating as a sort of maximum
>> instead, but there's not really such a thing as a maximum here: the
>> variable might still end up at the start of a section that has a higher
>> alignment, or might end up by chance on a "very aligned" boundary at
>> link or load time.
>>
>> I think people who add alignment attributes want to ensure that
>> accesses to that variable are fast, so it seems counter-intuitive
>> for it to make the access slower.  The vect-align-4.c test is an
>> example of this: for targets with 128-bit vectors, we get better
>> code without the aligned attribute than we do with it.
>>
>> Tested on aarch64-linux-gnu so far, will test more widely if OK.
>
> Works for me - I think I did this to copy behavior of code elsewhere
> (pass_increase_alignment::increase_alignment).

Note I had the impression that using the aligned attribute with a "low"
alignment is a hint to the compiler to not increase .data by overaligning
variables.  I think there were PRs asking for that.

Richard.

> Richard.
>
>> Thanks,
>> Richard
>>
>>
>> 2018-01-03  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
>> punt for user-aligned variables.
>>
>> gcc/testsuite/
>> * gcc.dg/vect/vect-align-4.c: New test.
>> * gcc.dg/vect/vect-nb-iter-ub-2.c (cc): Remove alignment attribute
>> and redefine as a structure with an unaligned member "b".
>> (foo): Update accordingly.
>>
>> Index: gcc/tree-vect-data-refs.c
>> ===
>> --- gcc/tree-vect-data-refs.c   2018-01-03 15:03:14.301330558 +
>> +++ gcc/tree-vect-data-refs.c   2018-01-03 15:03:14.454324422 +
>> @@ -920,19 +920,6 @@ vect_compute_data_ref_alignment (struct
>>   return true;
>> }
>>
>> -  if (DECL_USER_ALIGN (base))
>> -   {
>> - if (dump_enabled_p ())
>> -   {
>> - dump_printf_loc (MSG_NOTE, vect_location,
>> -  "not forcing alignment of user-aligned "
>> -  "variable: ");
>> - dump_generic_expr (MSG_NOTE, TDF_SLIM, base);
>> - dump_printf (MSG_NOTE, "\n");
>> -   }
>> - return true;
>> -   }
>> -
>>/* Force the alignment of the decl.
>>  NOTE: This is the only change to the code we make during
>>  the analysis phase, before deciding to vectorize the loop.  */
>> Index: gcc/testsuite/gcc.dg/vect/vect-align-4.c
>> ===
>> --- /dev/null   2018-01-03 08:32:43.873058927 +
>> +++ gcc/testsuite/gcc.dg/vect/vect-align-4.c2018-01-03 
>> 15:03:14.453324462 +
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-add-options bind_pic_locally } */
>> +
>> +__attribute__((aligned (8))) int a[2048] = {};
>> +
>> +void
>> +f1 (void)
>> +{
>> +  for (int i = 0; i < 2048; i++)
>> +a[i]++;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-not "Vectorizing an unaligned access" "vect" 
>> } } */
>> +/* { dg-final { scan-tree-dump-not "Alignment of access forced using 
>> peeling" "vect" } } */
>> Index: gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c
>> ===
>> --- gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c   2018-01-03 
>> 15:03:14.301330558 +
>> +++ gcc/testsuite/gcc.dg/vect/vect-nb-iter-ub-2.c   2018-01-03 
>> 15:03:14.454324422 +
>> @@ -3,18 +3,19 @@
>>  #include "tree-vect.h"
>>
>>  int ii[32];
>> -char cc[66] __attribute__((aligned(1))) =
>> +struct { char a; char b[66]; } cc = { 0,
>>{ 0, 0, 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0, 9, 0,
>>  10, 0, 11, 0, 12, 0, 13, 0, 14, 0, 15, 0, 16, 0, 17, 0, 18, 0, 19, 0,
>>  20, 0, 21, 0, 22, 0, 23, 0, 24, 0, 25, 0, 26, 0, 27, 0, 28, 0, 29, 0,
>> -30, 0, 31, 0 };
>> +30, 0, 31, 0 }
>> +};
>>
>>  void __attribute__((noinline,noclone))
>>  foo (int s)
>>  {
>>int i;
>> for (i = 0; i < s; i++)
>> - ii[i] = (int) cc[i*2];
>> + ii[i] = (int) cc.b[i*2];
>>  }
>>
>>  int main (int argc, const char **argv)


Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-01-05 Thread Richard Biener
On Wed, Jan 3, 2018 at 5:31 PM, Marek Polacek  wrote:
> Here we are crashing because cxx_fold_indirect_ref got a POINTER_PLUS_EXPR
> with offset > signed HOST_WIDE_INT and we tried to convert it to sHWI.
>
> The matching code in fold_indirect_ref_1 uses uHWIs so I've followed suit.
> But that code now also uses poly_uint64 and I'm not sure if any of the
> constexpr.c code should use it, too.  But this patch fixes the ICE.

POINTER_PLUS_EXPR offets are to be interpreted as signed (ptrdiff_t)
so using uhwi and then performing an unsigned division is wrong code.
See mem_ref_offset how to deal with this (ugh - poly-ints...).  Basically
you have to force the thing to signed.  Like just use

  HOST_WIDE_INT offset = TREE_INT_CST_LOW (op01);

Richard.

> Bootstrapped/regtested on x86_64-linux, ok for trunk/7?
>
> 2018-01-03  Marek Polacek  
>
> PR c++/83659
> * constexpr.c (cxx_fold_indirect_ref): Use unsigned HOST_WIDE_INT
> when computing offsets.
>
> * g++.dg/torture/pr83659.C: New test.
>
> diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
> index 1aeacd51810..cf7c994b381 100644
> --- gcc/cp/constexpr.c
> +++ gcc/cp/constexpr.c
> @@ -3109,9 +3109,10 @@ cxx_fold_indirect_ref (location_t loc, tree type, tree 
> op0, bool *empty_base)
>   && (same_type_ignoring_top_level_qualifiers_p
>   (type, TREE_TYPE (op00type
> {
> - HOST_WIDE_INT offset = tree_to_shwi (op01);
> + unsigned HOST_WIDE_INT offset = tree_to_uhwi (op01);
>   tree part_width = TYPE_SIZE (type);
> - unsigned HOST_WIDE_INT part_widthi = tree_to_shwi 
> (part_width)/BITS_PER_UNIT;
> + unsigned HOST_WIDE_INT part_widthi
> +   = tree_to_uhwi (part_width) / BITS_PER_UNIT;
>   unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
>   tree index = bitsize_int (indexi);
>
> diff --git gcc/testsuite/g++.dg/torture/pr83659.C 
> gcc/testsuite/g++.dg/torture/pr83659.C
> index e69de29bb2d..d9f709bb520 100644
> --- gcc/testsuite/g++.dg/torture/pr83659.C
> +++ gcc/testsuite/g++.dg/torture/pr83659.C
> @@ -0,0 +1,11 @@
> +// PR c++/83659
> +// { dg-do compile }
> +
> +typedef int V __attribute__ ((__vector_size__ (16)));
> +V a;
> +
> +int
> +main ()
> +{
> +  reinterpret_cast  (&a)[-1] += 1;
> +}
>
> Marek


Re: Allow VEC_PERM_EXPR folding to fail

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 6:46 PM, Jeff Law  wrote:
> On 01/04/2018 03:02 AM, Richard Sandiford wrote:
>> tree-ssa-forwprop.c was asserting that a VEC_PERM_EXPR fold on three
>> VECTOR_CSTs would always succeed, but it's possible for it to fail
>> with variable-length vectors.
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> Also tested by comparing the before and after assembly output for at
>> least one target per CPU directory.  OK to install?
>>
>> Richard
>>
>>
>> 2018-01-04  Richard Sandiford  
>>
>> gcc/
>>   * tree-ssa-forwprop.c (is_combined_permutation_identity): Allow
>>   the VEC_PERM_EXPR fold to fail.
> OK.  Ideally we'd have a test which failed here.

Yeah, can't make up an example myself that would fail.

Richard.

>
> jeff


Re: Add tree_fits_uhwi_p tests to BIT_FIELD_REF folder

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 11:06 AM, Richard Sandiford
 wrote:
> The first BIT_FIELD_REF folding pattern assumed without checking that
> operands satisfy tree_fits_uhwi_p.  The second pattern does check this:
>
>   /* On constants we can use native encode/interpret to constant
>  fold (nearly) all BIT_FIELD_REFs.  */
>   if (CONSTANT_CLASS_P (arg0)
>   && can_native_interpret_type_p (type)
>   && BITS_PER_UNIT == 8
>   && tree_fits_uhwi_p (op1)
>   && tree_fits_uhwi_p (op2))
>
> so this patch adds the checks to the first pattern too.  This is needed
> for POLY_INT_CST bit positions.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the before and after assembly output for at
> least one target per CPU directory.  OK to install?

Ok.  It's of course because verify_expr verified that.

Richard.

> Richard
>
>
> 2018-01-04  Richard Sandiford  
>
> gcc/
> * fold-const.c (fold_ternary_loc): Check tree_fits_uhwi_p before
> using tree_to_uhwi.
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c2018-01-03 21:42:34.349039784 +
> +++ gcc/fold-const.c2018-01-04 10:06:02.165809202 +
> @@ -11643,7 +11643,9 @@ fold_ternary_loc (location_t loc, enum t
>if (TREE_CODE (arg0) == VECTOR_CST
>   && (type == TREE_TYPE (TREE_TYPE (arg0))
>   || (TREE_CODE (type) == VECTOR_TYPE
> - && TREE_TYPE (type) == TREE_TYPE (TREE_TYPE (arg0)
> + && TREE_TYPE (type) == TREE_TYPE (TREE_TYPE (arg0
> + && tree_fits_uhwi_p (op1)
> + && tree_fits_uhwi_p (op2))
> {
>   tree eltype = TREE_TYPE (TREE_TYPE (arg0));
>   unsigned HOST_WIDE_INT width = tree_to_uhwi (TYPE_SIZE (eltype));


Re: [PATCH] Fix PR80846, change vectorizer reduction epilogue (on x86)

2018-01-05 Thread Richard Biener
On Tue, 28 Nov 2017, Richard Biener wrote:

> 
> The following adds a new target hook, targetm.vectorize.split_reduction,
> which allows the target to specify a preferred mode to perform the
> final reducion on using either vector shifts or scalar extractions.
> Up to that mode the vector reduction result is reduced by combining
> lowparts and highparts recursively.  This avoids lane-crossing operations
> when doing AVX256 on Zen and Bulldozer and also speeds up things on
> Haswell (I verified ~20% speedup on Broadwell).
> 
> Thus the patch implements the target hook on x86 to _always_ prefer
> SSE modes for the final reduction.
> 
> For the testcase in the bugzilla
> 
> int sumint(const int arr[]) {
> arr = __builtin_assume_aligned(arr, 64);
> int sum=0;
> for (int i=0 ; i<1024 ; i++)
>   sum+=arr[i];
> return sum;
> }
> 
> this changes -O3 -mavx512f code from
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxord  %zmm0, %zmm0, %zmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %zmm0, %zmm0
> addq$64, %rdi
> cmpq%rdi, %rax
> jne .L2
> vpxord  %zmm1, %zmm1, %zmm1
> vshufi32x4  $78, %zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovdqa64   .LC0(%rip), %zmm2
> vpermi2d%zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovdqa64   .LC1(%rip), %zmm2
> vpermi2d%zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovdqa64   .LC2(%rip), %zmm2
> vpermi2d%zmm1, %zmm0, %zmm2
> vpaddd  %zmm2, %zmm0, %zmm0
> vmovd   %xmm0, %eax
> 
> to
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxord  %zmm0, %zmm0, %zmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %zmm0, %zmm0
> addq$64, %rdi
> cmpq%rdi, %rax
> jne .L2
> vextracti64x4   $0x1, %zmm0, %ymm1
> vpaddd  %ymm0, %ymm1, %ymm1
> vmovdqa %xmm1, %xmm0
> vextracti128$1, %ymm1, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vpsrldq $8, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vpsrldq $4, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vmovd   %xmm0, %eax
> 
> and for -O3 -mavx2 from
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxor   %xmm0, %xmm0, %xmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %ymm0, %ymm0
> addq$32, %rdi
> cmpq%rdi, %rax
> jne .L2
> vpxor   %xmm1, %xmm1, %xmm1
> vperm2i128  $33, %ymm1, %ymm0, %ymm2
> vpaddd  %ymm2, %ymm0, %ymm0
> vperm2i128  $33, %ymm1, %ymm0, %ymm2
> vpalignr$8, %ymm0, %ymm2, %ymm2
> vpaddd  %ymm2, %ymm0, %ymm0
> vperm2i128  $33, %ymm1, %ymm0, %ymm1
> vpalignr$4, %ymm0, %ymm1, %ymm1
> vpaddd  %ymm1, %ymm0, %ymm0
> vmovd   %xmm0, %eax
> 
> to
> 
> sumint:
> .LFB0:
> .cfi_startproc
> vpxor   %xmm0, %xmm0, %xmm0
> leaq4096(%rdi), %rax
> .p2align 4,,10
> .p2align 3
> .L2:
> vpaddd  (%rdi), %ymm0, %ymm0
> addq$32, %rdi
> cmpq%rdi, %rax
> jne .L2
> vmovdqa %xmm0, %xmm1
> vextracti128$1, %ymm0, %xmm0
> vpaddd  %xmm0, %xmm1, %xmm0
> vpsrldq $8, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vpsrldq $4, %xmm0, %xmm1
> vpaddd  %xmm1, %xmm0, %xmm0
> vmovd   %xmm0, %eax
> vzeroupper
> ret
> 
> which besides being faster is also smaller (less prefixes).
> 
> SPEC 2k6 results on Haswell (thus AVX2) are neutral.  As it merely
> effects reduction vectorization epilogues I didn't expect big effects
> but for loops that do not run much (more likely with AVX512).
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Ok for trunk?

Ping?

Richard.

> The PR mentions some more tricks to optimize the sequence but
> those look like backend only optimizations.
> 
> Thanks,
> Richard.
> 
> 2017-11-28  Richard Biener  
> 
>   PR tree-optimization/80846
>   * target.def (split_reduction): New target hook.
>   * targhooks.c (default_split_reduction): New function.
>   * targhooks.h (default_split_reduction): Declare.
>   * tree-vect-loop.c (vect_create_epilog_for_reduction): If the
>   target requests first reduce vectors by combining low and high
>   parts.
>   * tree-vect-stmts.c (vect_gen_perm_mask_any): Adjust.
>   (get_vectype_for_scalar_type_and_size): Export.
>   * tree-vectorizer.h (get_vectype_for_scalar_type_and_size): Declare.
> 
>   * doc/tm.texi.in (TARGET_VECTORIZE_SPLIT_REDUCTION): Document.
>   * doc/tm.texi: Regenerate.
> 
>

Re: [PATCH] Simplify floating point comparisons

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 10:27 PM, Marc Glisse  wrote:
> (just a log of my thoughts while reading the patch)
>
> On Thu, 4 Jan 2018, Wilco Dijkstra wrote:
>
>> Richard Biener wrote:
>>>
>>> On Tue, Oct 17, 2017 at 6:28 PM, Wilco Dijkstra 
>>> wrote:
>>
>>
 +(if (flag_unsafe_math_optimizations)
 +  /* Simplify (C / x op 0.0) to x op 0.0 for C > 0.  */
 +  (for op (lt le gt ge)
 +   neg_op (gt ge lt le)
 +(simplify
 +  (op (rdiv REAL_CST@0 @1) real_zerop@2)
 +  (switch
 +   (if (real_less (&dconst0, TREE_REAL_CST_PTR (@0)))
>>>
>>>
>>> Note that real_less (0., +Inf) so I think you either need to check C is
>>> 'normal'
>>> or ! HONOR_INFINITIES.
>>
>>
>> Yes, it was missing an explicit check for infinity, now added.
>
>
> I don't understand how the check you added helps.
>
>>> There's also the underflow issue I guess this is what
>>> -funsafe-math-optimizations
>>> is for.  I think ignoring underflows is dangerous though.
>>
>>
>> We could change C / x > 0 to x >= 0 so the underflow case is included.
>> However that still means x == 0.0 would behave differently - so the
>> question is
>> what exactly does -funsafe-math-optimization allow?
>
>
> That is indeed unclear. HONOR_SIGNED_ZEROS, HONOR_NANS, HONOR_INFINITIES are
> better defined and could also see some use.
>
> 1/X>=0: if X can be zero (and the inverse can be infinite), the result
> depends on the sign of that zero. If !HONOR_SIGNED_ZEROS ||
> !HONOR_INFINITIES, it looks like we can simplify to either X>0 or X>=0,
> doesn't matter which.
> 1/X>=0 is true iff X is not NaN and the sign bit is not set, so with
> !HONOR_NANS it could also be turned into a bit test.
>
> It works the same for C/X>=0 with 0 difference is that X=infinity now gives false (if HONOR_NANS).
>
> C/X>0 is more tricky because small/big may round to zero. For C large enough
> (assuming denormals, 1 is large enough for instance), the only new issue is
> X=infinity. For smaller C, we start getting wrong results for finite values
> of X, and presumably that's where flag_unsafe_math_optimizations comes into
> play.
>
> If we consider flag_unsafe_math_optimizations as a kitchen sink that implies
> !HONOR_SIGNED_ZEROS && !HONOR_INFINITIES, then you don't even need your
> REAL_VALUE_ISINF test.

Without looking at the latest patch -funsafe-math-optimizations is a
kitchen-sink
we should avoid at all cost.  In the past we somewhat decided to break it up
properly (-fassociative-math and friends have been added for example),
and we really
don't want to add more stuff under this umbrella (there's too much
under it already).
I'm not sure if C / x op 0.0 to x op 0.0 is association though (I'm
quite sure it isn't),
so citing association examples doesn't work for me here ;)

It might be we need some new flag like -fassume-infinite-precision-math to
cover the underflow issue?  We do have -ffinite-math-only but that
shouldn't allow
GCC to possibly introduce +-Inf for overflow - GCC just can assume there are
no overflows in the original user expression.

Richard.

> --
> Marc Glisse


Re: [PATCH] lto, testsuite: Fix ICE in -Wodr (PR lto/83121)

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 10:52 PM, David Malcolm  wrote:
> PR lto/83121 reports an ICE deep inside the linemap code when -Wodr
> reports on a type mismatch.
>
> The root cause is that the warning can access the DECL_SOURCE_LOCATION
> of a streamed-in decl before the lto_location_cache has been applied.
>
> lto_location_cache::input_location stores RESERVED_LOCATION_COUNT (==2)
> as a poison value until the cache is applied:
> 250   /* Keep value RESERVED_LOCATION_COUNT in *loc as linemap lookups 
> will
> 251  ICE on it.  */
>
> The fix is relatively simple: apply the cache before reading the
> DECL_SOURCE_LOCATION.
>
> (I wonder if we should instead have a INVALID_LOCATION value to handle
> this case more explicitly?  e.g. 0x?  or reserve 2 in libcpp for
> that purpose, and have the non-reserved locations start at 3?  Either
> would be more invasive, though)
>
> Triggering the ICE was fiddly: it seems to be affected by many things,
> including the order of files, and (I think) by filenames.  My theory is
> that it's affected by the ordering of the tree nodes in the LTO stream:
> for the ICE to occur, the types in question need to be compared before
> some other operation flushes the lto_location_cache.  This ordering
> is affected by the hash-based ordering in DFS in lto-streamer-out.c, which
> might explain why r255066 seemed to trigger the bug; the only relevant
> change to LTO there seemed to be:
>   * lto-streamer-out.c (hash_tree): Hash TYPE_EMPTY_P and DECL_PADDING_P.
> If so, then the bug was presumably already present, but hidden.
>
> The patch also adds regression test coverage for the ICE, which is more
> involved - as far as I can tell, we don't have an existing way to verify
> diagnostics emitted during link-time optimization.
>
> Hence the patch adds some machinery to lib/lto.exp to support two new
> directives: dg-lto-warning and dg-lto-message, corresponding to
> dg-warning and dg-message respectively, where the diagnostics are
> expected to be emitted at link-time.
>
> The test case includes examples of LTO warnings and notes in both the
> primary and secondary source files
>
> Doing so required reusing the logic from DejaGnu for handling diagnostics.
> Unfortunately the pertinent code is a 50 line loop within a ~200 line Tcl
> function in dg.exp (dg-test), so I had to copy it from DejaGnu, making
> various changes as necessary (see lto_handle_diagnostics_for_file in the
> patch; for example the LTO version supports multiple source files,
> identifying which source file emitted a diagnostic).
>
> For non-LTO diagnostics we currently ignore surplus "note" diagnostics.
> This patch updates lto_prune_warns to follow this behavior (since
> otherwise we'd need numerous dg-lto-message directives for the motivating
> test case).
>
> The patch adds these PASS results to g++.sum:
>
> PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o assemble, -O0 -flto
> PASS: g++.dg/lto/pr83121 cp_lto_pr83121_1.o assemble, -O0 -flto
> PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C line 6)
> PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C line 8)
> PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C line 2)
> PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C line 3)
> PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o-cp_lto_pr83121_1.o link, -O0 -flto
>
> The output for dg-lto-message above refers to "warnings", rather than
> "messages" but that's the same as for the non-LTO case, where dg-message
> also refers to "warnings".
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

Hmm, but we do this in warn_odr already?  How's that not enough?

At least it seems the place you add this isn't ideal (not at the "root cause").

Richard.


> gcc/ChangeLog:
> PR lto/83121
> * ipa-devirt.c (add_type_duplicate): When comparing memory layout,
> call the lto_location_cache before reading the
> DECL_SOURCE_LOCATION of the types.
>
> gcc/testsuite/ChangeLog:
> PR lto/83121
> * g++.dg/lto/pr83121_0.C: New test case.
> * g++.dg/lto/pr83121_1.C: New test case.
> * lib/lto.exp (lto_handle_diagnostics_for_file): New procedure,
> adapted from DejaGnu's dg-test.
> (lto_handle_diagnostics): New procedure.
> (lto_prune_warns): Ignore informational notes.
> (lto-link-and-maybe-run): Add "messages_by_file" param.
> Call lto_handle_diagnostics.  Avoid issuing "unresolved" for
> "execute" when "link" fails if "execute" was not specified.
> (lto-can-handle-directive): New procedure.
> (lto-get-options-main): Call lto-can-handle-directive.  Add a
> dg-messages local, using it to set the caller's
> dg-messages-by-file for the given source file.
> (lto-get-options): Likewise.
> (lto-execute): Add dg-messages-by-file local, and pass it to
> lto-link-and-maybe-run.
> ---
>  gcc/ipa-devirt.c 

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
 wrote:
>
> Recently, Google Project Zero disclosed several classes of attack
> against speculative execution. One of these, known as variant-1
> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
> speculation, providing an arbitrary read gadget. Further details can
> be found on the GPZ blog [1] and the documentation that is included
> with the first patch.
>
> This patch set adds a new builtin function for GCC to provide a
> mechanism for limiting speculation by a CPU after a bounds-checked
> memory access.  I've tried to design this in such a way that it can be
> used for any target where this might be necessary.  The patch set
> provides a generic implementation of the builtin and then
> target-specific support for Arm and AArch64.  Other architectures can
> utilize the internal infrastructure as needed.
>
> Most of the details of the builtin and the hooks that need to be
> implemented to support it are described in the updates to the manual,
> but a short summary is given below.
>
> TYP __builtin_load_no_speculate
> (const volatile TYP *ptr,
>  const volatile void *lower,
>  const volatile void *upper,
>  TYP failval,
>  const volatile void *cmpptr)
>
> Where TYP can be any integral type (signed or unsigned char, int,
> short, long, etc) or any pointer type.
>
> The builtin implements the following logical behaviour:
>
> inline TYP __builtin_load_no_speculate
>  (const volatile TYP *ptr,
>   const volatile void *lower,
>   const volatile void *upper,
>   TYP failval,
>   const volatile void *cmpptr)
> {
>   TYP result;
>
>   if (cmpptr >= lower && cmpptr < upper)
> result = *ptr;
>   else
> result = failval;
>   return result;
> }
>
> in addition the specification of the builtin ensures that future
> speculation using *ptr may only continue iff cmpptr lies within the
> bounds specified.

I fail to see how the vulnerability doesn't affect aggregate copies
or bitfield accesses.  The vulnerability doesn't cause the loaded
value to leak but (part of) the address by populating the CPU cache
side-channel.

So why isn't this just

 T __builtin_load_no_speculate (T *);

?  Why that "fallback" and why the lower/upper bounds?

Did you talk with other compiler vendors (intel, llvm, ...?)?

Richard.

> Some optimizations are permitted to make the builtin easier to use.
> The final two arguments can both be omitted (c++ style): failval will
> default to 0 in this case and if cmpptr is omitted ptr will be used
> for expansions of the range check.  In addition either lower or upper
> (but not both) may be a literal NULL and the expansion will then
> ignore that boundary condition when expanding.
>
> The patch set is constructed as follows:
> 1 - generic modifications to GCC providing the builtin function for all
> architectures and expanding to an implementation that gives the
> logical behaviour of the builtin only.  A warning is generated if
> this expansion path is used that code will execute correctly but
> without providing protection against speculative use.
> 2 - AArch64 support
> 3 - AArch32 support (arm) for A32 and thumb2 states.
>
> These patches can be used with the header file that Arm recently
> published here: https://github.com/ARM-software/speculation-barrier.
>
> Kernel patches are also being developed, eg:
> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
> code like this will be able to use support directly from the compiler
> in a portable manner.
>
> Similar patches are also being developed for LLVM and will be posted
> to their development lists shortly.
>
> [1] More information on the topic can be found here:
> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> Arm specific information can be found here: 
> https://www.arm.com/security-update
>
>
>
> Richard Earnshaw (3):
>   [builtins] Generic support for __builtin_load_no_speculate()
>   [aarch64] Implement support for __builtin_load_no_speculate.
>   [arm] Implement support for the de-speculation intrinsic
>
>  gcc/builtin-types.def |  16 +
>  gcc/builtins.c|  99 +
>  gcc/builtins.def  |  22 ++
>  gcc/c-family/c-common.c   | 164 
> ++
>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>  gcc/config/aarch64/aarch64.c  |  92 
>  gcc/config/aarch64/aarch64.md |  28 
>  gcc/config/arm/arm.c  | 107 +++
>  gcc/config/arm/arm.md |  40 ++-
>  gcc/config/arm/unspecs.md |   1 +
>  gcc/doc/cpp.texi  |   4 ++
>  gcc/doc/extend.texi   |  53 ++
>  gcc/doc/tm.texi   |   6 ++
>  gcc/doc/tm.texi.in|   2 +
>  gcc/target.def|  20 ++
>  gcc/targhooks.c   |  69 ++

Re: Revert DECL_USER_ALIGN part of r241959

2018-01-05 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Jan 5, 2018 at 9:47 AM, Richard Biener
>  wrote:
>> On Wed, Jan 3, 2018 at 4:06 PM, Richard Sandiford
>>  wrote:
>>> r241959 included code to stop us increasing the alignment of a
>>> "user-aligned" variable.  This wasn't the main purpose of the patch,
>>> and I think it was just there to make the testcase work.
>>>
>>> The documentation for the aligned attribute says:
>>>
>>>   This attribute specifies a minimum alignment for the variable or
>>>   structure field, measured in bytes.
>>>
>>> The DECL_USER_ALIGN code seemed to be treating as a sort of maximum
>>> instead, but there's not really such a thing as a maximum here: the
>>> variable might still end up at the start of a section that has a higher
>>> alignment, or might end up by chance on a "very aligned" boundary at
>>> link or load time.
>>>
>>> I think people who add alignment attributes want to ensure that
>>> accesses to that variable are fast, so it seems counter-intuitive
>>> for it to make the access slower.  The vect-align-4.c test is an
>>> example of this: for targets with 128-bit vectors, we get better
>>> code without the aligned attribute than we do with it.
>>>
>>> Tested on aarch64-linux-gnu so far, will test more widely if OK.
>>
>> Works for me - I think I did this to copy behavior of code elsewhere
>> (pass_increase_alignment::increase_alignment).

OK.  I think the difference is that there we're increasing the alignment
of variables speculatively, without knowing whether it's going to help
vectorisation or not, whereas here we're more sure.

That's actually the reason I care (forgot to say).  increase_alignment
shouldn't start giving things 256-byte alignment just because we're
running the vectoriser on a target with 256-byte vectors.  We need
some kind of proof that it's going to help vectorise a particular access.

Another reason is to support vectorising search loops, where the
bound isn't known in advance.  (That's queued for GCC 9.)  If we have a
loop that operates on both a wide and a narrow type, the wide type needs
N vectors for each vector of the narrow type.  We then need to align to
N times the vector size to ensure that the accesses don't cross a page
boundary.

Part of the problem is that we don't distinguish true user alignments
from ones that have been set by GCC itself.  E.g. if increase_alignment
does set an alignment, it records that as a "user" alignment, which stops
the vectoriser from increasing it further.  But since the docs say that
the value's only a minimum, the current behaviour still feels wrong for
user attributes.

> Note I had the impression that using the aligned attribute with a "low"
> alignment is a hint to the compiler to not increase .data by overaligning
> variables.  I think there were PRs asking for that.

That's overloading the attribute so that sometimes it's a hint
to optimise accesses and sometimes it's a hint not to.  Maybe we should
support aligned+packed for variables, if people really want a fixed
alignment?  (Although that's only going to work reliably for variables
that go in their own section.)

E.g. the current behaviour triggers even for __attribute__((aligned)),
which doesn't specify a particular value.

Is the patch OK as a compromise for GCC 8?  We don't speculatively
increase the user alignment in increase_alignment, but do still increase
it if it helps to vectorise a particular loop access?

Thanks,
Richard


Re: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.

2018-01-05 Thread Richard Biener
On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
 wrote:
>
> This patch implements support for __builtin_load_no_speculate on
> AArch64.  On this architecture we inhibit speclation by emitting a
> combination of CSEL and a hint instruction that ensures the CSEL is
> full resolved when the operands to the CSEL may involve a speculative
> load.

Missing a high-level exlanation here but it looks like you do sth like

  if (access is not in bounds)
no-speculate-access;
  else
 regular access;

with the upper/lower bounds.  Is this because 'no-speculate-access' is
prohibitely
expensive even when factoring in the condition computation and the condjump?
(trying to reverse engineer how the actual assembly will look like
from the patch,
there isn't even a testcase :/)

> * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed
> to 'H' operand qualifier.
> (aarch64_inhibit_load_speculation): New function.
> (TARGET_INHIBIT_LOAD_SPECULATION): Redefine.
> * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile
> code.
> (nospeculate, nospeculateti): New patterns.
> ---
>  gcc/config/aarch64/aarch64.c  | 92 
> +++
>  gcc/config/aarch64/aarch64.md | 28 +
>  2 files changed, 120 insertions(+)
>


Re: Revert DECL_USER_ALIGN part of r241959

2018-01-05 Thread Jakub Jelinek
On Fri, Jan 05, 2018 at 09:49:56AM +, Richard Sandiford wrote:
> Is the patch OK as a compromise for GCC 8?  We don't speculatively
> increase the user alignment in increase_alignment, but do still increase
> it if it helps to vectorise a particular loop access?

I'd be a little bit worried about code that puts some variables into user
sections with specific alignment, i.e.
__attribute__((section ("whatever"), aligned(N)))
where data is collected from different TUs into the user section and
any padding added there breaks this.  E.g. Linux kernel and other programs
use this technique heavily.

Jakub


Re: Revert DECL_USER_ALIGN part of r241959

2018-01-05 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Fri, Jan 05, 2018 at 09:49:56AM +, Richard Sandiford wrote:
>> Is the patch OK as a compromise for GCC 8?  We don't speculatively
>> increase the user alignment in increase_alignment, but do still increase
>> it if it helps to vectorise a particular loop access?
>
> I'd be a little bit worried about code that puts some variables into user
> sections with specific alignment, i.e.
> __attribute__((section ("whatever"), aligned(N)))
> where data is collected from different TUs into the user section and
> any padding added there breaks this.  E.g. Linux kernel and other programs
> use this technique heavily.

Looking again, it seems we already prevent increasing alignment for
the "used" attribute (with or without "aligned").  Is that good enough?

That kind of construct is used without "aligned" too, and I think it
should have "used" to stop it being removed as dead.

Thanks,
Richard


Re: Revert DECL_USER_ALIGN part of r241959

2018-01-05 Thread Jakub Jelinek
On Fri, Jan 05, 2018 at 10:25:35AM +, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Fri, Jan 05, 2018 at 09:49:56AM +, Richard Sandiford wrote:
> >> Is the patch OK as a compromise for GCC 8?  We don't speculatively
> >> increase the user alignment in increase_alignment, but do still increase
> >> it if it helps to vectorise a particular loop access?
> >
> > I'd be a little bit worried about code that puts some variables into user
> > sections with specific alignment, i.e.
> > __attribute__((section ("whatever"), aligned(N)))
> > where data is collected from different TUs into the user section and
> > any padding added there breaks this.  E.g. Linux kernel and other programs
> > use this technique heavily.
> 
> Looking again, it seems we already prevent increasing alignment for
> the "used" attribute (with or without "aligned").  Is that good enough?
> 
> That kind of construct is used without "aligned" too, and I think it
> should have "used" to stop it being removed as dead.

Yeah, that should be likely good enough.

Jakub


Re: [PATCH] PR libstdc++/83626 Don't throw for remove("") and remove_all("")

2018-01-05 Thread Jonathan Wakely

On 04/01/18 21:02 -0500, Tim Song wrote:

What if the file to be removed is externally removed between the
symlink_status and the ::remove call? This is probably QoI because
filesystem race, but it seems reasonable to double check errno if
::remove fails and not fail if the failure is due to the file not
existing.


Yes, the race makes it undefined, but checking for ENOENT seems
reasonable. Thanks for the suggestion.



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 09:44, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>  wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
>>
>> This patch set adds a new builtin function for GCC to provide a
>> mechanism for limiting speculation by a CPU after a bounds-checked
>> memory access.  I've tried to design this in such a way that it can be
>> used for any target where this might be necessary.  The patch set
>> provides a generic implementation of the builtin and then
>> target-specific support for Arm and AArch64.  Other architectures can
>> utilize the internal infrastructure as needed.
>>
>> Most of the details of the builtin and the hooks that need to be
>> implemented to support it are described in the updates to the manual,
>> but a short summary is given below.
>>
>> TYP __builtin_load_no_speculate
>> (const volatile TYP *ptr,
>>  const volatile void *lower,
>>  const volatile void *upper,
>>  TYP failval,
>>  const volatile void *cmpptr)
>>
>> Where TYP can be any integral type (signed or unsigned char, int,
>> short, long, etc) or any pointer type.
>>
>> The builtin implements the following logical behaviour:
>>
>> inline TYP __builtin_load_no_speculate
>>  (const volatile TYP *ptr,
>>   const volatile void *lower,
>>   const volatile void *upper,
>>   TYP failval,
>>   const volatile void *cmpptr)
>> {
>>   TYP result;
>>
>>   if (cmpptr >= lower && cmpptr < upper)
>> result = *ptr;
>>   else
>> result = failval;
>>   return result;
>> }
>>
>> in addition the specification of the builtin ensures that future
>> speculation using *ptr may only continue iff cmpptr lies within the
>> bounds specified.
> 
> I fail to see how the vulnerability doesn't affect aggregate copies
> or bitfield accesses.  The vulnerability doesn't cause the loaded
> value to leak but (part of) the address by populating the CPU cache
> side-channel.
> 

It's not quite as simple as that.  You'll need to read the white paper
on Arm's web site to get a full grasp of this (linked from
https://www.arm.com/security-update).

> So why isn't this just
> 
>  T __builtin_load_no_speculate (T *);
> 
> ?  Why that "fallback" and why the lower/upper bounds?

Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
to restrict subsequent speculation, the CSEL needs a condition and the
compiler must not be able to optimize it away based on logical
reachability.  The fallback is used for the other operand of the CSEL
instruction.

> 
> Did you talk with other compiler vendors (intel, llvm, ...?)?

As stated we'll shortly be submitting similar patches for LLVM.

R.

> 
> Richard.
> 
>> Some optimizations are permitted to make the builtin easier to use.
>> The final two arguments can both be omitted (c++ style): failval will
>> default to 0 in this case and if cmpptr is omitted ptr will be used
>> for expansions of the range check.  In addition either lower or upper
>> (but not both) may be a literal NULL and the expansion will then
>> ignore that boundary condition when expanding.
>>
>> The patch set is constructed as follows:
>> 1 - generic modifications to GCC providing the builtin function for all
>> architectures and expanding to an implementation that gives the
>> logical behaviour of the builtin only.  A warning is generated if
>> this expansion path is used that code will execute correctly but
>> without providing protection against speculative use.
>> 2 - AArch64 support
>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>
>> These patches can be used with the header file that Arm recently
>> published here: https://github.com/ARM-software/speculation-barrier.
>>
>> Kernel patches are also being developed, eg:
>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
>> code like this will be able to use support directly from the compiler
>> in a portable manner.
>>
>> Similar patches are also being developed for LLVM and will be posted
>> to their development lists shortly.
>>
>> [1] More information on the topic can be found here:
>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
>> Arm specific information can be found here: 
>> https://www.arm.com/security-update
>>
>>
>>
>> Richard Earnshaw (3):
>>   [builtins] Generic support for __builtin_load_no_speculate()
>>   [aarch64] Implement support for __builtin_load_no_speculate.
>>   [arm] Implement support for the de-speculation intrinsic
>>
>>  gcc/builtin-types.def |  16 +
>>  gcc/builtins.c|  99 +
>>  gcc/bu

Re: [PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0

2018-01-05 Thread Sudakshina Das

Hi Jeff

On 04/01/18 18:30, Jeff Law wrote:

On 01/03/2018 06:57 AM, Sudakshina Das wrote:

Hi

This patch add support for the missing transformation of (x | y) == x ->
(y & ~x) == 0.
The transformation for (x & y) == x case already exists in
simplify-rtx.c since 2014 as of r218503 and this patch only adds a
couple of extra patterns for the IOR case.

Citing the example given in PR82439 :

Simple testcase (f1 should generate the same as f2):

int f1(int x, int y) { return (x | y) == x; }
int f2(int x, int y) { return (y & ~x) == 0; }

f1:
 orr    w1, w0, w1
 cmp    w1, w0
 cset    w0, eq
 ret
f2:
 bics    wzr, w1, w0
 cset    w0, eq
 ret

This benefits targets that have the BICS instruction to generate better
code. Wilco helped me in showing that even in targets that do not have
the BICS instruction, this is no worse and gives out 2 instructions.
For example in x86:

 :
    0:    09 fe    or %edi,%esi
    2:    31 c0    xor    %eax,%eax
    4:    39 fe    cmp    %edi,%esi
    6:    0f 94 c0 sete   %al
    9:    c3   retq

0010 :
   10:    f7 d7    not    %edi
   12:    31 c0    xor    %eax,%eax
   14:    85 f7    test   %esi,%edi
   16:    0f 94 c0 sete   %al
   19:    c3   retq

Testing done: Checked for regressions on bootstrapped
aarch64-none-linux-gnu and arm-none-linux-gnueabihf and added new test
cases.
Is this ok for trunk?

Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-01-03  Sudakshina Das  

 PR target/82439
 * simplify-rtx.c (simplify_relational_operation_1): Add
 simplifications of (x|y) == x for BICS pattern.

*** gcc/testsuite/ChangeLog ***

2017-01-03  Sudakshina Das  

 PR target/82439
 * gcc.target/aarch64/bics_5.c: New test.
 * gcc.target/arm/bics_5.c: Likewise.

OK.


Thanks! Committed as r256275.

Sudi


jeff





Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Biener
On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
 wrote:
> On 05/01/18 09:44, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>  wrote:
>>>
>>> Recently, Google Project Zero disclosed several classes of attack
>>> against speculative execution. One of these, known as variant-1
>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>> speculation, providing an arbitrary read gadget. Further details can
>>> be found on the GPZ blog [1] and the documentation that is included
>>> with the first patch.
>>>
>>> This patch set adds a new builtin function for GCC to provide a
>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>> memory access.  I've tried to design this in such a way that it can be
>>> used for any target where this might be necessary.  The patch set
>>> provides a generic implementation of the builtin and then
>>> target-specific support for Arm and AArch64.  Other architectures can
>>> utilize the internal infrastructure as needed.
>>>
>>> Most of the details of the builtin and the hooks that need to be
>>> implemented to support it are described in the updates to the manual,
>>> but a short summary is given below.
>>>
>>> TYP __builtin_load_no_speculate
>>> (const volatile TYP *ptr,
>>>  const volatile void *lower,
>>>  const volatile void *upper,
>>>  TYP failval,
>>>  const volatile void *cmpptr)
>>>
>>> Where TYP can be any integral type (signed or unsigned char, int,
>>> short, long, etc) or any pointer type.
>>>
>>> The builtin implements the following logical behaviour:
>>>
>>> inline TYP __builtin_load_no_speculate
>>>  (const volatile TYP *ptr,
>>>   const volatile void *lower,
>>>   const volatile void *upper,
>>>   TYP failval,
>>>   const volatile void *cmpptr)
>>> {
>>>   TYP result;
>>>
>>>   if (cmpptr >= lower && cmpptr < upper)
>>> result = *ptr;
>>>   else
>>> result = failval;
>>>   return result;
>>> }
>>>
>>> in addition the specification of the builtin ensures that future
>>> speculation using *ptr may only continue iff cmpptr lies within the
>>> bounds specified.
>>
>> I fail to see how the vulnerability doesn't affect aggregate copies
>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>> value to leak but (part of) the address by populating the CPU cache
>> side-channel.
>>
>
> It's not quite as simple as that.  You'll need to read the white paper
> on Arm's web site to get a full grasp of this (linked from
> https://www.arm.com/security-update).
>
>> So why isn't this just
>>
>>  T __builtin_load_no_speculate (T *);
>>
>> ?  Why that "fallback" and why the lower/upper bounds?
>
> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
> to restrict subsequent speculation, the CSEL needs a condition and the
> compiler must not be able to optimize it away based on logical
> reachability.  The fallback is used for the other operand of the CSEL
> instruction.

But the condition could be just 'true' in the instruction encoding?  That is,
why do you do both the jump-around and the csel?  Maybe I misunderstood
the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
the compiler messing up things.

>>
>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>
> As stated we'll shortly be submitting similar patches for LLVM.

How do you solve the aggregate load issue?  I would imagine
that speculating stores (by pre-fetching the affected cache line!)
could be another attack vector?  Not sure if any CPU speculatively
does that (but I see no good reason why a CPU shouldn't do that).

Does ARM have a barrier like instruction suitable for use in the
kernel patches that are floating around?

Richard.

> R.
>
>>
>> Richard.
>>
>>> Some optimizations are permitted to make the builtin easier to use.
>>> The final two arguments can both be omitted (c++ style): failval will
>>> default to 0 in this case and if cmpptr is omitted ptr will be used
>>> for expansions of the range check.  In addition either lower or upper
>>> (but not both) may be a literal NULL and the expansion will then
>>> ignore that boundary condition when expanding.
>>>
>>> The patch set is constructed as follows:
>>> 1 - generic modifications to GCC providing the builtin function for all
>>> architectures and expanding to an implementation that gives the
>>> logical behaviour of the builtin only.  A warning is generated if
>>> this expansion path is used that code will execute correctly but
>>> without providing protection against speculative use.
>>> 2 - AArch64 support
>>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>>
>>> These patches can be used with the header file that Arm recently
>>> published here: https://github.com/ARM-software/speculation-barrier.
>>>
>>> Kernel patches are also being developed, eg:
>>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventual

Re: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 09:51, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>  wrote:
>>
>> This patch implements support for __builtin_load_no_speculate on
>> AArch64.  On this architecture we inhibit speclation by emitting a
>> combination of CSEL and a hint instruction that ensures the CSEL is
>> full resolved when the operands to the CSEL may involve a speculative
>> load.
> 
> Missing a high-level exlanation here but it looks like you do sth like
> 
>   if (access is not in bounds)
> no-speculate-access;
>   else
>  regular access;

For aarch64 we end up with a sequence like

CMP 'bounds'
B out-of-range
LDR x1, [ptr]
out-of-range:
CSELx1, FAILVAL, x1, 
CSDB

The CSEL+CSDB combination ensures that even if we do fetch x1 from an
out-of-range location the value won't persist beyond the end of the code
sequence and thus can't be used for further speculation.

> 
> with the upper/lower bounds.  Is this because 'no-speculate-access' is
> prohibitely
> expensive even when factoring in the condition computation and the condjump?
> (trying to reverse engineer how the actual assembly will look like
> from the patch,

It's all in the white paper.

> there isn't even a testcase :/)
> 
>> * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero 
>> passed
>> to 'H' operand qualifier.
>> (aarch64_inhibit_load_speculation): New function.
>> (TARGET_INHIBIT_LOAD_SPECULATION): Redefine.
>> * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New 
>> unspec_volatile
>> code.
>> (nospeculate, nospeculateti): New patterns.
>> ---
>>  gcc/config/aarch64/aarch64.c  | 92 
>> +++
>>  gcc/config/aarch64/aarch64.md | 28 +
>>  2 files changed, 120 insertions(+)
>>



[PATCH] RX pragma address

2018-01-05 Thread Sebastian Perta
Hello,

The following patch adds a new pragma, "pragma address" for RX.
The patch updates extend.texi and add a test case to the regression as well.
For the test case I checked than test is getting picked up in gcc.log
unfortunately for the .texi part I don't know where to look/what to do to get 
the
documentation generated.

This is similar to the pragma address implemented for M32C.

Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim

Please let me know if this is OK, Thank you!
Sebastian

Index: ChangeLog
===
--- ChangeLog(revision 256076)
+++ ChangeLog(working copy)
@@ -1,3 +1,21 @@
+2018-01-03  Sebastian Perta  
+
+* config/rx/rx.c (rx_get_pragma_address): New function
+* config/rx/rx.c (rx_note_pragma_address): New function
+* config/rx/rx.c (rx_output_aligned_common): New function use .set instead
+of .comm for pragma address variables
+* config/rx/rx.c (rx_insert_attributes): New function which makes pragma 
address
+variables volatile
+* config/rx/rx-pragma.c: New file with 2 functions rx_pragma_address and
+rx_register_pragmas to implement and register the new pragma
+* config/rx/rx-protos.h: New declarations rl78_note_pragma_address,
+rx_register_pragmas, rx_output_aligned_common
+* config/rx/t-rx: added rx-pragma.o
+* config/rx/rx.h: defined ASM_OUTPUT_ALIGNED_DECL_COMMON and 
REGISTER_TARGET_PRAGMAS
+* doc/entend.texi: Added documenation for RX pragmas
+* testsuite/gcc.target/rx/test_pragma_address.c: New file
+* config.gcc: added "rx-pragma.o" to c_target_objs and cxx_target_objs
+
 2018-01-02  Richard Biener  

 * ipa-inline.c (big_speedup_p): Fix expression.
Index: config.gcc
===
--- config.gcc(revision 256076)
+++ config.gcc(working copy)
@@ -2661,6 +2661,8 @@
 rx-*-elf*)
 tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
 tmake_file="${tmake_file} rx/t-rx"
+c_target_objs="rx-pragma.o"
+cxx_target_objs="rx-pragma.o"
 ;;
 s390-*-linux*)
 tm_file="s390/s390.h dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
s390/linux.h"
Index: config/rx/rx-pragma.c
===
--- config/rx/rx-pragma.c(nonexistent)
+++ config/rx/rx-pragma.c(working copy)
@@ -0,0 +1,67 @@
+/* Subroutines used for code generation on Renesas RX processors.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Sebastian Perta.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+#include "c-family/c-pragma.h"
+#include "rx-protos.h"
+
+/* Implements the "pragma ADDRESS" pragma.  This pragma takes a
+ *variable name and an address, and arranges for that variable to be
+ *   "at" that address.  The variable is also made volatile.  */
+static void
+rx_pragma_address (cpp_reader * reader ATTRIBUTE_UNUSED)
+{
+  tree var, addr;
+  enum cpp_ttype type;
+  type = pragma_lex (&var);
+  if (type == CPP_NAME)
+{
+  type = pragma_lex (&addr);
+  if (type == CPP_NUMBER)
+{
+  if (var != error_mark_node)
+{
+   unsigned uaddr = tree_to_uhwi (addr);
+   rx_note_pragma_address (IDENTIFIER_POINTER (var), uaddr);
+}
+
+  type = pragma_lex (&var);
+  if (type != CPP_EOF)
+{
+   error ("junk at end of #pragma ADDRESS");
+}
+  return;
+}
+}
+  error ("malformed #pragma ADDRESS variable address");
+}
+
+void
+rx_register_pragmas (void)
+{
+  c_register_pragma (NULL, "ADDRESS", rx_pragma_address);
+  c_register_pragma (NULL, "address", rx_pragma_address);
+}
+
Index: config/rx/rx-protos.h
===
--- config/rx/rx-protos.h(revision 256076)
+++ config/rx/rx-protos.h(working copy)
@@ -25,7 +25,11 @@
 extern voidrx_expand_epilogue (bool);
 extern voidrx_expand_prologue (void);
 extern intrx_initial_elimination_offset (int, int);
+extern void rx_register_pragmas (void);

+extern void rx_note_pragma_address (const char *varname, unsigned address);
+extern void rx_output_aligned_common (FILE *stream, tree decl 
ATTRIBUTE_UNUSED, const char *name,int size, int align);
+
 bool is_interrupt_func (const_tree decl);
 bool is_fast_interrupt_func (const_tree decl);

Inde

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 10:47, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
>  wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>>  wrote:

 Recently, Google Project Zero disclosed several classes of attack
 against speculative execution. One of these, known as variant-1
 (CVE-2017-5753), allows explicit bounds checks to be bypassed under
 speculation, providing an arbitrary read gadget. Further details can
 be found on the GPZ blog [1] and the documentation that is included
 with the first patch.

 This patch set adds a new builtin function for GCC to provide a
 mechanism for limiting speculation by a CPU after a bounds-checked
 memory access.  I've tried to design this in such a way that it can be
 used for any target where this might be necessary.  The patch set
 provides a generic implementation of the builtin and then
 target-specific support for Arm and AArch64.  Other architectures can
 utilize the internal infrastructure as needed.

 Most of the details of the builtin and the hooks that need to be
 implemented to support it are described in the updates to the manual,
 but a short summary is given below.

 TYP __builtin_load_no_speculate
 (const volatile TYP *ptr,
  const volatile void *lower,
  const volatile void *upper,
  TYP failval,
  const volatile void *cmpptr)

 Where TYP can be any integral type (signed or unsigned char, int,
 short, long, etc) or any pointer type.

 The builtin implements the following logical behaviour:

 inline TYP __builtin_load_no_speculate
  (const volatile TYP *ptr,
   const volatile void *lower,
   const volatile void *upper,
   TYP failval,
   const volatile void *cmpptr)
 {
   TYP result;

   if (cmpptr >= lower && cmpptr < upper)
 result = *ptr;
   else
 result = failval;
   return result;
 }

 in addition the specification of the builtin ensures that future
 speculation using *ptr may only continue iff cmpptr lies within the
 bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 

That's why the intrinsic takes explicit bounds not a simple bool
expressing the conditions.  It prevents the optimizers from replacing
the condition by value range propagation.  That does restrict the
flexibility of the builtin somewhat but it does allow it to work
correctly at all optimization levels.

>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
> 
> Does ARM have a barrier like instruction suitable for use in the
> kernel patches that are floating around?
> 
> Richard.
> 
>> R.
>>
>>>
>>> Richard.
>>>
 Some optimizations are permitted to make the builtin easier to use.
 The final two arguments can both be omitted (c++ style): failval will
 default to 0 in this case and if cmpptr is omitted ptr will be used
 for expansions of the range check.  In addition either lower or upper
 (but not both) may be a literal NULL and the expansion will then
 ignore that boundary condition when expanding.

 The patch set is constructed as follows:
 1 - generic modifications to GCC providing the builtin function for all
 architectures and expanding to an implementation that gives the
 logical behaviour of the builtin only.  A warning is generated if
>>

Re: Non-INTEGER_CST CHREC_RIGHTs in analyze_*_subscript

2018-01-05 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Nov 6, 2017 at 9:12 PM, Richard Sandiford
>  wrote:
>> initialize_matrix_A requires the CHREC_RIGHT to be an INTEGER_CST:
>>
>>   switch (TREE_CODE (chrec))
>> {
>> case POLYNOMIAL_CHREC:
>>   A[index][0] = mult * int_cst_value (CHREC_RIGHT (chrec));
>>   return initialize_matrix_A (A, CHREC_LEFT (chrec), index + 1, mult);
>>
>> and isn't able to back out if it isn't.  This patch instead
>> checks for an appropriate CHREC_RIGHT before calling the parent
>> function analyze_subscript_affine_affine.
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>> OK to install?
>
> Hmm.  I think this papers over some underlying issues.  The calls are
> guarded already with predicates that are supposed to veryify what you do.
> Namely chrec_contains_symbols (a poly_int _does_ contain a "symbol").

Yeah, I guess.  "Symbol" seems to mean different things in different
places: elsewhere it's !is_gimple_min_invariant, while in rtl it's
specifically things that would have an assembler label.  And "if it's
not a symbol then it must be an INTEGER_CST" seems a bit dangerous in
general, which I thought was why we also had
evolution_function_right_is_integer_cst.

E.g. tree-ssa-loop-ivcanon.c uses chrec_contains_symbols to test whether
something is likely to be folded to a constant, and N * POLY_INT_CST
would be folded to a POLY_INT_CST.  So in that kind of usage we'd
want chrec_contains_symbols to be false.

But...

> So can you please fix that by adding TREE_CODE (chrec) == POLY_INT
> to the return true case?

...I don't have any proof that that causes problems in practice,
so is the following OK?

Thanks,
Richard


2018-01-05  Richard Sandiford  

gcc/
* tree-chrec.c (chrec_contains_symbols): Return true for
POLY_INT_CST.

Index: gcc/tree-chrec.c
===
--- gcc/tree-chrec.c2018-01-03 11:12:56.820720031 +
+++ gcc/tree-chrec.c2018-01-05 11:01:00.520570935 +
@@ -961,6 +961,7 @@ chrec_contains_symbols (const_tree chrec
 
   if (TREE_CODE (chrec) == SSA_NAME
   || VAR_P (chrec)
+  || TREE_CODE (chrec) == POLY_INT_CST
   || TREE_CODE (chrec) == PARM_DECL
   || TREE_CODE (chrec) == FUNCTION_DECL
   || TREE_CODE (chrec) == LABEL_DECL


Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-05 Thread Sudakshina Das

Hi Kyrill

On 04/01/18 16:36, Kyrill Tkachov wrote:

Hi Sudi,

On 04/01/18 15:35, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when
run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced into a
VOID type register before the comparison (in prepare_cmp_insn()) is done.

As pointed out by Kyrill, there is a comment on emit_store_flag() which
says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.
  If it is VOIDmode, they cannot both be CONST_INT". This condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

    PR target/82096
    * optabs.c (expand_atomic_compare_and_swap): Change argument
    to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

    PR target/82096
    * gcc.c-torture/compile/pr82096-1.c: New test.
    * gcc.c-torture/compile/pr82096-2.c: Likwise.


diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c

new file mode 100644
index 000..07eb5f6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { arm*-*-* } } } */
+/* { dg-options "-march=armv5t -mthumb -mfloat-abi=soft" } */

The tests in gcc.c-torture/compile/ are supposed to be 
target-independent, so

it's best to not gate it on target arm*-*-*. Best to add the arm-specific
options that you want using a dg-additional-options directive gated on 
target arm*-*-*.


+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 
0x1234567890ll));

+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c

new file mode 100644
index 000..2570b16
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { arm*-*-* } } } */
+/* { dg-options "-march=armv6 -mthumb -mfloat-abi=soft" } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 
0x1234567890ll));

+}

I don't think we need an armv6 test here as the root cause is the same 
as on armv5t AFAICT

so this won't give us any extra code coverage.



I have made the changes in the test files as requested.

Thanks
Sudi


Thanks,
Kyrill



diff --git a/gcc/optabs.c b/gcc/optabs.c
index 225e955..45b018e 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6295,7 +6295,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
   if (cc_reg)
 	{
 	  target_bool = emit_store_flag_force (target_bool, EQ, cc_reg,
-	   const0_rtx, VOIDmode, 0, 1);
+	   const0_rtx, mode, 0, 1);
 	  goto success;
 	}
   goto success_bool_from_val;
@@ -6323,7 +6323,7 @@ expand_atomic_compare_and_swap (rtx *ptarget_bool, rtx *ptarget_oval,
 
  success_bool_from_val:
target_bool = emit_store_flag_force (target_bool, EQ, target_oval,
-	expected, VOIDmode, 1, 1);
+	expected, mode, 1, 1);
  success:
   /* Make sure that the oval output winds up where the caller asked.  */
   if (ptarget_oval)
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr82096.c b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
new file mode 100644
index 000..9fed28c
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr82096.c
@@ -0,0 +1,9 @@
+/* { dg-additional-options "-march=armv5t -mthumb -mfloat-abi=soft" { target arm*-*-* } } */
+
+static long long AL[24];
+
+int
+check_ok (void)
+{
+  return (__sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll));
+}


RE: [PATCH] RX pragma address

2018-01-05 Thread Sebastian Perta
Sorry the spaces got removed from previous email.

-Original Message-
From: Sebastian Perta 
Sent: 05 January 2018 10:59
To: 'gcc-patches@gcc.gnu.org' 
Subject: [PATCH] RX pragma address

Hello, 

The following patch adds a new pragma, "pragma address" for RX.
The patch updates extend.texi and add a test case to the regression as well.
For the test case I checked than test is getting picked up in gcc.log
unfortunately for the .texi part I don't know where to look/what to do to
get the
documentation generated.

This is similar to the pragma address implemented for M32C.

Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim

Please let me know if this is OK, Thank you!
Sebastian

Index: ChangeLog
===
--- ChangeLog   (revision 256076)
+++ ChangeLog   (working copy)
@@ -1,3 +1,21 @@
+2018-01-03  Sebastian Perta  
+
+   * config/rx/rx.c (rx_get_pragma_address): New function
+   * config/rx/rx.c (rx_note_pragma_address): New function 
+   * config/rx/rx.c (rx_output_aligned_common): New function use .set
instead 
+   of .comm for pragma address variables
+   * config/rx/rx.c (rx_insert_attributes): New function which makes
pragma address 
+   variables volatile
+   * config/rx/rx-pragma.c: New file with 2 functions rx_pragma_address
and 
+   rx_register_pragmas to implement and register the new pragma
+   * config/rx/rx-protos.h: New declarations rl78_note_pragma_address, 
+   rx_register_pragmas, rx_output_aligned_common
+   * config/rx/t-rx: added rx-pragma.o
+   * config/rx/rx.h: defined ASM_OUTPUT_ALIGNED_DECL_COMMON and
REGISTER_TARGET_PRAGMAS
+   * doc/entend.texi: Added documenation for RX pragmas
+   * testsuite/gcc.target/rx/test_pragma_address.c: New file
+   * config.gcc: added "rx-pragma.o" to c_target_objs and
cxx_target_objs
+   
 2018-01-02  Richard Biener  
 
* ipa-inline.c (big_speedup_p): Fix expression.
Index: config.gcc
===
--- config.gcc  (revision 256076)
+++ config.gcc  (working copy)
@@ -2661,6 +2661,8 @@
 rx-*-elf*)
tm_file="dbxelf.h elfos.h newlib-stdint.h ${tm_file}"
tmake_file="${tmake_file} rx/t-rx"
+   c_target_objs="rx-pragma.o"
+   cxx_target_objs="rx-pragma.o"
;;
 s390-*-linux*)
tm_file="s390/s390.h dbxelf.h elfos.h gnu-user.h linux.h
glibc-stdint.h s390/linux.h"
Index: config/rx/rx-pragma.c
===
--- config/rx/rx-pragma.c   (nonexistent)
+++ config/rx/rx-pragma.c   (working copy)
@@ -0,0 +1,67 @@
+/* Subroutines used for code generation on Renesas RX processors.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Sebastian Perta.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+   
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "c-family/c-common.h"
+#include "c-family/c-pragma.h"
+#include "rx-protos.h"
+
+/* Implements the "pragma ADDRESS" pragma.  This pragma takes a
+ *variable name and an address, and arranges for that variable to be
+ *   "at" that address.  The variable is also made volatile.  */
+static void
+rx_pragma_address (cpp_reader * reader ATTRIBUTE_UNUSED)
+{
+  tree var, addr;
+  enum cpp_ttype type;
+  type = pragma_lex (&var);
+  if (type == CPP_NAME)
+{
+  type = pragma_lex (&addr);
+  if (type == CPP_NUMBER)
+   {
+ if (var != error_mark_node)
+   {
+  unsigned uaddr = tree_to_uhwi (addr);
+  rx_note_pragma_address (IDENTIFIER_POINTER (var),
uaddr);
+   }
+
+ type = pragma_lex (&var);
+ if (type != CPP_EOF)
+   {
+  error ("junk at end of #pragma ADDRESS");
+   }
+ return;
+   }
+}
+  error ("malformed #pragma ADDRESS variable address");
+}
+
+void
+rx_register_pragmas (void)
+{
+  c_register_pragma (NULL, "ADDRESS", rx_pragma_address);
+  c_register_pragma (NULL, "address", rx_pragma_address);
+}
+
Index: config/rx/rx-protos.h
=

PING: [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2

2018-01-05 Thread Richard Sandiford
Ping.  Here's the patch updated to apply on top of the v8.4 and
__builtin_load_no_speculate support.

Richard Sandiford  writes:
> This patch switches the AArch64 port to use 2 poly_int coefficients
> and updates code as necessary to keep it compiling.
>
> One potentially-significant change is to
> aarch64_hard_regno_caller_save_mode.  The old implementation
> was written in a pretty conservative way: it changed the default
> behaviour for single-register values, but used the default handling
> for multi-register values.
>
> I don't think that's necessary, since the interesting cases for this
> macro are usually the single-register ones.  Multi-register modes take
> up the whole of the constituent registers and the move patterns for all
> multi-register modes should be equally good.
>
> Using the original mode for multi-register cases stops us from using
> SVE modes to spill multi-register NEON values.  This was caught by
> gcc.c-torture/execute/pr47538.c.
>
> Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
> GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
> (which are all scalars), and I think it's more obvious, since if we ever
> do use this for elementwise shifts of vector modes, the mask will depend
> on the number of bits in each element rather than the number of bits in
> the whole vector.


2018-01-05  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
Return a poly_int64 rather than a HOST_WIDE_INT.
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* config/aarch64/aarch64.h (aarch64_frame): Protect with
HAVE_POLY_INT_H rather than HOST_WIDE_INT.  Change locals_offset,
hard_fp_offset, frame_size, initial_adjust, callee_offset and
final_offset from HOST_WIDE_INT to poly_int64.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
to_constant when getting the number of units in an Advanced SIMD
mode.
(aarch64_builtin_vectorized_function): Check for a constant number
of units.
* config/aarch64/aarch64-simd.md (mov): Handle polynomial
GET_MODE_SIZE.
(aarch64_ld_lane): Use the nunits
attribute instead of GET_MODE_NUNITS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_class_max_nregs): Use the constant_lowest_bound of the
GET_MODE_SIZE for fixed-size registers.
(aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p.
(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
(aarch64_print_operand, aarch64_print_address_internal)
(aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost)
(aarch64_short_vector_p, aapcs_vfp_sub_candidate)
(aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp):
Handle polynomial GET_MODE_SIZE.
(aarch64_hard_regno_caller_save_mode): Likewise.  Return modes
wider than SImode without modification.
(tls_symbolic_operand_type): Use strip_offset instead of split_const.
(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
passing and returning SVE modes.
(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
rather than GEN_INT.
(aarch64_emit_probe_stack_range): Take the size as a poly_int64
rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_layout_frame): Cope with polynomial offsets.
(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
start_offset as a poly_int64 rather than a HOST_WIDE_INT.  Track
polynomial offsets.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(aarch64_get_separate_components, aarch64_process_components)
(aarch64_expand_prologue, aarch64_expand_epilogue)
(aarch64_use_return_insn_p): Handle polynomial frame offsets.
(aarch64_anchor_offset): New function, split out from...
(aarch64_legitimize_address): ...here.
(aarch64_builtin_vectorization_cost): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
GET_MODE_NUNITS.
(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
number of elements from 

Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jakub Jelinek
On Fri, Jan 05, 2018 at 10:59:21AM +, Richard Earnshaw (lists) wrote:
> > But the condition could be just 'true' in the instruction encoding?  That 
> > is,
> > why do you do both the jump-around and the csel?  Maybe I misunderstood
> > the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> > the compiler messing up things.
> > 
> 
> That's why the intrinsic takes explicit bounds not a simple bool
> expressing the conditions.  It prevents the optimizers from replacing
> the condition by value range propagation.  That does restrict the

Preventing optimizers from replacing the condition can be done in many ways,
IFN, UNSPEC, ...
The question is if you really need it at the assembly level, or if you can
just do an unconditional branch or branch conditional on say comparison of
two constants, without any dynamic bounds.

Jakub


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Alexander Monakov
I believe the proposed behavior of the new builtin is over-specialized.
In principle the following snippet may be open to exploitation too:

  if (predicate)
foo = arr[(secret >> untrusted) & 64];

assuming the attacker has a means to observe which part of 'arr' is brought into
cache, but cannot set 'predicate' to true (so has to rely on the speculative
population of the cache); and likewise if a store is predicated-off rather than
a load.

So shouldn't, for generality, the new builtin work "as if" by cleansing the
address rather than the result of the load, like the following? It would also be
applicable to stores then.

On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> inline TYP __builtin_load_no_speculate
>  (const volatile TYP *ptr,
>   const volatile void *lower,
>   const volatile void *upper,
>   TYP failval,
>   const volatile void *cmpptr)
> {
>   TYP result;
> 
>   if (cmpptr >= lower && cmpptr < upper)
> result = *ptr;
>   else
> result = failval;
>   return result;
> }

{
  if (!(cmpptr >= lower && cmpptr < upper))
ptr = NULL;

  return *ptr;
}

Alexander


Re: Non-INTEGER_CST CHREC_RIGHTs in analyze_*_subscript

2018-01-05 Thread Richard Biener
On Fri, Jan 5, 2018 at 12:02 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Nov 6, 2017 at 9:12 PM, Richard Sandiford
>>  wrote:
>>> initialize_matrix_A requires the CHREC_RIGHT to be an INTEGER_CST:
>>>
>>>   switch (TREE_CODE (chrec))
>>> {
>>> case POLYNOMIAL_CHREC:
>>>   A[index][0] = mult * int_cst_value (CHREC_RIGHT (chrec));
>>>   return initialize_matrix_A (A, CHREC_LEFT (chrec), index + 1, mult);
>>>
>>> and isn't able to back out if it isn't.  This patch instead
>>> checks for an appropriate CHREC_RIGHT before calling the parent
>>> function analyze_subscript_affine_affine.
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>>> OK to install?
>>
>> Hmm.  I think this papers over some underlying issues.  The calls are
>> guarded already with predicates that are supposed to veryify what you do.
>> Namely chrec_contains_symbols (a poly_int _does_ contain a "symbol").
>
> Yeah, I guess.  "Symbol" seems to mean different things in different
> places: elsewhere it's !is_gimple_min_invariant, while in rtl it's
> specifically things that would have an assembler label.  And "if it's
> not a symbol then it must be an INTEGER_CST" seems a bit dangerous in
> general, which I thought was why we also had
> evolution_function_right_is_integer_cst.

Yeah, well ... we do have quite some (too) many predicates here.

> E.g. tree-ssa-loop-ivcanon.c uses chrec_contains_symbols to test whether
> something is likely to be folded to a constant, and N * POLY_INT_CST
> would be folded to a POLY_INT_CST.  So in that kind of usage we'd
> want chrec_contains_symbols to be false.
>
> But...
>
>> So can you please fix that by adding TREE_CODE (chrec) == POLY_INT
>> to the return true case?
>
> ...I don't have any proof that that causes problems in practice,
> so is the following OK?

Yes.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2018-01-05  Richard Sandiford  
>
> gcc/
> * tree-chrec.c (chrec_contains_symbols): Return true for
> POLY_INT_CST.
>
> Index: gcc/tree-chrec.c
> ===
> --- gcc/tree-chrec.c2018-01-03 11:12:56.820720031 +
> +++ gcc/tree-chrec.c2018-01-05 11:01:00.520570935 +
> @@ -961,6 +961,7 @@ chrec_contains_symbols (const_tree chrec
>
>if (TREE_CODE (chrec) == SSA_NAME
>|| VAR_P (chrec)
> +  || TREE_CODE (chrec) == POLY_INT_CST
>|| TREE_CODE (chrec) == PARM_DECL
>|| TREE_CODE (chrec) == FUNCTION_DECL
>|| TREE_CODE (chrec) == LABEL_DECL


Re: [1/4] [AArch64] SVE backend support

2018-01-05 Thread Richard Sandiford
Here's the patch updated to apply on top of the v8.4 and
__builtin_load_no_speculate support.  It also handles the new
vec_perm_indices and CONST_VECTOR encoding and uses VNx... names
for the SVE modes.

Richard Sandiford  writes:
> This patch adds support for ARM's Scalable Vector Extension.
> The patch just contains the core features that work with the
> current vectoriser framework; later patches will add extra
> capabilities to both the target-independent code and AArch64 code.
> The patch doesn't include:
>
> - support for unwinding frames whose size depends on the vector length
> - modelling the effect of __tls_get_addr on the SVE registers
>
> These are handled by later patches instead.
>
> Some notes:
>
> - The copyright years for aarch64-sve.md start at 2009 because some of
>   the code is based on aarch64.md, which also starts from then.
>
> - The patch inserts spaces between items in the AArch64 section
>   of sourcebuild.texi.  This matches at least the surrounding
>   architectures and looks a little nicer in the info output.
>
> - aarch64-sve.md includes a pattern:
>
> while_ult
>
>   A later patch adds a matching "while_ult" optab, but the pattern
>   is also needed by the predicate vec_duplicate expander.

2018-01-05  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes.  Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise.  Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vecto

Re: [PATCH PR81647][AARCH64] PING Fix handling of Unordered Comparisons in aarch64-simd.md

2018-01-05 Thread Sudakshina Das

PING

On 15/12/17 11:57, Sudakshina Das wrote:

Hi

This patch fixes the inconsistent behavior observed at -O3 for the 
unordered comparisons. According to the online docs 
(https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gccint/Unary-and-Binary-Expressions.html), 
all of the following should not raise an FP exception:

- UNGE_EXPR
- UNGT_EXPR
- UNLE_EXPR
- UNLT_EXPR
- UNEQ_EXPR
Also ORDERED_EXPR and UNORDERED_EXPR should only return zero or one.

The aarch64-simd.md handling of these were generating exception raising 
instructions such as fcmgt. This patch changes the instructions that are 
emitted to in order to not give out the exceptions. We first check each 
operand for NaNs and force any elements containing NaN to zero before 
using them in the compare.


Example: UN (a, b) -> UNORDERED (a, b) | (cm (isnan (a) ? 0.0 : 
a, isnan (b) ? 0.0 : b))



The ORDERED_EXPR is now handled as (cmeq (a, a) & cmeq (b, b)) and 
UNORDERED_EXPR as ~ORDERED_EXPR and UNEQ as (~ORDERED_EXPR | cmeq (a,b)).


Testing done: Checked for regressions on bootstrapped 
aarch64-none-linux-gnu and added a new test case.


Is this ok for trunk? This will probably need a back-port to 
gcc-7-branch as well.


Thanks
Sudi

ChangeLog Entries:

*** gcc/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * config/aarch64/aarch64-simd.md (vec_cmp): 
Modify instructions for

 UNLT, UNLE, UNGT, UNGE, UNEQ, UNORDERED and ORDERED.

*** gcc/testsuite/ChangeLog ***

2017-12-15  Sudakshina Das  

 PR target/81647
 * gcc.target/aarch64/pr81647.c: New.




Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 11:37, Alexander Monakov wrote:
> I believe the proposed behavior of the new builtin is over-specialized.
> In principle the following snippet may be open to exploitation too:
> 
>   if (predicate)
>     foo = arr[(secret >> untrusted) & 64];
> 
> assuming the attacker has a means to observe which part of 'arr' is
> brought into
> cache, but cannot set 'predicate' to true (so has to rely on the speculative
> population of the cache); and likewise if a store is predicated-off
> rather than
> a load.
> 
> So shouldn't, for generality, the new builtin work "as if" by cleansing the
> address rather than the result of the load, like the following? It would
> also be
> applicable to stores then.

This is quite tricky.  For ARM we have to have a speculated load.  So
one way to recast this might be to push 'untrusted' into a stack slot
that was then speculatively loaded back by the builtin.  To do that
you'd need to find a way of expressing predicate as a range check, so
something like

  if (predicate)
   {
 foo = arr[(secret >> __builtin_load_no_speculate (&untrusted,
predicate_base, predicate_upper, 0,
predicate_val)) & 64)];
   }
You could use similar techniques with stores as well.

I'm open to suggestions as to how we might express the conditions
better.  It's certainly the least pleasant part of this proposal, but
the seemingly obvious '_Bool condition' parameter won't work because the
compiler will just substitute true in too many situations where it
thinks it has proved that it knows the conditions.

R.


> 
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
>> inline TYP __builtin_load_no_speculate
>>  (const volatile TYP *ptr,
>>   const volatile void *lower,
>>   const volatile void *upper,
>>   TYP failval,
>>   const volatile void *cmpptr)
>> {
>>   TYP result;
>> 
>>   if (cmpptr >= lower && cmpptr < upper)
>> result = *ptr;
>>   else
>> result = failval;
>>   return result;
>> }
> 
> {
>   if (!(cmpptr >= lower && cmpptr < upper))
>     ptr = NULL;
> 
>   return *ptr;
> }
> 
> Alexander



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Richard Earnshaw (lists)
On 05/01/18 11:35, Jakub Jelinek wrote:
> On Fri, Jan 05, 2018 at 10:59:21AM +, Richard Earnshaw (lists) wrote:
>>> But the condition could be just 'true' in the instruction encoding?  That 
>>> is,
>>> why do you do both the jump-around and the csel?  Maybe I misunderstood
>>> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
>>> the compiler messing up things.
>>>
>>
>> That's why the intrinsic takes explicit bounds not a simple bool
>> expressing the conditions.  It prevents the optimizers from replacing
>> the condition by value range propagation.  That does restrict the
> 
> Preventing optimizers from replacing the condition can be done in many ways,
> IFN, UNSPEC, ...
> The question is if you really need it at the assembly level, or if you can
> just do an unconditional branch or branch conditional on say comparison of
> two constants, without any dynamic bounds.
> 
>   Jakub
> 

The bounds /have/ to reflect the real speculation condition.  Otherwise
the CSEL becomes ineffective.

R.


Re: [PATCH] RX pragma address

2018-01-05 Thread Oleg Endo
Hi,

On Fri, 2018-01-05 at 11:03 +, Sebastian Perta wrote:
> 
> Hello, 
> 
> The following patch adds a new pragma, "pragma address" for RX.
> The patch updates extend.texi and add a test case to the regression
> as well.
> For the test case I checked than test is getting picked up in gcc.log
> unfortunately for the .texi part I don't know where to look/what to
> do to
> get the
> documentation generated.
> 
> This is similar to the pragma address implemented for M32C.

Is this for some kind of legacy backwards compatibility of something?
There are ways how to achieve the same with standard C and C++ language
features ... so I was wondering what's the purpose of this?

Cheers,
Oleg


RE: [PATCH] RX pragma address

2018-01-05 Thread Sebastian Perta
Hi,

Thank you for your comment.

>>Is this for some kind of legacy backwards compatibility of something?
Sort of, this is required by the following tool
https://www.renesas.com/en-eu/products/software-tools/tools/code-generator/ap4-application-leading-tool-applilet.html
There are not many plans to improve this tool and other solutions (which might 
be more complex) might not be possible to implement in this tool.

>>There are ways how to achieve the same with standard C and C++ language
The only way I can think of is to put the variable in a separate section (using 
section attribute) and then in the linker script put that section at the 
desired address.
The problem is AP4 does not touch the linker script it only generates source 
code.

Do you have any other ideas/suggestions? I'm very happy to listen.

Best Regards,
Sebastian

-Original Message-
From: Oleg Endo [mailto:oleg.e...@t-online.de] 
Sent: 05 January 2018 11:59
To: Sebastian Perta ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] RX pragma address

Hi,

On Fri, 2018-01-05 at 11:03 +, Sebastian Perta wrote:
> 
> Hello, 
> 
> The following patch adds a new pragma, "pragma address" for RX.
> The patch updates extend.texi and add a test case to the regression
> as well.
> For the test case I checked than test is getting picked up in gcc.log
> unfortunately for the .texi part I don't know where to look/what to
> do to
> get the
> documentation generated.
> 
> This is similar to the pragma address implemented for M32C.

Is this for some kind of legacy backwards compatibility of something?
There are ways how to achieve the same with standard C and C++ language
features ... so I was wondering what's the purpose of this?

Cheers,
Oleg



[PATCH][AArch64] Use LDP/STP in shrinkwrapping

2018-01-05 Thread Wilco Dijkstra
The shrinkwrap optimization added late in GCC 7 allows each callee-save to
be delayed and done only across blocks which need a particular callee-save.
Although this reduces unnecessary memory traffic on code paths that need
few callee-saves, it typically uses LDR/STR rather than LDP/STP.  The
number of LDP/STP instructions is reduced by ~7%. This means more memory
accesses and increased codesize, ~1.0% on average.

To improve this, if a particular callee-save must be saved/restored, also
add the adjacent callee-save to allow use of LDP/STP.  This significantly
reduces codesize (for example gcc_r, povray_r, parest_r, xalancbmk_r are
1% smaller).  This is a simple fix which can be backported.  A more advanced
approach would scan blocks for pairs of callee-saves, but that requires a
rewrite of all the callee-save code which is too late at this stage.

An example epilog in a shrinkwrapped function before:

ldpx21, x22, [sp,#16]
ldrx23, [sp,#32]
ldrx24, [sp,#40]
ldpx25, x26, [sp,#48]
ldrx27, [sp,#64]
ldrx28, [sp,#72]
ldrx30, [sp,#80]
ldrd8, [sp,#88]
ldpx19, x20, [sp],#96
ret

And after this patch:

ldrd8, [sp,#88]
ldpx21, x22, [sp,#16]
ldpx23, x24, [sp,#32]
ldpx25, x26, [sp,#48]
ldpx27, x28, [sp,#64]
ldrx30, [sp,#80]
ldpx19, x20, [sp],#96
ret

Passes bootstrap, OK for commit (and backport to GCC7)?

ChangeLog:
2018-01-05  Wilco Dijkstra  

* config/aarch64/aarch64.c (aarch64_components_for_bb):
Increase LDP/STP opportunities by adding adjacent callee-saves.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
9735fc18402dd8fe2fa4022eef4c0522814a0552..da21032b19413d0361b8d30b51a31124eaaa31a1
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3503,7 +3503,22 @@ aarch64_components_for_bb (basic_block bb)
&& (bitmap_bit_p (in, regno)
   || bitmap_bit_p (gen, regno)
   || bitmap_bit_p (kill, regno)))
- bitmap_set_bit (components, regno);
+  {
+   unsigned regno2, offset, offset2;
+   bitmap_set_bit (components, regno);
+
+   /* If there is a callee-save at an adjacent offset, add it too
+  to increase the use of LDP/STP.  */
+   offset = cfun->machine->frame.reg_offset[regno];
+   regno2 = ((offset & 8) == 0) ? regno + 1 : regno - 1;
+
+   if (regno2 <= LAST_SAVED_REGNUM)
+ {
+   offset2 = cfun->machine->frame.reg_offset[regno2];
+   if ((offset & ~8) == (offset2 & ~8))
+ bitmap_set_bit (components, regno2);
+ }
+  }
 
   return components;
 }


Re: [PATCH][AArch64] Improve register allocation of fma

2018-01-05 Thread Wilco Dijkstra
Andrew Pinski wrote:

> Seems like you should do something similar to the integer madd/msub
> instructions too (aarch64_mla is already correct but aarch64_mla_elt
> needs this too).

Integer madd/msub may benefit too, however it wouldn't make a difference
for a 3-operand mla since the register allocator already forces the same
register for the destination and accumulator.

Wilco


Re: [PATCH] RX pragma address

2018-01-05 Thread Oleg Endo
On Fri, 2018-01-05 at 12:12 +, Sebastian Perta wrote:
> 
> > > 
> > > Is this for some kind of legacy backwards compatibility of
> > > something?

> Sort of, this is required by the following tool
> https://www.renesas.com/en-eu/products/software-tools/tools/code-
> generator/ap4-application-leading-tool-applilet.html

> There are not many plans to improve this tool and other solutions
> (which might be more complex) might not be possible to implement in
> this tool.

I see.

> 
> The only way I can think of is to put the variable in a separate
> section (using section attribute) and then in the linker script put
> that section at the desired address.
> The problem is AP4 does not touch the linker script it only generates
> source code.
> 
> Do you have any other ideas/suggestions? I'm very happy to listen.

If you can compile the code only as plain C, for example

#define const_addr_var(addr, type) (*(volatile type*)addr)

#define DTCCR const_addr_var (0x00082400, uint8_t)
#define DTCSTS const_addr_var (0x0008240E, uint16_t)


If you can compile the code as C++11 there are certainly more options,
albeit probably not useful for generated code.  For example I do those
things this way:

// read-write hardware register with constant address
static constexpr hw_reg_rw> DTCCR = { };

// ready-only hardware register with constant address
static constexpr hw_reg_r> DTCSTS = { };


In both cases (C and C++) the following will compile to the same code:

void test_wr (void)
{
  DTCCR = 123;
}

int test_rd (void)
{
  return DTCSTS;
}

volatile void* get_reg_addr (void)
{
  return &DTCCR;
}

For a possible implementation of the hw_reg thingy see
https://github.com/shared-ptr/bits/blob/master/hw_reg.hpp

But really, if that is just for some code generator, why not simply
adjust the code generator to spit out normal C code instead of
cluttering the compiler with non-portable pragmas?  You have also
posted a similar thing for RL78 a while ago, so in the end the same
pragma thing would be re-implemented in the compiler three times (M32C,
RL78, RX)?  In that case, maybe better move it out from the M32C target
and make it available for every target?

Cheers,
Oleg


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Alexander Monakov
On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
> This is quite tricky.  For ARM we have to have a speculated load.

Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
wouldn't work (applying CSEL to the address rather than loaded value), and
if it wouldn't, then ARM-specific lowering of the builtin can handle that
anyhow, right? (by spilling the pointer)

(on x86 the current Intel's recommendation is to emit LFENCE prior to the load)

Is the main issue expressing the CSEL condition in the source code? Perhaps it 
is
possible to introduce

  int guard = __builtin_nontransparent(predicate);

  if (predicate)
foo = __builtin_load_no_speculate(&arr[addr], guard);

... or maybe even

  if (predicate)
foo = arr[__builtin_loadspecbarrier(addr, guard)];

where internally __builtin_nontransparent is the same as

  guard = predicate;
  asm volatile("" : "+g"(guard));

although admittedly this is not perfect since it forces evaluation of 'guard'
before the branch.

Alexander


RE: [PATCH] RX pragma address

2018-01-05 Thread Sebastian Perta
Hi Oleg,

Thank you very much for those suggestions, they are definitely the better way 
to go.
>From my point of view I would like drop both patches(RX and RL78) however I 
>contacted the AP4 tool team to see if they agree with this change and are 
>willing to update their tool.
If not I will follow your suggestion and move it out from the M32C target and 
make it available for every target.

Best Regards,
Sebastian

-Original Message-
From: Oleg Endo [mailto:oleg.e...@t-online.de] 
Sent: 05 January 2018 12:50
To: Sebastian Perta ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] RX pragma address

On Fri, 2018-01-05 at 12:12 +, Sebastian Perta wrote:
> 
> > > 
> > > Is this for some kind of legacy backwards compatibility of
> > > something?

> Sort of, this is required by the following tool
> https://www.renesas.com/en-eu/products/software-tools/tools/code-
> generator/ap4-application-leading-tool-applilet.html

> There are not many plans to improve this tool and other solutions
> (which might be more complex) might not be possible to implement in
> this tool.

I see.

> 
> The only way I can think of is to put the variable in a separate
> section (using section attribute) and then in the linker script put
> that section at the desired address.
> The problem is AP4 does not touch the linker script it only generates
> source code.
> 
> Do you have any other ideas/suggestions? I'm very happy to listen.

If you can compile the code only as plain C, for example

#define const_addr_var(addr, type) (*(volatile type*)addr)

#define DTCCR const_addr_var (0x00082400, uint8_t)
#define DTCSTS const_addr_var (0x0008240E, uint16_t)


If you can compile the code as C++11 there are certainly more options,
albeit probably not useful for generated code.  For example I do those
things this way:

// read-write hardware register with constant address
static constexpr hw_reg_rw> DTCCR = { };

// ready-only hardware register with constant address
static constexpr hw_reg_r> DTCSTS = { };


In both cases (C and C++) the following will compile to the same code:

void test_wr (void)
{
  DTCCR = 123;
}

int test_rd (void)
{
  return DTCSTS;
}

volatile void* get_reg_addr (void)
{
  return &DTCCR;
}

For a possible implementation of the hw_reg thingy see
https://github.com/shared-ptr/bits/blob/master/hw_reg.hpp

But really, if that is just for some code generator, why not simply
adjust the code generator to spit out normal C code instead of
cluttering the compiler with non-portable pragmas?  You have also
posted a similar thing for RL78 a while ago, so in the end the same
pragma thing would be re-implemented in the compiler three times (M32C,
RL78, RX)?  In that case, maybe better move it out from the M32C target
and make it available for every target?

Cheers,
Oleg



Re: Fix Bug 83566 - cyl_bessel_j returns wrong result for x>1000 for high orders

2018-01-05 Thread Ed Smith-Rowland

On 01/04/2018 03:54 PM, Michele Pezzutti wrote:



On 01/04/2018 06:16 PM, Ed Smith-Rowland wrote:

On 01/03/2018 02:49 PM, Michele Pezzutti wrote:


Hi.

On 01/02/2018 05:43 PM, Michele Pezzutti wrote:


On 01/02/2018 02:28 AM, Ed Smith-Rowland wrote:

I like the patch.

I have a similar one in the tr29124 branch.

Anyway, I got held up and I think it's good to have new folks 
looking into this.


It looks good except that you need to uglify k.


I looked at the GSL implementation, based on same reference, and 
their loop is cleaner. What about porting that implementation here? 
Possible?


My implementation is also using one more term for P than for Q, 
which is discouraged in GSL, according to their comments.


Ed, do you have any comment about this point?
Regarding the 2nd line, after my observations, usually term stops 
contributing to P, Q before k > nu/2, so actually, an offset by one 
is most likely without consequences.

GSL implementation is nevertheless more elegant.


I've seen their implementation.  It's tight.  Feel free to port it.
I assume this is it:
int
gsl_sf_bessel_Jnu_asympx_e(const double nu, const double x, 
gsl_sf_result * result);


You do want to give Q or b one more term so as to make that last 
factor as small as possible.  They're right.




Here it is.

diff --git a/libstdc++-v3/include/tr1/bessel_function.tcc 
b/libstdc++-v3/include/tr1/bessel_function.tcc

index 7ac733d..7842350 100644
--- a/libstdc++-v3/include/tr1/bessel_function.tcc
+++ b/libstdc++-v3/include/tr1/bessel_function.tcc
@@ -353,21 +353,47 @@ namespace tr1
  *   @param  __x   The argument of the Bessel functions.
  *   @param  __Jnu  The output Bessel function of the first kind.
  *   @param  __Nnu  The output Neumann function (Bessel function 
of the second kind).

+ *
+ *  Adapted for libstdc++ from GNU GSL version 2.4 
specfunc/bessel_j.c
+ *  Copyright (C) 1996,1997,1998,1999,2000,2001,2002,2003 Gerard 
Jungman

  */
 template 
 void
 __cyl_bessel_jn_asymp(_Tp __nu, _Tp __x, _Tp & __Jnu, _Tp & __Nnu)
 {
   const _Tp __mu   = _Tp(4) * __nu * __nu;
-  const _Tp __mum1 = __mu - _Tp(1);
-  const _Tp __mum9 = __mu - _Tp(9);
-  const _Tp __mum25 = __mu - _Tp(25);
-  const _Tp __mum49 = __mu - _Tp(49);
-  const _Tp __xx = _Tp(64) * __x * __x;
-  const _Tp __P = _Tp(1) - __mum1 * __mum9 / (_Tp(2) * __xx)
-    * (_Tp(1) - __mum25 * __mum49 / (_Tp(12) * __xx));
-  const _Tp __Q = __mum1 / (_Tp(8) * __x)
-    * (_Tp(1) - __mum9 * __mum25 / (_Tp(6) * __xx));
+  const _Tp __8x = _Tp(8) * __x;
+
+  _Tp __P = _Tp(0);
+  _Tp __Q = _Tp(0);
+
+  _Tp k = _Tp(0);
+  _Tp __term = _Tp(1);
+
+  int __epsP = 0;
+  int __epsQ = 0;
+
+  _Tp __eps = std::numeric_limits<_Tp>::epsilon();
+
+  do
+    {
+  __term *= (k == 0) ? _Tp(1) : -(__mu - (2 * k - 1) * (2 * k 
- 1)) / (k * __8x);

+  __epsP = std::abs(__term) < std::abs(__eps * __P);
+  __P += __term;
+
+  k++;
+
+  __term *= (__mu - (2 * k - 1) * (2 * k - 1)) / (k * __8x);
+  __epsQ = std::abs(__term) < std::abs(__eps * __Q);
+  __Q += __term;
+
+  if (__epsP && __epsQ && k > __nu / 2.)
+    break;
+
+  k++;
+    }
+  while (k < 1000);
+

   const _Tp __chi = __x - (__nu + _Tp(0.5L))
 * __numeric_constants<_Tp>::__pi_2();



In principal, these ideas should be ported to I,K but I think (and 
IIRC GSL agrees) these under,over-flow before they have much effect.
I think I put the full series in there.  They could use a similar 
clean-up.


But let's get this in there first.

Ed





This looks good to me.  The only nit is that you have to uglify the k 
loop variable.


(I hope we can un-uglify our variables when modules come!)

Do you have copyright assignment in place and all that?

Ed





Re: [PATCH] Fix a vbmi2 ICE (PR target/83604)

2018-01-05 Thread Kirill Yukhin
Hello Jakub!
On 28 Dec 10:10, Jakub Jelinek wrote:
> Hi!
> 
> These insns don't really need AVX512BW in any way themselves, only their
> masked variants might need it for reloading of the mask register, but that
> should be covered in builtins.def, doesn't need duplication in sse.md.
> For non-masked it causes ICEs, because the builtins properly aren't guarded
> with AVX512BW, but the insns incorrectly require that.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
Your patch is OK.

> 
> 2017-12-28  Jakub Jelinek  
> 
>   PR target/83604
>   * config/i386/sse.md (VI248_VLBW): Rename to ...
>   (VI248_AVX512VL): ... this.  Don't guard V32HI with TARGET_AVX512BW.
>   (vpshrd_, vpshld_,
>   vpshrdv_, vpshrdv__mask, vpshrdv__maskz,
>   vpshrdv__maskz_1, vpshldv_, vpshldv__mask,
>   vpshldv__maskz, vpshldv__maskz_1): Use VI248_AVX512VL
>   mode iterator instead of VI248_VLBW.
> 
>   * gcc.target/i386/pr83604.c: New test.

--
Thanks, K


Re: [PATCH] GFNI and misc other fixes (PR target/83604)

2018-01-05 Thread Kirill Yukhin
Hello Jakub!
On 28 Dec 11:07, Jakub Jelinek wrote:
> Hi!
> 
> Martin reported sse-13.c ICEs without all the options it has in dg-options.
> The problem is that the GFNI builtins used incorrect ISA masks and the
> headers too.  GFNI has one SSE encoded instruction (but that really needs
> SSE2 rather than SSE, because it uses V16QImode which is not enabled just
> for SSE), 2 AVX VEC encoded ones (without masking) and then EVEX encoded
> ones with masking where for *_mask we sometimes also need AVX512BW in
> addition to GFNI + {AVX512VL,AVX512F} in the CPUID column in the pdf.
> 
> Of course, such combinations don't really work properly with the current
> handling of the builtin masks, so I've finally analyzed all cases where we
> combine multiple ISA options in masks and found out that these days
> what actually is an exception is where we require one isa or another isa,
> rather than both or all 3.  So instead of adding further and further
> exceptions, this patch changes the general rule, by default we require
> all the listed ISAs to be enabled and have 3 exceptions to that rule
> (SSE | 3DNOW_A), (FMA | FMA4) and (SSE4_2 | CRC32), where we are looking for
> at least one of those enabled rather than both (but, if these are ored with
> other ISA masks, we require one of the two from the pair and all others).
> 
> Another thing is that 3 intrinsic headers were missing the boilerplate,
> some were using incorrect macros etc.
> 
> The new testcase unfortunately still has to require -msse2 -mmmx, because
> some intrin headers aren't fully correct in their pragmas (don't enable
> MMX where required etc.).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.

--
Thanks, K


Re: [PATCH] RTEMS/EPIPHANY: Add RTEMS support

2018-01-05 Thread Jeff Law
On 01/05/2018 01:06 AM, Sebastian Huber wrote:
> On 04/01/18 18:37, Jeff Law wrote:
>> On 01/04/2018 01:51 AM, Sebastian Huber wrote:
>>> gcc/
>>> config.gcc (epiphany-*-elf*): Add (epiphany-*-rtems*)
>>> configuration.
>>> config/epiphany/rtems.h: New file.
>>>
>>> libgcc/
>>> config.host (epiphany-*-elf*): Add (epiphany-*-rtems*)
>>> configuration.
>> Seems to me like this would fall maintainership of the RTEMS bits.  SO
>> you could self-approve.  I gave it a looksie and didn't see anything
>> objectionable.
> 
> I would like to back port this to GCC 7 before the GCC 7.3 release.
Contact Richi directly -- I believe he's in charge of the 7.3 release.

> 
>>
>>
>> You might consider not including dbxelf in tm_file.  I realize many
>> other targets include it, but embedded stabs is something we're really
>> like to get rid of.
> 
> For RTEMS we don't need this embedded stabs support at all, but I would
> like to stay as close to the *-*-elf configuration as possible. Would it
> make sense to deprecate the dbxelf in general?
That's certainly the direction things are going (deprecating embedded
stabs).  Having fewer ports that even include it makes it easier to pull
the trigger.

Your call.  It's not a huge deal.

jeff


Re: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.

2018-01-05 Thread Jeff Law
On 01/05/2018 03:48 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 09:51, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>  wrote:
>>>
>>> This patch implements support for __builtin_load_no_speculate on
>>> AArch64.  On this architecture we inhibit speclation by emitting a
>>> combination of CSEL and a hint instruction that ensures the CSEL is
>>> full resolved when the operands to the CSEL may involve a speculative
>>> load.
>>
>> Missing a high-level exlanation here but it looks like you do sth like
>>
>>   if (access is not in bounds)
>> no-speculate-access;
>>   else
>>  regular access;
> 
> For aarch64 we end up with a sequence like
> 
>   CMP 'bounds'
>   B out-of-range
>   LDR x1, [ptr]
> out-of-range:
>   CSELx1, FAILVAL, x1, 
>   CSDB
> 
> The CSEL+CSDB combination ensures that even if we do fetch x1 from an
> out-of-range location the value won't persist beyond the end of the code
> sequence and thus can't be used for further speculation.
If I try to look at this architecture independently the code starting at
out-of-range is just a conditional move and fence.  Given those
primitives any target ought to be able to define
__builtin_load_no_speculate -- which is obviously important :-)

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 02:44 AM, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>  wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
>>
>> This patch set adds a new builtin function for GCC to provide a
>> mechanism for limiting speculation by a CPU after a bounds-checked
>> memory access.  I've tried to design this in such a way that it can be
>> used for any target where this might be necessary.  The patch set
>> provides a generic implementation of the builtin and then
>> target-specific support for Arm and AArch64.  Other architectures can
>> utilize the internal infrastructure as needed.
>>
>> Most of the details of the builtin and the hooks that need to be
>> implemented to support it are described in the updates to the manual,
>> but a short summary is given below.
>>
>> TYP __builtin_load_no_speculate
>> (const volatile TYP *ptr,
>>  const volatile void *lower,
>>  const volatile void *upper,
>>  TYP failval,
>>  const volatile void *cmpptr)
>>
>> Where TYP can be any integral type (signed or unsigned char, int,
>> short, long, etc) or any pointer type.
>>
>> The builtin implements the following logical behaviour:
>>
>> inline TYP __builtin_load_no_speculate
>>  (const volatile TYP *ptr,
>>   const volatile void *lower,
>>   const volatile void *upper,
>>   TYP failval,
>>   const volatile void *cmpptr)
>> {
>>   TYP result;
>>
>>   if (cmpptr >= lower && cmpptr < upper)
>> result = *ptr;
>>   else
>> result = failval;
>>   return result;
>> }
>>
>> in addition the specification of the builtin ensures that future
>> speculation using *ptr may only continue iff cmpptr lies within the
>> bounds specified.
> 
> I fail to see how the vulnerability doesn't affect aggregate copies
> or bitfield accesses.  The vulnerability doesn't cause the loaded
> value to leak but (part of) the address by populating the CPU cache
> side-channel.
> 
> So why isn't this just
> 
>  T __builtin_load_no_speculate (T *);
> 
> ?  Why that "fallback" and why the lower/upper bounds?
> 
> Did you talk with other compiler vendors (intel, llvm, ...?)?
I think "fallback" could potentially be any value, I don't think it's
actual value matters much -- I could just as easily be "0" across the
board or some indeterminate value (as long as the indeterminate doesn't
itself present an information leak).

The bounds are used to build a condition for the csel to select between
the loaded value and the fail value.  You need some way to select
between them.

If I've got anything wrong, I'm sure Richard E. will correct me.

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 03:40 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 09:44, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>  wrote:
>>>
>>> Recently, Google Project Zero disclosed several classes of attack
>>> against speculative execution. One of these, known as variant-1
>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>> speculation, providing an arbitrary read gadget. Further details can
>>> be found on the GPZ blog [1] and the documentation that is included
>>> with the first patch.
>>>
>>> This patch set adds a new builtin function for GCC to provide a
>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>> memory access.  I've tried to design this in such a way that it can be
>>> used for any target where this might be necessary.  The patch set
>>> provides a generic implementation of the builtin and then
>>> target-specific support for Arm and AArch64.  Other architectures can
>>> utilize the internal infrastructure as needed.
>>>
>>> Most of the details of the builtin and the hooks that need to be
>>> implemented to support it are described in the updates to the manual,
>>> but a short summary is given below.
>>>
>>> TYP __builtin_load_no_speculate
>>> (const volatile TYP *ptr,
>>>  const volatile void *lower,
>>>  const volatile void *upper,
>>>  TYP failval,
>>>  const volatile void *cmpptr)
>>>
>>> Where TYP can be any integral type (signed or unsigned char, int,
>>> short, long, etc) or any pointer type.
>>>
>>> The builtin implements the following logical behaviour:
>>>
>>> inline TYP __builtin_load_no_speculate
>>>  (const volatile TYP *ptr,
>>>   const volatile void *lower,
>>>   const volatile void *upper,
>>>   TYP failval,
>>>   const volatile void *cmpptr)
>>> {
>>>   TYP result;
>>>
>>>   if (cmpptr >= lower && cmpptr < upper)
>>> result = *ptr;
>>>   else
>>> result = failval;
>>>   return result;
>>> }
>>>
>>> in addition the specification of the builtin ensures that future
>>> speculation using *ptr may only continue iff cmpptr lies within the
>>> bounds specified.
>>
>> I fail to see how the vulnerability doesn't affect aggregate copies
>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>> value to leak but (part of) the address by populating the CPU cache
>> side-channel.
>>
> 
> It's not quite as simple as that.  You'll need to read the white paper
> on Arm's web site to get a full grasp of this (linked from
> https://www.arm.com/security-update).
I actually recommend reading the original papers referenced by Google in
addition to the ARM whitepaper.  This stuff is far from intuitive.  I've
been loosely involved for a month or so, but it really didn't start to
click for me until right before the holiday break.



jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 04:57 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 11:35, Jakub Jelinek wrote:
>> On Fri, Jan 05, 2018 at 10:59:21AM +, Richard Earnshaw (lists) wrote:
 But the condition could be just 'true' in the instruction encoding?  That 
 is,
 why do you do both the jump-around and the csel?  Maybe I misunderstood
 the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
 the compiler messing up things.

>>>
>>> That's why the intrinsic takes explicit bounds not a simple bool
>>> expressing the conditions.  It prevents the optimizers from replacing
>>> the condition by value range propagation.  That does restrict the
>>
>> Preventing optimizers from replacing the condition can be done in many ways,
>> IFN, UNSPEC, ...
>> The question is if you really need it at the assembly level, or if you can
>> just do an unconditional branch or branch conditional on say comparison of
>> two constants, without any dynamic bounds.
>>
>>  Jakub
>>
> 
> The bounds /have/ to reflect the real speculation condition.  Otherwise
> the CSEL becomes ineffective.
I think this is probably the key that Jakub and Richard B. were missing.
 It certainly hadn't clicked for me when we spoke earlier.

Jeff


Re: Fix Bug 83566 - cyl_bessel_j returns wrong result for x>1000 for high orders

2018-01-05 Thread Jonathan Wakely

On 05/01/18 09:54 -0500, Ed Smith-Rowland wrote:

On 01/04/2018 03:54 PM, Michele Pezzutti wrote:



On 01/04/2018 06:16 PM, Ed Smith-Rowland wrote:

On 01/03/2018 02:49 PM, Michele Pezzutti wrote:


Hi.

On 01/02/2018 05:43 PM, Michele Pezzutti wrote:


On 01/02/2018 02:28 AM, Ed Smith-Rowland wrote:

I like the patch.

I have a similar one in the tr29124 branch.

Anyway, I got held up and I think it's good to have new 
folks looking into this.


It looks good except that you need to uglify k.


I looked at the GSL implementation, based on same reference, 
and their loop is cleaner. What about porting that 
implementation here? Possible?


My implementation is also using one more term for P than for 
Q, which is discouraged in GSL, according to their comments.


Ed, do you have any comment about this point?
Regarding the 2nd line, after my observations, usually term 
stops contributing to P, Q before k > nu/2, so actually, an 
offset by one is most likely without consequences.

GSL implementation is nevertheless more elegant.


I've seen their implementation.  It's tight.  Feel free to port it.
I assume this is it:
int
gsl_sf_bessel_Jnu_asympx_e(const double nu, const double x, 
gsl_sf_result * result);


You do want to give Q or b one more term so as to make that last 
factor as small as possible.  They're right.




Here it is.

diff --git a/libstdc++-v3/include/tr1/bessel_function.tcc 
b/libstdc++-v3/include/tr1/bessel_function.tcc

index 7ac733d..7842350 100644
--- a/libstdc++-v3/include/tr1/bessel_function.tcc
+++ b/libstdc++-v3/include/tr1/bessel_function.tcc
@@ -353,21 +353,47 @@ namespace tr1
  *   @param  __x   The argument of the Bessel functions.
  *   @param  __Jnu  The output Bessel function of the first kind.
  *   @param  __Nnu  The output Neumann function (Bessel 
function of the second kind).

+ *
+ *  Adapted for libstdc++ from GNU GSL version 2.4 
specfunc/bessel_j.c
+ *  Copyright (C) 1996,1997,1998,1999,2000,2001,2002,2003 
Gerard Jungman

  */
 template 
 void
 __cyl_bessel_jn_asymp(_Tp __nu, _Tp __x, _Tp & __Jnu, _Tp & __Nnu)
 {
   const _Tp __mu   = _Tp(4) * __nu * __nu;
-  const _Tp __mum1 = __mu - _Tp(1);
-  const _Tp __mum9 = __mu - _Tp(9);
-  const _Tp __mum25 = __mu - _Tp(25);
-  const _Tp __mum49 = __mu - _Tp(49);
-  const _Tp __xx = _Tp(64) * __x * __x;
-  const _Tp __P = _Tp(1) - __mum1 * __mum9 / (_Tp(2) * __xx)
-    * (_Tp(1) - __mum25 * __mum49 / (_Tp(12) * __xx));
-  const _Tp __Q = __mum1 / (_Tp(8) * __x)
-    * (_Tp(1) - __mum9 * __mum25 / (_Tp(6) * __xx));
+  const _Tp __8x = _Tp(8) * __x;
+
+  _Tp __P = _Tp(0);
+  _Tp __Q = _Tp(0);
+
+  _Tp k = _Tp(0);
+  _Tp __term = _Tp(1);
+
+  int __epsP = 0;
+  int __epsQ = 0;
+
+  _Tp __eps = std::numeric_limits<_Tp>::epsilon();
+
+  do
+    {
+  __term *= (k == 0) ? _Tp(1) : -(__mu - (2 * k - 1) * (2 * 
k - 1)) / (k * __8x);

+  __epsP = std::abs(__term) < std::abs(__eps * __P);
+  __P += __term;
+
+  k++;
+
+  __term *= (__mu - (2 * k - 1) * (2 * k - 1)) / (k * __8x);
+  __epsQ = std::abs(__term) < std::abs(__eps * __Q);
+  __Q += __term;
+
+  if (__epsP && __epsQ && k > __nu / 2.)
+    break;
+
+  k++;
+    }
+  while (k < 1000);
+

   const _Tp __chi = __x - (__nu + _Tp(0.5L))
 * __numeric_constants<_Tp>::__pi_2();



In principal, these ideas should be ported to I,K but I think (and 
IIRC GSL agrees) these under,over-flow before they have much 
effect.
I think I put the full series in there.  They could use a similar 
clean-up.


But let's get this in there first.

Ed





This looks good to me.  The only nit is that you have to uglify the k 
loop variable.


Right, it needs to be __k.

I'm also not sure we can put the copyright notice there, rather than
at the top of the file. We can collapse the years to 1996-2003 so I
suggest adding to the top of the file something like:

/* __cyl_bessel_jn_asymp adapted from GNU GSL version 2.4 specfunc/bessel_j.c
* Copyright (C) 1996-2003 Gerard Jungman
*/



(I hope we can un-uglify our variables when modules come!)

Do you have copyright assignment in place and all that?


That appears to be underway.

Thanks for working on this.



Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/05/2018 03:47 AM, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
>  wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>>  wrote:

 Recently, Google Project Zero disclosed several classes of attack
 against speculative execution. One of these, known as variant-1
 (CVE-2017-5753), allows explicit bounds checks to be bypassed under
 speculation, providing an arbitrary read gadget. Further details can
 be found on the GPZ blog [1] and the documentation that is included
 with the first patch.

 This patch set adds a new builtin function for GCC to provide a
 mechanism for limiting speculation by a CPU after a bounds-checked
 memory access.  I've tried to design this in such a way that it can be
 used for any target where this might be necessary.  The patch set
 provides a generic implementation of the builtin and then
 target-specific support for Arm and AArch64.  Other architectures can
 utilize the internal infrastructure as needed.

 Most of the details of the builtin and the hooks that need to be
 implemented to support it are described in the updates to the manual,
 but a short summary is given below.

 TYP __builtin_load_no_speculate
 (const volatile TYP *ptr,
  const volatile void *lower,
  const volatile void *upper,
  TYP failval,
  const volatile void *cmpptr)

 Where TYP can be any integral type (signed or unsigned char, int,
 short, long, etc) or any pointer type.

 The builtin implements the following logical behaviour:

 inline TYP __builtin_load_no_speculate
  (const volatile TYP *ptr,
   const volatile void *lower,
   const volatile void *upper,
   TYP failval,
   const volatile void *cmpptr)
 {
   TYP result;

   if (cmpptr >= lower && cmpptr < upper)
 result = *ptr;
   else
 result = failval;
   return result;
 }

 in addition the specification of the builtin ensures that future
 speculation using *ptr may only continue iff cmpptr lies within the
 bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 
>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
Without going into details, I've speculated to the appropriate folks
that there's other vectors for these kinds of attacks, some more
exploitable than others.  Attacking the cache via loads/stores is just
one instance of a class of problems IMHO.

I fully expect we'll be fighting this class of problems for a while.
This is just the tip of the iceberg.

Jeff


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
> 
> Recently, Google Project Zero disclosed several classes of attack
> against speculative execution. One of these, known as variant-1
> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
> speculation, providing an arbitrary read gadget. Further details can
> be found on the GPZ blog [1] and the documentation that is included
> with the first patch.
So I think it's important for anyone reading this stuff to read
Richard's paragraph carefully --  "an arbitrary read gadget".

I fully expect that over the course of time we're going to see other
arbitrary read gadgets to be found.  Those gadgets may have lower
bandwidth, have a higher degree of jitter or be harder to exploit, but
they're out there just waiting to be discovered.

So I don't expect this to be the only mitigation we have to put into place.

Anyway...


> 
> Some optimizations are permitted to make the builtin easier to use.
> The final two arguments can both be omitted (c++ style): failval will
> default to 0 in this case and if cmpptr is omitted ptr will be used
> for expansions of the range check.  In addition either lower or upper
> (but not both) may be a literal NULL and the expansion will then
> ignore that boundary condition when expanding.
So what are the cases where FAILVAL is useful rather than just using
zero (or some other constant) all the time?

Similarly under what conditions would one want to use CMPPTR rather than
PTR?

I wandered down through the LKML thread but didn't see anything which
would shed light on those two questions.

jeff
> 


Re: [PATCH 0/3] Add __builtin_load_no_speculate

2018-01-05 Thread Jeff Law
On 01/04/2018 11:20 AM, Joseph Myers wrote:
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> 
>> 1 - generic modifications to GCC providing the builtin function for all
>> architectures and expanding to an implementation that gives the
>> logical behaviour of the builtin only.  A warning is generated if
>> this expansion path is used that code will execute correctly but
>> without providing protection against speculative use.
> 
> Presumably it would make sense to have a standard way for architectures 
> with no speculative execution to just use the generic version, but without 
> the warning?  E.g., split up default_inhibit_load_speculation so that it 
> generates the warning then calls another function with the same prototype, 
> so such architectures can just define the hook to use that other function?
> 
Seems wise.  There's still architectures that don't speculate or don't
speculate enough to trigger these problems.

Jeff


Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Joseph Myers
On Thu, 4 Jan 2018, Michael Meissner wrote:

>   (CVT_FLOAT128_TO_IBM128): Use TFtype instead of __float128 to
>   accomidate -mabi=ieeelongdouble multilibs.

Why is that correct in the -mabi=ibmlongdouble case?

As I understand it, TFtype is always of mode TFmode (it would certainly be 
confusing if it sometimes had a different mode), but TFmode's format 
depends on the selected ABI.  Where this code uses __float128, it requires 
something that is always of IEEE binary128 format.  In the 
-mabi=ieeelongdouble case, that's indeed TFtype, but not in the 
-mabi=ibmlongdouble case when TFmode has ibm128 format.

In C code, _Float128 and _Complex _Float128 should always be available 
when the binary128 format is supported, so I think the code needing that 
format can safely use those.

>   (__mulkc3): Use TFmode/TCmode for float128 scalar/complex types.
>   (__divkc3): Use TFmode/TCmode for float128 scalar/complex types.
>   * config/rs6000/extendkftf2-sw.c (__extendkftf2_sw): Likewise.
>   * config/rs6000/trunctfkf2-sw.c (__trunctfkf2_sw): Likewise.

And likewise here.

(My understanding is that each of the libgcc functions is meant to have a 
particular ABI that does not depend on the format of long double - so the 
"tf" and "tc" functions refer to ibm128 format, unconditionally, and the 
"kf" and "kc" ones to binary128 format, unconditionally.  I have not 
checked whether the libfuncs handling in init_float128_ibm and 
init_float128_ieee is sufficient to achieve this in all cases.  In 
particular, does complex arithmetic always use the correct one of "kc" and 
"tc" depending on the long double format?  What about __builtin_powil - 
does that properly use __powitf2 and __powikf2 as appropriate, or do you 
have the ABI of __powitf2 depending on the multilib?)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Jakub Jelinek
On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> This patch is the beginning step to switching the PowerPC long double support
> from IBM extended double to IEEE 128-bit floating point on PowerPC servers.  
> It
> will be necessary to have this patch or a similar patch to allow the GLIBC 
> team
> to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes
> out, we can decide to switch the default.  It is likely, the default will only
> be switched on the 64-bit little endian PowerPC systems, when a distribution
> goes through a major level, such that they can contemplate major changes.

When you're mentioning multilibs, does that mean that we'll have two ABI
incompatible libgcc.so.* libraries, two ABI incompatible libstdc++.so.*
libraries etc.?  Just one or 2 libc.so.*/libm.so.*?

At least when doing the long double format double to doubledouble transition
(or on other targets to quad) last time we've managed to make it ABI
compatible.

Jakub


PING Re: [v3 of PATCH 13/14] c-format.c: handle location wrappers

2018-01-05 Thread David Malcolm
Ping:
  https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01494.html


(FWIW, the only other patch still needing review in the kit is:
  "[v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing
(minimal impl)"
 https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01594.html  )

On Fri, 2017-12-22 at 14:10 -0500, David Malcolm wrote:
> On Thu, 2017-12-21 at 00:00 -0500, Jason Merrill wrote:
> > On Wed, Dec 20, 2017 at 12:33 PM, David Malcolm  > m>
> > wrote:
> > > On Tue, 2017-12-19 at 14:55 -0500, Jason Merrill wrote:
> > > > On 12/17/2017 11:29 AM, David Malcolm wrote:
> > > > > On Mon, 2017-12-11 at 18:45 -0500, Jason Merrill wrote:
> > > > > > On 11/10/2017 04:45 PM, David Malcolm wrote:
> > > > > > > +  STRIP_ANY_LOCATION_WRAPPER (format_tree);
> > > > > > > +
> > > > > > >   if (VAR_P (format_tree))
> > > > > > > {
> > > > > > >   /* Pull out a constant value if the front end
> > > > > > > didn't.  */
> > > > > > 
> > > > > > It seems like we want fold_for_warn here instead of the
> > > > > > special
> > > > > > variable
> > > > > > handling.  That probably makes sense for the other places
> > > > > > you
> > > > > > change in
> > > > > > this patch, too.
> > > > > 
> > > > > Here's an updated version of the patch which uses
> > > > > fold_for_warn,
> > > > > rather than STRIP_ANY_LOCATION_WRAPPER.
> > > > > 
> > > > > In one place it was necessary to add a STRIP_NOPS, since the
> > > > > fold_for_warn can add a cast around a:
> > > > >ADDR_EXPR  (
> > > > >  STRING_CST)
> > > > > turning it into a:
> > > > >NOP_EXPR (
> > > > >  ADDR_EXPR  (
> > > > >STRING_CST))
> > > > 
> > > > Hmm, that seems like a bug.  fold_for_warn shouldn't change the
> > > > type of
> > > > the result.
> > > 
> > > In a similar vein, is the result of decl_constant_value (decl)
> > > required
> > > to be the same type as that of the decl?
> > > 
> > > What's happening for this testcase (g++.dg/warn/Wformat-1.C) is
> > > that we
> > > have a VAR_DECL with a DECL_INITIAL, but the types of the two
> > > don't
> > > quite match up.
> > > 
> > > decl_constant_value returns an unshare_expr clone of the
> > > DECL_INITIAL,
> > > and this is what's returned from fold_for_warn.
> > > 
> > > Am I right in thinking
> > > 
> > > (a) that the bug here is that a DECL_INITIAL ought to have the
> > > same
> > > type as its decl, and so there's a missing cast where that
> > > gets
> > > set up, or
> > 
> > This seems right.
> > 
> > > (b) should decl_constant_value handle such cases, and introduce a
> > > cast
> > > if it uncovers them?
> > 
> > decl_constant_value should probably assert that the types match
> > closely enough.
> 
> Thanks.
> 
> You describe the types needing to match "closely enough" (as opposed
> to *exactly*), and that may be the key here: what I didn't say in my
> message above is that the decl is "const" whereas the value isn't.
> 
> To identify where a DECL_INITIAL can have a different type to its
> decl
> (and thus fold_for_warn "changing type"), I tried adding this
> assertion:
> 
>   --- a/gcc/cp/typeck2.c
>   +++ b/gcc/cp/typeck2.c
>   @@ -857,6 +857,7 @@ store_init_value (tree decl, tree init,
> vec** cleanups, int flags)
>  /* If the value is a constant, just put it in DECL_INITIAL.  If
> DECL
> is an automatic variable, the middle end will turn this into
> a
> dynamic initialization later.  */
>   +  gcc_assert (TREE_TYPE (value) == TREE_TYPE (decl));
>  DECL_INITIAL (decl) = value;
>  return NULL_TREE;
>}
> 
> The assertion fires when a "const" decl is initialized with a value
> of
> a non-const type; e.g. in g++.dg/warn/Wformat-1.C:
> 
>   const char *const msg = "abc";
> 
> but I can also reproduce it with:
> 
>   const int i = 42;
> 
> What happens is that the type of the decl has the "readonly_flag"
> set,
> whereas the type of the value has that flag clear.
> 
> For the string case, convert_for_initialization finishes here:
> 
> 8894return convert_for_assignment (type, rhs, errtype,
> fndecl, parmnum,
> 8895   complain, flags);
> 
> and convert_for_assignment finishes by calling:
> 
> 8801return perform_implicit_conversion_flags
> (strip_top_quals (type), rhs,
> 8802  complain,
> flags);
> 
> The "strip_top_quals" in that latter call strips away the
> "readonly_flag" from type (the decl's type), giving the type of the
> rhs, and hence this does nothing.
> 
> So we end up with a decl of type
>   const char * const
> being initialized with a value of type
>   const char *
> (or a decl of type "const int" being initialized with a value of
> type "int").
> 
> So the types are slightly inconsistent (const vs non-const), but
> given
> your wording of types needing to match "closely enough" it's not
> clear
> to me that it's a problem - if I'm reading things right, it's coming
> from
> that strip_top_quals in the implicit conversion in
> c

Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Joseph Myers
On Fri, 5 Jan 2018, Jakub Jelinek wrote:

> On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> > This patch is the beginning step to switching the PowerPC long double 
> > support
> > from IBM extended double to IEEE 128-bit floating point on PowerPC servers. 
> >  It
> > will be necessary to have this patch or a similar patch to allow the GLIBC 
> > team
> > to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes
> > out, we can decide to switch the default.  It is likely, the default will 
> > only
> > be switched on the 64-bit little endian PowerPC systems, when a distribution
> > goes through a major level, such that they can contemplate major changes.
> 
> When you're mentioning multilibs, does that mean that we'll have two ABI
> incompatible libgcc.so.* libraries, two ABI incompatible libstdc++.so.*
> libraries etc.?  Just one or 2 libc.so.*/libm.so.*?

The existing GCC code tries to fix the libgcc function names so *tf* 
always means IBM long double and *kf* always means IEEE long double.  As 
noted in my reply to this patch, I suspect it might fail to handle 
__div?c3, __mul?c3 and __powi?c2 (at least) properly to avoid two 
different libgcc ABIs there.

I don't know the intent for libstdc++, libgfortran etc. - but it's been 
stated the intent is a single set of glibc libraries.  (Of course a 
distribution probably has other libraries influenced by the long double 
format in some way.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PING Re: [v3 of PATCH 13/14] c-format.c: handle location wrappers

2018-01-05 Thread Joseph Myers
On Fri, 5 Jan 2018, David Malcolm wrote:

> Ping:
>   https://gcc.gnu.org/ml/gcc-patches/2017-12/msg01494.html

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR libstdc++/83626 Don't throw for remove("") and remove_all("")

2018-01-05 Thread Jonathan Wakely

On 05/01/18 10:37 +, Jonathan Wakely wrote:

On 04/01/18 21:02 -0500, Tim Song wrote:

What if the file to be removed is externally removed between the
symlink_status and the ::remove call? This is probably QoI because
filesystem race, but it seems reasonable to double check errno if
::remove fails and not fail if the failure is due to the file not
existing.


Yes, the race makes it undefined, but checking for ENOENT seems
reasonable. Thanks for the suggestion.


This makes remove and remove_all handle (some) filesystem races, by
ignoring ENOENT errors when removing entries or while iterating over
sub-directories.

It also avoids redundant stat() calls in remove_all, and fixes a bug
in the return value of the throwing version of remove_all.

Tested powerpc64le-linux, committed to trunk.

I've attached a multithreaded test I used to test the handling of
filesystem races, but I'm not committing that test.



commit 64a3b77dd050d16069061185c3efef020698fca4
Author: Jonathan Wakely 
Date:   Fri Jan 5 16:39:46 2018 +

PR libstdc++/83626 handle ENOENT due to filesystem race

PR libstdc++/83626
* src/filesystem/ops.cc (remove(const path&, error_code&)): Do not
report an error for ENOENT.
(remove_all(const path&)): Fix type of result variable.
(remove_all(const path&, error_code&)): Use non-throwing increment
for directory iterator. Call POSIX remove directly to avoid redundant
calls to symlink_status. Do not report errors for ENOENT.
* src/filesystem/std-ops.cc: Likewise.
* testsuite/27_io/filesystem/operations/remove_all.cc: Test throwing
overload.
* testsuite/experimental/filesystem/operations/remove_all.cc:
Likewise.

diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc
index 5a7cb599bb6..3690fb94d63 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -1025,13 +1025,16 @@ fs::remove(const path& p, error_code& ec) noexcept
   ec.clear();
   return false; // Nothing to do, not an error.
 }
-  if (::remove(p.c_str()) != 0)
+  if (::remove(p.c_str()) == 0)
 {
-  ec.assign(errno, std::generic_category());
-  return false;
+  ec.clear();
+  return true;
 }
-  ec.clear();
-  return true;
+  else if (errno == ENOENT)
+ec.clear();
+  else
+ec.assign(errno, std::generic_category());
+  return false;
 }
 
 
@@ -1039,7 +1042,7 @@ std::uintmax_t
 fs::remove_all(const path& p)
 {
   error_code ec;
-  bool result = remove_all(p, ec);
+  const auto result = remove_all(p, ec);
   if (ec)
 _GLIBCXX_THROW_OR_ABORT(filesystem_error("cannot remove all", p, ec));
   return result;
@@ -1051,15 +1054,29 @@ fs::remove_all(const path& p, error_code& ec) noexcept
   const auto s = symlink_status(p, ec);
   if (!status_known(s))
 return -1;
+
   ec.clear();
+  if (s.type() == file_type::not_found)
+return 0;
+
   uintmax_t count = 0;
   if (s.type() == file_type::directory)
-for (directory_iterator d(p, ec), end; !ec && d != end; ++d)
-  count += fs::remove_all(d->path(), ec);
-  if (!ec && fs::remove(p, ec))
+{
+  for (directory_iterator d(p, ec), end; !ec && d != end; d.increment(ec))
+	count += fs::remove_all(d->path(), ec);
+  if (ec.value() == ENOENT)
+	ec.clear();
+  else if (ec)
+	return -1;
+}
+
+  if (::remove(p.c_str()) == 0)
 ++count;
-  if (ec)
-return -1;
+  else if (errno != ENOENT)
+{
+  ec.assign(errno, std::generic_category());
+  return -1;
+}
   return count;
 }
 
diff --git a/libstdc++-v3/src/filesystem/std-ops.cc b/libstdc++-v3/src/filesystem/std-ops.cc
index 6ff280a9c76..2411bbb7977 100644
--- a/libstdc++-v3/src/filesystem/std-ops.cc
+++ b/libstdc++-v3/src/filesystem/std-ops.cc
@@ -1274,13 +1274,16 @@ fs::remove(const path& p, error_code& ec) noexcept
   ec.clear();
   return false; // Nothing to do, not an error.
 }
-  if (::remove(p.c_str()) != 0)
+  if (::remove(p.c_str()) == 0)
 {
-  ec.assign(errno, std::generic_category());
-  return false;
+  ec.clear();
+  return true;
 }
-  ec.clear();
-  return true;
+  else if (errno == ENOENT)
+ec.clear();
+  else
+ec.assign(errno, std::generic_category());
+  return false;
 }
 
 
@@ -1288,7 +1291,7 @@ std::uintmax_t
 fs::remove_all(const path& p)
 {
   error_code ec;
-  bool result = remove_all(p, ec);
+  const auto result = remove_all(p, ec);
   if (ec)
 _GLIBCXX_THROW_OR_ABORT(filesystem_error("cannot remove all", p, ec));
   return result;
@@ -1300,15 +1303,29 @@ fs::remove_all(const path& p, error_code& ec)
   const auto s = symlink_status(p, ec);
   if (!status_known(s))
 return -1;
+
   ec.clear();
+  if (s.type() == file_type::not_found)
+return 0;
+
   uintmax_t count = 0;
   if (s.type() == file_type::directory)
-for (directory_iterator d(p, ec), end; !ec && d != end; 

Re: [PATCH] avoid using %lli et al.

2018-01-05 Thread Martin Sebor

On 01/04/2018 08:53 AM, Jakub Jelinek wrote:

On Thu, Jan 04, 2018 at 08:50:30AM -0700, Martin Sebor wrote:

On 01/04/2018 01:30 AM, Jakub Jelinek wrote:

On Wed, Jan 03, 2018 at 05:00:41PM -0700, Martin Sebor wrote:

This is an example where having a solution for bug 78014 would
be helpful.  I.e., a -Wformat checker to help enforce the use


No, because %zu isn't portable enough for all the hosts we support.


GCC could mitigate all these portability pitfalls by providing
its own versions of the functions (I suggested gcc_printf the
last time we discussed this subject).  libiberty already does
this for a small subset of the portability problems in some of
the standard functions (including vsprintf), so this would be
a natural extension of that approach.


Ugh, no.

Jakub


I took me a while to process this well-articulated argument
but I think I finally got it.

An alternative to using %zu that I mentioned in my earlier
post is to cast the argument to the type the directive
expects:

> I.e., a -Wformat checker to help enforce the use ... (or
> an explicit cast from size_t) even on targets size_t is
> unsigned or unsigned long, respectively.

If providing interfaces to make portability to non-C99 hosts
easier is not palatable for some unknown reason, then -Wformat
could be enhanced to warn for uses of %zu and other similar
conversions to help prevent introducing them into GCC sources
regardless of the host GCC is being developed on.

The common idea of all of these solutions is to enhance our
tools and APIs to make it easier to avoid regressions like
the one you so kindly (and quickly) fixed for me.  But if
you prefer to spend your time and energy cleaning up the
mistakes of others rather than focus on adding improvements
then I understand if you don't agree with these suggestions.

Martin


Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-05 Thread Jeff Law
On 01/04/2018 08:35 AM, Sudakshina Das wrote:
> Hi
> 
> The bug reported a particular test di-longlong64-sync-1.c failing when
> run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3]
> and -mthumb -march=armv6 -O[g,1,2,3].
> 
> According to what I could see, the crash was caused because of the
> explicit VOIDmode argument that was sent to emit_store_flag_force ().
> Since the comparing argument was a long long, it was being forced into a
> VOID type register before the comparison (in prepare_cmp_insn()) is done.
> 
> As pointed out by Kyrill, there is a comment on emit_store_flag() which
> says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.
>  If it is VOIDmode, they cannot both be CONST_INT". This condition is
> not true in this case and thus I think it is suitable to change the
> argument.
> 
> Testing done: Checked for regressions on bootstrapped
> arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
> cases.
> 
> Sudi
> 
> ChangeLog entries:
> 
> *** gcc/ChangeLog ***
> 
> 2017-01-04  Sudakshina Das  
> 
> PR target/82096
> * optabs.c (expand_atomic_compare_and_swap): Change argument
> to emit_store_flag_force.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2017-01-04  Sudakshina Das  
> 
> PR target/82096
> * gcc.c-torture/compile/pr82096-1.c: New test.
> * gcc.c-torture/compile/pr82096-2.c: Likwise.
In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know the
result of the comparison and shouldn't we have optimized the store flag
to something simpler?

I feel like I must be missing something here.

Jeff
> 


Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Michael Meissner
On Fri, Jan 05, 2018 at 05:28:03PM +, Joseph Myers wrote:
> On Thu, 4 Jan 2018, Michael Meissner wrote:
> 
> > (CVT_FLOAT128_TO_IBM128): Use TFtype instead of __float128 to
> > accomidate -mabi=ieeelongdouble multilibs.
> 
> Why is that correct in the -mabi=ibmlongdouble case?

The PowerPC TFmode has always been 'special'.  It would mirror the 128-bit long
double format.  For the one port that actually used IEEE 128-bit floating
point, TFmode was IEEE.  For the the majority of ports TFmode is IBM extended
double.

So when I began this journey to add IEEE 128-bit support, I had to fit it into
this framework.

If long double is IBM extended double:
TFmode is IBM extended double scalar
TCmode is IBM extended double complex
KFmode is IEEE scalar
KCmode is IEEE complex
IFmode is not used
ICmode is not used

If long double is IEEE 128-bit (i.e. -mabi=ieeelongdouble):
TFmode is IEEE scalar
TCmode is IEEE complex
KFmode is not used
KCmode is not used
IFmode is IBM extended double scalar
ICmode is IBM extended double complex

We are just starting to look at the implications of switching the long double
type from IBM extended to IEEE.

While I would hope that eventualy GLIBC can do something with versioning so
that we avoid having to do multilibs, I suspect during initial development, we
will need multilibs just to get things right.  Then perhaps GLIBC can do
something like the original support for -mlong-double-64 and -mlong-double-128
went in.  However, since I don't work on GLIBC, I don't know how hard it is.  I
figured we would need multilibs as a crutch during initial development.

> As I understand it, TFtype is always of mode TFmode (it would certainly be 
> confusing if it sometimes had a different mode), but TFmode's format 
> depends on the selected ABI.  Where this code uses __float128, it requires 
> something that is always of IEEE binary128 format.  In the 
> -mabi=ieeelongdouble case, that's indeed TFtype, but not in the 
> -mabi=ibmlongdouble case when TFmode has ibm128 format.
> 
> In C code, _Float128 and _Complex _Float128 should always be available 
> when the binary128 format is supported, so I think the code needing that 
> format can safely use those.

Yes, in C code _Float128 _Comples works.  The trouble is compiling
libstdc++-v3.  In C++, we don't have _Float128, and __float128 _Complex does
not work for either x86 or PowerPC.  So on PowerPC the code from bits/floatn.h
is:


/* Defined to a complex binary128 type if __HAVE_FLOAT128 is 1.  */
#if __HAVE_FLOAT128
# if !__GNUC_PREREQ (7, 0) || defined __cplusplus
/* Add a typedef for older GCC compilers which don't natively support
   _Complex _Float128.  */
typedef _Complex float __cfloat128 __attribute__ ((__mode__ (__KC__)));
#  define __CFLOAT128 __cfloat128
# else
#  define __CFLOAT128 _Complex _Float128
# endif
#endif

Obviously, in a future release of GLIBC (via AT 11.0-3) we will fix that, but
it made the job of doing the initial port eaiser.

The problem is we need to get stuff into GCC so that GLIBC can start to add the
appropriate changes.


> > (__mulkc3): Use TFmode/TCmode for float128 scalar/complex types.
> > (__divkc3): Use TFmode/TCmode for float128 scalar/complex types.
> > * config/rs6000/extendkftf2-sw.c (__extendkftf2_sw): Likewise.
> > * config/rs6000/trunctfkf2-sw.c (__trunctfkf2_sw): Likewise.
> 
> And likewise here.
> 
> (My understanding is that each of the libgcc functions is meant to have a 
> particular ABI that does not depend on the format of long double - so the 
> "tf" and "tc" functions refer to ibm128 format, unconditionally, and the 
> "kf" and "kc" ones to binary128 format, unconditionally.  I have not 
> checked whether the libfuncs handling in init_float128_ibm and 
> init_float128_ieee is sufficient to achieve this in all cases.  In 
> particular, does complex arithmetic always use the correct one of "kc" and 
> "tc" depending on the long double format?  What about __builtin_powil - 
> does that properly use __powitf2 and __powikf2 as appropriate, or do you 
> have the ABI of __powitf2 depending on the multilib?)

Libgcc does not need to be multilibed.  It uses separate functions.  What does
currently need to be multilibed is libstdc++-v3 and possibly GLIBC.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Michael Meissner
On Fri, Jan 05, 2018 at 06:33:50PM +0100, Jakub Jelinek wrote:
> On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> > This patch is the beginning step to switching the PowerPC long double 
> > support
> > from IBM extended double to IEEE 128-bit floating point on PowerPC servers. 
> >  It
> > will be necessary to have this patch or a similar patch to allow the GLIBC 
> > team
> > to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes
> > out, we can decide to switch the default.  It is likely, the default will 
> > only
> > be switched on the 64-bit little endian PowerPC systems, when a distribution
> > goes through a major level, such that they can contemplate major changes.
> 
> When you're mentioning multilibs, does that mean that we'll have two ABI
> incompatible libgcc.so.* libraries, two ABI incompatible libstdc++.so.*
> libraries etc.?  Just one or 2 libc.so.*/libm.so.*?
> 
> At least when doing the long double format double to doubledouble transition
> (or on other targets to quad) last time we've managed to make it ABI
> compatible.

We are hoping that it eventually will not need multilibs.  But during the
initial development, it is likely that we will need to use multilibs.  That is
what the patch is for -- to allow building the compiler that supports
multilibs, but it is not on by default.

Part of the issue is GCC and GLIBC rev on different cycles and are done by
different people.  I have to guess what GLIBC will need, so that we have the
GLIBC solution by the time GCC 9 is ready.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Michael Meissner
On Fri, Jan 05, 2018 at 05:47:39PM +, Joseph Myers wrote:
> On Fri, 5 Jan 2018, Jakub Jelinek wrote:
> 
> > On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> > > This patch is the beginning step to switching the PowerPC long double 
> > > support
> > > from IBM extended double to IEEE 128-bit floating point on PowerPC 
> > > servers.  It
> > > will be necessary to have this patch or a similar patch to allow the 
> > > GLIBC team
> > > to begin their modifications in GLIBC 2.28, so that by the time GCC 9 
> > > comes
> > > out, we can decide to switch the default.  It is likely, the default will 
> > > only
> > > be switched on the 64-bit little endian PowerPC systems, when a 
> > > distribution
> > > goes through a major level, such that they can contemplate major changes.
> > 
> > When you're mentioning multilibs, does that mean that we'll have two ABI
> > incompatible libgcc.so.* libraries, two ABI incompatible libstdc++.so.*
> > libraries etc.?  Just one or 2 libc.so.*/libm.so.*?
> 
> The existing GCC code tries to fix the libgcc function names so *tf* 
> always means IBM long double and *kf* always means IEEE long double.  As 
> noted in my reply to this patch, I suspect it might fail to handle 
> __div?c3, __mul?c3 and __powi?c2 (at least) properly to avoid two 
> different libgcc ABIs there.
> 
> I don't know the intent for libstdc++, libgfortran etc. - but it's been 
> stated the intent is a single set of glibc libraries.  (Of course a 
> distribution probably has other libraries influenced by the long double 
> format in some way.)

It would have been much simpler if we didn't already have a 128-bit floating
point type.  But we had the existing long double support.

In any case, even if you switch default long double format to IEEE, the kf/kc
functions in libgcc will be called, even though the type is now TFmode/TCmode.
It is unfortunate that the IBM extended double support did not define all of
the interfaces to be __gcc_q, but there were some tf/tc functions defined
that are called, and I needed to pick unique names.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Jakub Jelinek
On Fri, Jan 05, 2018 at 02:07:51PM -0500, Michael Meissner wrote:
> Yes, in C code _Float128 _Comples works.  The trouble is compiling
> libstdc++-v3.  In C++, we don't have _Float128, and __float128 _Complex does
> not work for either x86 or PowerPC.  So on PowerPC the code from bits/floatn.h
> is:

Well, libstdc++-v3 should be certainly easier than glibc, at least assuming
that ICmode mangles differently from KCmode, because it should be just a
matter of compiling the subset of symbols refering to long double twice.

Jakub


Re: [PATCH, #2], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Michael Meissner
On Thu, Jan 04, 2018 at 06:05:55PM -0500, Michael Meissner wrote:
> This patch is the beginning step to switching the PowerPC long double support
> from IBM extended double to IEEE 128-bit floating point on PowerPC servers.  
> It
> will be necessary to have this patch or a similar patch to allow the GLIBC 
> team
> to begin their modifications in GLIBC 2.28, so that by the time GCC 9 comes
> out, we can decide to switch the default.  It is likely, the default will only
> be switched on the 64-bit little endian PowerPC systems, when a distribution
> goes through a major level, such that they can contemplate major changes.

In doing some testing on a big endian system, it getting hairy to add support
to add optional multilibs for ieee/ibm.  On the BE system, the default cpu is
a power4, and so the IEEE emulator is not built unless you configure to make
power7, power8, or power9 the default cpu.  In addition, the BE system already
has multilibs for 64 vs. 32 bit.  While these things can be handled, it is not
currently planned to ultimately switch the long double format in BE.  So, this
patch replaces the previous patch, and it only allows you to switch the long
double format for powerpc64le-*-linux* systems.

Except for configure.ac and configure, all of the other changes are the same.

I have tested the patch on BE systems, and it builds normally providing you do
not use the --with-long-double-format option.

[gcc]
2018-01-05  Michael Meissner  

* configure.ac (--with-long-double-format): Add support for
configuration option to change the default long double format on
little endian PowerPC Linux systems.
* configure: Regenerate.
* config.gcc (powerpc*-linux*-*): Add support for
--with-long-double-format={ieee,ibm}.  If the format is explicit
set, also set up IBM and IEEE multilibs.
* config/rs6000/rs6000.h (FLOAT128_IEEE_P): Explicitly check for
128-bit long doubles before checking TFmode or TCmode.
(FLOAT128_IBM_P): Likewise.
(TARGET_IEEEQUAD_MULTILIB): Set to 0 if not already defined.
* config/rs6000/rs6000.c (rs6000_option_override_internal): If we
have IBM/IEEE multilibs, don't give a warning if the user chagnes
the long double format.
(is_complex_IBM_long_double): Explicitly check for 128-bit long
doubles before checking TCmode.
* config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): If long
double is IEEE, define __KC__ and __KF__ to allow floatn.h to be
used without modification.
* config/rs6000/linux64.h (MULTILIB_DEFAULTS_IEEE): Specify the
-mabi={ieee,ibm}longdouble default for multilibs.
(MULTILIB_DEFAULTS): Likewise.
* config/rs6000/t-ldouble: New file, add IEEE/IBM long double
multilibs.

[libgcc]
2018-01-05  Michael Meissner  

* config/rs6000/quad-float128.h (IBM128_TYPE): Explicitly use
__ibm128, instead of trying to use long double.
(CVT_FLOAT128_TO_IBM128): Use TFtype instead of __float128 to
accomidate -mabi=ieeelongdouble multilibs.
(CVT_IBM128_TO_FLOAT128): Likewise.
* config/rs6000/ibm-ldouble.c (IBM128_TYPE): New macro to define
the appropriate IBM extended double type.
(__gcc_qadd): Change all occurances of long double to IBM128_TYPE.
(__gcc_qsub): Likewise.
(__gcc_qmul): Likewise.
(__gcc_qdiv): Likewise.
(pack_ldouble): Likewise.
(__gcc_qneg): Likewise.
(__gcc_qeq): Likewise.
(__gcc_qne): Likewise.
(__gcc_qge): Likewise.
(__gcc_qle): Likewise.
(__gcc_stoq): Likewise.
(__gcc_dtoq): Likewise.
(__gcc_itoq): Likewise.
(__gcc_utoq): Likewise.
(__gcc_qunord): Likewise.
* config/rs6000/_mulkc3.c (toplevel): Include soft-fp.h and
quad-float128.h for the definitions.
(COPYSIGN): Use the f128 version instead of the q version.
(INFINITY): Likewise.
(__mulkc3): Use TFmode/TCmode for float128 scalar/complex types.
* config/rs6000/_divkc3.c (toplevel): Include soft-fp.h and
quad-float128.h for the definitions.
(COPYSIGN): Use the f128 version instead of the q version.
(INFINITY): Likewise.
(FABS): Likewise.
(__divkc3): Use TFmode/TCmode for float128 scalar/complex types.
* config/rs6000/extendkftf2-sw.c (__extendkftf2_sw): Likewise.
* config/rs6000/trunctfkf2-sw.c (__trunctfkf2_sw): Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH PR82096] Fix ICE in int_mode_for_mode, at stor-layout.c:403 with arm-linux-gnueabi

2018-01-05 Thread Sudakshina Das

Hi Jeff

On 05/01/18 18:44, Jeff Law wrote:

On 01/04/2018 08:35 AM, Sudakshina Das wrote:

Hi

The bug reported a particular test di-longlong64-sync-1.c failing when
run on arm-linux-gnueabi with options -mthumb -march=armv5t -O[g,1,2,3]
and -mthumb -march=armv6 -O[g,1,2,3].

According to what I could see, the crash was caused because of the
explicit VOIDmode argument that was sent to emit_store_flag_force ().
Since the comparing argument was a long long, it was being forced into a
VOID type register before the comparison (in prepare_cmp_insn()) is done.

As pointed out by Kyrill, there is a comment on emit_store_flag() which
says "MODE is the mode to use for OP0 and OP1 should they be CONST_INTs.
  If it is VOIDmode, they cannot both be CONST_INT". This condition is
not true in this case and thus I think it is suitable to change the
argument.

Testing done: Checked for regressions on bootstrapped
arm-none-linux-gnueabi and arm-none-linux-gnueabihf and added new test
cases.

Sudi

ChangeLog entries:

*** gcc/ChangeLog ***

2017-01-04  Sudakshina Das  

 PR target/82096
 * optabs.c (expand_atomic_compare_and_swap): Change argument
 to emit_store_flag_force.

*** gcc/testsuite/ChangeLog ***

2017-01-04  Sudakshina Das  

 PR target/82096
 * gcc.c-torture/compile/pr82096-1.c: New test.
 * gcc.c-torture/compile/pr82096-2.c: Likwise.

In the case where both (op0/op1) to
emit_store_flag/emit_store_flag_force are constants, don't we know the
result of the comparison and shouldn't we have optimized the store flag
to something simpler?

I feel like I must be missing something here.



emit_store_flag_force () is comparing a register to op0.

The 2 constant arguments are to the expand_atomic_compare_and_swap () 
function. emit_store_flag_force () is used in case when this function is 
called by the bool variant of the built-in function where the bool 
return value is computed by comparing the result register with the 
expected op0.


Sudi


Jeff






Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Michael Meissner
On Fri, Jan 05, 2018 at 08:22:57PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 05, 2018 at 02:07:51PM -0500, Michael Meissner wrote:
> > Yes, in C code _Float128 _Comples works.  The trouble is compiling
> > libstdc++-v3.  In C++, we don't have _Float128, and __float128 _Complex does
> > not work for either x86 or PowerPC.  So on PowerPC the code from 
> > bits/floatn.h
> > is:
> 
> Well, libstdc++-v3 should be certainly easier than glibc, at least assuming
> that ICmode mangles differently from KCmode, because it should be just a
> matter of compiling the subset of symbols refering to long double twice.

Yep.  But as I said, I think we will need the crutch of having multilibs
initially.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [v3 of PATCH 13/14] c-format.c: handle location wrappers

2018-01-05 Thread Jason Merrill

On 12/22/2017 02:10 PM, David Malcolm wrote:

You describe the types needing to match "closely enough" (as opposed
to *exactly*), and that may be the key here: what I didn't say in my
message above is that the decl is "const" whereas the value isn't.


Right, differences in top-level qualifiers don't matter.  A scalar 
prvalue can't be const.


Jason


Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-05 Thread Jason Merrill

On 12/29/2017 12:06 PM, David Malcolm wrote:

One issue I ran into was that fold_for_warn doesn't eliminate
location wrappers when processing_template_decl, leading to
failures of the template-based cases in
g++.dg/warn/Wmemset-transposed-args-1.C.

This is due to the early bailout when processing_template_decl
within cp_fold:

2078  if (processing_template_decl
2079  || (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE (x) == 
error_mark_node)))
2080return x;

which dates back to the merger of the C++ delayed folding branch.

I've fixed that in this version of the patch by removing that
"processing_template_decl ||" condition from that cp_fold early
bailout.


Hmm, that makes me nervous.  We might want to fold in templates when 
called from fold_for_warn, but not for other occurrences.  But I see 
that we check processing_template_decl in cp_fully_fold and in the call 
to cp_fold_function, so I guess this is fine.



+case VIEW_CONVERT_EXPR:
+case NON_LVALUE_EXPR:
 case CAST_EXPR:
 case REINTERPRET_CAST_EXPR:
 case CONST_CAST_EXPR:
@@ -14937,6 +14940,15 @@ tsubst_copy (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 case CONVERT_EXPR:
 case NOP_EXPR:
   {
+   if (location_wrapper_p (t))
+ {
+   /* Handle location wrappers by substituting the wrapped node
+  first, *then* reusing the resulting type.  Doing the type
+  first ensures that we handle template parameters and
+  parameter pack expansions.  */
+   tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, 
in_decl);
+   return build1 (code, TREE_TYPE (op0), op0);
+ }


I'd rather handle location wrappers separately, and abort if 
VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as wrappers.



@@ -24998,6 +25018,8 @@ build_non_dependent_expr (tree expr)
   && !expanding_concept ())
 fold_non_dependent_expr (expr);
 
+  STRIP_ANY_LOCATION_WRAPPER (expr);


Why is this needed?

Jason


[PR/middle-end 81897] make tree-ssa-uninit.c handle longer sequences

2018-01-05 Thread Aldy Hernandez
This fixes the code that verifies that none of the uninitialized paths 
reaching a PHI are actually taken (uninit_uses_cannot_happen).


As discussed in the PR, this is done by fixing the predicate analysis in 
tree-ssa-uninit.c so that the set of predicates reaching the use of a 
PHI take into the entire path from the entry block.  Previously we were 
starting at the immediate dominator of the PHI (for some obscure reason, 
or brain fart on my part).


Fixing the predicate analysis to go all the way back to ENTRY, uncovers 
some latent limitations in compute_control_dep_chain().  I've fixed 
those as well.


Note, I don't know why we were excluding basic blocks with a small (< 2) 
number of successors, but the exclusion was keeping us from looking at 
the ENTRY block.  If this was by design, I can special case the ENTRY 
block.  Similarly, convert_control_dep_chain_into_preds() was ignoring 
basic blocks with no instructions.  This was making us skip the ENTRY 
block, which is just an empty forwarder block.  Again, if this was by 
design, I can just special case the ENTRY block.


Finally, I've split up the dumping routine into member functions so we 
can get finer grained debugging help.


Tested on x86-64 Linux.

OK for trunk?
gcc/

	PR middle-end/81897
	* tree-ssa-uninit.c (compute_control_dep_chain): Do not bail on
	basic blocks with a small number of successors.
	(convert_control_dep_chain_into_preds): Special case the entry
	block.
	(dump_predicates): Split apart into...
	(dump_pred_chain): ...here...
	(dump_pred_info): ...and here.
	(can_one_predicate_be_invalidated_p): Add debugging printfs.
	(can_chain_union_be_invalidated_p): Return TRUE if any predicate
	was invalidated.
	(uninit_uses_cannot_happen): Avoid unnecessary if
	convert_control_dep_chain_into_preds yielded nothing.

diff --git a/gcc/testsuite/gcc.dg/uninit-pr81897.c b/gcc/testsuite/gcc.dg/uninit-pr81897.c
new file mode 100644
index 000..0323050839d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/uninit-pr81897.c
@@ -0,0 +1,24 @@
+/* { dg-do compile }  */
+/* { dg-options "-O2 -Wuninitialized" } */
+
+int f(void);
+static inline void rcu_read_unlock(void)
+{
+static _Bool __warned;
+if (f() && !__warned && !f()) {
+__warned = 1;
+}
+}
+int inet6_rtm_getroute(void)
+{
+int dst;
+int fibmatch = f();
+
+if (!fibmatch)
+dst = f();
+rcu_read_unlock();
+if (fibmatch)
+dst = 0;
+
+return dst;
+}
diff --git a/gcc/tree-ssa-uninit.c b/gcc/tree-ssa-uninit.c
index 6930a243deb..58590a7763b 100644
--- a/gcc/tree-ssa-uninit.c
+++ b/gcc/tree-ssa-uninit.c
@@ -543,9 +543,6 @@ compute_control_dep_chain (basic_block bb, basic_block dep_bb,
   bool found_cd_chain = false;
   size_t cur_chain_len = 0;
 
-  if (EDGE_COUNT (bb->succs) < 2)
-return false;
-
   if (*num_calls > PARAM_VALUE (PARAM_UNINIT_CONTROL_DEP_ATTEMPTS))
 return false;
   ++*num_calls;
@@ -671,11 +668,9 @@ convert_control_dep_chain_into_preds (vec *dep_chains,
 	  e = one_cd_chain[j];
 	  guard_bb = e->src;
 	  gsi = gsi_last_bb (guard_bb);
+	  /* Ignore empty BBs as they're basically forwarder blocks.  */
 	  if (gsi_end_p (gsi))
-	{
-	  has_valid_pred = false;
-	  break;
-	}
+	continue;
 	  cond_stmt = gsi_stmt (gsi);
 	  if (is_gimple_call (cond_stmt) && EDGE_COUNT (e->src->succs) >= 2)
 	/* Ignore EH edge.  Can add assertion on the other edge's flag.  */
@@ -916,38 +911,49 @@ find_def_preds (pred_chain_union *preds, gphi *phi)
   return has_valid_pred;
 }
 
+/* Dump a pred_info.  */
+
+static void
+dump_pred_info (pred_info one_pred)
+{
+  if (one_pred.invert)
+fprintf (dump_file, " (.NOT.) ");
+  print_generic_expr (dump_file, one_pred.pred_lhs);
+  fprintf (dump_file, " %s ", op_symbol_code (one_pred.cond_code));
+  print_generic_expr (dump_file, one_pred.pred_rhs);
+}
+
+/* Dump a pred_chain.  */
+
+static void
+dump_pred_chain (pred_chain one_pred_chain)
+{
+  size_t np = one_pred_chain.length ();
+  for (size_t j = 0; j < np; j++)
+{
+  dump_pred_info (one_pred_chain[j]);
+  if (j < np - 1)
+	fprintf (dump_file, " (.AND.) ");
+  else
+	fprintf (dump_file, "\n");
+}
+}
+
 /* Dumps the predicates (PREDS) for USESTMT.  */
 
 static void
 dump_predicates (gimple *usestmt, pred_chain_union preds, const char *msg)
 {
-  size_t i, j;
-  pred_chain one_pred_chain = vNULL;
   fprintf (dump_file, "%s", msg);
-  print_gimple_stmt (dump_file, usestmt, 0);
-  fprintf (dump_file, "is guarded by :\n\n");
+  if (usestmt)
+{
+  print_gimple_stmt (dump_file, usestmt, 0);
+  fprintf (dump_file, "is guarded by :\n\n");
+}
   size_t num_preds = preds.length ();
-  /* Do some dumping here:  */
-  for (i = 0; i < num_preds; i++)
+  for (size_t i = 0; i < num_preds; i++)
 {
-  size_t np;
-
-  one_pred_chain = preds[i];
-  np = one_pred_chain.length ();
-
-  for (j = 0; j < np; j++)
-	{
-	  pred_

[PATCH] Add VEC_ORDERED_REMOVE_IF

2018-01-05 Thread Tom de Vries

[ was: Re: [RFC] Add vec::ordered_remove_if ]

On 01/02/2018 03:08 PM, Richard Biener wrote:

On Fri, Dec 29, 2017 at 2:55 PM, Tom de Vries  wrote:

[ was: Re: [libgomp, openacc, openmp, PR83046] Prune removed funcs from
offload table ]

On 12/28/2017 05:06 PM, Jakub Jelinek wrote:


On Thu, Dec 28, 2017 at 04:53:29PM +0100, Tom de Vries wrote:


--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -,6 +,16 @@ output_offload_tables (void)
 struct lto_simple_output_block *ob
   = lto_create_simple_output_block (LTO_section_offload_table);
   +  for (unsigned i = 0; i < vec_safe_length (offload_funcs);)
+{
+  if (!cgraph_node::get ((*offload_funcs)[i]))
+   {
+ offload_funcs->ordered_remove (i);
+ continue;
+   }
+  i++;
+}




This has O(n^2) complexity for n == vec_safe_length (offload_funcs).
Can't you instead just have 2 IVs, one for where we read the vector elt
and
one for where we write it if the 2 are different, then truncate the vector
if needed at the end?



I wonder if it makes sense to add this function to the vec interface.

Any comments?


Hmm.  We do have quite some cases in the tree doing this kind of
operation - can you try finding one or two and see if it fits?



Done.

[ I wonder though whether skipping element 0 in connect_traces is 
intentional or not. ]



If we only could use lambdas here...



I found having to write a separate function cumbersome, so I rewrote the 
function into macros VEC_ORDERED_REMOVE_IF and 
VEC_ORDERED_REMOVE_IF_FROM_TO, which gives a syntax somewhat similar to 
a lambda.



Oh, and I somehow miss a start/end index for operating on a vector part?



Done.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Add VEC_ORDERED_REMOVE_IF

2018-01-03  Tom de Vries  

	* vec.h	(VEC_ORDERED_REMOVE_IF, VEC_ORDERED_REMOVE_IF_FROM_TO): Define.
	* vec.c (test_ordered_remove_if): New function.
	(vec_c_tests): Call test_ordered_remove_if.
	* dwarf2cfi.c (connect_traces): Use VEC_ORDERED_REMOVE_IF_FROM_TO.
	* lto-streamer-out.c (prune_offload_funcs): Use VEC_ORDERED_REMOVE_IF.
	* tree-vect-patterns.c (vect_pattern_recog_1): Use
	VEC_ORDERED_REMOVE_IF.

---
 gcc/dwarf2cfi.c  | 19 +++
 gcc/lto-streamer-out.c   | 26 --
 gcc/tree-vect-patterns.c | 10 ++
 gcc/vec.c| 46 ++
 gcc/vec.h| 34 ++
 5 files changed, 101 insertions(+), 34 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 7a70639..7732ab2 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -2703,7 +2703,7 @@ before_next_cfi_note (rtx_insn *start)
 static void
 connect_traces (void)
 {
-  unsigned i, n = trace_info.length ();
+  unsigned i, n;
   dw_trace_info *prev_ti, *ti;
 
   /* ??? Ideally, we should have both queued and processed every trace.
@@ -2714,20 +2714,15 @@ connect_traces (void)
  these are not "real" instructions, and should not be considered.
  This could be generically useful for tablejump data as well.  */
   /* Remove all unprocessed traces from the list.  */
-  for (i = n - 1; i > 0; --i)
-{
-  ti = &trace_info[i];
-  if (ti->beg_row == NULL)
-	{
-	  trace_info.ordered_remove (i);
-	  n -= 1;
-	}
-  else
-	gcc_assert (ti->end_row != NULL);
-}
+  unsigned ix, ix2;
+  VEC_ORDERED_REMOVE_IF_FROM_TO (trace_info, ix, ix2, ti, 1,
+ trace_info.length (), ti->beg_row == NULL);
+  FOR_EACH_VEC_ELT (trace_info, ix, ti)
+gcc_assert (ti->end_row != NULL);
 
   /* Work from the end back to the beginning.  This lets us easily insert
  remember/restore_state notes in the correct order wrt other notes.  */
+  n = trace_info.length ();
   prev_ti = &trace_info[n - 1];
   for (i = n - 1; i > 0; --i)
 {
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index ef17083..b0993c7 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -2355,24 +2355,14 @@ prune_offload_funcs (void)
   if (!offload_funcs)
 return;
 
-  unsigned int write_index = 0;
-  for (unsigned read_index = 0; read_index < vec_safe_length (offload_funcs);
-   read_index++)
-{
-  tree fn_decl = (*offload_funcs)[read_index];
-  bool remove_p = cgraph_node::get (fn_decl) == NULL;
-  if (remove_p)
-	continue;
-
-  DECL_PRESERVE_P (fn_decl) = 1;
-
-  if (write_index != read_index)
-	(*offload_funcs)[write_index] = (*offload_funcs)[read_index];
-
-  write_index++;
-}
-
-  offload_funcs->truncate (write_index);
+  unsigned ix, ix2;
+  tree *elem_ptr;
+  VEC_ORDERED_REMOVE_IF (*offload_funcs, ix, ix2, elem_ptr,
+			 cgraph_node::get (*elem_ptr) == NULL);
+
+  tree fn_decl;
+  FOR_EACH_VEC_ELT (*offload_funcs, ix, fn_decl)
+DECL_PRESERVE_P (fn_decl) = 1;
 }
 
 /* Main entry point from the pass manager.  */
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index a2c6293..2a5d056 100644
--- a/gcc/t

Fix folding of Inf/NaN comparisons for -ftrapping-math (PR tree-optimization/64811)

2018-01-05 Thread Joseph Myers
The folding of comparisons against Inf (to constants or comparisons
with the maximum finite value) has various cases where it introduces
or loses "invalid" exceptions for comparisons with NaNs.

Folding x > +Inf to 0 should not be about HONOR_SNANS - ordered
comparisons of both quiet and signaling NaNs should raise invalid.

x <= +Inf is not the same as x == x, because again that loses an
exception (equality comparisons don't raise exceptions except for
signaling NaNs).

x == +Inf is not the same as x > DBL_MAX, and a similar issue applies
with the x != +Inf case - that transformation causes a spurious
exception.

This patch fixes the conditionals on the folding to avoid such
introducing or losing exceptions.

Bootstrapped with no regressions on x86_64-pc-linux-gnu (where the
cases involving spurious exceptions wouldn't have failed anyway before
GCC 8 because of unordered comparisons wrongly always having formerly
been used by the back end).  Also tested for powerpc-linux-gnu
soft-float that this fixes many glibc math/ test failures that arose
in that configuration because this folding affected the IBM long
double support in libgcc (no such failures appeared for hard-float
because of the bug of powerpc hard-float always using unordered
comparisons) - some failures remain, but I believe them to be
unrelated.  OK to commit?

gcc:
2018-01-05  Joseph Myers  

PR tree-optimization/64811
* match.pd: When optimizing comparisons with Inf, avoid
introducing or losing exceptions from comparisons with NaN.

gcc/testsuite:
2018-01-05  Joseph Myers  

PR tree-optimization/64811
* gcc.dg/torture/inf-compare-1.c, gcc.dg/torture/inf-compare-2.c,
gcc.dg/torture/inf-compare-3.c, gcc.dg/torture/inf-compare-4.c,
gcc.dg/torture/inf-compare-5.c, gcc.dg/torture/inf-compare-6.c,
gcc.dg/torture/inf-compare-7.c, gcc.dg/torture/inf-compare-8.c:
New tests.
* gcc.c-torture/execute/ieee/fp-cmp-7.x: New file.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 256279)
+++ gcc/match.pd(working copy)
@@ -3050,18 +3050,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  code = swap_tree_comparison (code);
  }
  (switch
-  /* x > +Inf is always false, if with ignore sNANs.  */
+  /* x > +Inf is always false, if we ignore NaNs or exceptions.  */
   (if (code == GT_EXPR
-  && ! HONOR_SNANS (@0))
+  && !(HONOR_NANS (@0) && flag_trapping_math))
{ constant_boolean_node (false, type); })
   (if (code == LE_EXPR)
-   /* x <= +Inf is always true, if we don't case about NaNs.  */
+   /* x <= +Inf is always true, if we don't care about NaNs.  */
(if (! HONOR_NANS (@0))
{ constant_boolean_node (true, type); }
-   /* x <= +Inf is the same as x == x, i.e. !isnan(x).  */
-   (eq @0 @0)))
-  /* x == +Inf and x >= +Inf are always equal to x > DBL_MAX.  */
-  (if (code == EQ_EXPR || code == GE_EXPR)
+   /* x <= +Inf is the same as x == x, i.e. !isnan(x), but this loses
+  an "invalid" exception.  */
+   (if (!flag_trapping_math)
+(eq @0 @0
+  /* x == +Inf and x >= +Inf are always equal to x > DBL_MAX, but
+for == this introduces an exception for x a NaN.  */
+  (if ((code == EQ_EXPR && !(HONOR_NANS (@0) && flag_trapping_math))
+  || code == GE_EXPR)
(with { real_maxval (&max, neg, TYPE_MODE (TREE_TYPE (@0))); }
(if (neg)
 (lt @0 { build_real (TREE_TYPE (@0), max); })
@@ -3072,7 +3076,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (neg)
 (ge @0 { build_real (TREE_TYPE (@0), max); })
 (le @0 { build_real (TREE_TYPE (@0), max); }
-  /* x != +Inf is always equal to !(x > DBL_MAX).  */
+  /* x != +Inf is always equal to !(x > DBL_MAX), but this introduces
+an exception for x a NaN so use an unordered comparison.  */
   (if (code == NE_EXPR)
(with { real_maxval (&max, neg, TYPE_MODE (TREE_TYPE (@0))); }
(if (! HONOR_NANS (@0))
@@ -3080,10 +3085,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (ge @0 { build_real (TREE_TYPE (@0), max); })
  (le @0 { build_real (TREE_TYPE (@0), max); }))
 (if (neg)
- (bit_xor (lt @0 { build_real (TREE_TYPE (@0), max); })
-  { build_one_cst (type); })
- (bit_xor (gt @0 { build_real (TREE_TYPE (@0), max); })
-  { build_one_cst (type); }))
+ (unge @0 { build_real (TREE_TYPE (@0), max); })
+ (unle @0 { build_real (TREE_TYPE (@0), max); }))
 
  /* If this is a comparison of a real constant with a PLUS_EXPR
 or a MINUS_EXPR of a real constant, we can convert it into a
Index: gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-7.x
===
--- gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-7.x (nonexistent)
+++ gcc/tes

Re: [PATCH], Add optional IEEE/IBM long double multilib support

2018-01-05 Thread Joseph Myers
On Fri, 5 Jan 2018, Michael Meissner wrote:

> On Fri, Jan 05, 2018 at 05:28:03PM +, Joseph Myers wrote:
> > On Thu, 4 Jan 2018, Michael Meissner wrote:
> > 
> > >   (CVT_FLOAT128_TO_IBM128): Use TFtype instead of __float128 to
> > >   accomidate -mabi=ieeelongdouble multilibs.
> > 
> > Why is that correct in the -mabi=ibmlongdouble case?
> 
> The PowerPC TFmode has always been 'special'.  It would mirror the 128-bit 
> long
> double format.  For the one port that actually used IEEE 128-bit floating
> point, TFmode was IEEE.  For the the majority of ports TFmode is IBM extended
> double.

I don't think this answers my question.

CVT_FLOAT128_TO_IBM128 has an argument VALUE that is always of IEEE 
format.  You're doing "TFtype __value = (VALUE);".  In the case where 
TFmode is IBM long double, and so therefore is TFtype, this will do an 
unwanted conversion.  I'd expect it to result in __extendkftf2 recursively 
calling itself, in fact.  If it does not, what's wrong with my reasoning, 
and what exactly does that line do in the case where long double is 
ibm128?

> Yes, in C code _Float128 _Comples works.  The trouble is compiling
> libstdc++-v3.  In C++, we don't have _Float128, and __float128 _Complex does

My comments are about changes to code in libgcc, not about libstdc++-v3.

> > (My understanding is that each of the libgcc functions is meant to have a 
> > particular ABI that does not depend on the format of long double - so the 
> > "tf" and "tc" functions refer to ibm128 format, unconditionally, and the 
> > "kf" and "kc" ones to binary128 format, unconditionally.  I have not 
> > checked whether the libfuncs handling in init_float128_ibm and 
> > init_float128_ieee is sufficient to achieve this in all cases.  In 
> > particular, does complex arithmetic always use the correct one of "kc" and 
> > "tc" depending on the long double format?  What about __builtin_powil - 
> > does that properly use __powitf2 and __powikf2 as appropriate, or do you 
> > have the ABI of __powitf2 depending on the multilib?)
> 
> Libgcc does not need to be multilibed.  It uses separate functions.  What does
> currently need to be multilibed is libstdc++-v3 and possibly GLIBC.

I don't see how you ensure __mulkc3, for example, always uses IEEE format, 
given that you're making it use TFtype (= IBM long double in the default 
case).

I don't see how you ensure that, when IEEE long double is the default, 
__multc3 is still built to use IBM long double, which is the ABI for that 
function name.

Likewise for __divkc3, __divtc3, __powitf2, __powikf2.

For that matter, when building with IEEE long double as the default, how 
do you ensure that libgcc functions such as __fixtfti (built from generic 
libgcc2.c sources, in that case) are built with the proper ABI, meaning 
use of IFmode for the arguments or results corresponding to the "tf" in 
the name?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] PR libstdc++/83279 handle sendfile not copying entire file

2018-01-05 Thread Jonathan Wakely

On 14/12/17 21:53 +, Jonathan Wakely wrote:

I failed to notice that the man page for sendfile(3) says it won't
copy more than 2GiB. This refactors the code to first try sendfile
(because it's fast if it works) and if that fails, or stops before the
end of the file, then use filebufs to copy what's left over.

I'm not adding a test because creating two files larger than 2G isn't
a good idea, but I've tested it locally.

PR libstdc++/83279
* src/filesystem/std-ops.cc (do_copy_file): Handle sendfile not
copying entire file.



This restores the non-null offset argument to sendfile, which is
required by Solaris.

Tested x86_64-linux, committed to trunk.


commit d448551ca3cbbfd35796609128d54c8a0030a032
Author: Jonathan Wakely 
Date:   Fri Jan 5 21:04:54 2018 +

PR libstdc++/83279 Use non-null offset argument for sendfile

PR libstdc++/83279
* src/filesystem/std-ops.cc  (do_copy_file): Use non-null offset with
sendfile.

diff --git a/libstdc++-v3/src/filesystem/std-ops.cc b/libstdc++-v3/src/filesystem/std-ops.cc
index 2411bbb7977..bed1ad1fe27 100644
--- a/libstdc++-v3/src/filesystem/std-ops.cc
+++ b/libstdc++-v3/src/filesystem/std-ops.cc
@@ -382,10 +382,10 @@ fs::do_copy_file(const char* from, const char* to,
   return false;
 }
 
-  ssize_t n = 0;
   size_t count = from_st->st_size;
 #ifdef _GLIBCXX_USE_SENDFILE
-  n = ::sendfile(out.fd, in.fd, nullptr, count);
+  off_t offset = 0;
+  ssize_t n = ::sendfile(out.fd, in.fd, &offset, count);
   if (n < 0 && errno != ENOSYS && errno != EINVAL)
 {
   ec.assign(errno, std::generic_category());
@@ -405,35 +405,32 @@ fs::do_copy_file(const char* from, const char* to,
 count -= n;
 #endif // _GLIBCXX_USE_SENDFILE
 
-  __gnu_cxx::stdio_filebuf sbin(in.fd, std::ios::in);
-  __gnu_cxx::stdio_filebuf sbout(out.fd, std::ios::out);
+  using std::ios;
+  __gnu_cxx::stdio_filebuf sbin(in.fd, ios::in|ios::binary);
+  __gnu_cxx::stdio_filebuf sbout(out.fd, ios::out|ios::binary);
 
   if (sbin.is_open())
 in.fd = -1;
   if (sbout.is_open())
 out.fd = -1;
 
-  const std::streampos errpos(std::streamoff(-1));
-
-  if (n < 0)
+#ifdef _GLIBCXX_USE_SENDFILE
+  if (n != 0)
 {
-  auto p1 = sbin.pubseekoff(0, std::ios_base::beg, std::ios_base::in);
-  auto p2 = sbout.pubseekoff(0, std::ios_base::beg, std::ios_base::out);
+  if (n < 0)
+	n = 0;
+
+  const auto p1 = sbin.pubseekoff(n, ios::beg, ios::in);
+  const auto p2 = sbout.pubseekoff(n, ios::beg, ios::out);
+
+  const std::streampos errpos(std::streamoff(-1));
   if (p1 == errpos || p2 == errpos)
 	{
 	  ec = std::make_error_code(std::errc::io_error);
 	  return false;
 	}
 }
-  else if (n > 0)
-{
-  auto p = sbout.pubseekoff(n, std::ios_base::beg, std::ios_base::out);
-  if (p == errpos)
-	{
-	  ec = std::make_error_code(std::errc::io_error);
-	  return false;
-	}
-}
+#endif
 
   if (count && !(std::ostream(&sbout) << &sbin))
 {


Re: [PATCH 1/2] nios2: Enable Ada run-time build

2018-01-05 Thread Jeff Law
On 01/04/2018 11:59 PM, Sebastian Huber wrote:
> gcc/
>   * config/nios2/nios2.h (nios2_section_threshold): Guard by not
>   USED_FOR_TARGET.
So I'm not familiar with *why* the section bits need to be
conditionalized at all.  But there's certainly evidence that this is the
way they're supposed to be handled for a long time (see 2005 changes to
darwin.h).

So, OK.  Similarly for the epiphany changes.


jeff


[PATCH] Fix yet another expand_debug_expr ICE (PR middle-end/83694)

2018-01-05 Thread Jakub Jelinek
Hi!

The recently added testcase for PR83666 ICEs on powerpc*/sparc*
and perhaps other targets, where get_inner_reference gives us
VOIDmode for a 512-bit field and smallest_int_mode_for_size
ICEs on it because there is no such integer mode.

Fixed by giving up above MAX_BITSIZE_MODE_ANY_INT, that would have been
BLKmode and not really useful for debug info generation anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-05  Jakub Jelinek  

PR middle-end/83694
* cfgexpand.c (expand_debug_expr): Punt if mode1 is VOIDmode
and bitsize might be greater than MAX_BITSIZE_MODE_ANY_INT.

--- gcc/cfgexpand.c.jj  2018-01-04 12:43:38.199537090 +0100
+++ gcc/cfgexpand.c 2018-01-05 12:10:46.960037601 +0100
@@ -4534,8 +4534,12 @@ expand_debug_expr (tree exp)
if (MEM_P (op0))
  {
if (mode1 == VOIDmode)
- /* Bitfield.  */
- mode1 = smallest_int_mode_for_size (bitsize);
+ {
+   if (maybe_gt (bitsize, MAX_BITSIZE_MODE_ANY_INT))
+ return NULL;
+   /* Bitfield.  */
+   mode1 = smallest_int_mode_for_size (bitsize);
+ }
poly_int64 bytepos = bits_to_bytes_round_down (bitpos);
if (maybe_ne (bytepos, 0))
  {

Jakub


[C++ PATCH] -Wclass-memaccess fixes and improvements (PR c++/81327)

2018-01-05 Thread Jakub Jelinek
Hi!

Apparently LLVM allows similar warning to -Wclass-memaccess (is it just
similar or same?; if the latter, perhaps we should use the same option for
that) to be disabled not just by casting the destination pointer to
e.g. (char *) or some other pointer to non-class, but also to (void *),
which is something that Qt uses or wants to use.
The problem is that the warning is diagnosed too late and whether we had
explicit or implicit conversion to void * is lost at that point.

This patch moves the warning tiny bit earlier (from build_cxx_call to the
caller) where we still have information about the original parsed arguments
before conversions to the builtin argument types.

In addition it fixes various small issues I run into when reading the code,
e.g. STRIP_NOPS was changing the argument in the argument array passed to
it, or e.g. the if (!warned && !warnfmt) return; was redundant with the
following code doing just
  if (warnfmt)
something;
  if (warned)
somethingelse;
  return;
or not verifying the INTEGER_CSTs fit into uhwi before calling tree_to_uhwi
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-05  Jakub Jelinek  

PR c++/81327
* call.c (maybe_warn_class_memaccess): Add forward declaration.
Change last argument from tree * to const vec *, adjust
args uses and check number of operands too.  Don't strip away any
nops.  Use maybe_constant_value when looking for INTEGER_CST args.
Deal with src argument not having pointer type.  Check
tree_fits_uhwi_p before calling tree_to_uhwi.  Remove useless
test.
(build_over_call): Call maybe_warn_class_memaccess here on the
original arguments.
(build_cxx_call): Rather than here on converted arguments.

* g++.dg/Wclass-memaccess-2.C: Don't expect a warning when explicitly
cast to void *.

--- gcc/cp/call.c.jj2018-01-04 00:43:16.109702768 +0100
+++ gcc/cp/call.c   2018-01-05 15:03:40.255312975 +0100
@@ -147,6 +147,8 @@ static int equal_functions (tree, tree);
 static int joust (struct z_candidate *, struct z_candidate *, bool,
  tsubst_flags_t);
 static int compare_ics (conversion *, conversion *);
+static void maybe_warn_class_memaccess (location_t, tree,
+   const vec *);
 static tree build_over_call (struct z_candidate *, int, tsubst_flags_t);
 #define convert_like(CONV, EXPR, COMPLAIN) \
   convert_like_real ((CONV), (EXPR), NULL_TREE, 0, \
@@ -8180,6 +8182,12 @@ build_over_call (struct z_candidate *can
   && !mark_used (fn, complain))
 return error_mark_node;
 
+  /* Warn if the built-in writes to an object of a non-trivial type.  */
+  if (warn_class_memaccess
+  && vec_safe_length (args) >= 2
+  && DECL_BUILT_IN_CLASS (fn) == BUILT_IN_NORMAL)
+maybe_warn_class_memaccess (input_location, fn, args);
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0
   /* Don't mess with virtual lookup in instantiate_non_dependent_expr;
 virtual functions can't be constexpr.  */
@@ -8360,7 +8368,8 @@ has_trivial_copy_p (tree type, bool acce
assignments.  */
 
 static void
-maybe_warn_class_memaccess (location_t loc, tree fndecl, tree *args)
+maybe_warn_class_memaccess (location_t loc, tree fndecl,
+   const vec *args)
 {
   /* Except for bcopy where it's second, the destination pointer is
  the first argument for all functions handled here.  Compute
@@ -8368,15 +8377,10 @@ maybe_warn_class_memaccess (location_t l
   unsigned dstidx = DECL_FUNCTION_CODE (fndecl) == BUILT_IN_BCOPY;
   unsigned srcidx = !dstidx;
 
-  tree dest = args[dstidx];
-  if (!dest || !TREE_TYPE (dest) || !POINTER_TYPE_P (TREE_TYPE (dest)))
+  tree dest = (*args)[dstidx];
+  if (!TREE_TYPE (dest) || !POINTER_TYPE_P (TREE_TYPE (dest)))
 return;
 
-  /* Remove the outermost (usually implicit) conversion to the void*
- argument type.  */
-  if (TREE_CODE (dest) == NOP_EXPR)
-dest = TREE_OPERAND (dest, 0);
-
   tree srctype = NULL_TREE;
 
   /* Determine the type of the pointed-to object and whether it's
@@ -8436,7 +8440,7 @@ maybe_warn_class_memaccess (location_t l
   switch (DECL_FUNCTION_CODE (fndecl))
 {
 case BUILT_IN_MEMSET:
-  if (!integer_zerop (args[1]))
+  if (!integer_zerop (maybe_constant_value ((*args)[1])))
{
  /* Diagnose setting non-copy-assignable or non-trivial types,
 or types with a private member, to (potentially) non-zero
@@ -8497,8 +8501,11 @@ maybe_warn_class_memaccess (location_t l
 case BUILT_IN_MEMMOVE:
 case BUILT_IN_MEMPCPY:
   /* Determine the type of the source object.  */
-  srctype = STRIP_NOPS (args[srcidx]);
-  srctype = TREE_TYPE (TREE_TYPE (srctype));
+  srctype = TREE_TYPE ((*args)[srcidx]);
+  if (!srctype || !POINTER_TYPE_P (srctype))
+   srctype = void_type_node;
+  else
+ 

[PATCH] Fix *vec_duplicate_p (PR rtl-optimization/83682)

2018-01-05 Thread Jakub Jelinek
Hi!

All the vec_duplicate_p uses in simplify-rtx.c assume that it returns vector
element if it returns true, and while that is the case for CONST_VECTOR
which can't contain anything but scalars, VEC_DUPLICATE is documented:
"This operation converts a scalar into a vector or a small vector into a
larger one by duplicating the input values.  The output vector mode must have
the same submodes as the input vector mode or the scalar modes, and the
number of output parts must be an integer multiple of the number of input
parts." and e.g. on x86 that is heavily used in the backend.
So, simplify-rtx.c checks something and expects to be given a scalar
element, but it can return something different with a different (vector)
mode instead.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-01-05  Jakub Jelinek  

PR rtl-optimization/83682
* rtl.h (const_vec_duplicate_p): Only return true for VEC_DUPLICATE
if it has non-VECTOR_MODE element mode.
(vec_duplicate_p): Likewise.

* gcc.target/i386/pr83682.c: New test.

--- gcc/rtl.h.jj2018-01-04 00:43:14.704702340 +0100
+++ gcc/rtl.h   2018-01-05 16:03:52.420823342 +0100
@@ -2969,7 +2969,9 @@ const_vec_duplicate_p (T x, T *elt)
   *elt = CONST_VECTOR_ENCODED_ELT (x, 0);
   return true;
 }
-  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+  if (GET_CODE (x) == CONST
+  && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE
+  && !VECTOR_MODE_P (GET_MODE (XEXP (XEXP (x, 0), 0
 {
   *elt = XEXP (XEXP (x, 0), 0);
   return true;
@@ -2984,7 +2986,8 @@ template 
 inline bool
 vec_duplicate_p (T x, T *elt)
 {
-  if (GET_CODE (x) == VEC_DUPLICATE)
+  if (GET_CODE (x) == VEC_DUPLICATE
+  && !VECTOR_MODE_P (GET_MODE (XEXP (x, 0
 {
   *elt = XEXP (x, 0);
   return true;
--- gcc/testsuite/gcc.target/i386/pr83682.c.jj  2018-01-05 16:06:57.299885887 
+0100
+++ gcc/testsuite/gcc.target/i386/pr83682.c 2018-01-05 16:06:26.630875486 
+0100
@@ -0,0 +1,17 @@
+/* PR rtl-optimization/83682 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef float V __attribute__((__vector_size__(16)));
+typedef double W __attribute__((__vector_size__(16)));
+V b;
+W c;
+
+void
+foo (void *p)
+{
+  V e = __builtin_ia32_cvtsd2ss (b, c);
+  V g = e;
+  float f = g[0];
+  __builtin_memcpy (p, &f, sizeof (f));
+}

Jakub


[PATCH] -gstatement-frontiers vs. -fselective-scheduling* (PR debug/83480)

2018-01-05 Thread Jakub Jelinek
Hi!

As the testcase shows, the sel-sched.c framework can't deal with debugging
insns without causing -fcompare-debug failures, and I presume also with
debug markers, with which it causes even ICEs right now.

The following patch just follows what we do for -fvar-tracking-assignments,
disable it by default if -fselective-scheduling*.  It isn't a complete fix,
the testcase will still ICE with explicit -fselective-scheduling{,2}
-gstatement-frontiers, but at least it won't ICE that often and even if the
ICE is fixed, it is quite unlikely that debug marker handling would
work with -fcompare-debug when other debug insns don't work with that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-05  Jakub Jelinek  

PR debug/83480
* toplev.c (process_options): Don't enable debug_nonbind_markers_p
by default if flag_selective_schedling{,2}.  Formatting fixes.

* gcc.dg/pr83480.c: New test.

--- gcc/toplev.c.jj 2018-01-03 10:19:54.453533845 +0100
+++ gcc/toplev.c2018-01-05 16:20:56.130195110 +0100
@@ -1535,8 +1535,9 @@ process_options (void)
 flag_var_tracking_uninit = flag_var_tracking;
 
   if (flag_var_tracking_assignments == AUTODETECT_VALUE)
-flag_var_tracking_assignments = flag_var_tracking
-  && !(flag_selective_scheduling || flag_selective_scheduling2);
+flag_var_tracking_assignments
+  = (flag_var_tracking
+&& !(flag_selective_scheduling || flag_selective_scheduling2));
 
   if (flag_var_tracking_assignments_toggle)
 flag_var_tracking_assignments = !flag_var_tracking_assignments;
@@ -1550,8 +1551,12 @@ process_options (void)
"var-tracking-assignments changes selective scheduling");
 
   if (debug_nonbind_markers_p == AUTODETECT_VALUE)
-debug_nonbind_markers_p = optimize && debug_info_level >= 
DINFO_LEVEL_NORMAL
-  && (write_symbols == DWARF2_DEBUG || write_symbols == 
VMS_AND_DWARF2_DEBUG);
+debug_nonbind_markers_p
+  = (optimize
+&& debug_info_level >= DINFO_LEVEL_NORMAL
+&& (write_symbols == DWARF2_DEBUG
+|| write_symbols == VMS_AND_DWARF2_DEBUG)
+&& !(flag_selective_scheduling || flag_selective_scheduling2));
 
   if (flag_tree_cselim == AUTODETECT_VALUE)
 {
--- gcc/testsuite/gcc.dg/pr83480.c.jj   2018-01-05 16:24:14.444290629 +0100
+++ gcc/testsuite/gcc.dg/pr83480.c  2018-01-05 16:23:40.631274338 +0100
@@ -0,0 +1,32 @@
+/* PR debug/83480 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g -fselective-scheduling2 -ftree-vectorize 
-freorder-blocks-algorithm=simple -fnon-call-exceptions 
-fno-guess-branch-probability -fno-peephole2 -fno-tree-sink 
-fno-tree-scev-cprop" } */
+
+signed char a, b;
+
+void
+foo (int x, int y)
+{
+  for (a = 1; a != 0; ++a)
+;
+
+  for (;;)
+{
+  int c;
+
+  b %= (y != 0 && a != 0) + 1;
+  if (a != 0)
+   y = b;
+
+  for (c = 0; c < 50; ++c)
+   ++x;
+
+  if (a < 1)
+   {
+ while (x != 0)
+   ;
+
+ a /= 0;   /* { dg-warning "division by zero" } */
+   }
+}
+}

Jakub


[C++ PATCH] Fix some -Wused-but-set-variable regressions (PR c++/82728, PR c++/82799, PR c++/83690)

2018-01-05 Thread Jakub Jelinek
Hi!

Jason's recent change removed a mark_rvalue_use call from constant_value_1,
which unfortunately regressed quite a few cases where
-Wunused-but-set-variable now has false positives.

The easiest fix seems to be just deal with the -Wunused-but-set-variable
issue at that point.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-05  Jakub Jelinek  

PR c++/82728
PR c++/82799
PR c++/83690
* init.c (constant_value_1): Call mark_exp_read on decl.

* g++.dg/warn/Wunused-var-27.C: New test.
* g++.dg/warn/Wunused-var-28.C: New test.
* g++.dg/warn/Wunused-var-29.C: New test.

--- gcc/cp/init.c.jj2018-01-03 10:20:17.296537499 +0100
+++ gcc/cp/init.c   2018-01-05 16:48:27.914911810 +0100
@@ -2211,6 +2211,7 @@ constant_value_1 (tree decl, bool strict
 initializer for the static data member is not processed
 until needed; we need it now.  */
   mark_used (decl, tf_none);
+  mark_exp_read (decl);
   init = DECL_INITIAL (decl);
   if (init == error_mark_node)
{
--- gcc/testsuite/g++.dg/warn/Wunused-var-27.C.jj   2018-01-05 
16:44:15.647814313 +0100
+++ gcc/testsuite/g++.dg/warn/Wunused-var-27.C  2018-01-05 16:44:05.774810218 
+0100
@@ -0,0 +1,14 @@
+// PR c++/82728
+// { dg-do compile }
+// { dg-options "-Wunused-but-set-variable" }
+
+void
+foo ()
+{
+  const int i = 1; // { dg-bogus "set but not used" }
+  switch (0)
+{
+case i:
+  break;
+}
+}
--- gcc/testsuite/g++.dg/warn/Wunused-var-28.C.jj   2018-01-05 
16:44:27.158819090 +0100
+++ gcc/testsuite/g++.dg/warn/Wunused-var-28.C  2018-01-05 16:42:18.766765787 
+0100
@@ -0,0 +1,15 @@
+// PR c++/82799
+// { dg-do compile }
+// { dg-options "-Wunused-but-set-variable" }
+
+enum E { b }; 
+struct C {
+  template 
+  int foo ()
+  {
+const bool i = 0;  // { dg-bogus "set but not used" }
+const int r = i ? 7 : 9;
+return r;
+  }
+  void bar () { foo  (); }
+};
--- gcc/testsuite/g++.dg/warn/Wunused-var-29.C.jj   2018-01-05 
16:45:59.576857460 +0100
+++ gcc/testsuite/g++.dg/warn/Wunused-var-29.C  2018-01-05 16:45:42.956850557 
+0100
@@ -0,0 +1,10 @@
+// PR c++/83690
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-but-set-variable" }
+
+void
+foo ()
+{
+  constexpr bool foo = true;   // { dg-bogus "set but not used" }
+  static_assert (foo, "foo");
+}

Jakub


[Patch][aarch64][PR target/83335] Fix regression, ICE on gcc.target/aarch64/asm-2.c

2018-01-05 Thread Steve Ellcey
This is a fix for PR target/83335.  We are asserting in
aarch64_print_address_internal because we have a non Pmode
address coming from an asm instruction.  My fix is to 
just allow this by checking this_is_asm_operands.  This is
what it was doing before the assert was added that caused
the ICE.

Verified that it fixed gcc.target/aarch64/asm-2.c in ILP32
mode and that it caused no regressions.

Steve Ellcey
sell...@cavium.com


2018-01-05  Steve Ellcey  

PR target/83335
* config/aarch64/aarch64.c (aarch64_print_address_internal):
Allow non Pmode address in asm statements.


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a189605..af74212 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5670,8 +5670,9 @@ aarch64_print_address_internal (FILE *f, machine_mode 
mode, rtx x,
 {
   struct aarch64_address_info addr;
 
-  /* Check all addresses are Pmode - including ILP32.  */
-  gcc_assert (GET_MODE (x) == Pmode);
+  /* Check all addresses are Pmode - including ILP32,
+ unless this is coming from an asm statement.  */
+  gcc_assert (GET_MODE (x) == Pmode || this_is_asm_operands);
 
   if (aarch64_classify_address (&addr, x, mode, true, type))
 switch (addr.type)


Re: [v2 of PATCH 03/14] C++: add location_t wrapper nodes during parsing (minimal impl)

2018-01-05 Thread David Malcolm
On Fri, 2018-01-05 at 15:29 -0500, Jason Merrill wrote:
> On 12/29/2017 12:06 PM, David Malcolm wrote:
> > One issue I ran into was that fold_for_warn doesn't eliminate
> > location wrappers when processing_template_decl, leading to
> > failures of the template-based cases in
> > g++.dg/warn/Wmemset-transposed-args-1.C.
> > 
> > This is due to the early bailout when processing_template_decl
> > within cp_fold:
> > 
> > 2078  if (processing_template_decl
> > 2079  || (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE
> > (x) == error_mark_node)))
> > 2080return x;
> > 
> > which dates back to the merger of the C++ delayed folding branch.
> > 
> > I've fixed that in this version of the patch by removing that
> > "processing_template_decl ||" condition from that cp_fold early
> > bailout.
> 
> Hmm, that makes me nervous.  We might want to fold in templates when 
> called from fold_for_warn, but not for other occurrences.  But I see 
> that we check processing_template_decl in cp_fully_fold and in the
> call 
> to cp_fold_function, so I guess this is fine.

(I wondered if it would be possible to add a new flag to the various
fold* calls to request folding in templates, but given that the API is
partially shared with C doing so doesn't seem to make sense)

> > +case VIEW_CONVERT_EXPR:
> > +case NON_LVALUE_EXPR:
> >  case CAST_EXPR:
> >  case REINTERPRET_CAST_EXPR:
> >  case CONST_CAST_EXPR:
> > @@ -14937,6 +14940,15 @@ tsubst_copy (tree t, tree args,
> > tsubst_flags_t complain, tree in_decl)
> >  case CONVERT_EXPR:
> >  case NOP_EXPR:
> >{
> > +   if (location_wrapper_p (t))
> > + {
> > +   /* Handle location wrappers by substituting the
> > wrapped node
> > +  first, *then* reusing the resulting type.  Doing
> > the type
> > +  first ensures that we handle template parameters
> > and
> > +  parameter pack expansions.  */
> > +   tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args,
> > complain, in_decl);
> > +   return build1 (code, TREE_TYPE (op0), op0);
> > + }
> 
> I'd rather handle location wrappers separately, and abort if 
> VIEW_CONVERT_EXPR or NON_LVALUE_EXPR appear other than as wrappers.

OK.  I'm testing an updated version which does so.

> > @@ -24998,6 +25018,8 @@ build_non_dependent_expr (tree expr)
> >&& !expanding_concept ())
> >  fold_non_dependent_expr (expr);
> >  
> > +  STRIP_ANY_LOCATION_WRAPPER (expr);
> 
> Why is this needed?

I added this to fix a failure:

template argument deduction/substitution failed:
note:   cannot convert ‘0’ (type ‘int’) to type ‘char**’

seen building the precompiled header, where build_non_dependent_expr
adds a NON_DEPENDENT_EXPR around a location wrapper around an
INTEGER_CST, whereas without location wrappers we'd just have an
INTEGER_CST.

build_non_dependent_expr performs various tests for exact TREE_CODEs,
special-casing constants; without the stripping these TREE_CODE tests
don't match when the constants have location wrappers, leading to the
failure.

Here are the gory details:

In file included from ./src/libstdc++-
v3/include/precompiled/extc++.h:82:
./build/x86_64-pc-linux-gnu/libstdc++-
v3/include/ext/codecvt_specializations.h: In member function ‘virtual
std::codecvt_base::result std::codecvt<_InternT, _ExternT,
__gnu_cxx::__cxx11::encoding_state>::do_unshift(std::codecvt<_InternT,
_ExternT, __gnu_cxx::__cxx11::encoding_state>::state_type&,
std::codecvt<_InternT, _ExternT,
__gnu_cxx::__cxx11::encoding_state>::extern_type*,
std::codecvt<_InternT, _ExternT,
__gnu_cxx::__cxx11::encoding_state>::extern_type*,
std::codecvt<_InternT, _ExternT,
__gnu_cxx::__cxx11::encoding_state>::extern_type*&) const’:
./build/x86_64-pc-linux-gnu/libstdc++-
v3/include/ext/codecvt_specializations.h:392:58: error: no matching
function for call to ‘__iconv_adaptor(size_t (&)(iconv_t, char**,
size_t*, char**, size_t*), void* const&, int, int, char**,
std::size_t*)’
   &__cto, &__tlen);
  ^
./build/x86_64-pc-linux-gnu/libstdc++-
v3/include/ext/codecvt_specializations.h:301:5: note: candidate:
‘template std::size_t std::__iconv_adaptor(std::size_t
(*)(iconv_t, _Tp, std::size_t*, char**, std::size_t*), iconv_t, char**,
std::size_t*, char**, std::size_t*)’
 __iconv_adaptor(size_t(*__func)(iconv_t, _Tp, size_t*, char**,
size_t*),
 ^~~
./build/x86_64-pc-linux-gnu/libstdc++-
v3/include/ext/codecvt_specializations.h:301:5: note:   template
argument deduction/substitution failed:
./build/x86_64-pc-linux-gnu/libstdc++-
v3/include/ext/codecvt_specializations.h:392:58: note:   cannot convert
‘0’ (type ‘int’) to type ‘char**’
   &__cto, &__tlen);
  ^



(gdb) call debug_tree (arg)
 
unit-size 
align:32 warn_if_not_align:0 symtab:-291372496 alias-set 

Re: Fix Bug 83566 - cyl_bessel_j returns wrong result for x>1000 for high orders

2018-01-05 Thread Michele Pezzutti

On 01/05/2018 05:50 PM, Jonathan Wakely wrote:

On 05/01/18 09:54 -0500, Ed Smith-Rowland wrote:
This looks good to me.  The only nit is that you have to uglify the k 
loop variable.


Right, it needs to be __k.

I'm also not sure we can put the copyright notice there, rather than
at the top of the file. We can collapse the years to 1996-2003 so I
suggest adding to the top of the file something like:

/* __cyl_bessel_jn_asymp adapted from GNU GSL version 2.4 
specfunc/bessel_j.c

* Copyright (C) 1996-2003 Gerard Jungman
*/

diff --git a/libstdc++-v3/include/tr1/bessel_function.tcc 
b/libstdc++-v3/include/tr1/bessel_function.tcc

index 7ac733d..6f5ff34 100644
--- a/libstdc++-v3/include/tr1/bessel_function.tcc
+++ b/libstdc++-v3/include/tr1/bessel_function.tcc
@@ -27,6 +27,10 @@
  *  Do not attempt to use it directly. @headername{tr1/cmath}
  */

+/* __cyl_bessel_jn_asymp adapted from GNU GSL version 2.4 
specfunc/bessel_j.c

+ * Copyright (C) 1996-2003 Gerard Jungman
+ */
+
 //
 // ISO C++ 14882 TR1: 5.2  Special functions
 //
@@ -358,16 +362,40 @@ namespace tr1
 void
 __cyl_bessel_jn_asymp(_Tp __nu, _Tp __x, _Tp & __Jnu, _Tp & __Nnu)
 {
-  const _Tp __mu   = _Tp(4) * __nu * __nu;
-  const _Tp __mum1 = __mu - _Tp(1);
-  const _Tp __mum9 = __mu - _Tp(9);
-  const _Tp __mum25 = __mu - _Tp(25);
-  const _Tp __mum49 = __mu - _Tp(49);
-  const _Tp __xx = _Tp(64) * __x * __x;
-  const _Tp __P = _Tp(1) - __mum1 * __mum9 / (_Tp(2) * __xx)
-    * (_Tp(1) - __mum25 * __mum49 / (_Tp(12) * __xx));
-  const _Tp __Q = __mum1 / (_Tp(8) * __x)
-    * (_Tp(1) - __mum9 * __mum25 / (_Tp(6) * __xx));
+  const _Tp __mu = _Tp(4) * __nu * __nu;
+  const _Tp __8x = _Tp(8) * __x;
+
+  _Tp __P = _Tp(0);
+  _Tp __Q = _Tp(0);
+
+  _Tp __k = _Tp(0);
+  _Tp __term = _Tp(1);
+
+  int __epsP = 0;
+  int __epsQ = 0;
+
+  _Tp __eps = std::numeric_limits<_Tp>::epsilon();
+
+  do
+    {
+  __term *= (__k == 0) ? _Tp(1) : -(__mu - (2 * __k - 1) * (2 * 
__k - 1))

+    / (__k * __8x);
+  __epsP = std::abs(__term) < std::abs(__eps * __P);
+  __P += __term;
+
+  __k++;
+
+  __term *= (__mu - (2 * __k - 1) * (2 * __k - 1)) / (__k * __8x);
+  __epsQ = std::abs(__term) < std::abs(__eps * __Q);
+  __Q += __term;
+
+  if (__epsP && __epsQ && __k > __nu / 2.)
+    break;
+
+  __k++;
+    }
+  while (__k < 1000);
+

   const _Tp __chi = __x - (__nu + _Tp(0.5L))
 * __numeric_constants<_Tp>::__pi_2();

(I hope we can un-uglify our variables when modules come!)

Do you have copyright assignment in place and all that?


That appears to be underway.

No news since I sent questionnaire though.



Thanks for working on this.


Welcome.



Re: [PATCH] lto, testsuite: Fix ICE in -Wodr (PR lto/83121)

2018-01-05 Thread David Malcolm
On Fri, 2018-01-05 at 10:36 +0100, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 10:52 PM, David Malcolm 
> wrote:
> > PR lto/83121 reports an ICE deep inside the linemap code when -Wodr
> > reports on a type mismatch.
> > 
> > The root cause is that the warning can access the
> > DECL_SOURCE_LOCATION
> > of a streamed-in decl before the lto_location_cache has been
> > applied.
> > 
> > lto_location_cache::input_location stores RESERVED_LOCATION_COUNT
> > (==2)
> > as a poison value until the cache is applied:
> > 250   /* Keep value RESERVED_LOCATION_COUNT in *loc as linemap
> > lookups will
> > 251  ICE on it.  */
> > 
> > The fix is relatively simple: apply the cache before reading the
> > DECL_SOURCE_LOCATION.
> > 
> > (I wonder if we should instead have a INVALID_LOCATION value to
> > handle
> > this case more explicitly?  e.g. 0x?  or reserve 2 in
> > libcpp for
> > that purpose, and have the non-reserved locations start at
> > 3?  Either
> > would be more invasive, though)
> > 
> > Triggering the ICE was fiddly: it seems to be affected by many
> > things,
> > including the order of files, and (I think) by filenames.  My
> > theory is
> > that it's affected by the ordering of the tree nodes in the LTO
> > stream:
> > for the ICE to occur, the types in question need to be compared
> > before
> > some other operation flushes the lto_location_cache.  This ordering
> > is affected by the hash-based ordering in DFS in lto-streamer-
> > out.c, which
> > might explain why r255066 seemed to trigger the bug; the only
> > relevant
> > change to LTO there seemed to be:
> >   * lto-streamer-out.c (hash_tree): Hash TYPE_EMPTY_P and
> > DECL_PADDING_P.
> > If so, then the bug was presumably already present, but hidden.
> > 
> > The patch also adds regression test coverage for the ICE, which is
> > more
> > involved - as far as I can tell, we don't have an existing way to
> > verify
> > diagnostics emitted during link-time optimization.
> > 
> > Hence the patch adds some machinery to lib/lto.exp to support two
> > new
> > directives: dg-lto-warning and dg-lto-message, corresponding to
> > dg-warning and dg-message respectively, where the diagnostics are
> > expected to be emitted at link-time.
> > 
> > The test case includes examples of LTO warnings and notes in both
> > the
> > primary and secondary source files
> > 
> > Doing so required reusing the logic from DejaGnu for handling
> > diagnostics.
> > Unfortunately the pertinent code is a 50 line loop within a ~200
> > line Tcl
> > function in dg.exp (dg-test), so I had to copy it from DejaGnu,
> > making
> > various changes as necessary (see lto_handle_diagnostics_for_file
> > in the
> > patch; for example the LTO version supports multiple source files,
> > identifying which source file emitted a diagnostic).
> > 
> > For non-LTO diagnostics we currently ignore surplus "note"
> > diagnostics.
> > This patch updates lto_prune_warns to follow this behavior (since
> > otherwise we'd need numerous dg-lto-message directives for the
> > motivating
> > test case).
> > 
> > The patch adds these PASS results to g++.sum:
> > 
> > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o assemble, -O0 -flto
> > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_1.o assemble, -O0 -flto
> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C line
> > 6)
> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C line
> > 8)
> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C line
> > 2)
> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C line
> > 3)
> > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o-cp_lto_pr83121_1.o
> > link, -O0 -flto
> > 
> > The output for dg-lto-message above refers to "warnings", rather
> > than
> > "messages" but that's the same as for the non-LTO case, where dg-
> > message
> > also refers to "warnings".
> > 
> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> 
> Hmm, but we do this in warn_odr already?  How's that not enough?
> 
> At least it seems the place you add this isn't ideal (not at the
> "root cause").
> 
> Richard.

[CCing Honza]

Yes, warn_odr does apply the cache, but this warning is coming from
warn_types_mismatch.

Of the various calls to warn_types_mismatch, most are immediately after
a warn_odr guarded by if (warn && warned) or if (warned), and if
warn_odr writes back true to *warned, it has definitely applied the
cache.

However, the 7-argument overload of odr_types_equivalent_p can
also call warn_types_mismatch, passing in the location_t values it
received.

The problem occurs with this call stack, when:

  add_type_duplicate, passes DECL_SOURCE_LOCATIONs to:
odr_types_equivalent_p (7-argument overload) - which calls:
  warn_types_mismatch, which calls:
warning_at with the given location_t

where the DECL_SOURCE_LOCATION might not yet have been fixed up yet. 
My patch adds the fixup there.

This behavior seems to have been introduced by r224248

Re: Tighten LRA cycling check

2018-01-05 Thread Jeff Law
On 01/04/2018 02:28 PM, Richard Sandiford wrote:
> LRA has code to try to prevent cycling, by avoiding reloads that
> look too similar to the instruction being reloaded.  E.g. if we
> have a R<-C move for some constant C, reloading the source with
> another R<-C move is unlikely to be a good idea.
> 
> However, this safeguard unnecessarily triggered in tests like
> the one in the patch.  We started with instructions like:
> 
> (insn 12 9 13 5 (set (reg:DI 0 x0)
> (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}
>  (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]   0x7f3c03c1f510 x00>)
> (nil)))
> 
> where r459 didn't get allocated a register and is equivalent to
> constant x00.  LRA would then handle it like this:
> 
> Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'
> 1 Non-pseudo reload: reject+=2
> 1 Non input pseudo reload: reject++
>   alt=0,overall=609,losers=1,rld_nregs=1
> [...]
>   alt=13,overall=9,losers=1,rld_nregs=1
> [...]
>  Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}
> 
> In other words, to avoid loading the constant x00 into another GPR,
> LRA decided instead to move it into a floating-point register,
> then move that floating-point register into x0:
> 
>   Creating newreg=630, assigning class FP_REGS to r630
>   Set class ALL_REGS for r631
>12: x0:DI=r630:DI
>   REG_EQUAL `x00'
> Inserting insn reload before:
>   815: r631:DI=high(`x00')
>   816: r630:DI=r631:DI+low(`x00')
>   REG_EQUAL `x00'
> 
> That's inefficient and doesn't really help to resolve a cycling
> problem, since the r630 destination of 816 needs to be reloaded into
> a GPR anyway.
> 
> The cycling check already had an exception for source values that are
> the result of an elimination.  This patch extends it to include the
> result of equivalence substitution.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the before and after assembly output for at
> least one target per CPU directory.  The targets most affected were:
> 
>   aarch64_be-linux-gn
>   aarch64-linux-gnu
>   powerpc-eabispe
>   riscv32-elf
>   riscv64-elf
>   sparc64-linux-gnu
>   sparc-linux-gnu
>   spu-elf
> 
> for which it improved code size overall.  There were minor register
> renaming differences on some other targets.  x86 and powerpc*-linux-gnu
> targets were unaffected.
> 
> OK to install?
> 
> Richard
> 
> 
> 2018-01-04  Richard Sandiford  
> 
> gcc/
>   * lra-constraints.c (process_alt_operands): Test for the equivalence
>   substitutions when detecting a possible reload cycle.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/reg-alloc-1.c: New test.
OK.  Presumably related to 83699?  Do you want to reference it in the
ChangeLog?

jeff


Re: [C++ PATCH] -Wclass-memaccess fixes and improvements (PR c++/81327)

2018-01-05 Thread Martin Sebor

On 01/05/2018 03:02 PM, Jakub Jelinek wrote:

Hi!

Apparently LLVM allows similar warning to -Wclass-memaccess (is it just
similar or same?; if the latter, perhaps we should use the same option for
that) to be disabled not just by casting the destination pointer to
e.g. (char *) or some other pointer to non-class, but also to (void *),
which is something that Qt uses or wants to use.


Clang has an option/warning called -Wdynamic-class-memaccess.
It detects a limited subset of the problems -Wclass-memaccess
detects: AFAIK, just calling raw memory functions like memcpy
on objects of polymorphic types, which the standard says is
undefined.

I didn't know about the Clang option when I wrote the GCC code.
Had I found out about it sooner I would have probably considered
splitting up -Wclass-memaccess and providing the same option as
Clang for compatibility, and another for the rest of the checks.
But there might still be time to split it up before the release
if that's important.

Some Qt code apparently uses memcpy to copy polymorphic objects
but gets around the Clang warning by casting the pointers to
void*, which doesn't suppress the GCC warning.  I had considered
allowing the void* cast as a possible suppression mechanism but
ultimately decided against it.  IIRC, partly because it would
have complicated the implementation and partly because I wasn't
convinced that it was a good idea (and I also didn't know about
the Clang escape hatch).


The problem is that the warning is diagnosed too late and whether we had
explicit or implicit conversion to void * is lost at that point.

This patch moves the warning tiny bit earlier (from build_cxx_call to the
caller) where we still have information about the original parsed arguments
before conversions to the builtin argument types.


I'm still not convinced that letting programs get away with
byte-wise copying polymorphic objects is a good thing, but I
agree that supporting the void* cast suppression will be useful
for portability with Clang (presumably both C-style cast and
C++-style static_cast work).

If we want to make it a feature then we should document it in
the manual.  Otherwise, if it's supposed to be a back door for
poorly written code, then I think it would be helpful to mention
it in comments in the implementation and above the assertion in
the test. I don't like undocumented back doors so I'm leaning
toward documenting it in the manual.  I'm happy to add some
text to the manual if/when this is approved and decided.

Martin


In addition it fixes various small issues I run into when reading the code,
e.g. STRIP_NOPS was changing the argument in the argument array passed to
it, or e.g. the if (!warned && !warnfmt) return; was redundant with the
following code doing just
  if (warnfmt)
something;
  if (warned)
somethingelse;
  return;
or not verifying the INTEGER_CSTs fit into uhwi before calling tree_to_uhwi
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-05  Jakub Jelinek  

PR c++/81327
* call.c (maybe_warn_class_memaccess): Add forward declaration.
Change last argument from tree * to const vec *, adjust
args uses and check number of operands too.  Don't strip away any
nops.  Use maybe_constant_value when looking for INTEGER_CST args.
Deal with src argument not having pointer type.  Check
tree_fits_uhwi_p before calling tree_to_uhwi.  Remove useless
test.
(build_over_call): Call maybe_warn_class_memaccess here on the
original arguments.
(build_cxx_call): Rather than here on converted arguments.

* g++.dg/Wclass-memaccess-2.C: Don't expect a warning when explicitly
cast to void *.

--- gcc/cp/call.c.jj2018-01-04 00:43:16.109702768 +0100
+++ gcc/cp/call.c   2018-01-05 15:03:40.255312975 +0100
@@ -147,6 +147,8 @@ static int equal_functions (tree, tree);
 static int joust (struct z_candidate *, struct z_candidate *, bool,
  tsubst_flags_t);
 static int compare_ics (conversion *, conversion *);
+static void maybe_warn_class_memaccess (location_t, tree,
+   const vec *);
 static tree build_over_call (struct z_candidate *, int, tsubst_flags_t);
 #define convert_like(CONV, EXPR, COMPLAIN) \
   convert_like_real ((CONV), (EXPR), NULL_TREE, 0, \
@@ -8180,6 +8182,12 @@ build_over_call (struct z_candidate *can
   && !mark_used (fn, complain))
 return error_mark_node;

+  /* Warn if the built-in writes to an object of a non-trivial type.  */
+  if (warn_class_memaccess
+  && vec_safe_length (args) >= 2
+  && DECL_BUILT_IN_CLASS (fn) == BUILT_IN_NORMAL)
+maybe_warn_class_memaccess (input_location, fn, args);
+
   if (DECL_VINDEX (fn) && (flags & LOOKUP_NONVIRTUAL) == 0
   /* Don't mess with virtual lookup in instantiate_non_dependent_expr;
 virtual functions can't be constexpr.  */
@@ -8360,7 +8368,8

Re: [PATCH] PR libstdc++/83626 Don't throw for remove("") and remove_all("")

2018-01-05 Thread Jonathan Wakely

On 05/01/18 18:02 +, Jonathan Wakely wrote:

On 05/01/18 10:37 +, Jonathan Wakely wrote:

On 04/01/18 21:02 -0500, Tim Song wrote:

What if the file to be removed is externally removed between the
symlink_status and the ::remove call? This is probably QoI because
filesystem race, but it seems reasonable to double check errno if
::remove fails and not fail if the failure is due to the file not
existing.


Yes, the race makes it undefined, but checking for ENOENT seems
reasonable. Thanks for the suggestion.


This makes remove and remove_all handle (some) filesystem races, by
ignoring ENOENT errors when removing entries or while iterating over
sub-directories.

It also avoids redundant stat() calls in remove_all, and fixes a bug
in the return value of the throwing version of remove_all.

Tested powerpc64le-linux, committed to trunk.

I've attached a multithreaded test I used to test the handling of
filesystem races, but I'm not committing that test.


In Bugzilla the reporter pointed out that the call to symlink_status
in filesystem::remove is now redundant, because handling the ENOENT
case after calling ::remove gives the same result (and is less
vulnerable to TOCTTOU races).

This removes the symlink_status call, and then makes remove_all call
filesystem::remove again instead of ::remove, because doing so doesn't
add an unnecessary symlink_status call now.

Tested powerpc64le-linux, committed to trunk.


commit 89cfa63335230a0ce82152171cdbd2bb5be64440
Author: Jonathan Wakely 
Date:   Fri Jan 5 23:28:06 2018 +

PR libstdc++/83626 simplify filesystem::remove and filesystem::remove_all

PR libstdc++/83626
* src/filesystem/ops.cc (remove(const path&, error_code&)): Remove
unnecessary symlink_status call.
(remove_all(const path&, error_code&)): Use filesystem::remove.
* src/filesystem/std-ops.cc: Likewise.

diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc
index 3690fb94d63..899defea6d2 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -1017,14 +1017,6 @@ fs::remove(const path& p)
 bool
 fs::remove(const path& p, error_code& ec) noexcept
 {
-  const auto s = symlink_status(p, ec);
-  if (!status_known(s))
-return false;
-  if (s.type() == file_type::not_found)
-{
-  ec.clear();
-  return false; // Nothing to do, not an error.
-}
   if (::remove(p.c_str()) == 0)
 {
   ec.clear();
@@ -1070,14 +1062,9 @@ fs::remove_all(const path& p, error_code& ec) noexcept
 	return -1;
 }
 
-  if (::remove(p.c_str()) == 0)
+  if (fs::remove(p, ec))
 ++count;
-  else if (errno != ENOENT)
-{
-  ec.assign(errno, std::generic_category());
-  return -1;
-}
-  return count;
+  return ec ? -1 : count;
 }
 
 void
diff --git a/libstdc++-v3/src/filesystem/std-ops.cc b/libstdc++-v3/src/filesystem/std-ops.cc
index bed1ad1fe27..65b06f5b67b 100644
--- a/libstdc++-v3/src/filesystem/std-ops.cc
+++ b/libstdc++-v3/src/filesystem/std-ops.cc
@@ -1254,7 +1254,7 @@ bool
 fs::remove(const path& p)
 {
   error_code ec;
-  bool result = fs::remove(p, ec);
+  const bool result = fs::remove(p, ec);
   if (ec)
 _GLIBCXX_THROW_OR_ABORT(filesystem_error("cannot remove", p, ec));
   return result;
@@ -1263,14 +1263,6 @@ fs::remove(const path& p)
 bool
 fs::remove(const path& p, error_code& ec) noexcept
 {
-  const auto s = symlink_status(p, ec);
-  if (!status_known(s))
-return false;
-  if (s.type() == file_type::not_found)
-{
-  ec.clear();
-  return false; // Nothing to do, not an error.
-}
   if (::remove(p.c_str()) == 0)
 {
   ec.clear();
@@ -1316,14 +1308,9 @@ fs::remove_all(const path& p, error_code& ec)
 	return -1;
 }
 
-  if (::remove(p.c_str()) == 0)
+  if (fs::remove(p, ec))
 ++count;
-  else if (errno != ENOENT)
-{
-  ec.assign(errno, std::generic_category());
-  return -1;
-}
-  return count;
+  return ec ? -1 : count;
 }
 
 void


Go patch committed: Correct type of long double builtin math functions

2018-01-05 Thread Ian Lance Taylor
This patch to the GCC portion of the Go frontend corrects the type
used for long double builtin math functions with one argument.  This
lets gccgo inline calls to sqrt, cos, etc., on 32-bit x86.  The math
package in libgo uses this feature.  Previously it was generating
function calls rather than using the direct instruction (fsqrt, etc.).
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

2018-01-05  Ian Lance Taylor  

* go-gcc.cc (Gcc_backend::Gcc_backend): Correct
math_function_type_long to take one argument.
Index: go-gcc.cc
===
--- go-gcc.cc   (revision 256262)
+++ go-gcc.cc   (working copy)
@@ -633,7 +633,7 @@ Gcc_backend::Gcc_backend()
 NULL_TREE);
   tree math_function_type_long =
 build_function_type_list(long_double_type_node, long_double_type_node,
-long_double_type_node, NULL_TREE);
+NULL_TREE);
   tree math_function_type_two = build_function_type_list(double_type_node,
 double_type_node,
 double_type_node,


libgo patch committed: Fix GOARCH_CACHELINESIZE on ia64

2018-01-05 Thread Ian Lance Taylor
This libgo patch by James Clarke fixes GOARCH_CACHELINESIZE on ia64.
Bootstrapped on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256262)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9b9bece388d1bacdc9d1d0024e722ffe449d221d
+1319f36ccc65cf802b8e17ddd3d2da3ca6d82f4c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 256262)
+++ libgo/configure.ac  (working copy)
@@ -264,7 +264,7 @@ GOARCH_HUGEPAGESIZE="1 << 21"
   ia64-*-*)
 GOARCH=ia64
 GOARCH_FAMILY=IA64
-GOARCH_CACHELINESIZE=16384
+GOARCH_CACHELINESIZE=128
 GOARCH_PHYSPAGESIZE=65536
 ;;
   m68k*-*-*)


Re: [PATCH] -gstatement-frontiers vs. -fselective-scheduling* (PR debug/83480)

2018-01-05 Thread Richard Biener
On January 5, 2018 11:10:26 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As the testcase shows, the sel-sched.c framework can't deal with
>debugging
>insns without causing -fcompare-debug failures, and I presume also with
>debug markers, with which it causes even ICEs right now.
>
>The following patch just follows what we do for
>-fvar-tracking-assignments,
>disable it by default if -fselective-scheduling*.  It isn't a complete
>fix,
>the testcase will still ICE with explicit -fselective-scheduling{,2}
>-gstatement-frontiers, but at least it won't ICE that often and even if
>the
>ICE is fixed, it is quite unlikely that debug marker handling would
>work with -fcompare-debug when other debug insns don't work with that.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard. 


>2018-01-05  Jakub Jelinek  
>
>   PR debug/83480
>   * toplev.c (process_options): Don't enable debug_nonbind_markers_p
>   by default if flag_selective_schedling{,2}.  Formatting fixes.
>
>   * gcc.dg/pr83480.c: New test.
>
>--- gcc/toplev.c.jj2018-01-03 10:19:54.453533845 +0100
>+++ gcc/toplev.c   2018-01-05 16:20:56.130195110 +0100
>@@ -1535,8 +1535,9 @@ process_options (void)
> flag_var_tracking_uninit = flag_var_tracking;
> 
>   if (flag_var_tracking_assignments == AUTODETECT_VALUE)
>-flag_var_tracking_assignments = flag_var_tracking
>-  && !(flag_selective_scheduling || flag_selective_scheduling2);
>+flag_var_tracking_assignments
>+  = (flag_var_tracking
>+   && !(flag_selective_scheduling || flag_selective_scheduling2));
> 
>   if (flag_var_tracking_assignments_toggle)
> flag_var_tracking_assignments = !flag_var_tracking_assignments;
>@@ -1550,8 +1551,12 @@ process_options (void)
>   "var-tracking-assignments changes selective scheduling");
> 
>   if (debug_nonbind_markers_p == AUTODETECT_VALUE)
>-debug_nonbind_markers_p = optimize && debug_info_level >=
>DINFO_LEVEL_NORMAL
>-  && (write_symbols == DWARF2_DEBUG || write_symbols ==
>VMS_AND_DWARF2_DEBUG);
>+debug_nonbind_markers_p
>+  = (optimize
>+   && debug_info_level >= DINFO_LEVEL_NORMAL
>+   && (write_symbols == DWARF2_DEBUG
>+   || write_symbols == VMS_AND_DWARF2_DEBUG)
>+   && !(flag_selective_scheduling || flag_selective_scheduling2));
> 
>   if (flag_tree_cselim == AUTODETECT_VALUE)
> {
>--- gcc/testsuite/gcc.dg/pr83480.c.jj  2018-01-05 16:24:14.444290629
>+0100
>+++ gcc/testsuite/gcc.dg/pr83480.c 2018-01-05 16:23:40.631274338 +0100
>@@ -0,0 +1,32 @@
>+/* PR debug/83480 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -g -fselective-scheduling2 -ftree-vectorize
>-freorder-blocks-algorithm=simple -fnon-call-exceptions
>-fno-guess-branch-probability -fno-peephole2 -fno-tree-sink
>-fno-tree-scev-cprop" } */
>+
>+signed char a, b;
>+
>+void
>+foo (int x, int y)
>+{
>+  for (a = 1; a != 0; ++a)
>+;
>+
>+  for (;;)
>+{
>+  int c;
>+
>+  b %= (y != 0 && a != 0) + 1;
>+  if (a != 0)
>+  y = b;
>+
>+  for (c = 0; c < 50; ++c)
>+  ++x;
>+
>+  if (a < 1)
>+  {
>+while (x != 0)
>+  ;
>+
>+a /= 0;   /* { dg-warning "division by zero" } */
>+  }
>+}
>+}
>
>   Jakub



Re: [PATCH] Fix *vec_duplicate_p (PR rtl-optimization/83682)

2018-01-05 Thread Richard Biener
On January 5, 2018 11:05:59 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>All the vec_duplicate_p uses in simplify-rtx.c assume that it returns
>vector
>element if it returns true, and while that is the case for CONST_VECTOR
>which can't contain anything but scalars, VEC_DUPLICATE is documented:
>"This operation converts a scalar into a vector or a small vector into
>a
>larger one by duplicating the input values.  The output vector mode
>must have
>the same submodes as the input vector mode or the scalar modes, and the
>number of output parts must be an integer multiple of the number of
>input
>parts." and e.g. on x86 that is heavily used in the backend.
>So, simplify-rtx.c checks something and expects to be given a scalar
>element, but it can return something different with a different
>(vector)
>mode instead.
>
>Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
>for
>trunk?

OK. 

Richard. 

>2018-01-05  Jakub Jelinek  
>
>   PR rtl-optimization/83682
>   * rtl.h (const_vec_duplicate_p): Only return true for VEC_DUPLICATE
>   if it has non-VECTOR_MODE element mode.
>   (vec_duplicate_p): Likewise.
>
>   * gcc.target/i386/pr83682.c: New test.
>
>--- gcc/rtl.h.jj   2018-01-04 00:43:14.704702340 +0100
>+++ gcc/rtl.h  2018-01-05 16:03:52.420823342 +0100
>@@ -2969,7 +2969,9 @@ const_vec_duplicate_p (T x, T *elt)
>   *elt = CONST_VECTOR_ENCODED_ELT (x, 0);
>   return true;
> }
>-  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) ==
>VEC_DUPLICATE)
>+  if (GET_CODE (x) == CONST
>+  && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE
>+  && !VECTOR_MODE_P (GET_MODE (XEXP (XEXP (x, 0), 0
> {
>   *elt = XEXP (XEXP (x, 0), 0);
>   return true;
>@@ -2984,7 +2986,8 @@ template 
> inline bool
> vec_duplicate_p (T x, T *elt)
> {
>-  if (GET_CODE (x) == VEC_DUPLICATE)
>+  if (GET_CODE (x) == VEC_DUPLICATE
>+  && !VECTOR_MODE_P (GET_MODE (XEXP (x, 0
> {
>   *elt = XEXP (x, 0);
>   return true;
>--- gcc/testsuite/gcc.target/i386/pr83682.c.jj 2018-01-05
>16:06:57.299885887 +0100
>+++ gcc/testsuite/gcc.target/i386/pr83682.c2018-01-05
>16:06:26.630875486 +0100
>@@ -0,0 +1,17 @@
>+/* PR rtl-optimization/83682 */
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -msse2" } */
>+
>+typedef float V __attribute__((__vector_size__(16)));
>+typedef double W __attribute__((__vector_size__(16)));
>+V b;
>+W c;
>+
>+void
>+foo (void *p)
>+{
>+  V e = __builtin_ia32_cvtsd2ss (b, c);
>+  V g = e;
>+  float f = g[0];
>+  __builtin_memcpy (p, &f, sizeof (f));
>+}
>
>   Jakub



Re: [PATCH] Fix yet another expand_debug_expr ICE (PR middle-end/83694)

2018-01-05 Thread Richard Biener
On January 5, 2018 10:55:00 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The recently added testcase for PR83666 ICEs on powerpc*/sparc*
>and perhaps other targets, where get_inner_reference gives us
>VOIDmode for a 512-bit field and smallest_int_mode_for_size
>ICEs on it because there is no such integer mode.
>
>Fixed by giving up above MAX_BITSIZE_MODE_ANY_INT, that would have been
>BLKmode and not really useful for debug info generation anyway.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK. 

Richard. 

>2018-01-05  Jakub Jelinek  
>
>   PR middle-end/83694
>   * cfgexpand.c (expand_debug_expr): Punt if mode1 is VOIDmode
>   and bitsize might be greater than MAX_BITSIZE_MODE_ANY_INT.
>
>--- gcc/cfgexpand.c.jj 2018-01-04 12:43:38.199537090 +0100
>+++ gcc/cfgexpand.c2018-01-05 12:10:46.960037601 +0100
>@@ -4534,8 +4534,12 @@ expand_debug_expr (tree exp)
>   if (MEM_P (op0))
> {
>   if (mode1 == VOIDmode)
>-/* Bitfield.  */
>-mode1 = smallest_int_mode_for_size (bitsize);
>+{
>+  if (maybe_gt (bitsize, MAX_BITSIZE_MODE_ANY_INT))
>+return NULL;
>+  /* Bitfield.  */
>+  mode1 = smallest_int_mode_for_size (bitsize);
>+}
>   poly_int64 bytepos = bits_to_bytes_round_down (bitpos);
>   if (maybe_ne (bytepos, 0))
> {
>
>   Jakub



Re: [PATCH] lto, testsuite: Fix ICE in -Wodr (PR lto/83121)

2018-01-05 Thread Richard Biener
On January 5, 2018 11:55:11 PM GMT+01:00, David Malcolm  
wrote:
>On Fri, 2018-01-05 at 10:36 +0100, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 10:52 PM, David Malcolm 
>> wrote:
>> > PR lto/83121 reports an ICE deep inside the linemap code when -Wodr
>> > reports on a type mismatch.
>> > 
>> > The root cause is that the warning can access the
>> > DECL_SOURCE_LOCATION
>> > of a streamed-in decl before the lto_location_cache has been
>> > applied.
>> > 
>> > lto_location_cache::input_location stores RESERVED_LOCATION_COUNT
>> > (==2)
>> > as a poison value until the cache is applied:
>> > 250   /* Keep value RESERVED_LOCATION_COUNT in *loc as linemap
>> > lookups will
>> > 251  ICE on it.  */
>> > 
>> > The fix is relatively simple: apply the cache before reading the
>> > DECL_SOURCE_LOCATION.
>> > 
>> > (I wonder if we should instead have a INVALID_LOCATION value to
>> > handle
>> > this case more explicitly?  e.g. 0x?  or reserve 2 in
>> > libcpp for
>> > that purpose, and have the non-reserved locations start at
>> > 3?  Either
>> > would be more invasive, though)
>> > 
>> > Triggering the ICE was fiddly: it seems to be affected by many
>> > things,
>> > including the order of files, and (I think) by filenames.  My
>> > theory is
>> > that it's affected by the ordering of the tree nodes in the LTO
>> > stream:
>> > for the ICE to occur, the types in question need to be compared
>> > before
>> > some other operation flushes the lto_location_cache.  This ordering
>> > is affected by the hash-based ordering in DFS in lto-streamer-
>> > out.c, which
>> > might explain why r255066 seemed to trigger the bug; the only
>> > relevant
>> > change to LTO there seemed to be:
>> >   * lto-streamer-out.c (hash_tree): Hash TYPE_EMPTY_P and
>> > DECL_PADDING_P.
>> > If so, then the bug was presumably already present, but hidden.
>> > 
>> > The patch also adds regression test coverage for the ICE, which is
>> > more
>> > involved - as far as I can tell, we don't have an existing way to
>> > verify
>> > diagnostics emitted during link-time optimization.
>> > 
>> > Hence the patch adds some machinery to lib/lto.exp to support two
>> > new
>> > directives: dg-lto-warning and dg-lto-message, corresponding to
>> > dg-warning and dg-message respectively, where the diagnostics are
>> > expected to be emitted at link-time.
>> > 
>> > The test case includes examples of LTO warnings and notes in both
>> > the
>> > primary and secondary source files
>> > 
>> > Doing so required reusing the logic from DejaGnu for handling
>> > diagnostics.
>> > Unfortunately the pertinent code is a 50 line loop within a ~200
>> > line Tcl
>> > function in dg.exp (dg-test), so I had to copy it from DejaGnu,
>> > making
>> > various changes as necessary (see lto_handle_diagnostics_for_file
>> > in the
>> > patch; for example the LTO version supports multiple source files,
>> > identifying which source file emitted a diagnostic).
>> > 
>> > For non-LTO diagnostics we currently ignore surplus "note"
>> > diagnostics.
>> > This patch updates lto_prune_warns to follow this behavior (since
>> > otherwise we'd need numerous dg-lto-message directives for the
>> > motivating
>> > test case).
>> > 
>> > The patch adds these PASS results to g++.sum:
>> > 
>> > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o assemble, -O0 -flto
>> > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_1.o assemble, -O0 -flto
>> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C line
>> > 6)
>> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_0.C line
>> > 8)
>> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C line
>> > 2)
>> > PASS: g++.dg/lto/pr83121  (test for LTO warnings, pr83121_1.C line
>> > 3)
>> > PASS: g++.dg/lto/pr83121 cp_lto_pr83121_0.o-cp_lto_pr83121_1.o
>> > link, -O0 -flto
>> > 
>> > The output for dg-lto-message above refers to "warnings", rather
>> > than
>> > "messages" but that's the same as for the non-LTO case, where dg-
>> > message
>> > also refers to "warnings".
>> > 
>> > Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>> > 
>> > OK for trunk?
>> 
>> Hmm, but we do this in warn_odr already?  How's that not enough?
>> 
>> At least it seems the place you add this isn't ideal (not at the
>> "root cause").
>> 
>> Richard.
>
>[CCing Honza]
>
>Yes, warn_odr does apply the cache, but this warning is coming from
>warn_types_mismatch.
>
>Of the various calls to warn_types_mismatch, most are immediately after
>a warn_odr guarded by if (warn && warned) or if (warned), and if
>warn_odr writes back true to *warned, it has definitely applied the
>cache.
>
>However, the 7-argument overload of odr_types_equivalent_p can
>also call warn_types_mismatch, passing in the location_t values it
>received.
>
>The problem occurs with this call stack, when:
>
>  add_type_duplicate, passes DECL_SOURCE_LOCATIONs to:
>odr_types_equivalent_p (7-argument overload) - which calls:
>  warn_types_mismatch, which calls:
>

Re: Fix folding of Inf/NaN comparisons for -ftrapping-math (PR tree-optimization/64811)

2018-01-05 Thread Richard Biener
On January 5, 2018 10:13:34 PM GMT+01:00, Joseph Myers 
 wrote:
>The folding of comparisons against Inf (to constants or comparisons
>with the maximum finite value) has various cases where it introduces
>or loses "invalid" exceptions for comparisons with NaNs.
>
>Folding x > +Inf to 0 should not be about HONOR_SNANS - ordered
>comparisons of both quiet and signaling NaNs should raise invalid.
>
>x <= +Inf is not the same as x == x, because again that loses an
>exception (equality comparisons don't raise exceptions except for
>signaling NaNs).
>
>x == +Inf is not the same as x > DBL_MAX, and a similar issue applies
>with the x != +Inf case - that transformation causes a spurious
>exception.
>
>This patch fixes the conditionals on the folding to avoid such
>introducing or losing exceptions.
>
>Bootstrapped with no regressions on x86_64-pc-linux-gnu (where the
>cases involving spurious exceptions wouldn't have failed anyway before
>GCC 8 because of unordered comparisons wrongly always having formerly
>been used by the back end).  Also tested for powerpc-linux-gnu
>soft-float that this fixes many glibc math/ test failures that arose
>in that configuration because this folding affected the IBM long
>double support in libgcc (no such failures appeared for hard-float
>because of the bug of powerpc hard-float always using unordered
>comparisons) - some failures remain, but I believe them to be
>unrelated.  OK to commit?

OK. 

Richard. 

>gcc:
>2018-01-05  Joseph Myers  
>
>   PR tree-optimization/64811
>   * match.pd: When optimizing comparisons with Inf, avoid
>   introducing or losing exceptions from comparisons with NaN.
>
>gcc/testsuite:
>2018-01-05  Joseph Myers  
>
>   PR tree-optimization/64811
>   * gcc.dg/torture/inf-compare-1.c, gcc.dg/torture/inf-compare-2.c,
>   gcc.dg/torture/inf-compare-3.c, gcc.dg/torture/inf-compare-4.c,
>   gcc.dg/torture/inf-compare-5.c, gcc.dg/torture/inf-compare-6.c,
>   gcc.dg/torture/inf-compare-7.c, gcc.dg/torture/inf-compare-8.c:
>   New tests.
>   * gcc.c-torture/execute/ieee/fp-cmp-7.x: New file.
>
>Index: gcc/match.pd
>===
>--- gcc/match.pd   (revision 256279)
>+++ gcc/match.pd   (working copy)
>@@ -3050,18 +3050,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  code = swap_tree_comparison (code);
>  }
>  (switch
>-  /* x > +Inf is always false, if with ignore sNANs.  */
>+  /* x > +Inf is always false, if we ignore NaNs or exceptions. 
>*/
>   (if (code == GT_EXPR
>- && ! HONOR_SNANS (@0))
>+ && !(HONOR_NANS (@0) && flag_trapping_math))
>{ constant_boolean_node (false, type); })
>   (if (code == LE_EXPR)
>-   /* x <= +Inf is always true, if we don't case about NaNs.  */
>+   /* x <= +Inf is always true, if we don't care about NaNs.  */
>(if (! HONOR_NANS (@0))
>   { constant_boolean_node (true, type); }
>-  /* x <= +Inf is the same as x == x, i.e. !isnan(x).  */
>-  (eq @0 @0)))
>-  /* x == +Inf and x >= +Inf are always equal to x > DBL_MAX.  */
>-  (if (code == EQ_EXPR || code == GE_EXPR)
>+  /* x <= +Inf is the same as x == x, i.e. !isnan(x), but this loses
>+ an "invalid" exception.  */
>+  (if (!flag_trapping_math)
>+   (eq @0 @0
>+  /* x == +Inf and x >= +Inf are always equal to x > DBL_MAX, but
>+   for == this introduces an exception for x a NaN.  */
>+  (if ((code == EQ_EXPR && !(HONOR_NANS (@0) &&
>flag_trapping_math))
>+ || code == GE_EXPR)
>(with { real_maxval (&max, neg, TYPE_MODE (TREE_TYPE (@0))); }
>   (if (neg)
>(lt @0 { build_real (TREE_TYPE (@0), max); })
>@@ -3072,7 +3076,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (neg)
>(ge @0 { build_real (TREE_TYPE (@0), max); })
>(le @0 { build_real (TREE_TYPE (@0), max); }
>-  /* x != +Inf is always equal to !(x > DBL_MAX).  */
>+  /* x != +Inf is always equal to !(x > DBL_MAX), but this
>introduces
>+   an exception for x a NaN so use an unordered comparison.  */
>   (if (code == NE_EXPR)
>(with { real_maxval (&max, neg, TYPE_MODE (TREE_TYPE (@0))); }
>   (if (! HONOR_NANS (@0))
>@@ -3080,10 +3085,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (ge @0 { build_real (TREE_TYPE (@0), max); })
> (le @0 { build_real (TREE_TYPE (@0), max); }))
>(if (neg)
>-(bit_xor (lt @0 { build_real (TREE_TYPE (@0), max); })
>- { build_one_cst (type); })
>-(bit_xor (gt @0 { build_real (TREE_TYPE (@0), max); })
>- { build_one_cst (type); }))
>+(unge @0 { build_real (TREE_TYPE (@0), max); })
>+(unle @0 { build_real (TREE_TYPE (@0), max); }))
> 
>  /* If this is a comparison of a real constant with a PLUS_EXPR
> or a MINUS_EXPR of a real constant, we can convert it into a
>Index: gcc/testsuite/gcc.c-torture/execute/ieee/fp-cmp-7.x