[x86] GFNI enabling[1/4]

2017-10-13 Thread Koval, Julia
Hi,

gcc/
* gcc/common/config/i386/i386-common.c (OPTION_MASK_ISA_GFNI_SET,
(OPTION_MASK_ISA_GFNI_UNSET): New.
(ix86_handle_option): Handle OPT_mgfni.
* gcc/config/i386/cpuid.h (bit_GFNI): New.
* gcc/config/i386/driver-i386.c (host_detect_local_cpu): Detect gfni.
* gcc/config/i386/i386-c.c (ix86_target_macros_internal): Define 
__GFNI__.
* gcc/config/i386/i386.c (ix86_target_string): Add -mgfni.
(ix86_valid_target_attribute_inner_p): Add OPT_mgfni.
* gcc/config/i386/i386.h (TARGET_GFNI, TARGET_GFNI_P): New.
* gcc/config/i386/i386.opt: Add mgfni.

Here is the first patch of GFNI isaset enabling. It adds new option -mgfni for 
GFNI isaset and cpuid bit.
Docs for new instructions and isasets are here:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
Ok for trunk? 

Thanks,
Julia


0001-gfni-option.patch
Description: 0001-gfni-option.patch


Re: [x86] GFNI enabling[1/4]

2017-10-13 Thread Jakub Jelinek
On Fri, Oct 13, 2017 at 07:03:14AM +, Koval, Julia wrote:

--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -753,6 +753,10 @@ mrdpid
 Target Report Mask(ISA_RDPID) Var(ix86_isa_flags2) Save
 Support RDPID built-in functions and code generation.
 
+mgfni
+Target Report Mask(ISA_GFNI) Var(ix86_isa_flags2) Save
+Support RDPID built-in functions and code generation.
+

Pasto?  It would surprise me if the description was meant to be
exactly the same as -mrdpid.

Jakub


0001-gfni-option.patch
Description: 0001-gfni-option.patch


Re: [PATCH] Improve rotate fold-const pattern matching (PR target/82498)

2017-10-13 Thread Richard Biener
On Thu, 12 Oct 2017, Jakub Jelinek wrote:

> Hi!
> 
> Marc in the PR mentioned that it is not really good that the recommended
> rotate pattern is recognized only during forwprop1 and later, which is after
> einline and that inlining or early opts could have changed stuff too much so
> that we wouldn't recogize it anymore.

Hmm, but the only thing functions see is inlining early optimized bodies
into them and then constant propagation performed, so I don't see how
we could confuse the pattern in a way to be indetectable.  Also
early inlining is performed on early optimized bodies so cost metrics
see rotates, not the unrecognized form.

> The following patch handles that pattern in fold_binary_loc too, and while
> I've touched it, it cleans a lot of ugliness/duplication in that code.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Still looks like an improvement, thus ok.

Thanks,
Richard.

> 2017-10-12  Jakub Jelinek  
> 
>   PR target/82498
>   * fold-const.c (fold_binary_loc) : Code cleanups,
>   instead of handling MINUS_EXPR twice (once for each argument),
>   canonicalize operand order and handle just once, use rtype where
>   possible.  Handle (A << B) | (A >> (-B & (Z - 1))).
> 
>   * gcc.dg/tree-ssa/pr82498.c: New test.
> 
> --- gcc/fold-const.c.jj   2017-10-11 22:37:51.0 +0200
> +++ gcc/fold-const.c  2017-10-12 13:17:45.360589554 +0200
> @@ -9429,7 +9429,10 @@ fold_binary_loc (location_t loc,
>/* (A << C1) + (A >> C2) if A is unsigned and C1+C2 is the size of A
>is a rotate of A by C1 bits.  */
>/* (A << B) + (A >> (Z - B)) if A is unsigned and Z is the size of A
> -  is a rotate of A by B bits.  */
> +  is a rotate of A by B bits.
> +  Similarly for (A << B) | (A >> (-B & C3)) where C3 is Z-1,
> +  though in this case CODE must be | and not + or ^, otherwise
> +  it doesn't return A when B is 0.  */
>{
>   enum tree_code code0, code1;
>   tree rtype;
> @@ -9447,25 +9450,32 @@ fold_binary_loc (location_t loc,
>   == GET_MODE_UNIT_PRECISION (TYPE_MODE (rtype
> {
>   tree tree01, tree11;
> + tree orig_tree01, orig_tree11;
>   enum tree_code code01, code11;
>  
> - tree01 = TREE_OPERAND (arg0, 1);
> - tree11 = TREE_OPERAND (arg1, 1);
> + tree01 = orig_tree01 = TREE_OPERAND (arg0, 1);
> + tree11 = orig_tree11 = TREE_OPERAND (arg1, 1);
>   STRIP_NOPS (tree01);
>   STRIP_NOPS (tree11);
>   code01 = TREE_CODE (tree01);
>   code11 = TREE_CODE (tree11);
> + if (code11 != MINUS_EXPR
> + && (code01 == MINUS_EXPR || code01 == BIT_AND_EXPR))
> +   {
> + std::swap (code0, code1);
> + std::swap (code01, code11);
> + std::swap (tree01, tree11);
> + std::swap (orig_tree01, orig_tree11);
> +   }
>   if (code01 == INTEGER_CST
>   && code11 == INTEGER_CST
>   && (wi::to_widest (tree01) + wi::to_widest (tree11)
> - == element_precision (TREE_TYPE (TREE_OPERAND (arg0, 0)
> + == element_precision (rtype)))
> {
>   tem = build2_loc (loc, LROTATE_EXPR,
> -   TREE_TYPE (TREE_OPERAND (arg0, 0)),
> -   TREE_OPERAND (arg0, 0),
> +   rtype, TREE_OPERAND (arg0, 0),
> code0 == LSHIFT_EXPR
> -   ? TREE_OPERAND (arg0, 1)
> -   : TREE_OPERAND (arg1, 1));
> +   ? orig_tree01 : orig_tree11);
>   return fold_convert_loc (loc, type, tem);
> }
>   else if (code11 == MINUS_EXPR)
> @@ -9477,39 +9487,37 @@ fold_binary_loc (location_t loc,
>   STRIP_NOPS (tree111);
>   if (TREE_CODE (tree110) == INTEGER_CST
>   && 0 == compare_tree_int (tree110,
> -   element_precision
> -   (TREE_TYPE (TREE_OPERAND
> -   (arg0, 0
> +   element_precision (rtype))
>   && operand_equal_p (tree01, tree111, 0))
> -   return
> - fold_convert_loc (loc, type,
> -   build2 ((code0 == LSHIFT_EXPR
> -? LROTATE_EXPR
> -: RROTATE_EXPR),
> -   TREE_TYPE (TREE_OPERAND (arg0, 
> 0)),
> -   TREE_OPERAND (arg0, 0),
> -   TREE_OPERAND (arg0, 1)));
> +   {
> + tem = build2_loc (loc, (code0 == LSHIFT_EXPR
> +

RE: [x86] GFNI enabling[1/4]

2017-10-13 Thread Koval, Julia
Sorry, fixed that.

Thanks,
Julia

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Friday, October 13, 2017 9:08 AM
> To: Koval, Julia 
> Cc: GCC Patches ; Uros Bizjak
> ; Kirill Yukhin 
> Subject: Re: [x86] GFNI enabling[1/4]
> 
> On Fri, Oct 13, 2017 at 07:03:14AM +, Koval, Julia wrote:
> 
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -753,6 +753,10 @@ mrdpid
>  Target Report Mask(ISA_RDPID) Var(ix86_isa_flags2) Save
>  Support RDPID built-in functions and code generation.
> 
> +mgfni
> +Target Report Mask(ISA_GFNI) Var(ix86_isa_flags2) Save
> +Support RDPID built-in functions and code generation.
> +
> 
> Pasto?  It would surprise me if the description was meant to be
> exactly the same as -mrdpid.
> 
>   Jakub


0001-gfni-option.patch
Description: 0001-gfni-option.patch


Re: [PATCH 4/9] [SFN] introduce statement frontier notes, still disabled

2017-10-13 Thread Alexandre Oliva
On Oct  9, 2017, Richard Biener  wrote:

> The middle-end changes are ok.   The C FE change looks reasonable,
> I'd appreciate a 2nd look at the C++ FE changes by a maintainer.

https://gcc.gnu.org/ml/gcc-patches/2017-09/msg02026.html


Thanks a lot for the reviews!

I've held off installing the approved patches and parts of this patch
for the time being, awaiting the second look you requested (copying the
C++ maintainers to get their attention), and also reviews for the
subsequent patches (2 LVU and 2 IEPM).  Please let me know if you'd
prefer me to go ahead and install the approved bits.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [PATCH] i386: Check red zone size in push peephole2

2017-10-13 Thread Uros Bizjak
On Wed, Oct 11, 2017 at 3:16 AM, H.J. Lu  wrote:
> Check red zone size, instead of if red zone is available, in push
> peephole2s.
>
> Tested on i686 and x86-64.  OK for master?

Please rename variable to ix86_red_zone_size, as Jakub suggested in the PR.

OK for mainline with this change.

Thanks,
Uros.

>
> H.J.
> ---
> gcc/
>
> PR target/82499
> * config/i386/i386.h (x86_red_zone_size): New.
> * config/i386/i386.md (push peephole2s): Replace
> "!ix86_using_red_zone ()" with "ix86_red_zone_size == 0".
>
> gcc/testsuite/
>
> PR target/82499
> * gcc.target/i386/pr82499-1.c: New file.
> * gcc.target/i386/pr82499-2.c: Likewise.
> * gcc.target/i386/pr82499-3.c: Likewise.
> ---
>  gcc/config/i386/i386.h|  1 +
>  gcc/config/i386/i386.md   |  8 
>  gcc/testsuite/gcc.target/i386/pr82499-1.c | 21 +
>  gcc/testsuite/gcc.target/i386/pr82499-2.c | 21 +
>  gcc/testsuite/gcc.target/i386/pr82499-3.c | 21 +
>  5 files changed, 68 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82499-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82499-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr82499-3.c
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef88d89cae2..197f4e22d91 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2603,6 +2603,7 @@ struct GTY(()) machine_function {
>  #define ix86_current_function_calls_tls_descriptor \
>(ix86_tls_descriptor_calls_expanded_in_cfun && df_regs_ever_live_p 
> (SP_REG))
>  #define ix86_static_chain_on_stack (cfun->machine->static_chain_on_stack)
> +#define x86_red_zone_size (cfun->machine->frame.red_zone_size)
>
>  /* Control behavior of x86_file_start.  */
>  #define X86_FILE_START_VERSION_DIRECTIVE false
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 1a794c0ae06..57d446857af 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -18827,7 +18827,7 @@
>   (clobber (mem:BLK (scratch)))])]
>"(TARGET_SINGLE_PUSH || optimize_insn_for_size_p ())
> && INTVAL (operands[0]) == -GET_MODE_SIZE (word_mode)
> -   && !ix86_using_red_zone ()"
> +   && x86_red_zone_size == 0"
>[(clobber (match_dup 1))
> (parallel [(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
>   (clobber (mem:BLK (scratch)))])])
> @@ -18841,7 +18841,7 @@
>   (clobber (mem:BLK (scratch)))])]
>"(TARGET_DOUBLE_PUSH || optimize_insn_for_size_p ())
> && INTVAL (operands[0]) == -2*GET_MODE_SIZE (word_mode)
> -   && !ix86_using_red_zone ()"
> +   && x86_red_zone_size == 0"
>[(clobber (match_dup 1))
> (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
> (parallel [(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
> @@ -18856,7 +18856,7 @@
>   (clobber (reg:CC FLAGS_REG))])]
>"(TARGET_SINGLE_PUSH || optimize_insn_for_size_p ())
> && INTVAL (operands[0]) == -GET_MODE_SIZE (word_mode)
> -   && !ix86_using_red_zone ()"
> +   && x86_red_zone_size == 0"
>[(clobber (match_dup 1))
> (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])
>
> @@ -18868,7 +18868,7 @@
>   (clobber (reg:CC FLAGS_REG))])]
>"(TARGET_DOUBLE_PUSH || optimize_insn_for_size_p ())
> && INTVAL (operands[0]) == -2*GET_MODE_SIZE (word_mode)
> -   && !ix86_using_red_zone ()"
> +   && x86_red_zone_size == 0"
>[(clobber (match_dup 1))
> (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
> (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])
> diff --git a/gcc/testsuite/gcc.target/i386/pr82499-1.c 
> b/gcc/testsuite/gcc.target/i386/pr82499-1.c
> new file mode 100644
> index 000..3aba62a466f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr82499-1.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* The pic register save adds unavoidable stack pointer references.  */
> +/* { dg-skip-if "" { ia32 && { ! nonpic } } } */
> +/* These options are selected to ensure 1 word needs to be allocated
> +   on the stack to maintain alignment for the call.  This should be
> +   transformed to push+pop.  We also want to force unwind info updates.  */
> +/* { dg-options "-Os -fomit-frame-pointer -fasynchronous-unwind-tables" } */
> +/* { dg-additional-options "-mpreferred-stack-boundary=3" { target ia32 } } 
> */
> +/* { dg-additional-options "-mpreferred-stack-boundary=4" { target { ! ia32 
> } } } */
> +/* ms_abi has reserved stack-region.  */
> +/* { dg-skip-if "" { x86_64-*-mingw* } } */
> +
> +extern void g (void);
> +int
> +f (void)
> +{
> +  g ();
> +  return 42;
> +}
> +
> +/* { dg-final { scan-assembler-not "(sub|add)(l|q)\[\\t \]*\\$\[0-9\]*,\[\\t 
> \]*%\[re\]?sp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr82499-2.c 
> b/gcc/testsuite/gcc.target/i386/pr82499-2.c
> new file mode 100644
> index 00

[ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated part 2

2017-10-13 Thread Christophe Lyon
Hi,

The attached small patch solves PR 67591 and removes occurrences of
"IT blocks containing 32-bit Thumb instructions are deprecated in
ARMv8". It is similar to the patch I committed recently and updates
the 3 remaining patterns that can generate such instructions. I
checked gcc.log, g++.log, libstdc++.log and gfortran.log and found no
occurrence of the warning with this patch applied.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, and also bootstrapped
successfully on armv8 HW in thumb mode.

Benchmarking shows no noticeable difference.

OK for trunk?

Thanks,

Christophe
2017-10-13  Christophe Lyon  

PR target/67591
* config/arm/arm.md (*sub_shiftsi): Add predicable_short_it
attribute.
(*cmp_ite0): Add enabled_for_depr_it attribute.
(*cmp_ite1): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f241f9d..093db74 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8960,6 +8960,7 @@
   "TARGET_32BIT"
   "sub%?\\t%0, %1, %3%S2"
   [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")
(set_attr "shift" "3")
(set_attr "arch" "32,a")
(set_attr "type" "alus_shift_imm,alus_shift_reg")])
@@ -9398,6 +9399,7 @@
   }"
   [(set_attr "conds" "set")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr "type" "multiple")
(set_attr_alternative "length"
   [(const_int 6)
@@ -9481,6 +9483,7 @@
   }"
   [(set_attr "conds" "set")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr_alternative "length"
   [(const_int 6)
(const_int 8)


Patch ping^3

2017-10-13 Thread Jakub Jelinek
Hi!

I'd like to ping the following wrong-code patch:

http://gcc.gnu.org/ml/gcc-patches/2017-09/msg01540.html 
   
  libgcc __mulvDI3 fix - missed detection of overflow for   
   
  0x * 0x   
   
  __builtin_mul_overflow{,_p} fix for the same  
   

   
Thanks  
   

Jakub


Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-13 Thread Martin Liška
On 10/12/2017 11:54 PM, Jeff Law wrote:
> On 10/11/2017 12:13 AM, Martin Liška wrote:
>> 2017-10-10  Martin Liska  
>>
>>  PR tree-optimization/82493
>>  * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>>  (test_range_functions): New function.
>>  (sbitmap_c_tests): Likewise.
>>  * selftest-run-tests.c (selftest::run_tests): Run new tests.
>>  * selftest.h (sbitmap_c_tests): New function.
> I went ahead and committed this along with a patch to fix the off-by-one
> error in live_bytes_read.  Bootstrapped and regression tested on x86.
> 
> Actual patch attached for archival purposes.
> 
> Jeff
> 

Thanks.

I'll carry on and write another set of unit tests.

Martin


Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-10-13 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 8:54 PM, Tsimbalist, Igor V
 wrote:
> Attached is an updated patch according to your comments. New tests are
> added to test ICF optimization in presence of nocf_check attribute.
--- a/gcc/testsuite/c-c++-common/fcf-protection-2.c
+++ b/gcc/testsuite/c-c++-common/fcf-protection-2.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "-fcf-protection=branch" } */
-/* { dg-error "'-fcf-protection=branch' is not supported for this
target" "" { target { "i?86-*-* x86_64-*-*" } } 0 } */
+/* { dg-error "'-fcf-protection=branch' requires CET support on this
target. Use -mcet or one of -mibt, -mshstk options to enable CET" "" {
target { "i?86-*-* x86_64-*-*" } } 0 } */

Checking for "-fcf-protection=branch' requires CET support on this
target" should be enough. No need to check the whole message here and
in other tests.

 /* { dg-error "'-fcf-protection=branch' is not supported for this
target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
diff --git a/gcc/testsuite/c-c++-common/fcf-protection-3.c
b/gcc/testsuite/c-c++-common/fcf-protection-3.c


--- a/gcc/testsuite/c-c++-common/fcf-protection-4.c
+++ b/gcc/testsuite/c-c++-common/fcf-protection-4.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
 /* { dg-options "-fcf-protection=none" } */
-/* { dg-bogus "'-fcf-protection=none' is not supported for this
target" "" { target { "i?86-*-* x86_64-*-*" } } 0 } */
+/* { dg-bogus "'-fcf-protection=none' res CET support on this target.
Use -mcet or one of -mibt, -mshstk options to enable CET" "" { target
{ "i?86-*-* x86_64-*-*" } } 0 } */
 /* { dg-bogus "'-fcf-protection=none' is not supported for this
target" "" { target { ! "i?86-*-* x86_64-*-*" } } 0 } */
diff --git a/gcc/testsuite/c-c++-common/fcf-protection-5.c
b/gcc/testsuite/c-c++-common/fcf-protection-5.c

The above test checks for bogus messages? -fcf-protection=none option
should not generate any messages. So, the test should check that
-fcf-protection=none doesn't generate any error. (And, there is a typo
in the message, /s/res/requires.)

Uros.

> Igor
>
>
>> -Original Message-
>> From: Tsimbalist, Igor V
>> Sent: Tuesday, September 19, 2017 11:30 PM
>> To: Uros Bizjak 
>> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
>> 
>> Subject: RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
>>
>> > -Original Message-
>> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> > ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
>> > Sent: Tuesday, September 19, 2017 6:13 PM
>> > To: Tsimbalist, Igor V 
>> > Cc: gcc-patches@gcc.gnu.org
>> > Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
>> >
>> > On Tue, Sep 19, 2017 at 5:18 PM, Tsimbalist, Igor V
>> >  wrote:
>> > >> -Original Message-
>> > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> > >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
>> > >> Sent: Monday, September 18, 2017 12:17 PM
>> > >> To: gcc-patches@gcc.gnu.org
>> > >> Cc: Tsimbalist, Igor V ; Tsimbalist,
>> > >> Igor V 
>> > >> Subject: Re:
>> > >> 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
>> > >>
>> > >> Hello!
>> > >>
>> > >> > gcc/testsuite/
>> > >> >
>> > >> > * g++.dg/cet-notrack-1.C: New test.
>> > >> > * gcc.target/i386/cet-intrin-1.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-10.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-2.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-3.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-4.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-5.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-6.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-7.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-8.c: Likewise.
>> > >> > * gcc.target/i386/cet-intrin-9.c: Likewise.
>> > >> > * gcc.target/i386/cet-label.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-2b.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-3.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-4a.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-4b.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-5a.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-5b.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-6a.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-6b.c: Likewise.
>> > >> > * gcc.target/i386/cet-notrack-7.c: Likewise.
>> > >> > * gcc.target/i386/cet-property-1.c: Likewise.
>> > >> > * gcc.target/i386/cet-property-2.c: Likewise.
>> > >> > * gcc.target/i386/cet-rdssp-1.c: Likewise.
>> > >> > * gcc.target/i386/cet-sjlj-1.c: Likewise.
>> > >> > * gcc.target/i386/cet-sjlj-2.c: Likewise.
>> > >> > * gcc.target/i386/cet-sjlj-3.c: Likewise.
>> > >> > * gcc.target/i386/cet-switch-1.c: Likewise.
>> > >> > * gcc.target/i386/cet-switch-2.c: Likewise.
>> > >> > * lib/target-supports.exp (check_effective_target_cet): New proc.
>> > >>
>> > >> A 

Re: [PATCH] Improve rotate fold-const pattern matching (PR target/82498)

2017-10-13 Thread Marc Glisse

On Fri, 13 Oct 2017, Richard Biener wrote:


On Thu, 12 Oct 2017, Jakub Jelinek wrote:


Hi!

Marc in the PR mentioned that it is not really good that the recommended
rotate pattern is recognized only during forwprop1 and later, which is after
einline and that inlining or early opts could have changed stuff too much so
that we wouldn't recogize it anymore.


Hmm, but the only thing functions see is inlining early optimized bodies
into them and then constant propagation performed, so I don't see how
we could confuse the pattern in a way to be indetectable.  Also


For instance, in

unsigned f(unsigned x, int y) { return (x << y) | (x >> (-y & 31)); }
unsigned g(unsigned x) { return f(x << 2, 3); }

if we inline f into g before optimizing f, we might combine (x << 2) << 3 
into x << 5, which would make it harder to recognize the rotation. For 
some reason that's not happening, but similar scenarios seemed possible to 
me. But if the inliner calls early optimizations (including forwprop) on 
the callee before inlining it, then my comment is indeed nonsense.



early inlining is performed on early optimized bodies so cost metrics
see rotates, not the unrecognized form.


Ah, so there's a hidden forwprop pass in einline?

unsigned f(unsigned x, int y) {
  unsigned z = x << y;
  return z | (x >> (-y & 31));
}
unsigned g(unsigned x, int y) { return f(x, y); }

After einline, g has rotate but not f, that's rather confusing.

--
Marc Glisse


Re: [PATCH] (gimple) Allow integer return type from vector compares

2017-10-13 Thread Richard Biener
On Thu, Oct 12, 2017 at 10:02 PM, Will Schmidt
 wrote:
> Hi,
>
> Update the logic in verify_gimple_comparision to allow a vector integer result
> from a vector comparison, where it previously was limited to only allowing
> compares with boolean results.  This allows powerpc intrinsics such as this
> one to build (after gimple folding):
>vector bool int vec_cmpeq (vector signed int, vector signed int);
>
> This has been tested in conjunction with the "rs6000 GIMPLE folding for vector
> compares" patch (posted separately) on p6 and newer.
>
> OK for trunk?

Well.  It was this way before AVX512 introduced those vector boolean
types.  There
we decided to have comparisons always return a boolean vector.  Previously
there was an additional restriction in that the vector elements of the
boolean result
had to be the same size as the elements of the comparison operands.

You are now supposed to use the "canonical"

 _1 = _2 != _3 ? { your true value vector } : { your false value vector};

with a VEC_COND_EXPR.

Richard.

> Thanks,
> -Will
>
> [gcc]
>
> 2017-10-12  Will Schmidt  
>
> * gcc/tree-cfg.c: (@ verify_gimple_comparison ): allow boolean result
> from vector compares.
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index b5e0460..adf3607 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -3624,14 +3624,16 @@ verify_gimple_comparison (tree type, tree op0, tree 
> op1, enum tree_code code)
>   debug_generic_expr (op0_type);
>   debug_generic_expr (op1_type);
>   return true;
>  }
>  }
> -  /* Or a boolean vector type with the same element count
> - as the comparison operand types.  */
> +  /* Or a vector type with the same element count
> + as the comparison operand types.  The vector type may
> + be boolean or integer.  */
>else if (TREE_CODE (type) == VECTOR_TYPE
> -  && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
> +  && (( TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
> +  || ( TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)))
>  {
>if (TREE_CODE (op0_type) != VECTOR_TYPE
>   || TREE_CODE (op1_type) != VECTOR_TYPE)
>  {
>error ("non-vector operands in vector comparison");
>
>


Re: [PATCH] Add gnu::unique_ptr

2017-10-13 Thread Richard Biener
On Fri, Oct 13, 2017 at 2:40 AM, David Malcolm  wrote:
> From: Trevor Saunders 
>
> I had a go at updating Trevor's unique_ptr patch from July
> ( https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02084.html )
>
> One of the sticking points was what to call the namespace; there was
> wariness about using "gtl" as the name.
>
> Richi proposed (https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00155.html):
>> If it should be short use g::.  We can also use gnu:: I guess and I
>> agree gnutools:: is a little long (similar to libiberty::).  Maybe
>> gt:: as a short-hand for gnutools.
>
> Pedro noted (https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00157.html):
>> Exactly 3 letters has the nice property of making s/gtl::foo/std::foo/
>> super trivial down the road; you don't have to care about reindenting
>> stuff
>
> Hence this version of the patch uses "gnu::" - 3 letters, one of the
> ones Richi proposed, and *not* a match for ".tl" (e.g. "gtl");
> (FWIW personally "gnu::" is my favorite, followed by "gcc::").
>
> The include/unique-ptr.h in this patch is identical to that posted
> by Trevor in July, with the following changes (by me):
> - renaming of "gtl" to "gnu"
> - renaming of DEFINE_GDB_UNIQUE_PTR to DEFINE_GNU_UNIQUE_PTR
> - renaming of xfree_deleter to xmalloc_deleter, and making it
>   use "free" rather than "xfree" (which doesn't exist)
>
> I also went and added a gcc/unique-ptr-tests.cc file containing
> selftests (my thinking here is that although std::unique_ptr ought
> to already be well-tested, we need to ensure that the fallback
> implementation is sane when building with C++ prior to C++11).
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu,
> using gcc 4.8 for the initial bootstrap (hence testing both gnu+03
> and then gnu++14 in the selftests, for stage 1 and stages 2 and 3
> respectively).
>
> I also manually tested selftests with both gcc 4.8 and trunk on
> the same hardware (again, to exercise both the with/without C++11
> behavior).
>
> Tested with "make selftest-valgrind" (no new issues).
>
> OK for trunk?

Ok if the gdb folks approve (and will change to this version).

Thanks,
Richard.

> Motivation/use-cases:
> (a) I have an updated version of the
> name_hint/deferred_diagnostic patch kit, which uses this (rather
> than the refcounting used by
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00439.html ); and
> (b) having this available ought to allow various other cleanups
> e.g. the example ones identified by Trevor here:
>   https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02085.html
>
> Thanks
> Dave
>
>
> Blurb from Trevor:
>
> For most of the history of this see
>   https://sourceware.org/ml/gdb-patches/2016-10/msg00223.html
> The changes are mostly s/gdb/gtl/g
>
> include/ChangeLog:
>
> 2017-07-29  Trevor Saunders  
>
> * unique-ptr.h: New file.
>
> Combined ChangeLog follows:
>
> gcc/ChangeLog:
>
> David Malcolm 
>
> * Makefile.in (OBJS): Add unique-ptr-tests.o.
> * selftest-run-tests.c (selftest::run_tests): Call
> selftest::unique_ptr_tests_cc_tests.
> * selftest.h (selftest::unique_ptr_tests_cc_tests): New decl.
> * unique-ptr-tests.cc: New file.
>
> include/ChangeLog:
>
> Trevor Saunders  
> David Malcolm 
>
> * unique-ptr.h: New file.
> ---
>  gcc/Makefile.in  |   1 +
>  gcc/selftest-run-tests.c |   1 +
>  gcc/selftest.h   |   1 +
>  gcc/unique-ptr-tests.cc  | 177 ++
>  include/unique-ptr.h | 386 
> +++
>  5 files changed, 566 insertions(+)
>  create mode 100644 gcc/unique-ptr-tests.cc
>  create mode 100644 include/unique-ptr.h
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 878ce7b..2809619 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1568,6 +1568,7 @@ OBJS = \
> tree-vrp.o \
> tree.o \
> typed-splay-tree.o \
> +   unique-ptr-tests.o \
> valtrack.o \
> value-prof.o \
> var-tracking.o \
> diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
> index 30e476d..b05e0fc 100644
> --- a/gcc/selftest-run-tests.c
> +++ b/gcc/selftest-run-tests.c
> @@ -66,6 +66,7 @@ selftest::run_tests ()
>sreal_c_tests ();
>fibonacci_heap_c_tests ();
>typed_splay_tree_c_tests ();
> +  unique_ptr_tests_cc_tests ();
>
>/* Mid-level data structures.  */
>input_c_tests ();
> diff --git a/gcc/selftest.h b/gcc/selftest.h
> index 96eccac..adc0b68 100644
> --- a/gcc/selftest.h
> +++ b/gcc/selftest.h
> @@ -194,6 +194,7 @@ extern void store_merging_c_tests ();
>  extern void typed_splay_tree_c_tests ();
>  extern void tree_c_tests ();
>  extern void tree_cfg_c_tests ();
> +extern void unique_ptr_tests_cc_tests ();
>  extern void vec_c_tests ();
>  extern void wide_int_cc_tests ();
>  extern void predict_c_tests ();
> diff --git a/gcc/unique-ptr-tests.cc b/gcc/unique-ptr-tests.cc
> new file mode 100644
> index 000..df18467
> --- /

Re: [PATCH, rs6000] GIMPLE folding for vector compares

2017-10-13 Thread Richard Biener
On Thu, Oct 12, 2017 at 10:03 PM, Will Schmidt
 wrote:
> Hi,
>
> Add support for gimple folding of vec_cmp_{eq,ge,gt,le,ne} for
> the integer data types.
>
> This adds a handful of entries to the switch statement in 
> builtin_function_type
> for those builtins having unsigned arguments.
>
> Three entries are added to vsx.md to enable vcmpne[bhw] instruction, where we
> would otherwise generate a vcmpeq + vnor.
>
> This patch requires the previously posted "allow integer return type from 
> vector compares" patch.
>
> A handful of existing tests required updates to their specified optimization
> levels to continue to generate the desired code.  builtins-3-p9.c in 
> particular
> has been updated to reflect improved code gen with the higher specified
> optimization level.   Testcase coverage is otherwise handled by the 
> already-in-tree
> gcc.target/powerpc/fold-vec-cmp-*.c tests.
>
> Tested OK on P6 and newer. OK for trunk?
>
> Thanks,
> -Will
>
> [gcc]
>
> 2017-10-12  Will Schmidt  
>
> * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support for
> folding of vector compares.  (builtin_function_type) Add compare
> builtins to the list of functions having unsigned arguments.
> * config/rs6000/vsx.md:  Add vcmpne{b,h,w} instructions.
>
> [testsuite]
>
> 2017-10-12  Will Schmidt  
>
> * gcc.target/powerpc/builtins-3-p9.c: Add -O1, update
> expected codegen checks.
> * gcc.target/powerpc/vec-cmp-sel.c: Mark vars as volatile.
> * gcc.target/powerpc/vsu/vec-cmpne-0.c: Add -O1.
> * gcc.target/powerpc/vsu/vec-cmpne-1.c: Add -O1.
> * gcc.target/powerpc/vsu/vec-cmpne-2.c: Add -O1.
> * gcc.target/powerpc/vsu/vec-cmpne-3.c: Add -O1.
> * gcc.target/powerpc/vsu/vec-cmpne-4.c: Add -O1.
> * gcc.target/powerpc/vsu/vec-cmpne-5.c: Add -O1.
> * gcc.target/powerpc/vsu/vec-cmpne-6.c: Add -O1.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 12ddd97..7e73239 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -16605,17 +16605,93 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
>build_int_cst (arg2_type, 0)), 
> arg0);
>  gimple_set_location (g, loc);
>  gsi_replace (gsi, g, true);
>  return true;
>}
> +/* Vector compares (integer); EQ, NE, GE, GT, LE.  */
> +case ALTIVEC_BUILTIN_VCMPEQUB:
> +case ALTIVEC_BUILTIN_VCMPEQUH:
> +case ALTIVEC_BUILTIN_VCMPEQUW:
> +case P8V_BUILTIN_VCMPEQUD:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   lhs = gimple_call_lhs (stmt);
> +   gimple *g = gimple_build_assign (lhs, EQ_EXPR, arg0, arg1);

As said elsewhere this needs to become either

  tree ctype = build_same_sized_truth_vector_type (TREE_TYPE (lhs));
  gimple_build_assign (make_ssa_name (ctype), EQ_EXPR, arg0, arg1)
  gimple_build_assign (lhs, VIEW_CONVERT_EXPR, lhs above);

(eventually the VCE can be elided - try) or

  gimple_build_assign (lhs, VEC_COND_EXPR,
 fold_build2 (EQ_EXPR, ctype, arg0, arg1),
 vector-with-trues, vector-with-falses);

depending on what your target can expand.


> +   gimple_set_location (g, gimple_location (stmt));
> +   gsi_replace (gsi, g, true);
> +   return true;
> +  }
> +case P9V_BUILTIN_CMPNEB:
> +case P9V_BUILTIN_CMPNEH:
> +case P9V_BUILTIN_CMPNEW:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   lhs = gimple_call_lhs (stmt);
> +   gimple *g = gimple_build_assign (lhs, NE_EXPR, arg0, arg1);
> +   gimple_set_location (g, gimple_location (stmt));
> +   gsi_replace (gsi, g, true);
> +   return true;
> +  }
> +case VSX_BUILTIN_CMPGE_16QI:
> +case VSX_BUILTIN_CMPGE_U16QI:
> +case VSX_BUILTIN_CMPGE_8HI:
> +case VSX_BUILTIN_CMPGE_U8HI:
> +case VSX_BUILTIN_CMPGE_4SI:
> +case VSX_BUILTIN_CMPGE_U4SI:
> +case VSX_BUILTIN_CMPGE_2DI:
> +case VSX_BUILTIN_CMPGE_U2DI:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   lhs = gimple_call_lhs (stmt);
> +   gimple *g = gimple_build_assign (lhs, GE_EXPR, arg0, arg1);
> +   gimple_set_location (g, gimple_location (stmt));
> +   gsi_replace (gsi, g, true);
> +   return true;
> +  }
> +case ALTIVEC_BUILTIN_VCMPGTSB:
> +case ALTIVEC_BUILTIN_VCMPGTUH:
> +case ALTIVEC_BUILTIN_VCMPGTSH:
> +case ALTIVEC_BUILTIN_VCMPGTUW:
> +case ALTIVEC_BUILTIN_VCMPGTSW:
> +case ALTIVEC_BUILTIN_VCMPGTUB:
> +case P8V_BUILTIN_VCMPGTUD:
> +case P8V_BUILTIN_VCMPGTSD:
> +  {
> +   arg0 = gimple_call_arg (stmt, 0);
> +   arg1 = gimple_call_arg (stmt, 1);
> +   lhs = gimple_call_lhs (stmt);
> +   gimple *g = gimple_build_assign (lhs, GT_EXPR, arg0, arg1);
> +   gimple_s

Re: [PATCH 4/9] [SFN] introduce statement frontier notes, still disabled

2017-10-13 Thread Richard Biener
On Fri, Oct 13, 2017 at 9:21 AM, Alexandre Oliva  wrote:
> On Oct  9, 2017, Richard Biener  wrote:
>
>> The middle-end changes are ok.   The C FE change looks reasonable,
>> I'd appreciate a 2nd look at the C++ FE changes by a maintainer.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-09/msg02026.html
>
>
> Thanks a lot for the reviews!
>
> I've held off installing the approved patches and parts of this patch
> for the time being, awaiting the second look you requested (copying the
> C++ maintainers to get their attention), and also reviews for the
> subsequent patches (2 LVU and 2 IEPM).  Please let me know if you'd
> prefer me to go ahead and install the approved bits.

If the [SFN] is self-contained you can install that part once the approval
for the FE parts is in.  You can of course wait a bit for more reviews
(stopped short on LVU because of that all-targets touching patch ... ;))

Richard.

> --
> Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
> You must be the change you wish to see in the world. -- Gandhi
> Be Free! -- http://FSFLA.org/   FSF Latin America board member
> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PATCH C++] Fix PR82357 - bogus "cannot bind bitfield" error

2017-10-13 Thread Markus Trippelsdorf
r253266 introduced a bogus "cannot bind bitfield" error that breaks
building Chromium and Node.js.
Fix by removing the ugly goto.

Tested on ppc64le.
Ok for trunk?
Thanks.

PR c++/82357
* typeck.c (build_static_cast): Handle processing_template_decl
without using goto.

diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 08b2ae555e63..00688a21421c 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -7059,9 +7059,8 @@ build_static_cast (tree type, tree oexpr, tsubst_flags_t 
complain)
 
   bool dependent = (dependent_type_p (type)
|| type_dependent_expression_p (expr));
-  if (dependent)
+  if (dependent || processing_template_decl)
 {
-tmpl:
   expr = build_min (STATIC_CAST_EXPR, type, oexpr);
   /* We don't know if it will or will not have side effects.  */
   TREE_SIDE_EFFECTS (expr) = 1;
@@ -7084,8 +7083,6 @@ build_static_cast (tree type, tree oexpr, tsubst_flags_t 
complain)
  maybe_warn_about_useless_cast (type, expr, complain);
  maybe_warn_about_cast_ignoring_quals (type, complain);
}
-  if (processing_template_decl)
-   goto tmpl;
   return result;
 }
 
diff --git a/gcc/testsuite/g++.dg/template/bitfield4.C 
b/gcc/testsuite/g++.dg/template/bitfield4.C
new file mode 100644
index ..d53b0d406275
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/bitfield4.C
@@ -0,0 +1,6 @@
+// PR c++/82357
+
+template  struct A {
+  A() { x |= 0; } // { dg-bogus "cannot bind bitfield" }
+  int x : 8;
+};
-- 
Markus


Re: [PATCH] Improve rotate fold-const pattern matching (PR target/82498)

2017-10-13 Thread Richard Biener
On Fri, 13 Oct 2017, Marc Glisse wrote:

> On Fri, 13 Oct 2017, Richard Biener wrote:
> 
> > On Thu, 12 Oct 2017, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Marc in the PR mentioned that it is not really good that the recommended
> > > rotate pattern is recognized only during forwprop1 and later, which is
> > > after
> > > einline and that inlining or early opts could have changed stuff too much
> > > so
> > > that we wouldn't recogize it anymore.
> > 
> > Hmm, but the only thing functions see is inlining early optimized bodies
> > into them and then constant propagation performed, so I don't see how
> > we could confuse the pattern in a way to be indetectable.  Also
> 
> For instance, in
> 
> unsigned f(unsigned x, int y) { return (x << y) | (x >> (-y & 31)); }
> unsigned g(unsigned x) { return f(x << 2, 3); }
> 
> if we inline f into g before optimizing f, we might combine (x << 2) << 3 into
> x << 5, which would make it harder to recognize the rotation. For some reason
> that's not happening, but similar scenarios seemed possible to me. But if the
> inliner calls early optimizations (including forwprop) on the callee before
> inlining it, then my comment is indeed nonsense.

We are always inlining early optimized bodies (unless it is the recursion
edge in a cyclic callgraph).  So we're effectively re-optimizing any
code we inline as well.

> > early inlining is performed on early optimized bodies so cost metrics
> > see rotates, not the unrecognized form.
> 
> Ah, so there's a hidden forwprop pass in einline?

No.  The early optimization pipeline runs on each function separately
and the processing is done in an order that optimizes callees before
callers.  The first step is to perform inlining into the current
function, then we optimize the result.

> unsigned f(unsigned x, int y) {
>   unsigned z = x << y;
>   return z | (x >> (-y & 31));
> }
> unsigned g(unsigned x, int y) { return f(x, y); }
> 
> After einline, g has rotate but not f, that's rather confusing.

It indeed is ;)  Non-IPA pass dumps do not show function bodies
at a "single point in time".

Richard.


[PATCH][GRAPHITE] Some TLC

2017-10-13 Thread Richard Biener

Removing a global constructor, a return value that isn't checked
and adjusting testcases that spew -Waggressive-loop-optimization
warnings when built with different options.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-10-13  Richard Biener  

* graphite-isl-ast-to-gimple.c (max_mode_int_precision,
graphite_expression_type_precision): Avoid global constructor
by moving ...
(translate_isl_ast_to_gimple::translate_isl_ast_to_gimple): Here.
(translate_isl_ast_to_gimple::graphite_expr_type): Add type
member.
(translate_isl_ast_to_gimple::translate_isl_ast_node_for): Use it.
(translate_isl_ast_to_gimple::build_iv_mapping): Likewise.
(translate_isl_ast_to_gimple::graphite_create_new_guard): Likewise.
* graphite-sese-to-poly.c (build_original_schedule): Return nothing.

* gcc.dg/graphite/scop-10.c: Enlarge array to avoid undefined
behavior.
* gcc.dg/graphite/scop-7.c: Likewise.
* gcc.dg/graphite/scop-8.c: Likewise.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253707)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -58,15 +58,6 @@ along with GCC; see the file COPYING3.
 #include "tree-ssa.h"
 #include "graphite.h"
 
-/* We always try to use signed 128 bit types, but fall back to smaller types
-   in case a platform does not provide types of these sizes. In the future we
-   should use isl to derive the optimal type for each subexpression.  */
-
-static int max_mode_int_precision =
-  GET_MODE_PRECISION (int_mode_for_size (MAX_FIXED_MODE_SIZE, 0).require ());
-static int graphite_expression_type_precision = 128 <= max_mode_int_precision ?
-   128 : max_mode_int_precision;
-
 struct ast_build_info
 {
   ast_build_info()
@@ -143,8 +134,7 @@ enum phi_node_kind
 class translate_isl_ast_to_gimple
 {
  public:
-  translate_isl_ast_to_gimple (sese_info_p r)
-: region (r), codegen_error (false) { }
+  translate_isl_ast_to_gimple (sese_info_p r);
   edge translate_isl_ast (loop_p context_loop, __isl_keep isl_ast_node *node,
  edge next_e, ivs_params &ip);
   edge translate_isl_ast_node_for (loop_p context_loop,
@@ -235,8 +225,24 @@ private:
 
   /* A vector of all the edges at if_condition merge points.  */
   auto_vec merge_points;
+
+  tree graphite_expr_type;
 };
 
+translate_isl_ast_to_gimple::translate_isl_ast_to_gimple (sese_info_p r)
+  : region (r), codegen_error (false)
+{
+  /* We always try to use signed 128 bit types, but fall back to smaller types
+ in case a platform does not provide types of these sizes. In the future we
+ should use isl to derive the optimal type for each subexpression.  */
+  int max_mode_int_precision
+= GET_MODE_PRECISION (int_mode_for_size (MAX_FIXED_MODE_SIZE, 0).require 
());
+  int graphite_expr_type_precision
+= 128 <= max_mode_int_precision ?  128 : max_mode_int_precision;
+  graphite_expr_type
+= build_nonstandard_integer_type (graphite_expr_type_precision, 0);
+}
+
 /* Return the tree variable that corresponds to the given isl ast identifier
expression (an isl_ast_expr of type isl_ast_expr_id).
 
@@ -702,8 +708,7 @@ translate_isl_ast_node_for (loop_p conte
edge next_e, ivs_params &ip)
 {
   gcc_assert (isl_ast_node_get_type (node) == isl_ast_node_for);
-  tree type
-= build_nonstandard_integer_type (graphite_expression_type_precision, 0);
+  tree type = graphite_expr_type;
 
   isl_ast_expr *for_init = isl_ast_node_for_get_init (node);
   tree lb = gcc_expression_from_isl_expression (type, for_init, ip);
@@ -742,8 +747,7 @@ build_iv_mapping (vec iv_map, gimp
   for (i = 1; i < isl_ast_expr_get_op_n_arg (user_expr); i++)
 {
   arg_expr = isl_ast_expr_get_op_arg (user_expr, i);
-  tree type =
-   build_nonstandard_integer_type (graphite_expression_type_precision, 0);
+  tree type = graphite_expr_type;
   tree t = gcc_expression_from_isl_expression (type, arg_expr, ip);
 
   /* To fail code generation, we generate wrong code until we discard it.  
*/
@@ -841,8 +845,7 @@ edge translate_isl_ast_to_gimple::
 graphite_create_new_guard (edge entry_edge, __isl_take isl_ast_expr *if_cond,
   ivs_params &ip)
 {
-  tree type =
-build_nonstandard_integer_type (graphite_expression_type_precision, 0);
+  tree type = graphite_expr_type;
   tree cond_expr = gcc_expression_from_isl_expression (type, if_cond, ip);
 
   /* To fail code generation, we generate wrong code until we discard it.  */
Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 253707)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -1194,7 +1194,7 @@ build_schedule_loop_nest (scop_p scop, i
 

RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-10-13 Thread Tsimbalist, Igor V
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Friday, October 13, 2017 10:02 AM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> 
> On Thu, Oct 12, 2017 at 8:54 PM, Tsimbalist, Igor V
>  wrote:
> > Attached is an updated patch according to your comments. New tests are
> > added to test ICF optimization in presence of nocf_check attribute.
> --- a/gcc/testsuite/c-c++-common/fcf-protection-2.c
> +++ b/gcc/testsuite/c-c++-common/fcf-protection-2.c
> @@ -1,4 +1,4 @@
>  /* { dg-do compile } */
>  /* { dg-options "-fcf-protection=branch" } */
> -/* { dg-error "'-fcf-protection=branch' is not supported for this target" "" 
> {
> target { "i?86-*-* x86_64-*-*" } } 0 } */
> +/* { dg-error "'-fcf-protection=branch' requires CET support on this
> target. Use -mcet or one of -mibt, -mshstk options to enable CET" "" { target 
> {
> "i?86-*-* x86_64-*-*" } } 0 } */
> 
> Checking for "-fcf-protection=branch' requires CET support on this target"
> should be enough. No need to check the whole message here and in other
> tests.

Fixed as you suggested. Also shortened the checking string for ignoring the
attribute in attr-nocf-check-1.c and attr-nocf-check-3.c.

>  /* { dg-error "'-fcf-protection=branch' is not supported for this target" "" 
> {
> target { ! "i?86-*-* x86_64-*-*" } } 0 } */ diff --git a/gcc/testsuite/c-c++-
> common/fcf-protection-3.c
> b/gcc/testsuite/c-c++-common/fcf-protection-3.c
> 
> 
> --- a/gcc/testsuite/c-c++-common/fcf-protection-4.c
> +++ b/gcc/testsuite/c-c++-common/fcf-protection-4.c
> @@ -1,4 +1,4 @@
>  /* { dg-do compile } */
>  /* { dg-options "-fcf-protection=none" } */
> -/* { dg-bogus "'-fcf-protection=none' is not supported for this target" "" {
> target { "i?86-*-* x86_64-*-*" } } 0 } */
> +/* { dg-bogus "'-fcf-protection=none' res CET support on this target.
> Use -mcet or one of -mibt, -mshstk options to enable CET" "" { target { "i?86-
> *-* x86_64-*-*" } } 0 } */
>  /* { dg-bogus "'-fcf-protection=none' is not supported for this target" "" {
> target { ! "i?86-*-* x86_64-*-*" } } 0 } */ diff --git a/gcc/testsuite/c-c++-
> common/fcf-protection-5.c
> b/gcc/testsuite/c-c++-common/fcf-protection-5.c
> 
> The above test checks for bogus messages? -fcf-protection=none option
> should not generate any messages. So, the test should check that -fcf-
> protection=none doesn't generate any error. (And, there is a typo in the
> message, /s/res/requires.)

The gcc documentation says about dg-bogus

This DejaGnu directive appears on a source line that should not get a message
matching regexp...

I decided to use dg-bogus to check the absence of the error. Now I removed both
lines as any additional messages should be caught as an extra messages. Actually
I will update the fcf-protection-4.c test in the generic patch.

Updated patch is attached. 

Igor

> Uros.
> 
> > Igor
> >
> >
> >> -Original Message-
> >> From: Tsimbalist, Igor V
> >> Sent: Tuesday, September 19, 2017 11:30 PM
> >> To: Uros Bizjak 
> >> Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> >> 
> >> Subject: RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> >>
> >> > -Original Message-
> >> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> > ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> > Sent: Tuesday, September 19, 2017 6:13 PM
> >> > To: Tsimbalist, Igor V 
> >> > Cc: gcc-patches@gcc.gnu.org
> >> > Subject: Re:
> >> > 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> >> >
> >> > On Tue, Sep 19, 2017 at 5:18 PM, Tsimbalist, Igor V
> >> >  wrote:
> >> > >> -Original Message-
> >> > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> > >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> > >> Sent: Monday, September 18, 2017 12:17 PM
> >> > >> To: gcc-patches@gcc.gnu.org
> >> > >> Cc: Tsimbalist, Igor V ;
> >> > >> Tsimbalist, Igor V 
> >> > >> Subject: Re:
> >> > >> 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> >> > >>
> >> > >> Hello!
> >> > >>
> >> > >> > gcc/testsuite/
> >> > >> >
> >> > >> > * g++.dg/cet-notrack-1.C: New test.
> >> > >> > * gcc.target/i386/cet-intrin-1.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-10.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-2.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-3.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-4.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-5.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-6.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-7.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-8.c: Likewise.
> >> > >> > * gcc.target/i386/cet-intrin-9.c: Likewise.
> >> > >> > * gcc.target/i386/cet-label.c: Likewise.
> >> > >> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
> >> > >> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
> >> > >> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
> >> > >> > * gcc.target/i386/

[PATCH][GRAPHITE] Fix SSA update

2017-10-13 Thread Richard Biener

This is something I wanted to do later just as compile-time optimization
but it turns out it is necessary for correctness if we want to keep
the current order of creating SCOPs and analyzing data references and
parameters and only after that code-generating SCOPs that were optimized.

This is because what SSA names are the parameters really depens on the
IL and liveout PHIs for transformed SESE regions changes the situation
enough that use "stale" information.

Of course the issue isn't one if we do the transform in "one step"
because update-SSA can just cope with that.

In the process I simplified the main graphite function to inline
the initialize/finalize stuff, removing a weird parameter that
ended up PASSing gcc.dg/graphite/pr35356-3.c (with just one loop)
so I XFAILed that again.

I've added gcc.dg/graphite/pr81373-2.c which ICEs before this patch.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC is happy,
applied.

Richard.

2017-10-13  Richard Biener  

* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::get_rename_from_scev): Remove unused
parameters and dominance check.
(translate_isl_ast_to_gimple::graphite_copy_stmts_from_block): Adjust.
(translate_isl_ast_to_gimple::copy_bb_and_scalar_dependences): Likewise.
(translate_isl_ast_to_gimple::graphite_regenerate_ast_isl):
Do not update SSA form here or do intermediate IL verification.
* graphite.c: Include tree-ssa.h and tree-into-ssa.h.
(graphite_initialize): Remove check on the number of loops in
the function and inline into graphite_transform_loops.
(graphite_finalize): Inline into graphite_transform_loops.
(graphite_transform_loops): Perform SSA update and IL verification
here.
* params.def (PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION): Remove.

* gcc.dg/graphite/pr35356-3.c: XFAIL again.
* gcc.dg/graphite/pr81373-2.c: Copy from gcc.dg/graphite/pr81373.c
with alternate flags.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253719)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -189,7 +189,6 @@ class translate_isl_ast_to_gimple
   __isl_give isl_ast_node * scop_to_isl_ast (scop_p scop);
 
   tree get_rename_from_scev (tree old_name, gimple_seq *stmts, loop_p loop,
-basic_block new_bb, basic_block old_bb,
 vec iv_map);
   bool graphite_copy_stmts_from_block (basic_block bb, basic_block new_bb,
   vec iv_map);
@@ -1084,7 +1083,6 @@ gsi_insert_earliest (gimple_seq seq)
 
 tree translate_isl_ast_to_gimple::
 get_rename_from_scev (tree old_name, gimple_seq *stmts, loop_p loop,
- basic_block new_bb, basic_block,
  vec iv_map)
 {
   tree scev = scalar_evolution_in_region (region->region, loop, old_name);
@@ -1113,16 +,6 @@ get_rename_from_scev (tree old_name, gim
   return build_zero_cst (TREE_TYPE (old_name));
 }
 
-  if (TREE_CODE (new_expr) == SSA_NAME)
-{
-  basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (new_expr));
-  if (bb && !dominated_by_p (CDI_DOMINATORS, new_bb, bb))
-   {
- set_codegen_error ();
- return build_zero_cst (TREE_TYPE (old_name));
-   }
-}
-
   /* Replace the old_name with the new_expr.  */
   return force_gimple_operand (unshare_expr (new_expr), stmts,
   true, NULL_TREE);
@@ -1245,8 +1233,7 @@ graphite_copy_stmts_from_block (basic_bl
  {
gimple_seq stmts = NULL;
new_name = get_rename_from_scev (old_name, &stmts,
-bb->loop_father,
-new_bb, bb, iv_map);
+bb->loop_father, iv_map);
if (! codegen_error_p ())
  gsi_insert_earliest (stmts);
new_expr = &new_name;
@@ -1361,7 +1348,7 @@ copy_bb_and_scalar_dependences (basic_bl
  gimple_seq stmts = NULL;
  tree new_name = get_rename_from_scev (arg, &stmts,
bb->loop_father,
-   new_bb, bb, iv_map);
+   iv_map);
  if (! codegen_error_p ())
gsi_insert_earliest (stmts);
  arg = new_name;
@@ -1567,17 +1554,6 @@ graphite_regenerate_ast_isl (scop_p scop
 if_region->true_region->region.exit);
   if (dump_file)
fprintf (dump_file, "[codegen] isl AST to Gimple succeeded.\n");
-
-  mark_virtual_operands_for_renaming (cfun);
-  update_ssa (TODO_update_ssa);
-  checking_verify_ssa (true, true);
-

Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-13 Thread Martin Liška
On 10/12/2017 01:34 PM, Jakub Jelinek wrote:
> On Thu, Oct 12, 2017 at 01:13:56PM +0200, Martin Liška wrote:
>> +  if (a1 == a2)
>> +return;
>> +
>> +  uptr shadow_offset1, shadow_offset2;
>> +  bool valid1, valid2;
>> +  {
>> +ThreadRegistryLock l(&asanThreadRegistry());
>> +
>> +valid1 = GetStackVariableBeginning(a1, &shadow_offset1);
>> +valid2 = GetStackVariableBeginning(a2, &shadow_offset2);
>> +  }
>> +
>> +  if (valid1 && valid2) {
>> +if (shadow_offset1 == shadow_offset2)
>> +  return;
>>}
>> +  else if (!valid1 && !valid2) {
>> +AsanChunkView chunk1 = FindHeapChunkByAddress(a1);
>> +AsanChunkView chunk2 = FindHeapChunkByAddress(a2);
>> +valid1 = chunk1.IsAllocated();
>> +valid2 = chunk2.IsAllocated();
>> +
>> +if (valid1 && valid2) {
>> +  if (chunk1.Eq(chunk2))
>> +return;
>> +}
>> +else if (!valid1 && !valid2) {
>> +  uptr offset = a1 < a2 ? a2 - a1 : a1 - a2;
>> +  uptr left = a1 < a2 ? a1 : a2;
>> +  if (offset <= 2048) {
>> +if (__asan_region_is_poisoned (left, offset) == 0)
>> +  return;
>> +else {
>> +  GET_CALLER_PC_BP_SP;
>> +  ReportInvalidPointerPair(pc, bp, sp, a1, a2);
>> +  return;
>> +}
> 
> My assumption was that this offset/left stuff would be done
> right after the if (a1 == a2) above, i.e. after the most common
> case - equality comparison, handle the case of pointers close to each other
> without any expensive locking and data structure lookup (also, you only need
> to compute left if offset is <= 2048).  Only for larger distances, you'd
> go through the other cases.  I.e. see if one or both pointers point
> into a stack variable, or heap, or global registered variable.
> 
> For those 3 cases, I wonder if pairs of valid1 = ...; valid2 = ...;
> is what is most efficient.  What we really want is to error out if
> one pointer is valid in any of the categories, but the other is
> not.  So, isn't the best general algorithm something like:
>   if (a1 == a2)
> return;
>   if (distance <= 2048)
> {
>   if (not poisoned area)
>   return;
> }
>   else if (heap (a1))
> {
>   if (heap (a2) && same_heap_object)
>   return;
> }
>   else if (stackvar (a1))
> {
>   if (stackvar (a2) && same_stackvar)
>   return;
> }
>   else if (globalvar (a1))
> {
>   if (globalvar (a2) && same_globalvar)
> return;
> }
>   else
> return; /* ??? We don't know what the pointers point to, punt.
>Or perhaps try the not poisoned area check even for
>slightly larger distances (like 16K) and only punt
>for larger?  */
>   error;
> 
> ?  What order of the a1 tests should be done depends on how expensive
> those tests are and perhaps some data gathering on real-world apps
> on what pointers in the comparisons/subtracts are most common.
> 
>   Jakub
> 

Thanks Jakub for valuable feedback. I'm sending new version where I implemented 
what you pointed.
I also moved builtin created to FE and fixed quite some bugs I seen on postgres 
database.

I guess I should slowly start with review process of libsanitizer changes. They 
are quite some.

Martin
>From 28401b05e8da21e57c8c0388fcc71fcbf34e0002 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Oct 2017 12:14:25 +0200
Subject: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

gcc/ChangeLog:

2017-10-13  Martin Liska  

	* doc/invoke.texi: Document the options.
	* flag-types.h (enum sanitize_code): Add
	SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* ipa-inline.c (sanitize_attrs_match_for_inline_p): Add handling
	of SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* opts.c: Define new sanitizer options.
	* sanitizer.def (BUILT_IN_ASAN_POINTER_COMPARE): Likewise.
	(BUILT_IN_ASAN_POINTER_SUBTRACT): Likewise.

gcc/c/ChangeLog:

2017-10-13  Martin Liska  

	* c-typeck.c (pointer_diff): Add new argument and instrument
	pointer subtraction.
	(build_binary_op): Similar for pointer comparison.

gcc/testsuite/ChangeLog:

2017-10-13  Martin Liska  

	* gcc.dg/asan/pointer-compare-1.c: New test.
	* gcc.dg/asan/pointer-compare-2.c: New test.
	* gcc.dg/asan/pointer-subtract-1.c: New test.
	* gcc.dg/asan/pointer-subtract-2.c: New test.
---
 gcc/c/c-typeck.c   | 31 +--
 gcc/doc/invoke.texi| 22 
 gcc/flag-types.h   |  2 +
 gcc/ipa-inline.c   |  8 ++-
 gcc/opts.c | 15 +
 gcc/sanitizer.def  |  4 ++
 gcc/testsuite/gcc.dg/asan/pointer-compare-1.c  | 77 ++
 gcc/testsuite/gcc.dg/asan/pointer-compare-2.c  | 72 
 gcc/testsuite/gcc.dg/asan/pointer-subtract-1.c | 41 ++
 gcc/testsuite/gcc.dg/asan/pointer-subtract-2.c | 32 +++
 libsanitizer/asan/asan_descriptions.cc | 29 ++
 lib

Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-13 Thread Jakub Jelinek
On Fri, Oct 13, 2017 at 01:01:37PM +0200, Martin Liška wrote:
> @@ -3826,6 +3827,18 @@ pointer_diff (location_t loc, tree op0, tree op1)
>  pedwarn (loc, OPT_Wpointer_arith,
>"pointer to a function used in subtraction");
>  
> +  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
> +{
> +  gcc_assert (current_function_decl != NULL_TREE);
> +
> +  op0 = c_fully_fold (op0, false, NULL);
> +  op1 = c_fully_fold (op1, false, NULL);
> +
> +  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
> +  *instrument_expr
> + = build_call_expr_loc (loc, tt, 2, op0, op1);
> +}

If op0 or op1 have some embedded side-effects, won't you evaluate them
multiple times?  I'd expect you need to c_save_expr them and make sure the
actual pointer difference expression uses the result of the save expr too.

> --- a/libsanitizer/asan/asan_report.cc
> +++ b/libsanitizer/asan/asan_report.cc
> @@ -344,14 +344,68 @@ static INLINE void CheckForInvalidPointerPair(void *p1, 
> void *p2) {
>if (!flags()->detect_invalid_pointer_pairs) return;
>uptr a1 = reinterpret_cast(p1);
>uptr a2 = reinterpret_cast(p2);
> -  AsanChunkView chunk1 = FindHeapChunkByAddress(a1);
> -  AsanChunkView chunk2 = FindHeapChunkByAddress(a2);
> -  bool valid1 = chunk1.IsAllocated();
> -  bool valid2 = chunk2.IsAllocated();
> -  if (!valid1 || !valid2 || !chunk1.Eq(chunk2)) {
> -GET_CALLER_PC_BP_SP;
> -return ReportInvalidPointerPair(pc, bp, sp, a1, a2);
> +
> +  if (a1 == a2)
> +return;
> +
> +  uptr offset = a1 < a2 ? a2 - a1 : a1 - a2;
> +  uptr left = a1 < a2 ? a1 : a2;
> +  uptr right = a1 < a2 ? a2 : a1;
> +  if (offset <= 2048) {
> +if (__asan_region_is_poisoned (left, offset) == 0)
> +  return;
> +  }

If offset <= 2048 and something is poisoned in between, then it is
clearly a failure, so you should either goto the error or duplicate
the two lines inside of the outer if above.

> +
> +  uptr shadow_offset1, shadow_offset2;
> +  bool valid1, valid2;
> +  {
> +ThreadRegistryLock l(&asanThreadRegistry());
> +valid1 = GetStackVariableBeginning(left, &shadow_offset1);
>}
> +
> +  // check whether a1 is a stack memory pointer
> +  if (valid1) {
> +{
> +  ThreadRegistryLock l(&asanThreadRegistry());
> +  valid2 = GetStackVariableBeginning(right - 1, &shadow_offset2);

Why do you take the registery lock twice?  Just do:
  {
ThreadRegistryLock l(&asanThreadRegistry());
if (GetStackVariableBeginning(left, &shadow_offset1))
  {
if (GetStackVariableBeginning(right - 1, &shadow_offset2) &&
shadow_offset1 == shadow_offset2)
  return;
// goto do_error; or:
GET_CALLER_PC_BP_SP;
ReportInvalidPointerPair(pc, bp, sp, a1, a2);
return;
  }
  }

> +}
> +
> +if (valid2 && shadow_offset1 == shadow_offset2)
> +  return;
> +  }
> +  // check whether a1 is a heap memory address
> +  else {
> +HeapAddressDescription hdesc1;
> +valid1 = GetHeapAddressInformation(a1, 0, &hdesc1);
> +
> +if (valid1 && hdesc1.chunk_access.access_type == kAccessTypeInside) {
> +  HeapAddressDescription hdesc2;
> +  valid2 = GetHeapAddressInformation(a2, 0, &hdesc2);

Again, no need for valid1/valid2.  Why do you use above left and right - 1
and here a1 and a2?  Shouldn't it be always left and right - 1?

Jakub


[PATCH] Tree structure marking

2017-10-13 Thread Nathan Sidwell

In figuring out a problem with CODE_CONTAINS_STRUCT I noticed that:

1) the tree_contains_struct array is unsigned char.  bool seems a better 
choice now we're in C++-land.


2) the MARK_TS_FOO macros used the 'do ... while (0)' idiom.  But 
there's no need for such verbosity.  These are 'ary[index] = true' 
setters -- other setters don't use  do while:

#define SET_DECL_ASSEMBLER_NAME(NODE, NAME) \
  (DECL_ASSEMBLER_NAME_RAW (NODE) = (NAME))

We can combine them with the comma operator, not make separate statements.

Fixed thusly, ok?

nathan
--
Nathan Sidwell
2017-10-13  Nathan Sidwell  

	* tree-core.h (tree_contains_struct): Make bool.
	* tree.c (tree_contains_struct): Likewise.
	* tree.h (MARK_TS_BASE): Remove do ... while (0) idiom.
	(MARK_TS_TYPED, MARK_TS_COMMON, MARK_TS_TYPE_COMMON,
	MARK_TS_TYPE_WITH_LANG_SPECIFIC, MARK_TS_DECL_MINIMAL,
	MARK_TS_DECL_COMMON, MARK_TS_DECL_WRTL, MARK_TS_DECL_WITH_VIS,
	MARK_TS_DECL_NON_COMMON): Likewise, use comma operator.

Index: tree-core.h
===
--- tree-core.h	(revision 253695)
+++ tree-core.h	(working copy)
@@ -2058,7 +2058,7 @@ struct floatn_type_info {
 Global variables
 ---*/
 /* Matrix describing the structures contained in a given tree code.  */
-extern unsigned char tree_contains_struct[MAX_TREE_CODES][64];
+extern bool tree_contains_struct[MAX_TREE_CODES][64];
 
 /* Class of tree given its code.  */
 extern const enum tree_code_class tree_code_type[];
Index: tree.c
===
--- tree.c	(revision 253695)
+++ tree.c	(working copy)
@@ -259,7 +259,7 @@ tree integer_types[itk_none];
 bool int_n_enabled_p[NUM_INT_N_ENTS];
 struct int_n_trees_t int_n_trees [NUM_INT_N_ENTS];
 
-unsigned char tree_contains_struct[MAX_TREE_CODES][64];
+bool tree_contains_struct[MAX_TREE_CODES][64];
 
 /* Number of operands for each OpenMP clause.  */
 unsigned const char omp_clause_num_ops[] =
Index: tree.h
===
--- tree.h	(revision 253695)
+++ tree.h	(working copy)
@@ -76,64 +76,43 @@ as_internal_fn (combined_fn code)
 
 /* Macros for initializing `tree_contains_struct'.  */
 #define MARK_TS_BASE(C)	\
-  do {			\
-tree_contains_struct[C][TS_BASE] = 1;		\
-  } while (0)
+  (tree_contains_struct[C][TS_BASE] = true)
 
 #define MARK_TS_TYPED(C)\
-  do {			\
-MARK_TS_BASE (C);	\
-tree_contains_struct[C][TS_TYPED] = 1;		\
-  } while (0)
+  (MARK_TS_BASE (C),	\
+   tree_contains_struct[C][TS_TYPED] = true)
 
 #define MARK_TS_COMMON(C)\
-  do {			\
-MARK_TS_TYPED (C);	\
-tree_contains_struct[C][TS_COMMON] = 1;		\
-  } while (0)
+  (MARK_TS_TYPED (C),	\
+   tree_contains_struct[C][TS_COMMON] = true)
 
 #define MARK_TS_TYPE_COMMON(C)\
-  do {			\
-MARK_TS_COMMON (C);	\
-tree_contains_struct[C][TS_TYPE_COMMON] = 1;	\
-  } while (0)
+  (MARK_TS_COMMON (C),	\
+   tree_contains_struct[C][TS_TYPE_COMMON] = true)
 
 #define MARK_TS_TYPE_WITH_LANG_SPECIFIC(C)		\
-  do {			\
-MARK_TS_TYPE_COMMON (C);\
-tree_contains_struct[C][TS_TYPE_WITH_LANG_SPECIFIC] = 1;	\
-  } while (0)
+  (MARK_TS_TYPE_COMMON (C),\
+   tree_contains_struct[C][TS_TYPE_WITH_LANG_SPECIFIC] = true)
 
 #define MARK_TS_DECL_MINIMAL(C)\
-  do {			\
-MARK_TS_COMMON (C);	\
-tree_contains_struct[C][TS_DECL_MINIMAL] = 1;	\
-  } while (0)
+  (MARK_TS_COMMON (C),	\
+   tree_contains_struct[C][TS_DECL_MINIMAL] = true)
 
 #define MARK_TS_DECL_COMMON(C)\
-  do {			\
-MARK_TS_DECL_MINIMAL (C);\
-tree_contains_struct[C][TS_DECL_COMMON] = 1;	\
-  } while (0)
+  (MARK_TS_DECL_MINIMAL (C),\
+   tree_contains_struct[C][TS_DECL_COMMON] = true)
 
 #define MARK_TS_DECL_WRTL(C)\
-  do {			\
-MARK_TS_DECL_COMMON (C);\
-tree_contains_struct[C][TS_DECL_WRTL] = 1;		\
-  } while (0)
+  (MARK_TS_DECL_COMMON (C),\
+   tree_contains_struct[C][TS_DECL_WRTL] = true)
 
 #define MARK_TS_DECL_WITH_VIS(C)			\
-  do {			\
-MARK_TS_DECL_WRTL (C);\
-tree_contains_struct[C][TS_DECL_WITH_VIS] = 1;	\
-  } while (0)
+  (MARK_TS_DECL_WRTL (C),\
+   tree_contains_struct[C][TS_DECL_WITH_VIS] = true)
 
 #define MARK_TS_DECL_NON_COMMON(C)			\
-  do {			\
-MARK_TS_DECL_WITH_VIS (C);\
-tree_contains_struct[C][TS_DECL_NON_COMMON] = 1;	\
-  } while (0)
-
+  (MARK_TS_DECL_WITH_VIS (C),\
+   tree_contains_struct[C][TS_DECL_NON_COMMON] = true)
 
 /* Returns the string representing CLASS.  */
 


Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-10-13 Thread Uros Bizjak
On Fri, Oct 13, 2017 at 12:56 PM, Tsimbalist, Igor V
 wrote:
>> -Original Message-
>> From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> Sent: Friday, October 13, 2017 10:02 AM
>> To: Tsimbalist, Igor V 
>> Cc: gcc-patches@gcc.gnu.org
>> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
>>
>> On Thu, Oct 12, 2017 at 8:54 PM, Tsimbalist, Igor V
>>  wrote:
>> > Attached is an updated patch according to your comments. New tests are
>> > added to test ICF optimization in presence of nocf_check attribute.
>> --- a/gcc/testsuite/c-c++-common/fcf-protection-2.c
>> +++ b/gcc/testsuite/c-c++-common/fcf-protection-2.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do compile } */
>>  /* { dg-options "-fcf-protection=branch" } */
>> -/* { dg-error "'-fcf-protection=branch' is not supported for this target" 
>> "" {
>> target { "i?86-*-* x86_64-*-*" } } 0 } */
>> +/* { dg-error "'-fcf-protection=branch' requires CET support on this
>> target. Use -mcet or one of -mibt, -mshstk options to enable CET" "" { 
>> target {
>> "i?86-*-* x86_64-*-*" } } 0 } */
>>
>> Checking for "-fcf-protection=branch' requires CET support on this target"
>> should be enough. No need to check the whole message here and in other
>> tests.
>
> Fixed as you suggested. Also shortened the checking string for ignoring the
> attribute in attr-nocf-check-1.c and attr-nocf-check-3.c.
>
>>  /* { dg-error "'-fcf-protection=branch' is not supported for this target" 
>> "" {
>> target { ! "i?86-*-* x86_64-*-*" } } 0 } */ diff --git a/gcc/testsuite/c-c++-
>> common/fcf-protection-3.c
>> b/gcc/testsuite/c-c++-common/fcf-protection-3.c
>>
>>
>> --- a/gcc/testsuite/c-c++-common/fcf-protection-4.c
>> +++ b/gcc/testsuite/c-c++-common/fcf-protection-4.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do compile } */
>>  /* { dg-options "-fcf-protection=none" } */
>> -/* { dg-bogus "'-fcf-protection=none' is not supported for this target" "" {
>> target { "i?86-*-* x86_64-*-*" } } 0 } */
>> +/* { dg-bogus "'-fcf-protection=none' res CET support on this target.
>> Use -mcet or one of -mibt, -mshstk options to enable CET" "" { target { 
>> "i?86-
>> *-* x86_64-*-*" } } 0 } */
>>  /* { dg-bogus "'-fcf-protection=none' is not supported for this target" "" {
>> target { ! "i?86-*-* x86_64-*-*" } } 0 } */ diff --git a/gcc/testsuite/c-c++-
>> common/fcf-protection-5.c
>> b/gcc/testsuite/c-c++-common/fcf-protection-5.c
>>
>> The above test checks for bogus messages? -fcf-protection=none option
>> should not generate any messages. So, the test should check that -fcf-
>> protection=none doesn't generate any error. (And, there is a typo in the
>> message, /s/res/requires.)
>
> The gcc documentation says about dg-bogus
>
> This DejaGnu directive appears on a source line that should not get a message
> matching regexp...
>
> I decided to use dg-bogus to check the absence of the error. Now I removed 
> both
> lines as any additional messages should be caught as an extra messages. 
> Actually
> I will update the fcf-protection-4.c test in the generic patch.
>
> Updated patch is attached.

OK.

Thanks,
Uros.


Re: [PATCH] Tree structure marking

2017-10-13 Thread Richard Biener
On Fri, 13 Oct 2017, Nathan Sidwell wrote:

> In figuring out a problem with CODE_CONTAINS_STRUCT I noticed that:
> 
> 1) the tree_contains_struct array is unsigned char.  bool seems a better
> choice now we're in C++-land.
> 
> 2) the MARK_TS_FOO macros used the 'do ... while (0)' idiom.  But there's no
> need for such verbosity.  These are 'ary[index] = true' setters -- other
> setters don't use  do while:
> #define SET_DECL_ASSEMBLER_NAME(NODE, NAME) \
>   (DECL_ASSEMBLER_NAME_RAW (NODE) = (NAME))
> 
> We can combine them with the comma operator, not make separate statements.
> 
> Fixed thusly, ok?

OK.

Richard.

> nathan
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Add gnu::unique_ptr

2017-10-13 Thread Pedro Alves
On 10/13/2017 10:26 AM, Richard Biener wrote:
> On Fri, Oct 13, 2017 at 2:40 AM, David Malcolm  wrote:
>> From: Trevor Saunders 
>>
>> I had a go at updating Trevor's unique_ptr patch from July
>> ( https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02084.html )
>>
>> One of the sticking points was what to call the namespace; there was
>> wariness about using "gtl" as the name.
>>
>> Richi proposed (https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00155.html):
>>> If it should be short use g::.  We can also use gnu:: I guess and I
>>> agree gnutools:: is a little long (similar to libiberty::).  Maybe
>>> gt:: as a short-hand for gnutools.
>>
>> Pedro noted (https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00157.html):
>>> Exactly 3 letters has the nice property of making s/gtl::foo/std::foo/
>>> super trivial down the road; you don't have to care about reindenting
>>> stuff
>>
>> Hence this version of the patch uses "gnu::" - 3 letters, one of the
>> ones Richi proposed, and *not* a match for ".tl" (e.g. "gtl");
>> (FWIW personally "gnu::" is my favorite, followed by "gcc::").
>>
>> The include/unique-ptr.h in this patch is identical to that posted
>> by Trevor in July, with the following changes (by me):
>> - renaming of "gtl" to "gnu"
>> - renaming of DEFINE_GDB_UNIQUE_PTR to DEFINE_GNU_UNIQUE_PTR
>> - renaming of xfree_deleter to xmalloc_deleter, and making it
>>   use "free" rather than "xfree" (which doesn't exist)
>>
>> I also went and added a gcc/unique-ptr-tests.cc file containing
>> selftests (my thinking here is that although std::unique_ptr ought
>> to already be well-tested, we need to ensure that the fallback
>> implementation is sane when building with C++ prior to C++11).
>>
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu,
>> using gcc 4.8 for the initial bootstrap (hence testing both gnu+03
>> and then gnu++14 in the selftests, for stage 1 and stages 2 and 3
>> respectively).
>>
>> I also manually tested selftests with both gcc 4.8 and trunk on
>> the same hardware (again, to exercise both the with/without C++11
>> behavior).
>>
>> Tested with "make selftest-valgrind" (no new issues).
>>
>> OK for trunk?
> 
> Ok if the gdb folks approve 

The new tests looks good to me.  [ The class too, obviously. :-) ]

(and will change to this version).

GDB used this emulation/shim in master for a period, but it
no longer does, because GDB switched to requiring
C++11 (gcc >= 4.8) instead meanwhile...  I.e., GDB uses
standard std::unique_ptr nowadays.

GDB still uses gdb::unique_xmalloc_ptr though.  Might make
sense to switch to using gnu::unique_xmalloc_ptr instead.

GDB does have an xfree function that we call instead
of free throughout (that's where the reference David fixed
comes from).  We can most probably just switch to free too nowadays.

Also, gdb nowadays also has a gdb::unique_xmalloc_ptr specialization
that this patch is missing (because the specialization was added
after switching to C++11):

  [pushed] Allow gdb::unique_xmalloc_ptr
  https://sourceware.org/ml/gdb-patches/2017-08/msg00212.html

So we'd need that before switching.  I don't recall off hand whether
it's easy to make that work with the C++03 version.

Thanks,
Pedro Alves



RE: [PATCH 13/22] Enable building libstdc++-v3 with Intel CET

2017-10-13 Thread Tsimbalist, Igor V
Added libstd...@gcc.gnu.org


> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Thursday, October 12, 2017 10:24 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Jeff Law ; jwak...@redhat.com; Tsimbalist, Igor V
> 
> Subject: [PATCH 13/22] Enable building libstdc++-v3 with Intel CET
> 
> Enable building libstdc++v3 with CET options.
> 
> libstdc++-v3/
>   * acinclude.m4: Add cet.m4.
>   * configure.ac: Set CET_FLAGS. Update EXTRA_CFLAGS.
>   * libsupc++/Makefile.am: Add EXTRA_CFLAGS.
>   * Makefile.in: Regenerate.
>   * configure: Likewise.
>   * doc/Makefile.in: Likewise.
>   * include/Makefile.in: Likewise.
>   * libsupc++/Makefile.in: Likewise.
>   * po/Makefile.in: Likewise.
>   * python/Makefile.in: Likewise.
>   * src/Makefile.in: Likewise.
>   * src/c++11/Makefile.in: Likewise.
>   * src/c++98/Makefile.in: Likewise.
>   * src/filesystem/Makefile.in: Likewise.
>   * testsuite/Makefile.in: Likewise.



[PATCH] Document -fdump-tree-vect option

2017-10-13 Thread Jonathan Wakely

This adds an item to the list of options for the -fdump-tree option,
because we show an example using 'vect' but don't document it.

OK for trunk?

Is there any logic to the order of those options? Would it makes sense
to order them alphabetically?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 649e2e8a303..491e8f0c694 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-10-13  Jonathan Wakely  
+
+	* doc/invoke.texi (-fdump-tree): Document 'vect' option.
+
 2017-10-13  Richard Biener  
 
 	* graphite-isl-ast-to-gimple.c
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4e7dfb33c31..0ff05445143 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13323,7 +13323,7 @@ instead of the auto named dump files.  If the @samp{-@var{options}}
 form is used, @var{options} is a list of @samp{-} separated options
 which control the details of the dump.  Not all options are applicable
 to all dumps; those that are not meaningful are ignored.  The
-following options are available
+following options are available:
 
 @table @samp
 @item address
@@ -13386,6 +13386,8 @@ passes).
 @item note
 Enable other detailed optimization information (only available in
 certain passes).
+@item vect
+Enable showing vectorization information (only available in certain passes).
 @item =@var{filename}
 Instead of an auto named dump file, output into the given file
 name. The file names @file{stdout} and @file{stderr} are treated


[PATCH] PR libstdc++/82481 Suppress clang-tidy warnings

2017-10-13 Thread Jonathan Wakely

Clang-tidy correctly notices that after exiting std:call_once a couple
of globals are left pointing to local variables that go out of scope.
This isn't a problem, because the globals are only ever used by
std::call_once and it will re-init them before the next use. Zero the
variables out just for clang-tidy, to prevent the warnings.

PR libstdc++/82481
* include/std/mutex (call_once): Suppress clang-tidy warnings about
dangling references.

Tested powerpc64le-linux, committed to trunk.


commit 018813c8994b7dceab1b7d999e9c09654a22ef50
Author: Jonathan Wakely 
Date:   Fri Oct 13 12:56:07 2017 +0100

PR libstdc++/82481 Suppress clang-tidy warnings

PR libstdc++/82481
* include/std/mutex (call_once): Suppress clang-tidy warnings about
dangling references.

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 8c692a88ffd..50420ee22d4 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -688,6 +688,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __set_once_functor_lock_ptr(0);
 #endif
 
+#ifdef __clang_analyzer__
+  // PR libstdc++/82481
+  __once_callable = nullptr;
+  __once_call = nullptr;
+#endif
+
   if (__e)
__throw_system_error(__e);
 }


[PATCH] PR libstdc++/82522 overload map insert functions for rvalues (LWG 2354)

2017-10-13 Thread Jonathan Wakely

DR 2354 adds these overloads to avoid unnecessary copies when doing
map.insert({key, val})

PR libstdc++/82522
* doc/xml/manual/intro.xml: Document LWG 2354 changes.
* include/bits/stl_map.h (map::insert(value_type&&))
(map::insert(const_iterator, value_type&&)): Add overload for rvalues.
* include/bits/stl_multimap.h (multimap::insert(value_type&&))
(multimap::insert(const_iterator, value_type&&)): Likewise.
* include/bits/unordered_map.h (unordered_map::insert(value_type&&))
(unordered_map::insert(const_iterator, value_type&&))
(unordered_multimap::insert(value_type&&))
(unordered_multimap::insert(const_iterator, value_type&&)): Likewise.
* testsuite/23_containers/map/modifiers/insert/dr2354.cc: New test.
* testsuite/23_containers/multimap/modifiers/insert/dr2354.cc: New
test.
* testsuite/23_containers/unordered_map/insert/dr2354.cc: New test.
* testsuite/23_containers/unordered_multimap/insert/dr2354.cc: New
test.

Tested powerpc64le-linux, committed to trunk.


commit 3dfce049e44572f79409b35d56f8a3367b3ab1cd
Author: Jonathan Wakely 
Date:   Fri Oct 13 13:27:43 2017 +0100

PR libstdc++/82522 overload map insert functions for rvalues (LWG 2354)

PR libstdc++/82522
* doc/xml/manual/intro.xml: Document LWG 2354 changes.
* include/bits/stl_map.h (map::insert(value_type&&))
(map::insert(const_iterator, value_type&&)): Add overload for 
rvalues.
* include/bits/stl_multimap.h (multimap::insert(value_type&&))
(multimap::insert(const_iterator, value_type&&)): Likewise.
* include/bits/unordered_map.h (unordered_map::insert(value_type&&))
(unordered_map::insert(const_iterator, value_type&&))
(unordered_multimap::insert(value_type&&))
(unordered_multimap::insert(const_iterator, value_type&&)): 
Likewise.
* testsuite/23_containers/map/modifiers/insert/dr2354.cc: New test.
* testsuite/23_containers/multimap/modifiers/insert/dr2354.cc: New
test.
* testsuite/23_containers/unordered_map/insert/dr2354.cc: New test.
* testsuite/23_containers/unordered_multimap/insert/dr2354.cc: New
test.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index 782817e0698..3b243e57c8b 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -988,6 +988,12 @@ requirements of the license of GCC.
 Add deleted constructors.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#2354">2332:
+   Unnecessary copying when inserting into maps with braced-init 
syntax
+
+Add overloads of insert taking 
value_type&& rvalues.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="&DR;#2399">2399:
shared_ptr's constructor from 
unique_ptr should be constrained
 
diff --git a/libstdc++-v3/include/bits/stl_map.h 
b/libstdc++-v3/include/bits/stl_map.h
index 0e8a98a96c1..bad6020ef47 100644
--- a/libstdc++-v3/include/bits/stl_map.h
+++ b/libstdc++-v3/include/bits/stl_map.h
@@ -778,7 +778,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   /**
*  @brief Attempts to insert a std::pair into the %map.
-
*  @param __x Pair to be inserted (see std::make_pair for easy
*creation of pairs).
*
@@ -791,12 +790,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  first element (the key) is not already present in the %map.
*
*  Insertion requires logarithmic time.
+   *  @{
*/
   std::pair
   insert(const value_type& __x)
   { return _M_t._M_insert_unique(__x); }
 
 #if __cplusplus >= 201103L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2354. Unnecessary copying when inserting into maps with braced-init
+  std::pair
+  insert(value_type&& __x)
+  { return _M_t._M_insert_unique(std::move(__x)); }
+
   template::value>::type>
@@ -804,6 +810,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
insert(_Pair&& __x)
{ return _M_t._M_insert_unique(std::forward<_Pair>(__x)); }
 #endif
+  // @}
 
 #if __cplusplus >= 201103L
   /**
@@ -840,6 +847,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  for more on @a hinting.
*
*  Insertion requires logarithmic time (if the hint is not taken).
+   *  @{
*/
   iterator
 #if __cplusplus >= 201103L
@@ -850,6 +858,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   { return _M_t._M_insert_unique_(__position, __x); }
 
 #if __cplusplus >= 201103L
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 2354. Unnecessary copying when inserting into maps with braced-init
+  iterator
+  insert(const_iterator __position, value_type&& __x)
+  { return _M_t._M_insert_unique_(__position, std::move(__x)); }
+
   template::value>::type>
@@ -858,6 +872,7 @@ _GLIBCXX_

[committed] C++: show location of unclosed extern "C" specifications (v3)

2017-10-13 Thread David Malcolm
On Thu, 2017-10-12 at 17:07 -0400, Jason Merrill wrote:
> On Thu, Oct 12, 2017 at 2:45 PM, David Malcolm 
> wrote:
> > - put the note on the string-literal, rather than the extern:
> > note: 'extern "C"' linkage started here
> >  extern "C" {
> > ^~~
> 
> Maybe a range spanning both tokens?
> 
> OK with or without that change.
> 
> Jason

Good idea; I've implemented it.

Committed to trunk as r253726 (after usual testing).

For reference, here's what I committed:

gcc/cp/ChangeLog:
* cp-tree.h (maybe_show_extern_c_location): New decl.
* decl.c (grokfndecl): When complaining about literal operators
with C linkage, issue a note giving the location of the
extern "C".
* parser.c (cp_parser_new): Initialize new field
"innermost_linkage_specification_location".
(cp_parser_linkage_specification): Store the location
of the linkage specification within the cp_parser.
(cp_parser_explicit_specialization): When complaining about
template specializations with C linkage, issue a note giving the
location of the extern "C".
(cp_parser_explicit_template_declaration): Likewise for templates.
(maybe_show_extern_c_location): New function.
* parser.h (struct cp_parser): New field
"innermost_linkage_specification_location".

gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/udlit-extern-c.C: New test case.
* g++.dg/diagnostic/unclosed-extern-c.C: Add example of a template
erroneously covered by an unclosed extern "C".
* g++.dg/template/extern-c.C: New test case.
---
 gcc/cp/cp-tree.h   |  1 +
 gcc/cp/decl.c  |  1 +
 gcc/cp/parser.c| 39 -
 gcc/cp/parser.h|  4 ++
 gcc/testsuite/g++.dg/cpp0x/udlit-extern-c.C|  7 +++
 .../g++.dg/diagnostic/unclosed-extern-c.C  | 11 +++-
 gcc/testsuite/g++.dg/template/extern-c.C   | 66 ++
 7 files changed, 127 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extern-c.C
 create mode 100644 gcc/testsuite/g++.dg/template/extern-c.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index dc98dd8..b74b6d9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6356,6 +6356,7 @@ extern bool parsing_nsdmi (void);
 extern bool parsing_default_capturing_generic_lambda_in_template (void);
 extern void inject_this_parameter (tree, cp_cv_quals);
 extern location_t defarg_location (tree);
+extern void maybe_show_extern_c_location (void);
 
 /* in pt.c */
 extern bool check_template_shadow  (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index f251a90..a3cc80c 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -8729,6 +8729,7 @@ grokfndecl (tree ctype,
   if (DECL_LANGUAGE (decl) == lang_c)
{
  error ("literal operator with C linkage");
+ maybe_show_extern_c_location ();
  return NULL_TREE;
}
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 810e2b7..2337be5 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3937,6 +3937,9 @@ cp_parser_new (void)
   /* Allow constrained-type-specifiers. */
   parser->prevent_constrained_type_specifiers = 0;
 
+  /* We haven't yet seen an 'extern "C"'.  */
+  parser->innermost_linkage_specification_location = UNKNOWN_LOCATION;
+
   return parser;
 }
 
@@ -13848,9 +13851,11 @@ cp_parser_linkage_specification (cp_parser* parser)
   tree linkage;
 
   /* Look for the `extern' keyword.  */
-  cp_parser_require_keyword (parser, RID_EXTERN, RT_EXTERN);
+  cp_token *extern_token
+= cp_parser_require_keyword (parser, RID_EXTERN, RT_EXTERN);
 
   /* Look for the string-literal.  */
+  cp_token *string_token = cp_lexer_peek_token (parser->lexer);
   linkage = cp_parser_string_literal (parser, false, false);
 
   /* Transform the literal into an identifier.  If the literal is a
@@ -13869,6 +13874,20 @@ cp_parser_linkage_specification (cp_parser* parser)
   /* We're now using the new linkage.  */
   push_lang_context (linkage);
 
+  /* Preserve the location of the the innermost linkage specification,
+ tracking the locations of nested specifications via a local.  */
+  location_t saved_location
+= parser->innermost_linkage_specification_location;
+  /* Construct a location ranging from the start of the "extern" to
+ the end of the string-literal, with the caret at the start, e.g.:
+   extern "C" {
+   ^~
+  */
+  parser->innermost_linkage_specification_location
+= make_location (extern_token->location,
+extern_token->location,
+get_finish (string_token->location));
+
   /* If the next token is a `{', then we're using the first
  production.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
@@ -13899,6 +13918,9 @@ cp_parser_linkage_specification (

Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-13 Thread Martin Liška
On 10/13/2017 01:17 PM, Jakub Jelinek wrote:
> On Fri, Oct 13, 2017 at 01:01:37PM +0200, Martin Liška wrote:
>> @@ -3826,6 +3827,18 @@ pointer_diff (location_t loc, tree op0, tree op1)
>>  pedwarn (loc, OPT_Wpointer_arith,
>>   "pointer to a function used in subtraction");
>>  
>> +  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
>> +{
>> +  gcc_assert (current_function_decl != NULL_TREE);
>> +
>> +  op0 = c_fully_fold (op0, false, NULL);
>> +  op1 = c_fully_fold (op1, false, NULL);
>> +
>> +  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
>> +  *instrument_expr
>> += build_call_expr_loc (loc, tt, 2, op0, op1);
>> +}
> 
> If op0 or op1 have some embedded side-effects, won't you evaluate them
> multiple times?  I'd expect you need to c_save_expr them and make sure the
> actual pointer difference expression uses the result of the save expr too.

Good point, fixed.

> 
>> --- a/libsanitizer/asan/asan_report.cc
>> +++ b/libsanitizer/asan/asan_report.cc
>> @@ -344,14 +344,68 @@ static INLINE void CheckForInvalidPointerPair(void 
>> *p1, void *p2) {
>>if (!flags()->detect_invalid_pointer_pairs) return;
>>uptr a1 = reinterpret_cast(p1);
>>uptr a2 = reinterpret_cast(p2);
>> -  AsanChunkView chunk1 = FindHeapChunkByAddress(a1);
>> -  AsanChunkView chunk2 = FindHeapChunkByAddress(a2);
>> -  bool valid1 = chunk1.IsAllocated();
>> -  bool valid2 = chunk2.IsAllocated();
>> -  if (!valid1 || !valid2 || !chunk1.Eq(chunk2)) {
>> -GET_CALLER_PC_BP_SP;
>> -return ReportInvalidPointerPair(pc, bp, sp, a1, a2);
>> +
>> +  if (a1 == a2)
>> +return;
>> +
>> +  uptr offset = a1 < a2 ? a2 - a1 : a1 - a2;
>> +  uptr left = a1 < a2 ? a1 : a2;
>> +  uptr right = a1 < a2 ? a2 : a1;
>> +  if (offset <= 2048) {
>> +if (__asan_region_is_poisoned (left, offset) == 0)
>> +  return;
>> +  }
> 
> If offset <= 2048 and something is poisoned in between, then it is
> clearly a failure, so you should either goto the error or duplicate
> the two lines inside of the outer if above.

Done that.

> 
>> +
>> +  uptr shadow_offset1, shadow_offset2;
>> +  bool valid1, valid2;
>> +  {
>> +ThreadRegistryLock l(&asanThreadRegistry());
>> +valid1 = GetStackVariableBeginning(left, &shadow_offset1);
>>}
>> +
>> +  // check whether a1 is a stack memory pointer
>> +  if (valid1) {
>> +{
>> +  ThreadRegistryLock l(&asanThreadRegistry());
>> +  valid2 = GetStackVariableBeginning(right - 1, &shadow_offset2);
> 
> Why do you take the registery lock twice?  Just do:

Yep, once is OK. However, we need to have the lock in a different scope than:
GET_CALLER_PC_BP_SP;
ReportInvalidPointerPair(pc, bp, sp, a1, a2);

Otherwise we'll reach a deadlock.

>   {
> ThreadRegistryLock l(&asanThreadRegistry());
> if (GetStackVariableBeginning(left, &shadow_offset1))
>   {
> if (GetStackVariableBeginning(right - 1, &shadow_offset2) &&
>   shadow_offset1 == shadow_offset2)
> return;
> // goto do_error; or:
>   GET_CALLER_PC_BP_SP;
>   ReportInvalidPointerPair(pc, bp, sp, a1, a2);
>   return;
>   }
>   }
> 
>> +}
>> +
>> +if (valid2 && shadow_offset1 == shadow_offset2)
>> +  return;
>> +  }
>> +  // check whether a1 is a heap memory address
>> +  else {
>> +HeapAddressDescription hdesc1;
>> +valid1 = GetHeapAddressInformation(a1, 0, &hdesc1);
>> +
>> +if (valid1 && hdesc1.chunk_access.access_type == kAccessTypeInside) {
>> +  HeapAddressDescription hdesc2;
>> +  valid2 = GetHeapAddressInformation(a2, 0, &hdesc2);
> 
> Again, no need for valid1/valid2.  Why do you use above left and right - 1
> and here a1 and a2?  Shouldn't it be always left and right - 1?

valid{1,2} removed. Now using a{1,2} just for error report purpose.

Martin

> 
>   Jakub
> 

>From 68710d309cedf737ef333e82f33a8f2c9fd43c25 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Oct 2017 12:14:25 +0200
Subject: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

gcc/ChangeLog:

2017-10-13  Martin Liska  

	* doc/invoke.texi: Document the options.
	* flag-types.h (enum sanitize_code): Add
	SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* ipa-inline.c (sanitize_attrs_match_for_inline_p): Add handling
	of SANITIZE_POINTER_COMPARE and SANITIZE_POINTER_SUBTRACT.
	* opts.c: Define new sanitizer options.
	* sanitizer.def (BUILT_IN_ASAN_POINTER_COMPARE): Likewise.
	(BUILT_IN_ASAN_POINTER_SUBTRACT): Likewise.

gcc/c/ChangeLog:

2017-10-13  Martin Liska  

	* c-typeck.c (pointer_diff): Add new argument and instrument
	pointer subtraction.
	(build_binary_op): Similar for pointer comparison.

gcc/testsuite/ChangeLog:

2017-10-13  Martin Liska  

	* gcc.dg/asan/pointer-compare-1.c: New test.
	* gcc.dg/asan/pointer-compare-2.c: New test.
	* gcc.dg/asan/pointer-subtract-1.c: New test.
	* gcc.dg/asan/pointer-subtract-2.c: New test.
---
 gcc/c/c-typeck.c   | 34 ++--
 gcc/doc/invok

Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-13 Thread Martin Liška
On 10/12/2017 11:54 PM, Jeff Law wrote:
> On 10/11/2017 12:13 AM, Martin Liška wrote:
>> 2017-10-10  Martin Liska  
>>
>>  PR tree-optimization/82493
>>  * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>>  (test_range_functions): New function.
>>  (sbitmap_c_tests): Likewise.
>>  * selftest-run-tests.c (selftest::run_tests): Run new tests.
>>  * selftest.h (sbitmap_c_tests): New function.
> I went ahead and committed this along with a patch to fix the off-by-one
> error in live_bytes_read.  Bootstrapped and regression tested on x86.
> 
> Actual patch attached for archival purposes.
> 
> Jeff
> 

Hello.

I wrote a patch that adds various gcc_checking_asserts and I hit following:

./xgcc -B. 
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90 -c -O2
during GIMPLE pass: dse
/home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90:7:0:

  program testat
 
internal compiler error: in bitmap_check_index, at sbitmap.h:105
0x1c014c1 bitmap_check_index
../../gcc/sbitmap.h:105
0x1c01fa7 bitmap_bit_in_range_p(simple_bitmap_def const*, unsigned int, 
unsigned int)
../../gcc/sbitmap.c:335
0x1179002 live_bytes_read
../../gcc/tree-ssa-dse.c:497
0x117935a dse_classify_store
../../gcc/tree-ssa-dse.c:595
0x1179947 dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)
../../gcc/tree-ssa-dse.c:786
0x1179b6e dse_dom_walker::before_dom_children(basic_block_def*)
../../gcc/tree-ssa-dse.c:853
0x1a6f659 dom_walker::walk(basic_block_def*)
../../gcc/domwalk.c:308
0x1179cb9 execute
../../gcc/tree-ssa-dse.c:907

Where we call:
Breakpoint 1, bitmap_bit_in_range_p (bmap=0x29d6cd0, start=0, end=515) at 
../../gcc/sbitmap.c:335
335   bitmap_check_index (bmap, end);
(gdb) p *bmap
$1 = {n_bits = 256, size = 4, elms = {255}}

Is it a valid call or should caller check indices?

Martin
>From ba3d597be70b8329abafe92da868ab5250610840 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 13 Oct 2017 13:39:08 +0200
Subject: [PATCH 2/2] Add gcc_checking_assert for sbitmap.c.

---
 gcc/sbitmap.c | 39 +++
 gcc/sbitmap.h | 25 +
 2 files changed, 64 insertions(+)

diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index fdcf5393e53..df933f6516c 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -180,6 +180,8 @@ sbitmap_vector_alloc (unsigned int n_vecs, unsigned int n_elms)
 void
 bitmap_copy (sbitmap dst, const_sbitmap src)
 {
+  gcc_checking_assert (src->size <= dst->size);
+
   memcpy (dst->elms, src->elms, sizeof (SBITMAP_ELT_TYPE) * dst->size);
 }
 
@@ -187,6 +189,8 @@ bitmap_copy (sbitmap dst, const_sbitmap src)
 int
 bitmap_equal_p (const_sbitmap a, const_sbitmap b)
 {
+  bitmap_check_sizes (a, b);
+
   return !memcmp (a->elms, b->elms, sizeof (SBITMAP_ELT_TYPE) * a->size);
 }
 
@@ -211,6 +215,8 @@ bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
   if (count == 0)
 return;
 
+  bitmap_check_index (bmap, start + count - 1);
+
   unsigned int start_word = start / SBITMAP_ELT_BITS;
   unsigned int start_bitno = start % SBITMAP_ELT_BITS;
 
@@ -267,6 +273,8 @@ bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
   if (count == 0)
 return;
 
+  bitmap_check_index (bmap, start + count - 1);
+
   unsigned int start_word = start / SBITMAP_ELT_BITS;
   unsigned int start_bitno = start % SBITMAP_ELT_BITS;
 
@@ -324,6 +332,8 @@ bool
 bitmap_bit_in_range_p (const_sbitmap bmap, unsigned int start, unsigned int end)
 {
   gcc_checking_assert (start <= end);
+  bitmap_check_index (bmap, end);
+
   unsigned int start_word = start / SBITMAP_ELT_BITS;
   unsigned int start_bitno = start % SBITMAP_ELT_BITS;
 
@@ -467,6 +477,9 @@ bitmap_vector_ones (sbitmap *bmap, unsigned int n_vecs)
 bool
 bitmap_ior_and_compl (sbitmap dst, const_sbitmap a, const_sbitmap b, const_sbitmap c)
 {
+  bitmap_check_sizes (a, b);
+  bitmap_check_sizes (b, c);
+
   unsigned int i, n = dst->size;
   sbitmap_ptr dstp = dst->elms;
   const_sbitmap_ptr ap = a->elms;
@@ -489,6 +502,8 @@ bitmap_ior_and_compl (sbitmap dst, const_sbitmap a, const_sbitmap b, const_sbitm
 void
 bitmap_not (sbitmap dst, const_sbitmap src)
 {
+  bitmap_check_sizes (src, dst);
+
   unsigned int i, n = dst->size;
   sbitmap_ptr dstp = dst->elms;
   const_sbitmap_ptr srcp = src->elms;
@@ -510,6 +525,9 @@ bitmap_not (sbitmap dst, const_sbitmap src)
 void
 bitmap_and_compl (sbitmap dst, const_sbitmap a, const_sbitmap b)
 {
+  bitmap_check_sizes (a, b);
+  bitmap_check_sizes (b, dst);
+
   unsigned int i, dst_size = dst->size;
   unsigned int min_size = dst->size;
   sbitmap_ptr dstp = dst->elms;
@@ -537,6 +555,8 @@ bitmap_and_compl (sbitmap dst, const_sbitmap a, const_sbitmap b)
 bool
 bitmap_intersect_p (const_sbitmap a, const_sbitmap b)
 {
+  bitmap_check_sizes (a, b);
+
   const_sbitmap_ptr ap = a->elms;
   const_sbitmap_ptr bp = b->elms;
   unsigned int i, n;
@@ -555,6 +575

[PATCH][GRAPHITE] Consistently use region analysis

2017-10-13 Thread Richard Biener

Now that SCEV instantiation handles regions properly (see hunk below
for a minor fix) we can use it consistently from GRAPHITE and thus
simplify scalar_evolution_in_region greatly.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

A lot of the parameter renaming stuff looks dead but a more "complete"
patch causes some more SPEC miscompares and also a bootstrap issue
(warning only, but an uninitialized use of 'int tem = 0;' ...).

This is probably all latent issues coming up more easily now.

Note that formerly we'd support invariant "parameters" defined in
the region by copying those out but now SCEV instantiation should
lead chrec_dont_know for stuff we cannot gobble up (anythin not
affine).  This probably only worked for the outermost scop in the
region and it means we need some other way to handle those.  The
original issue is probably that "parameters" cannot occur in
dependences and thus an array index cannot "depend" on the computation
of a parameter (and array indexes coming from "data" cannot be handled
anyway?).  We don't seem to have any functional testcase for those
parameters that are not parameters.

Richard.

2017-10-13  Richard Biener  

* graphite-scop-detection.c
(scop_detection::stmt_has_simple_data_refs_p): Always use
the full nest as region.
(try_generate_gimple_bb): Likewise.
(build_scops): First split edges, then compute RPO order.
* sese.c (scalar_evolution_in_region): Simplify now that
SCEV can handle instantiation in regions.
* tree-scalar-evolution.c (instantiate_scev_name): Also instantiate
in the non-loop part of a function if requested.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 253721)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -1005,15 +1005,10 @@ scop_detection::graphite_can_represent_e
 bool
 scop_detection::stmt_has_simple_data_refs_p (sese_l scop, gimple *stmt)
 {
-  edge nest;
+  edge nest = scop.entry;;
   loop_p loop = loop_containing_stmt (stmt);
   if (!loop_in_sese_p (loop, scop))
-{
-  nest = scop.entry;
-  loop = NULL;
-}
-  else
-nest = loop_preheader_edge (outermost_loop_in_sese (scop, gimple_bb 
(stmt)));
+loop = NULL;
 
   auto_vec drs;
   if (! graphite_find_data_references_in_stmt (nest, loop, stmt, &drs))
@@ -1381,15 +1350,10 @@ try_generate_gimple_bb (scop_p scop, bas
   vec reads = vNULL;
 
   sese_l region = scop->scop_info->region;
-  edge nest;
+  edge nest = region.entry;
   loop_p loop = bb->loop_father;
   if (!loop_in_sese_p (loop, region))
-{
-  nest = region.entry;
-  loop = NULL;
-}
-  else
-nest = loop_preheader_edge (outermost_loop_in_sese (region, bb));
+loop = NULL;
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
gsi_next (&gsi))
@@ -1696,6 +1660,13 @@ build_scops (vec *scops)
   /* Now create scops from the lightweight SESEs.  */
   vec scops_l = sb.get_scops ();
 
+  /* For our out-of-SSA we need a block on s->entry, similar to how
+ we include the LCSSA block in the region.  */
+  int i;
+  sese_l *s;
+  FOR_EACH_VEC_ELT (scops_l, i, s)
+s->entry = single_pred_edge (split_edge (s->entry));
+
   /* Domwalk needs a bb to RPO mapping.  Compute it once here.  */
   int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
   int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
@@ -1704,14 +1675,8 @@ build_scops (vec *scops)
 bb_to_rpo[postorder[i]] = i;
   free (postorder);
 
-  int i;
-  sese_l *s;
   FOR_EACH_VEC_ELT (scops_l, i, s)
 {
-  /* For our out-of-SSA we need a block on s->entry, similar to how
- we include the LCSSA block in the region.  */
-  s->entry = single_pred_edge (split_edge (s->entry));
-
   scop_p scop = new_scop (s->entry, s->exit);
 
   /* Record all basic blocks and their conditions in REGION.  */
Index: gcc/sese.c
===
--- gcc/sese.c  (revision 253721)
+++ gcc/sese.c  (working copy)
@@ -459,41 +447,16 @@ scev_analyzable_p (tree def, sese_l ®
 tree
 scalar_evolution_in_region (const sese_l ®ion, loop_p loop, tree t)
 {
-  gimple *def;
-  struct loop *def_loop;
-
   /* SCOP parameters.  */
   if (TREE_CODE (t) == SSA_NAME
   && !defined_in_sese_p (t, region))
 return t;
 
-  if (TREE_CODE (t) != SSA_NAME
-  || loop_in_sese_p (loop, region))
-/* FIXME: we would need instantiate SCEV to work on a region, and be more
-   flexible wrt. memory loads that may be invariant in the region.  */
-return instantiate_scev (region.entry, loop,
-analyze_scalar_evolution (loop, t));
-
-  def = SSA_NAME_DEF_STMT (t);
-  def_loop = loop_containing_stmt (def);
-
-  if (loop_in_sese_p (def_loop, region))
-{
-  t = analyze_scalar_evolution (def_loop, t);
-  def

Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-13 Thread Martin Liška
On 10/13/2017 10:01 AM, Martin Liška wrote:
> On 10/12/2017 11:54 PM, Jeff Law wrote:
>> On 10/11/2017 12:13 AM, Martin Liška wrote:
>>> 2017-10-10  Martin Liska  
>>>
>>> PR tree-optimization/82493
>>> * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>>> (test_range_functions): New function.
>>> (sbitmap_c_tests): Likewise.
>>> * selftest-run-tests.c (selftest::run_tests): Run new tests.
>>> * selftest.h (sbitmap_c_tests): New function.
>> I went ahead and committed this along with a patch to fix the off-by-one
>> error in live_bytes_read.  Bootstrapped and regression tested on x86.
>>
>> Actual patch attached for archival purposes.
>>
>> Jeff
>>
> 
> Thanks.
> 
> I'll carry on and write another set of unit tests.
> 
> Martin
> 

Sending patch candidate, may I install it after it finishes regression tests?

Martin
>From 6114004455ffc534cdb895eb74b2018cea4b704a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 13 Oct 2017 13:18:47 +0200
Subject: [PATCH 1/2] Add selftests for bitmap_set_range.

gcc/ChangeLog:

2017-10-13  Martin Liska  

	* sbitmap.c (bitmap_bit_in_range_p_checking): New function.
	(test_set_range): Likewise.
	(test_range_functions): Rename to ...
	(test_bit_in_range): ... this.
	(sbitmap_c_tests): Add new test.
---
 gcc/sbitmap.c | 60 ---
 1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index baef4d05f0d..fdcf5393e53 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -823,11 +823,64 @@ namespace selftest {
 
 /* Selftests for sbitmaps.  */
 
+/* Checking function that uses both bitmap_bit_in_range_p and
+   loop of bitmap_bit_p and verifies consistent results.  */
 
-/* Verify range functions for sbitmap.  */
+static bool
+bitmap_bit_in_range_p_checking (sbitmap s, unsigned int start,
+unsigned end)
+{
+  bool r1 = bitmap_bit_in_range_p (s, start, end);
+  bool r2 = false;
+
+  for (unsigned int i = start; i <= end; i++)
+if (bitmap_bit_p (s, i))
+  {
+	r2 = true;
+	break;
+  }
+
+  ASSERT_EQ (r1, r2);
+  return r1;
+}
+
+/* Verify bitmap_set_range functions for sbitmap.  */
+
+static void
+test_set_range ()
+{
+  sbitmap s = sbitmap_alloc (16);
+  bitmap_clear (s);
+
+  bitmap_set_range (s, 0, 1);
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 0, 0));
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 1, 15));
+  bitmap_set_range (s, 15, 1);
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 1, 14));
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 15, 15));
+
+  s = sbitmap_alloc (1024);
+  bitmap_clear (s);
+  bitmap_set_range (s, 512, 1);
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 0, 511));
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 513, 1023));
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512, 512));
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 508, 512));
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 508, 513));
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 508, 511));
+
+  bitmap_clear (s);
+  bitmap_set_range (s, 512, 64);
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 0, 511));
+  ASSERT_FALSE (bitmap_bit_in_range_p_checking (s, 512 + 64, 1023));
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512, 512));
+  ASSERT_TRUE (bitmap_bit_in_range_p_checking (s, 512 + 63, 512 + 63));
+}
+
+/* Verify bitmap_bit_in_range_p functions for sbitmap.  */
 
 static void
-test_range_functions ()
+test_bit_in_range ()
 {
   sbitmap s = sbitmap_alloc (1024);
   bitmap_clear (s);
@@ -900,7 +953,8 @@ test_range_functions ()
 void
 sbitmap_c_tests ()
 {
-  test_range_functions ();
+  test_set_range ();
+  test_bit_in_range ();
 }
 
 } // namespace selftest
-- 
2.14.2



Re: [PATCH] Add -fsanitize=pointer-{compare,subtract}.

2017-10-13 Thread Jakub Jelinek
On Fri, Oct 13, 2017 at 02:53:50PM +0200, Martin Liška wrote:
> @@ -3826,6 +3827,19 @@ pointer_diff (location_t loc, tree op0, tree op1)
>  pedwarn (loc, OPT_Wpointer_arith,
>"pointer to a function used in subtraction");
>  
> +  if (sanitize_flags_p (SANITIZE_POINTER_SUBTRACT))
> +{
> +  gcc_assert (current_function_decl != NULL_TREE);
> +
> +  op0 = save_expr (op0);
> +  op1 = save_expr (op1);
> +
> +  tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_SUBTRACT);
> +  *instrument_expr
> + = build_call_expr_loc (loc, tt, 2, c_fully_fold (op0, false, NULL),
> +c_fully_fold (op1, false, NULL));
> +}

Why the c_fully_fold?  Can't that be deferred until it actually is
folded all together later?

> +   ret = pointer_diff (location, op0, op1, &instrument_expr);
> goto return_build_binary_op;
>   }
>/* Handle pointer minus int.  Just like pointer plus int.  */
> @@ -11707,6 +11721,18 @@ build_binary_op (location_t location, enum tree_code 
> code,
> pedwarn (location, 0,
>  "comparison of distinct pointer types lacks a cast");
>   }
> +
> +   if (sanitize_flags_p (SANITIZE_POINTER_COMPARE))
> + {
> +   gcc_assert (current_function_decl != NULL_TREE);
> +
> +   op0 = save_expr (op0);
> +   op1 = save_expr (op1);
> +
> +   tree tt = builtin_decl_explicit (BUILT_IN_ASAN_POINTER_COMPARE);
> +   instrument_expr
> + = build_call_expr_loc (location, tt, 2, op0, op1);
> + }

Is this the right spot for this?  I mean then you don't handle
ptr >= 0 or ptr > 0 and similar or ptr >= 0x12345678.
I know we have warnings or pedwarns for those, still I think it would be
better to handle the above e.g. before
  if ((TREE_CODE (TREE_TYPE (orig_op0)) == BOOLEAN_TYPE
   || truth_value_p (TREE_CODE (orig_op0)))
  ^ (TREE_CODE (TREE_TYPE (orig_op1)) == BOOLEAN_TYPE
 || truth_value_p (TREE_CODE (orig_op1
maybe_warn_bool_compare (location, code, orig_op0, orig_op1);
by testing if ((code0 == POINTER_TYPE || code1 == POINTER_TYPE)
   && sanitize_flags_p (SANITIZE_POINTER_COMPARE))

What about the C++ FE?  Or is pointer comparison well defined there?
What about pointer subtraction? My memory is fuzzy.

> +// { dg-options "-fsanitize=pointer-compare -O0" }

Please use -fsanitize=address,pointer-compare
etc. in the testcases, so that it is an example to users who don't know
we have implicit -fsanitize=address for these tests.
> +  if (offset <= 2048) {
> +if (__asan_region_is_poisoned (left, offset) == 0)

I think the LLVM coding conventions don't want a space before ( above.

> +// check whether left is a stack memory pointer
> +if (GetStackVariableBeginning(left, &shadow_offset1)) {
> +  if (GetStackVariableBeginning(right - 1, &shadow_offset2)
> +   && shadow_offset1 == shadow_offset2)

Nor && at the beginning of the line (they want it at the end of previous
one).

> + return;
> +  else
> + goto do_error;
> +}

If you have goto do_error; for all cases, then you don't need to indent
further stuff into else ... further and further.

> +// check whether left is a heap memory address
> +else {
> +  HeapAddressDescription hdesc1, hdesc2;
> +  if (GetHeapAddressInformation(left, 0, &hdesc1)
> +   && hdesc1.chunk_access.access_type == kAccessTypeInside) {
> + if (GetHeapAddressInformation(right, 0, &hdesc2)
> + && hdesc2.chunk_access.access_type == kAccessTypeInside
> + && (hdesc1.chunk_access.chunk_begin
> +   == hdesc2.chunk_access.chunk_begin))
> +   return;

So, here one is a heap object and the other is not.  That should be an
do_error, right?

> +  } else {
> + // check whether left is an address of a global variable
> + GlobalAddressDescription gdesc1, gdesc2;
> + if (GetGlobalAddressInformation(left, 0, &gdesc1)) {
> +   if (GetGlobalAddressInformation(right - 1, 0, &gdesc2)
> +   && gdesc1.PointsInsideASameVariable (gdesc2))
> + return;
> + } else {
> +   // TODO

Either goto do_error; here too, or do the if (offset <= 16384) case here.
Guess upstream wouldn't like it with a TODO spot.

Jakub


Re: tree-complex.c: fix some_nonzerop test over reals (and a bug fix)

2017-10-13 Thread Laurent Thevenoux
Hi Richard,

I will investigate it further, but thanks again for your review!

Laurent

- Mail original -
> De: "Richard Biener" 
> À: "Laurent Thevenoux" 
> Cc: "GCC Patches" 
> Envoyé: Lundi 9 Octobre 2017 15:57:38
> Objet: Re: tree-complex.c: fix some_nonzerop test over reals (and a bug fix)
> 
> On Mon, Oct 9, 2017 at 3:30 PM, Laurent Thevenoux
>  wrote:
> >
> >
> > - Mail original -
> >> De: "Richard Biener" 
> >> À: "Laurent Thevenoux" 
> >> Cc: "GCC Patches" 
> >> Envoyé: Lundi 9 Octobre 2017 14:04:49
> >> Objet: Re: tree-complex.c: fix some_nonzerop test over reals (and a bug
> >> fix)
> >>
> >> On Fri, Oct 6, 2017 at 8:58 PM, Laurent Thevenoux
> >>  wrote:
> >> > Hi Richard, thanks for your quick reply.
> >> >
> >> > - Mail original -
> >> >> De: "Richard Biener" 
> >> >> À: "Laurent Thevenoux" 
> >> >> Cc: "GCC Patches" 
> >> >> Envoyé: Vendredi 6 Octobre 2017 13:42:57
> >> >> Objet: Re: tree-complex.c: fix some_nonzerop test over reals (and a bug
> >> >> fix)
> >> >>
> >> >> On Thu, Oct 5, 2017 at 4:41 PM, Laurent Thevenoux
> >> >>  wrote:
> >> >> > Hello,
> >> >> >
> >> >> > This patch improves the some_nonzerop(tree t) function from
> >> >> > tree-complex.c
> >> >> > file (the function is only used there).
> >> >> >
> >> >> > This function returns true if a tree as a parameter is not the
> >> >> > constant
> >> >> > 0
> >> >> > (or +0.0 only for reals when !flag_signed_zeros ). The former result
> >> >> > is
> >> >> > then used to determine if some simplifications are possible for
> >> >> > complex
> >> >> > expansions (addition, multiplication, and division).
> >> >> >
> >> >> > Unfortunately, if the tree is a real constant, the function always
> >> >> > return
> >> >> > true, even for +0.0 because of the explicit test on flag_signed_zeros
> >> >> > (so
> >> >> > if your system enables signed zeros you cannot benefit from those
> >> >> > simplifications). To avoid this behavior and allow complex expansion
> >> >> > simplifications, I propose the following patch, which test for the
> >> >> > sign
> >> >> > of
> >> >> > the real constant 0.0 instead of checking the flag.
> >> >>
> >> >> But
> >> >>
> >> >> +  if (TREE_CODE (t) == REAL_CST)
> >> >> +{
> >> >> +  if (real_identical (&TREE_REAL_CST (t), &dconst0))
> >> >> +   zerop = !real_isneg (&TREE_REAL_CST (t));
> >> >> +}
> >> >>
> >> >> shouldn't you do
> >> >>
> >> >>zerop = !flag_signed_zeros || !real_isneg (&TREE_REAL_CST (t));
> >> >>
> >> >> ?
> >> >
> >> > I’m not so sure. If I understand your proposition (tables below gives
> >> > values of zerop following the values of t and flag_signed_zeros):
> >> >
> >> >|   zerop
> >> >  t | !flag_signed_zeros is false | !flag_signed_zeros is true
> >> > -
> >> > +n | true*   | true*
> >> > -n | false   | true*
> >> >  0 | true| true
> >> > -0 | false   | unreachable
> >> >
> >> > But here, results with a * don't return the good value. The existing
> >> > code
> >> > is also wrong, it always returns false if flag_signed_zeros is true,
> >> > else
> >> > the code is correct:
> >> >
> >> >|   zerop
> >> >  t | !flag_signed_zeros is false | !flag_signed_zeros is true
> >> > -
> >> > +n | false   | false
> >> > -n | false   | false
> >> >  0 | true| false*
> >> > -0 | false   | unreachable
> >> >
> >> > With the code I propose, we obtain the right results:
> >> >
> >> >  t | zerop
> >> > --
> >> > +n | false
> >> > -n | false
> >> >  0 | true
> >> > -0 | false
> >> >
> >> > Do I really miss something (I'm sorry if I'm wrong)?
> >> >
> >> >>
> >> >> Also doesn't this miscalculate A - B for {ar, 0.} - {br, 0.} given the
> >> >> simplification simply
> >> >> returns bi?
> >> >
> >> > Here we are in case PAIR (ONLY_REAL, ONLY_REAL), which gives ri = 0.
> >> > (ai)
> >> > even with signed zeros. So everything is OK.
> >> >
> >> >>
> >> >> So I'm not convinced this handling is correct.
> >> >>
> >> >> > This first fix reveals a bug (thanks to
> >> >> > c-c++-common/torture/complex-sign-add.c ) in the simplification
> >> >> > section
> >> >> > of
> >> >> > expand_complex_addition (also fixed in the patch).
> >> >>
> >> >> Your patch looks backward and the existing code looks ok.
> >> >>
> >> >> @@ -937,7 +940,7 @@ expand_complex_addition (gimple_stmt_iterator
> >> >> *gsi, tree inner_type,
> >> >>
> >> >>  case PAIR (VARYING, ONLY_REAL):
> >> >>rr = gimplify_build2 (gsi, code, inner_type, ar, br);
> >> >> -  ri = ai;
> >> >> +  ri = bi;
> >> >>break;
> >> >
> >> > With the existing code we don’t perform the simplification step at all
> >> > since it always returns (VARYING, VARYING) when fl

Re: [PATCH, rs6000] GIMPLE folding for vector compares

2017-10-13 Thread Will Schmidt
On Thu, 2017-10-12 at 22:05 -0500, Segher Boessenkool wrote:
> > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> 
> > +;; Vector Compare Not Equal Byte (specified/not+eq:)
> > +(define_insn "vcmpneb_spec"
> > +  [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
> > +  (not:V16QI
> > +(eq:V16QI (match_operand:V16QI 1 "altivec_register_operand"
> "v")
> > +  (match_operand:V16QI 2 "altivec_register_operand"
> "v"]
> > +  "TARGET_P9_VECTOR"
> > +  "vcmpneb %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")]
> > +)
> 
> +  [(set_attr "type" "vecsimple")])
> 
> What does "_spec" mean?  That it is not an unspec?  :-)

Yes, exactly. :-)

> If a name is not (expected to be) used directly, it should start with
> *.

Ok.

> Do we still need the unspec version? 

Not sure..   I'll play with this some more. 



Re: [PATCH, rs6000] GIMPLE folding for vector compares

2017-10-13 Thread Will Schmidt
On Fri, 2017-10-13 at 11:36 +0200, Richard Biener wrote:
> On Thu, Oct 12, 2017 at 10:03 PM, Will Schmidt
>  wrote:
> > Hi,
> >
> > Add support for gimple folding of vec_cmp_{eq,ge,gt,le,ne} for
> > the integer data types.
> >
> > This adds a handful of entries to the switch statement in 
> > builtin_function_type
> > for those builtins having unsigned arguments.
> >
> > Three entries are added to vsx.md to enable vcmpne[bhw] instruction, where 
> > we
> > would otherwise generate a vcmpeq + vnor.
> >
> > This patch requires the previously posted "allow integer return type from 
> > vector compares" patch.
> >
> > A handful of existing tests required updates to their specified optimization
> > levels to continue to generate the desired code.  builtins-3-p9.c in 
> > particular
> > has been updated to reflect improved code gen with the higher specified
> > optimization level.   Testcase coverage is otherwise handled by the 
> > already-in-tree
> > gcc.target/powerpc/fold-vec-cmp-*.c tests.
> >
> > Tested OK on P6 and newer. OK for trunk?
> >
> > Thanks,
> > -Will
> >
> > [gcc]
> >
> > 2017-10-12  Will Schmidt  
> >
> > * config/rs6000/rs6000.c: (rs6000_gimple_fold_builtin) Add support 
> > for
> > folding of vector compares.  (builtin_function_type) Add compare
> > builtins to the list of functions having unsigned arguments.
> > * config/rs6000/vsx.md:  Add vcmpne{b,h,w} instructions.
> >
> > [testsuite]
> >
> > 2017-10-12  Will Schmidt  
> >
> > * gcc.target/powerpc/builtins-3-p9.c: Add -O1, update
> > expected codegen checks.
> > * gcc.target/powerpc/vec-cmp-sel.c: Mark vars as volatile.
> > * gcc.target/powerpc/vsu/vec-cmpne-0.c: Add -O1.
> > * gcc.target/powerpc/vsu/vec-cmpne-1.c: Add -O1.
> > * gcc.target/powerpc/vsu/vec-cmpne-2.c: Add -O1.
> > * gcc.target/powerpc/vsu/vec-cmpne-3.c: Add -O1.
> > * gcc.target/powerpc/vsu/vec-cmpne-4.c: Add -O1.
> > * gcc.target/powerpc/vsu/vec-cmpne-5.c: Add -O1.
> > * gcc.target/powerpc/vsu/vec-cmpne-6.c: Add -O1.
> >
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 12ddd97..7e73239 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -16605,17 +16605,93 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> > *gsi)
> >build_int_cst (arg2_type, 0)), 
> > arg0);
> >  gimple_set_location (g, loc);
> >  gsi_replace (gsi, g, true);
> >  return true;
> >}
> > +/* Vector compares (integer); EQ, NE, GE, GT, LE.  */
> > +case ALTIVEC_BUILTIN_VCMPEQUB:
> > +case ALTIVEC_BUILTIN_VCMPEQUH:
> > +case ALTIVEC_BUILTIN_VCMPEQUW:
> > +case P8V_BUILTIN_VCMPEQUD:
> > +  {
> > +   arg0 = gimple_call_arg (stmt, 0);
> > +   arg1 = gimple_call_arg (stmt, 1);
> > +   lhs = gimple_call_lhs (stmt);
> > +   gimple *g = gimple_build_assign (lhs, EQ_EXPR, arg0, arg1);
> 
> As said elsewhere this needs to become either
> 
>   tree ctype = build_same_sized_truth_vector_type (TREE_TYPE (lhs));
>   gimple_build_assign (make_ssa_name (ctype), EQ_EXPR, arg0, arg1)
>   gimple_build_assign (lhs, VIEW_CONVERT_EXPR, lhs above);
> 
> (eventually the VCE can be elided - try) or
> 
>   gimple_build_assign (lhs, VEC_COND_EXPR,
>  fold_build2 (EQ_EXPR, ctype, arg0, arg1),
>  vector-with-trues, vector-with-falses);
> 
> depending on what your target can expand.


Alright, i'll work with this some more and see what I come up with.
Thanks for the review and feedback.  :-)

-Will





Check that there are no missing probabilities

2017-10-13 Thread Jan Hubicka
Hi,
this patch enables check that no edge probabilities are missing. 

Honza

* cfghooks.c (verify_flow_info): Check that edge probabilities are
set.

Index: cfghooks.c
===
--- cfghooks.c  (revision 253694)
+++ cfghooks.c  (working copy)
@@ -160,6 +161,13 @@ verify_flow_info (void)
 e->src->index, e->dest->index);
  err = 1;
}
+ if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
+ && !e->probability.initialized_p ())
+   {
+ error ("Uninitialized probability of edge %i->%i", e->src->index,
+e->dest->index);
+ err = 1;
+   }
  if (!e->probability.verify ())
{
  error ("verify_flow_info: Wrong probability of edge %i->%i",


Rename inchash::hash::add_wide_int

2017-10-13 Thread Richard Sandiford
The name inchash::add_wide_int is a bit misleading, since it sounds
like it's hashing a wide_int.  This patch renames it to add_hwi instead.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2017-10-13  Richard Sandiford  

gcc/
* inchash.h (inchash::hash::add_wide_int): Rename to...
(inchash::hash::add_hwi): ...this.
* ipa-devirt.c (hash_odr_vtable): Update accordingly.
(polymorphic_call_target_hasher::hash): Likewise.
* ipa-icf.c (sem_function::get_hash, sem_function::init): Likewise.
(sem_item::add_expr, sem_item::add_type, sem_variable::get_hash)
(sem_item_optimizer::update_hash_by_addr_refs): Likewise.
* lto-streamer-out.c (hash_tree): Likewise.
* optc-save-gen.awk: Likewise.
* tree.c (add_expr): Likewise.

Index: gcc/inchash.h
===
--- gcc/inchash.h   2017-02-23 19:54:20.0 +
+++ gcc/inchash.h   2017-10-13 14:59:35.120146199 +0100
@@ -58,7 +58,7 @@ hashval_t iterative_hash_hashval_t (hash
   }
 
   /* Add HOST_WIDE_INT value V.  */
-  void add_wide_int (HOST_WIDE_INT v)
+  void add_hwi (HOST_WIDE_INT v)
   {
 val = iterative_hash_host_wide_int (v, val);
   }
Index: gcc/ipa-devirt.c
===
--- gcc/ipa-devirt.c2017-08-10 14:36:08.043471772 +0100
+++ gcc/ipa-devirt.c2017-10-13 14:59:35.121062203 +0100
@@ -373,7 +373,7 @@ hash_odr_vtable (const_tree t)
   v = TREE_OPERAND (TREE_OPERAND (v, 0), 0);
 }
 
-  hstate.add_wide_int (IDENTIFIER_HASH_VALUE (DECL_ASSEMBLER_NAME (v)));
+  hstate.add_hwi (IDENTIFIER_HASH_VALUE (DECL_ASSEMBLER_NAME (v)));
   return hstate.end ();
 }
 
@@ -2625,14 +2625,14 @@ polymorphic_call_target_hasher::hash (co
 {
   inchash::hash hstate (odr_query->otr_token);
 
-  hstate.add_wide_int (odr_query->type->id);
+  hstate.add_hwi (odr_query->type->id);
   hstate.merge_hash (TYPE_UID (odr_query->context.outer_type));
-  hstate.add_wide_int (odr_query->context.offset);
+  hstate.add_hwi (odr_query->context.offset);
 
   if (odr_query->context.speculative_outer_type)
 {
   hstate.merge_hash (TYPE_UID (odr_query->context.speculative_outer_type));
-  hstate.add_wide_int (odr_query->context.speculative_offset);
+  hstate.add_hwi (odr_query->context.speculative_offset);
 }
   hstate.add_flag (odr_query->speculative);
   hstate.add_flag (odr_query->context.maybe_in_construction);
Index: gcc/ipa-icf.c
===
--- gcc/ipa-icf.c   2017-09-25 13:33:39.989814299 +0100
+++ gcc/ipa-icf.c   2017-10-13 14:59:35.121062203 +0100
@@ -286,11 +286,11 @@ sem_function::get_hash (void)
 
   /* Add common features of declaration itself.  */
   if (DECL_FUNCTION_SPECIFIC_TARGET (decl))
-hstate.add_wide_int
+hstate.add_hwi
 (cl_target_option_hash
   (TREE_TARGET_OPTION (DECL_FUNCTION_SPECIFIC_TARGET (decl;
   if (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl))
-   hstate.add_wide_int
+   hstate.add_hwi
 (cl_optimization_hash
   (TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl;
   hstate.add_flag (DECL_CXX_CONSTRUCTOR_P (decl));
@@ -1437,8 +1437,8 @@ sem_function::init (void)
 {
   cfg_checksum = 0;
   inchash::hash hstate;
-  hstate.add_wide_int (cnode->thunk.fixed_offset);
-  hstate.add_wide_int (cnode->thunk.virtual_value);
+  hstate.add_hwi (cnode->thunk.fixed_offset);
+  hstate.add_hwi (cnode->thunk.virtual_value);
   hstate.add_flag (cnode->thunk.this_adjusting);
   hstate.add_flag (cnode->thunk.virtual_offset_p);
   hstate.add_flag (cnode->thunk.add_pointer_bounds_args);
@@ -1485,7 +1485,7 @@ sem_item::add_expr (const_tree exp, inch
unsigned HOST_WIDE_INT idx;
tree value;
 
-   hstate.add_wide_int (int_size_in_bytes (TREE_TYPE (exp)));
+   hstate.add_hwi (int_size_in_bytes (TREE_TYPE (exp)));
 
FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
  if (value)
@@ -1500,7 +1500,7 @@ sem_item::add_expr (const_tree exp, inch
 case VAR_DECL:
 case CONST_DECL:
 case PARM_DECL:
-  hstate.add_wide_int (int_size_in_bytes (TREE_TYPE (exp)));
+  hstate.add_hwi (int_size_in_bytes (TREE_TYPE (exp)));
   break;
 case MEM_REF:
 case POINTER_PLUS_EXPR:
@@ -1518,7 +1518,7 @@ sem_item::add_expr (const_tree exp, inch
   }
   break;
 CASE_CONVERT:
-  hstate.add_wide_int (int_size_in_bytes (TREE_TYPE (exp)));
+  hstate.add_hwi (int_size_in_bytes (TREE_TYPE (exp)));
   return add_expr (TREE_OPERAND (exp, 0), hstate);
 default:
   break;
@@ -1589,11 +1589,11 @@ sem_item::add_type (const_tree type, inc
 
  hstate2.add_int (nf);
  hash = hstate2.end ();
- hstate.add_wide_int (hash);
+ 

Add wide_int version of inchash::hash::add_wide_int

2017-10-13 Thread Richard Sandiford
This patch adds an inchash hasher for wide_int-based types.
It means that hash_tree no longer hashes TREE_INT_CST_EXT_NUNITS,
but that was redundant with hashing the type.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install?

Richard


2017-10-13  Richard Sandiford  

gcc/
* inchash.h (inchash::hash::add_wide_int): New function.
* lto-streamer-out.c (hash_tree): Use it.

Index: gcc/inchash.h
===
--- gcc/inchash.h   2017-10-13 14:59:35.120146199 +0100
+++ gcc/inchash.h   2017-10-13 15:00:00.743308010 +0100
@@ -63,6 +63,15 @@ hashval_t iterative_hash_hashval_t (hash
 val = iterative_hash_host_wide_int (v, val);
   }
 
+  /* Add wide_int-based value V.  */
+  template
+  void add_wide_int (const generic_wide_int &x)
+  {
+add_int (x.get_len ());
+for (unsigned i = 0; i < x.get_len (); i++)
+  add_hwi (x.elt (i));
+  }
+
   /* Hash in pointer PTR.  */
   void add_ptr (const void *ptr)
   {
Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  2017-10-13 14:59:35.121978207 +0100
+++ gcc/lto-streamer-out.c  2017-10-13 15:00:00.743308010 +0100
@@ -1028,13 +1028,7 @@ #define visit(SIBLING) \
   hstate.commit_flag ();
 
   if (CODE_CONTAINS_STRUCT (code, TS_INT_CST))
-{
-  int i;
-  hstate.add_hwi (TREE_INT_CST_NUNITS (t));
-  hstate.add_hwi (TREE_INT_CST_EXT_NUNITS (t));
-  for (i = 0; i < TREE_INT_CST_NUNITS (t); i++)
-   hstate.add_hwi (TREE_INT_CST_ELT (t, i));
-}
+hstate.add_wide_int (wi::to_widest (t));
 
   if (CODE_CONTAINS_STRUCT (code, TS_REAL_CST))
 {


Add an alternative vector loop iv mechanism

2017-10-13 Thread Richard Sandiford
Normally we adjust the vector loop so that it iterates:

   (original number of scalar iterations - number of peels) / VF

times, enforcing this using an IV that starts at zero and increments
by one each iteration.  However, dividing by VF would be expensive
for variable VF, so this patch adds an alternative in which the IV
increments by VF each iteration instead.  We then need to take care
to handle possible overflow in the IV.

The new mechanism isn't used yet; a later patch replaces the
"if (1)" with a check for variable VF.  If the patch is OK, I'll
hold off applying it until the follow-on is ready to go in.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install when the time comes?

Richard


2017-10-13  Richard Sandiford  

gcc/
* tree-vect-loop-manip.c: Include gimple-fold.h.
(slpeel_make_loop_iterate_ntimes): Add step, final_iv and
niters_maybe_zero parameters.  Handle other cases besides a step of 1.
(vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
Add a path that uses a step of VF instead of 1, but disable it
for now.
(vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
and niters_no_overflow parameters.  Update calls to
slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
Create a new SSA name if the latter choses to use a ste other
than zero, and return it via niters_vector_mult_vf_var.
* tree-vect-loop.c (vect_transform_loop): Update calls to
vect_do_peeling, vect_gen_vector_loop_niters and
slpeel_make_loop_iterate_ntimes.
* tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling)
(vect_gen_vector_loop_niters): Update declarations after above changes.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
+++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
@@ -41,6 +41,7 @@ Software Foundation; either version 3, o
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
 #include "tree-ssa-loop-ivopts.h"
+#include "gimple-fold.h"
 
 /*
   Simple Loop Peeling Utilities
@@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
gimple_bb (update_phi));
 }
 
-/* Make the LOOP iterate NITERS times. This is done by adding a new IV
-   that starts at zero, increases by one and its limit is NITERS.
+/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
+   where NITERS is known to be outside the range [1, STEP - 1].
+   This is equivalent to making the loop execute NITERS / STEP
+   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
+   where M is the precision of NITERS.
+
+   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
+   to be >= STEP.  In the latter case N is always NITERS / STEP.
+
+   If FINAL_IV is nonnull, it is an SSA name that should be set to
+   N * STEP on exit from the loop.
 
Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
 
 void
-slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
+slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step,
+tree final_iv, bool niters_maybe_zero)
 {
   tree indx_before_incr, indx_after_incr;
   gcond *cond_stmt;
   gcond *orig_cond;
+  edge pe = loop_preheader_edge (loop);
   edge exit_edge = single_exit (loop);
   gimple_stmt_iterator loop_cond_gsi;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  tree init = build_int_cst (TREE_TYPE (niters), 0);
-  tree step = build_int_cst (TREE_TYPE (niters), 1);
   source_location loop_loc;
   enum tree_code code;
+  tree niters_type = TREE_TYPE (niters);
 
   orig_cond = get_loop_exit_condition (loop);
   gcc_assert (orig_cond);
   loop_cond_gsi = gsi_for_stmt (orig_cond);
 
+  tree init, limit;
+  if (!niters_maybe_zero && integer_onep (step))
+{
+  /* In this case we can use a simple 0-based IV:
+
+A:
+  x = 0;
+  do
+{
+  ...
+  x += 1;
+}
+  while (x < NITERS);  */
+  code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR;
+  init = build_zero_cst (niters_type);
+  limit = niters;
+}
+  else
+{
+  /* The following works for all values of NITERS except 0:
+
+B:
+  x = 0;
+  do
+{
+  ...
+  x += STEP;
+}
+  while (x <= NITERS - STEP);
+
+so that the loop continues to iterate if x + STEP - 1 < NITERS
+but stops if x + STEP - 1 >= NITERS.
+
+However, if NITERS is zero, x never hits a value above NITERS - STEP
+before wrapping around.  There are two obvious ways of dealing with
+this:
+
+

Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-13 Thread Jeff Law
On 10/13/2017 07:03 AM, Martin Liška wrote:
> On 10/13/2017 10:01 AM, Martin Liška wrote:
>> On 10/12/2017 11:54 PM, Jeff Law wrote:
>>> On 10/11/2017 12:13 AM, Martin Liška wrote:
 2017-10-10  Martin Liska  

PR tree-optimization/82493
* sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
(test_range_functions): New function.
(sbitmap_c_tests): Likewise.
* selftest-run-tests.c (selftest::run_tests): Run new tests.
* selftest.h (sbitmap_c_tests): New function.
>>> I went ahead and committed this along with a patch to fix the off-by-one
>>> error in live_bytes_read.  Bootstrapped and regression tested on x86.
>>>
>>> Actual patch attached for archival purposes.
>>>
>>> Jeff
>>>
>>
>> Thanks.
>>
>> I'll carry on and write another set of unit tests.
>>
>> Martin
>>
> 
> Sending patch candidate, may I install it after it finishes regression tests?
Yes.  Please.
Jeff




Re: Add wide_int version of inchash::hash::add_wide_int

2017-10-13 Thread Jeff Law
On 10/13/2017 08:05 AM, Richard Sandiford wrote:
> This patch adds an inchash hasher for wide_int-based types.
> It means that hash_tree no longer hashes TREE_INT_CST_EXT_NUNITS,
> but that was redundant with hashing the type.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-10-13  Richard Sandiford  
> 
> gcc/
>   * inchash.h (inchash::hash::add_wide_int): New function.
>   * lto-streamer-out.c (hash_tree): Use it.
OK.
jeff


Re: [PATCH 09/22] Enable building libbacktrace with Intel CET

2017-10-13 Thread Jeff Law
On 10/12/2017 05:58 PM, Ian Lance Taylor wrote:
> "Tsimbalist, Igor V"  writes:
> 
>> Enable building libbacktrace with CET options.
>>
>> libbacktrace/
>>  * configure.ac: Add CET_FLAGS to EXTRA_FLAGS.
>>  * aclocal.m4: Regenerate.
>>  * Makefile.in: Likewise.
>>  * configure: Likewise.
> 
>> +if test x$enable_cet = xyes; then
>> +  CET_FLAGS="-fcf-protection -mcet -include cet.h"
>> +fi
> 
> Is this really right?  Why the -include option?  CET protection sounds
> like it should work for any language, but -include is C-specific and
> doesn't seem to have anything to do with code generation.
Agreed.  And more generally it seems that we want to be able to just use
the -f/-m options to enable/disable CET checking.  If we're having to
include additional files to get CET support, then we need to go back and
rethink things somewhere.

Jeff


Re: [PATCH] Fix bitmap_bit_in_range_p (PR tree-optimization/82493).

2017-10-13 Thread Jeff Law
On 10/13/2017 07:02 AM, Martin Liška wrote:
> On 10/12/2017 11:54 PM, Jeff Law wrote:
>> On 10/11/2017 12:13 AM, Martin Liška wrote:
>>> 2017-10-10  Martin Liska  
>>>
>>> PR tree-optimization/82493
>>> * sbitmap.c (bitmap_bit_in_range_p): Fix the implementation.
>>> (test_range_functions): New function.
>>> (sbitmap_c_tests): Likewise.
>>> * selftest-run-tests.c (selftest::run_tests): Run new tests.
>>> * selftest.h (sbitmap_c_tests): New function.
>> I went ahead and committed this along with a patch to fix the off-by-one
>> error in live_bytes_read.  Bootstrapped and regression tested on x86.
>>
>> Actual patch attached for archival purposes.
>>
>> Jeff
>>
> 
> Hello.
> 
> I wrote a patch that adds various gcc_checking_asserts and I hit following:
> 
> ./xgcc -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90 -c 
> -O2
> during GIMPLE pass: dse
> /home/marxin/Programming/gcc/gcc/testsuite/gfortran.dg/char_result_12.f90:7:0:
> 
>   program testat
>  
> internal compiler error: in bitmap_check_index, at sbitmap.h:105
> 0x1c014c1 bitmap_check_index
>   ../../gcc/sbitmap.h:105
> 0x1c01fa7 bitmap_bit_in_range_p(simple_bitmap_def const*, unsigned int, 
> unsigned int)
>   ../../gcc/sbitmap.c:335
> 0x1179002 live_bytes_read
>   ../../gcc/tree-ssa-dse.c:497
> 0x117935a dse_classify_store
>   ../../gcc/tree-ssa-dse.c:595
> 0x1179947 dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)
>   ../../gcc/tree-ssa-dse.c:786
> 0x1179b6e dse_dom_walker::before_dom_children(basic_block_def*)
>   ../../gcc/tree-ssa-dse.c:853
> 0x1a6f659 dom_walker::walk(basic_block_def*)
>   ../../gcc/domwalk.c:308
> 0x1179cb9 execute
>   ../../gcc/tree-ssa-dse.c:907
> 
> Where we call:
> Breakpoint 1, bitmap_bit_in_range_p (bmap=0x29d6cd0, start=0, end=515) at 
> ../../gcc/sbitmap.c:335
> 335 bitmap_check_index (bmap, end);
> (gdb) p *bmap
> $1 = {n_bits = 256, size = 4, elms = {255}}
> 
> Is it a valid call or should caller check indices?
It doesn't look valid to me.  I'll dig into it.

In general the sbitmap interface requires callers to DTRT -- failure can
easily lead to an out of bounds read or write.  It's one of the things I
really dislike about the sbitmap implementation.

So it's safe to assume that I'm fully supportive of adding more testing
to catch this kind thing.

Jeff


[PATCH GCC]Try harder to find base object by expanding base address

2017-10-13 Thread Bin Cheng
Hi,
I ran into this when investigating PR82369 which we failed to find base object.
This simple patch tries harder to find base object by expanding base address
in alloc_iv.  In general, we don't want to do aggressive expansion, but this
case is fine because finding base object means reduction happened during the
expansion.  And it's good to have base object for address type iv_uses.
Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin
2017-10-12  Bin Cheng  

* tree-scalar-evolution.c (alloc_iv): New parameter controlling
base expansion for finding base object.
(find_interesting_uses_address): Adjust call to alloc_iv.

gcc/testsuite
2017-10-12  Bin Cheng  

* gcc.dg/tree-ssa/ivopt_6.c: New test.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c
new file mode 100644
index 000..de94b88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+typedef unsigned long int uintptr_t;
+typedef long unsigned int size_t;
+typedef long int ptrdiff_t;
+
+void foo (unsigned char *restrict dst, unsigned char *restrict src, size_t 
bytes)
+{
+  uintptr_t end_dst = (uintptr_t) (dst + bytes);
+  uintptr_t srcu = (uintptr_t) src, dstu = (uintptr_t) dst;
+  ptrdiff_t src_dst_offset = srcu - 2 * dstu;
+
+  do {
+ unsigned char v0 = *(unsigned char *) (dstu * 2 + src_dst_offset);
+ unsigned char v1 = *(unsigned char *) ((dstu * 2 + src_dst_offset) + 1);
+ unsigned char res = v1 + v0;
+
+ *((unsigned char*) dstu) = res;
+ dstu += 16;
+  } while (dstu < end_dst);
+}
+/* { dg-final { scan-tree-dump-times "Type:\tADDRESS" 3 "ivopts" } } */
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index bbea619..4ccdf32 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -1160,11 +1160,12 @@ contain_complex_addr_expr (tree expr)
 }
 
 /* Allocates an induction variable with given initial value BASE and step STEP
-   for loop LOOP.  NO_OVERFLOW implies the iv doesn't overflow.  */
+   for loop LOOP.  NO_OVERFLOW implies the iv doesn't overflow.  If EXPAND_P
+   is true, this function expands base address to find base object.  */
 
 static struct iv *
 alloc_iv (struct ivopts_data *data, tree base, tree step,
- bool no_overflow = false)
+ bool no_overflow = false, bool expand_p = false)
 {
   tree expr = base;
   struct iv *iv = (struct iv*) obstack_alloc (&data->iv_obstack,
@@ -1185,8 +1186,22 @@ alloc_iv (struct ivopts_data *data, tree base, tree step,
   base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb));
 }
 
+  tree base_object = determine_base_object (base);
+  /* Try harder to find base object by expanding base.  */
+  if (expand_p && base_object == NULL_TREE)
+{
+  aff_tree comb;
+  expr = unshare_expr (base);
+  tree_to_aff_combination_expand (base, TREE_TYPE (base), &comb,
+ &data->name_expansion_cache);
+  base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb));
+  base_object = determine_base_object (base);
+  /* Fall back to unexpanded base if no base object is found.  */
+  if (!base_object)
+   base = expr;
+}
   iv->base = base;
-  iv->base_object = determine_base_object (base);
+  iv->base_object = base_object;
   iv->step = step;
   iv->biv_p = false;
   iv->nonlin_use = NULL;
@@ -2365,7 +2380,7 @@ find_interesting_uses_address (struct ivopts_data *data, 
gimple *stmt,
}
 }
 
-  civ = alloc_iv (data, base, step);
+  civ = alloc_iv (data, base, step, false, true);
   /* Fail if base object of this memory reference is unknown.  */
   if (civ->base_object == NULL_TREE)
 goto fail;


RE: [PATCH 09/22] Enable building libbacktrace with Intel CET

2017-10-13 Thread Tsimbalist, Igor V
> -Original Message-
> From: Ian Lance Taylor [mailto:i...@airs.com]
> Sent: Friday, October 13, 2017 1:59 AM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org; Jeff Law 
> Subject: Re: [PATCH 09/22] Enable building libbacktrace with Intel CET
> 
> "Tsimbalist, Igor V"  writes:
> 
> > Enable building libbacktrace with CET options.
> >
> > libbacktrace/
> > * configure.ac: Add CET_FLAGS to EXTRA_FLAGS.
> > * aclocal.m4: Regenerate.
> > * Makefile.in: Likewise.
> > * configure: Likewise.
> 
> > +if test x$enable_cet = xyes; then
> > +  CET_FLAGS="-fcf-protection -mcet -include cet.h"
> > +fi
> 
> Is this really right?  Why the -include option?  CET protection sounds like it
> should work for any language, but -include is C-specific and doesn't seem to
> have anything to do with code generation.

This file is included to simplify building a library that might have assembler 
files.
This is an auxiliary file to automate creation of a special section in an 
output object
file. Without it every assembler file has to be modified by hand to include a 
special
section. This "-include cet.h " option is specified at a high level to not 
bother if a
library has or does not have assembler files. The option either has no effect if
all source files are C/C++ or used only for assembler file processing. The file 
itself
has an assembler code. The same code is generated by the compiler for each
input C/C++/etc. files.

In real life a user who is going to write an assemble code and have it CET 
compatible
has to add a special section to mark the object file as CET compatible.

Igor

> Of course, for libbacktrace, that is a generated file.  The patch to
> libbacktrace/configure.ac is fine if the general approach is approved.
> 
> Ian


Re: Rename inchash::hash::add_wide_int

2017-10-13 Thread Jeff Law
On 10/13/2017 08:04 AM, Richard Sandiford wrote:
> The name inchash::add_wide_int is a bit misleading, since it sounds
> like it's hashing a wide_int.  This patch renames it to add_hwi instead.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-10-13  Richard Sandiford  
> 
> gcc/
>   * inchash.h (inchash::hash::add_wide_int): Rename to...
>   (inchash::hash::add_hwi): ...this.
>   * ipa-devirt.c (hash_odr_vtable): Update accordingly.
>   (polymorphic_call_target_hasher::hash): Likewise.
>   * ipa-icf.c (sem_function::get_hash, sem_function::init): Likewise.
>   (sem_item::add_expr, sem_item::add_type, sem_variable::get_hash)
>   (sem_item_optimizer::update_hash_by_addr_refs): Likewise.
>   * lto-streamer-out.c (hash_tree): Likewise.
>   * optc-save-gen.awk: Likewise.
>   * tree.c (add_expr): Likewise.
OK.
jeff


Re: [PING from 2013][PATCH] fixincludes: handle symlinks with multiple slashes

2017-10-13 Thread Jeff Law
On 08/17/2017 02:31 PM, Sergei Trofimovich wrote:
> Looks like the following patch falled through the cracks
> https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01397.html
> https://bugs.gentoo.org/show_bug.cgi?id=434180#c16
Yea.  Looks like it did fall through the cracks.

I've updated Mike's email address and installed the patch.

Thanks,
jeff



signature.asc
Description: OpenPGP digital signature


Re: Make more use of df_read_modify_subreg_p

2017-10-13 Thread Jeff Law
On 08/24/2017 12:25 PM, Richard Sandiford wrote:
> Segher Boessenkool  writes:
>> On Wed, Aug 23, 2017 at 11:49:03AM +0100, Richard Sandiford wrote:
>>> This patch uses df_read_modify_subreg_p to check whether writing
>>> to a subreg would preserve some of the existing contents.
>>
>> combine does not keep the DF info up-to-date -- but that is no
>> problem here, df_read_modify_subreg_p uses no DF info at all.  Maybe
>> it should not have "df_" in the name?
> 
> Yeah, I guess that's a bit confusing.  I've just posted a patch
> to rename it.
> 
> Here's a version of the patch that applies on top of that one.
> Tested as before.  OK to install?
> 
> Thanks,
> Richard
> 
> 
> 2017-08-24  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * caller-save.c (mark_referenced_regs):  Use read_modify_subreg_p.
>   * combine.c (find_single_use_1): Likewise.
>   (expand_field_assignment): Likewise.
>   (move_deaths): Likewise.
>   * lra-constraints.c (simplify_operand_subreg): Likewise.
>   (curr_insn_transform): Likewise.
>   * lra.c (collect_non_operand_hard_regs): Likewise.
>   (add_regs_to_insn_regno_info): Likewise.
>   * rtlanal.c (reg_referenced_p): Likewise.
>   (covers_regno_no_parallel_p): Likewise.
> 


> Index: gcc/combine.c
> ===
> --- gcc/combine.c 2017-08-24 19:22:26.163269637 +0100
> +++ gcc/combine.c 2017-08-24 19:22:45.218100970 +0100
> @@ -579,10 +579,7 @@ find_single_use_1 (rtx dest, rtx *loc)
> && !REG_P (SET_DEST (x))
> && ! (GET_CODE (SET_DEST (x)) == SUBREG
>   && REG_P (SUBREG_REG (SET_DEST (x)))
> - && (((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x
> -   + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
> - == ((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
> -  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD
> + && !read_modify_subreg_p (SET_DEST (x
>   break;
Is this correct for a paradoxical subreg?  ISTM the original code was
checking for a subreg that just changes the mode, but not the size
(subreg:SI (reg:SF)) or (subreg:DF (reg:DI)) kinds of things.  It would
reject a paradoxical AFAICT.

As written now I think the condition would be true for a paradoxical.

Similarly for the other two instances in combine.c and the changes in
rtlanal.c.

In some of those cases you might be able to argue that it's the right
way to handle a paradoxical.  I haven't thought a whole lot about that
angle, but mention it as a possible way your change might still be correct.


Jeff


Re: Add wider_subreg_mode helper functions

2017-10-13 Thread Jeff Law
On 08/28/2017 02:26 AM, Richard Sandiford wrote:
> This patch adds helper functions that say which of the two modes
> involved in a subreg is the larger, preferring the outer mode in
> the event of a tie.  It also converts IRA and reload to track modes
> instead of byte sizes, since this is slightly more convenient when
> variable-sized modes are added later.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu, and by checking
> there was no change in the testsuite assembly output for at least
> one target per CPU.  OK to install?
> 
> Richard
> 
> 
> 2017-08-28  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (wider_subreg_mode): New function.
>   * ira.h (ira_sort_regnos_for_alter_reg): Take a machine_mode *
>   rather than an unsigned int *.
>   * ira-color.c (regno_max_ref_width): Replace with...
>   (regno_max_ref_mode): ...this new variable.
>   (coalesced_pseudo_reg_slot_compare): Update accordingly.
>   Use wider_subreg_mode.
>   (ira_sort_regnos_for_alter_reg): Likewise.  Take a machine_mode *
>   rather than an unsigned int *.
>   * lra-constraints.c (uses_hard_regs_p): Use wider_subreg_mode.
>   (process_alt_operands): Likewise.
>   (invariant_p): Likewise.
>   * lra-spills.c (assign_mem_slot): Likewise.
>   (add_pseudo_to_slot): Likewise.
>   * lra.c (collect_non_operand_hard_regs): Likewise.
>   (add_regs_to_insn_regno_info): Likewise.
>   * reload1.c (regno_max_ref_width): Replace with...
>   (regno_max_ref_mode): ...this new variable.
>   (reload): Update accordingly.  Update call to
>   ira_sort_regnos_for_alter_reg.
>   (alter_reg): Update to use regno_max_ref_mode.  Call wider_subreg_mode.
>   (init_eliminable_invariants): Update to use regno_max_ref_mode.
>   (scan_paradoxical_subregs): Likewise.
OK.
jeff


Re: [PATCH C++] Fix PR82357 - bogus "cannot bind bitfield" error

2017-10-13 Thread Jason Merrill
On Fri, Oct 13, 2017 at 5:40 AM, Markus Trippelsdorf
 wrote:
> r253266 introduced a bogus "cannot bind bitfield" error that breaks
> building Chromium and Node.js.
> Fix by removing the ugly goto.
>
> Tested on ppc64le.
> Ok for trunk?

No, this just undoes my change, so we go back to not doing type
checking for non-dependent static casts.  Better I think to avoid the
call to build_static_cast in the first place, by teaching
stabilize_reference that it can return a NON_DEPENDENT_EXPR unchanged.
How does this (untested) patch work for you?
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index e21ff6a1572..366f46f1506 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -333,6 +333,10 @@ cp_stabilize_reference (tree ref)
 {
   switch (TREE_CODE (ref))
 {
+case NON_DEPENDENT_EXPR:
+  /* We aren't actually evaluating this.  */
+  return ref;
+
 /* We need to treat specially anything stabilize_reference doesn't
handle specifically.  */
 case VAR_DECL:


Re: [PATCH, Makefile] improve default cpu selection for e500v2

2017-10-13 Thread Jeff Law
On 08/31/2017 12:54 PM, Olivier Hainque wrote:
> Hello,
> 
> gcc can be configured with an "e500v2" cpu target
> name, conveying SPE with double precision floats.
> 
> config.gcc already has a provision for a good default
> cpu selection for SPE with double precision floats
> when the latter is explicitly requested with an explicit
> --enable command line option. This is:
> 
>   # If there is no $with_cpu option, try to infer one from ${target}.
>   # This block sets nothing except for with_cpu.
>   ...
>   case ${target} in
>   ...
> powerpc*-*-*spe*)
>   if test x$enable_e500_double = xyes; then
>  with_cpu=8548
>   else
>  with_cpu=8540
>   fi   
>   ;;
> 
> The attached patch is a proposal to refine this to select 8548 also
> when we were configured for an e500v2 target cpu (canonicalized to
> match powerpc-*spe), regardless of enable_e500_double.
> 
> We have been using something like this in production for a few
> years in-house, lately with gcc-6 based toolchains for VxWorks or
> bareboard configurations.
> 
> I also checked that
> - e500v2-wrs-vxworks builds work with gcc-7, that the default cpu is
>   indeed properly set to 8548, and that the built toolchains pass Ada
>   ACATS tests for a couple of runtime library variants (kernel and rtp).
> 
> - a mainline build for powerpc-eabispe without --enable-e500-double
>   defaults to 8540
> 
> - a mainline build for powerpc-eabispe with --enable-e500-double
>   defaults to 8548
> 
> - a mainline build for e500v2-wrs-vxworks without --enable-e500-double
>   defaults to 8548
> 
> OK to commit ?
> 
> Thanks in advance
> 
> 2017-08-31  Olivier Hainque  
> 
> * gcc/config.gcc (powerpc*-*-*spe*): Pick 8548 as the
>   default with_cpu for an e500v2 target cpu name, in addition
>   to --enable-e500-double.
GIven this hits the powerpcspe port, I'd like Andrew Jenner to chime in
as the powerpcspe maintainer.  I've added him on CC.

jeff


Re: Make more use of df_read_modify_subreg_p

2017-10-13 Thread Richard Sandiford
Jeff Law  writes:
> On 08/24/2017 12:25 PM, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>>> On Wed, Aug 23, 2017 at 11:49:03AM +0100, Richard Sandiford wrote:
 This patch uses df_read_modify_subreg_p to check whether writing
 to a subreg would preserve some of the existing contents.
>>>
>>> combine does not keep the DF info up-to-date -- but that is no
>>> problem here, df_read_modify_subreg_p uses no DF info at all.  Maybe
>>> it should not have "df_" in the name?
>> 
>> Yeah, I guess that's a bit confusing.  I've just posted a patch
>> to rename it.
>> 
>> Here's a version of the patch that applies on top of that one.
>> Tested as before.  OK to install?
>> 
>> Thanks,
>> Richard
>> 
>> 
>> 2017-08-24  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * caller-save.c (mark_referenced_regs):  Use read_modify_subreg_p.
>>  * combine.c (find_single_use_1): Likewise.
>>  (expand_field_assignment): Likewise.
>>  (move_deaths): Likewise.
>>  * lra-constraints.c (simplify_operand_subreg): Likewise.
>>  (curr_insn_transform): Likewise.
>>  * lra.c (collect_non_operand_hard_regs): Likewise.
>>  (add_regs_to_insn_regno_info): Likewise.
>>  * rtlanal.c (reg_referenced_p): Likewise.
>>  (covers_regno_no_parallel_p): Likewise.
>> 
>
>
>> Index: gcc/combine.c
>> ===
>> --- gcc/combine.c2017-08-24 19:22:26.163269637 +0100
>> +++ gcc/combine.c2017-08-24 19:22:45.218100970 +0100
>> @@ -579,10 +579,7 @@ find_single_use_1 (rtx dest, rtx *loc)
>>&& !REG_P (SET_DEST (x))
>>&& ! (GET_CODE (SET_DEST (x)) == SUBREG
>>  && REG_P (SUBREG_REG (SET_DEST (x)))
>> -&& (((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x
>> -  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
>> -== ((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
>> - + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD
>> +&& !read_modify_subreg_p (SET_DEST (x
>>  break;
> Is this correct for a paradoxical subreg?  ISTM the original code was
> checking for a subreg that just changes the mode, but not the size
> (subreg:SI (reg:SF)) or (subreg:DF (reg:DI)) kinds of things.  It would
> reject a paradoxical AFAICT.
>
> As written now I think the condition would be true for a paradoxical.
>
> Similarly for the other two instances in combine.c and the changes in
> rtlanal.c.
>
> In some of those cases you might be able to argue that it's the right
> way to handle a paradoxical.  I haven't thought a whole lot about that
> angle, but mention it as a possible way your change might still be correct.

Yeah, I agree this'll change the handling of paradoxical subregs that
occupy more words than the SUBREG_REG, but I think the new version is
correct.  The comment says:

  /* If the destination is anything other than CC0, PC, a REG or a SUBREG
 of a REG that occupies all of the REG, the insn uses DEST if
 it is mentioned in the destination or the source.  Otherwise, we
 need just check the source.  */

and a paradoxical subreg does occupy all of the SUBREG_REG.

The code is trying to work out whether the instruction "reads" the
destination if you view partial stores as a read of the old value
followed by a write of a partially-updated value, whereas writing to a
paradoxical subreg preserves none of the original value.  And that's
also the semantics that the current code uses for "normal" word-sized
paradoxical subregs.

Thanks,
Richard


Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

2017-10-13 Thread Wilco Dijkstra
Hi,

Sorry for the delay - I finally had a chance to look at this again. 
I'll start with alloca:

@@ -15245,6 +15455,28 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
 }
 }
 
+/* It has been decided that to allow up to 1kb of outgoing argument
+   space to be allocated w/o probing.  If more than 1kb of outgoing
+   argment space is allocated, then it must be probed and the last
+   probe must occur no more than 1kbyte away from the end of the
+   allocated space.
+
+   This implies that the residual part of an alloca allocation may
+   need probing in cases where the generic code might not otherwise
+   think a probe is needed.
+
+   This target hook returns TRUE when allocating RESIDUAL bytes of
+   alloca space requires an additional probe, otherwise FALSE is
+   returned.  */
+
+static bool
+aarch64_stack_clash_protection_final_dynamic_probe (rtx residual)
+{
+  return (residual == CONST0_RTX (Pmode)
+ || GET_CODE (residual) != CONST_INT
+ || INTVAL (residual) >= 1024);
+}
+

The const0 check is wrong - for alloca(0) we do not want to insert a probe!

I don't get how this works - probes are done at strange offsets, there is always
a final probe (even if residual is < 1024), and the alignment info seems to be 
lost
somehow, so generated code end up quite bad:

void *p;
void small_alloca (int x)
{
  if (x > 100)
   p = __builtin_alloca (4096);
}

    sub sp, sp, #4096
    str xzr, [sp, 4088]
    mov x0, sp
    str xzr, [x0], 15  *** +15 and extra probe
    and x0, x0, -16    *** already aligned

The same unnecessary probe happens for a variable alloca after the loop:

void alloca (int x)
{
  if (x > 100)
   p = __builtin_alloca (x);
}

    add x0, x0, 15
    and x0, x0, -16
    and x1, x0, -4096
    sub x1, sp, x1
.L22:
    mov x2, sp
    cmp x2, x1
    beq .L23
    sub sp, sp, #4096
    str xzr, [sp, 4088]
    b   .L22
.L23:
    and x0, x0, 4095
    sub x1, x0, #8
    sub sp, sp, x0
    str xzr, [sp, x1]
    str xzr, [sp, -8]
    mov x1, sp


That does lots of unnecessary operations, including always an extra probe 
(which isn't needed since the final adjustment is always less than the max
probe distance). I think it should be:

    add x0, x0, 15
    subs x1, x0, 4096
    bhi .L22
.L23:
    and x0, x0, 4080
    sub sp, sp, x0
    str xzr, [sp, probe_offset - 8]
    mov x1, sp


.L22: (loop marked as unlikely so out of the way)
    subs x1, x1, 4096
    sub sp, sp, #4096
    str xzr, [sp, probe_offset]
    bhi .L22
    b .L23

On AArch64 probe_offset would be min (1024, outgoing_args_size). On other
targets it could be 0 or (probe distance - 8) depending on the probe design. 
This
means you always do exactly the number of probes that are needed and avoid
the corner case of alloca (0). Typically only extra executed instructions are a
compare+non-taken branch plus a store. Codesize overhead is reduced by a
third (from 12 to 8 instructions).

Wilco

[RFC, PR 80689] Copy small aggregates element-wise

2017-10-13 Thread Martin Jambor
Hi,

I'd like to request comments to the patch below which aims to fix PR
80689, which is an instance of a store-to-load forwarding stall on
x86_64 CPUs in the Image Magick benchmark, which is responsible for a
slow down of up to 9% compared to gcc 6, depending on options and HW
used.  (Actually, I have just seen 24% in one specific combination but
for various reasons can no longer verify it today.)

The revision causing the regression is 237074, which increased the
size of the mode for copying aggregates "by pieces" to 128 bits,
incurring big stalls when the values being copied are also still being
stored in a smaller data type or if the copied values are loaded in a
smaller types shortly afterwards.  Such situations happen in Image
Magick even across calls, which means that any non-IPA flow-sensitive
approach would not detect them.  Therefore, the patch simply changes
the way we copy small BLKmode data that are simple combinations of
records and arrays (meaning no unions, bit-fields but also character
arrays are disallowed) and simply copies them one field and/or element
at a time.

"Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
(the structure in the benchmark has 32 bytes) but is subject to change
after more benchmarking and is actually zero - meaning element copying
never happens - on other architectures.  I believe that any
architecture with a store buffer can benefit but it's probably better
to leave it to their maintainers to find a different default value.  I
am not sure this is how such HW-dependant decisions should be done and
is the primary reason, why I am sending this RFC first.

I have decided to implement this change at the expansion level because
at that point the type information is still readily available and at
the same time we can also handle various implicit copies, for example
those passing a parameter.  I found I could re-use some bits and
pieces of tree-SRA and so I did, creating tree-sra.h header file in
the process.

I am fully aware that in the final patch the new parameter, or indeed
any new parameters, need to be documented.  I have skipped that
intentionally now and will write the documentation if feedback here is
generally good.

I have bootstrapped and tested this patch on x86_64-linux, with
different values of the parameter and only found problems with
unreasonably high values leading to OOM.  I have done the same with a
previous version of the patch which was equivalent to the limit being
64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
into failures of tests which assumed that structure padding was copied
in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
for example gcc.dg/vmx/varargs-4.c).

The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
8% at -O2 and -Ofast compilation levels respectively on one particular
new AMD CPU and by 6% and 3% on one particular old Intel machine.

Thanks in advance for any comments,

Martin


2017-10-12  Martin Jambor  

PR target/80689
* tree-sra.h: New file.
* ipa-prop.h: Moved declaration of build_ref_for_offset to
tree-sra.h.
* expr.c: Include params.h and tree-sra.h.
(emit_move_elementwise): New function.
(store_expr_with_bounds): Optionally use it.
* ipa-cp.c: Include tree-sra.h.
* params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
* config/i386/i386.c (ix86_option_override_internal): Set
PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
* tree-sra.c: Include tree-sra.h.
(scalarizable_type_p): Renamed to
simple_mix_of_records_and_arrays_p, made public, renamed the
second parameter to allow_char_arrays.
(extract_min_max_idx_from_array): New function.
(completely_scalarize): Moved bits of the function to
extract_min_max_idx_from_array.

testsuite/
* gcc.target/i386/pr80689-1.c: New test.
---
 gcc/config/i386/i386.c|   4 ++
 gcc/expr.c| 103 --
 gcc/ipa-cp.c  |   1 +
 gcc/ipa-prop.h|   4 --
 gcc/params.def|   6 ++
 gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++
 gcc/tree-sra.c|  86 +++--
 gcc/tree-sra.h|  33 ++
 8 files changed, 233 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
 create mode 100644 gcc/tree-sra.h

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1ee8351c21f..87f602e7ead 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
 ix86_tune_cost->l2_cache_size,
 opts->x_param_values,
 opts_set->x_param_values);
+  maybe_se

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2017-10-13 Thread Jeff Law
On 09/06/2017 03:26 AM, Jakub Jelinek wrote:
> On Tue, Sep 05, 2017 at 11:21:48PM -0600, Jeff Law wrote:
>> --- a/gcc/tree-ssa-reassoc.c
>> +++ b/gcc/tree-ssa-reassoc.c
>> @@ -5763,14 +5763,15 @@ reassociate_bb (basic_block bb)
>>   "Width = %d was chosen for reassociation\n", 
>> width);
>>  
>>  
>> -  /* For binary bit operations, if the last operand in
>> - OPS is a constant, move it to the front.  This
>> - helps ensure that we generate (X & Y) & C rather
>> - than (X & C) & Y.  The former will often match
>> - a canonical bit test when we get to RTL.  */
>> -  if ((rhs_code == BIT_AND_EXPR
>> -   || rhs_code == BIT_IOR_EXPR
>> -   || rhs_code == BIT_XOR_EXPR)
>> +  /* For binary bit operations, if there are at least 3
>> + operands and the last last operand in OPS is a constant,
>> + move it to the front.  This helps ensure that we generate
>> + (X & Y) & C rather than (X & C) & Y.  The former will
>> + often match a canonical bit test when we get to RTL.  */
>> +  if (ops.length () != 2
> 
> So wouldn't it be clearer to write ops.length () > 2 ?
> if (ops.length () == 0)
> else if (ops.length () == 1)
> come earlier, so it is the same thing, but might help the reader.
Agreed.  It's a trivial change, but I'll spin it with some other testing
that's in progress.


> 
>> +  && (rhs_code == BIT_AND_EXPR
>> +  || rhs_code == BIT_IOR_EXPR
>> +  || rhs_code == BIT_XOR_EXPR)
>>&& TREE_CODE (ops.last ()->op) == INTEGER_CST)
>>  std::swap (*ops[0], *ops[ops_num - 1]);
> 
> Don't you then want to put the constant as second operand rather than first,
> i.e. swap with *ops[1]?
I don't think it matters in practice. I recall being concerned that we
could end up with non-canonical gimple, but we don't AFAICT.

> And doesn't swap_ops_for_binary_stmt undo it again?
No.  It doesn't.  The rank checks get in the way IIRC.  FWIW that's
where I'd originally put the transformation, but Richi wanted it in the
caller.

Jeff


Re: [PATCH, Makefile] improve default cpu selection for e500v2

2017-10-13 Thread Olivier Hainque

> On Oct 13, 2017, at 18:07 , Jeff Law  wrote:
> 
>>* gcc/config.gcc (powerpc*-*-*spe*): Pick 8548 as the
>>  default with_cpu for an e500v2 target cpu name, in addition
>>  to --enable-e500-double.
> GIven this hits the powerpcspe port, I'd like Andrew Jenner to chime in
> as the powerpcspe maintainer.

Sure.

>  I've added him on CC.

Great, thanks Jeff!



Re: [PATCH 09/22] Enable building libbacktrace with Intel CET

2017-10-13 Thread Ian Lance Taylor
"Tsimbalist, Igor V"  writes:

>
> This file is included to simplify building a library that might have
> assembler files.
> This is an auxiliary file to automate creation of a special section in
> an output object
> file. Without it every assembler file has to be modified by hand to
> include a special
> section. This "-include cet.h " option is specified at a high level to
> not bother if a
> library has or does not have assembler files. The option either has no effect 
> if
> all source files are C/C++ or used only for assembler file
> processing. The file itself
> has an assembler code. The same code is generated by the compiler for each
> input C/C++/etc. files.
>
> In real life a user who is going to write an assemble code and have it
> CET compatible
> has to add a special section to mark the object file as CET compatible.

I guess I don't understand how you can assume that general assembly code
is CET compatible.  And if you know it is CET compatible then adding the
section seems simple enough; people already do it routinely for
.note.GNU-stack.

In any case a -include file such as you describe does not belong in a
general FLAGS variable, it belongs in CPPFLAGS or, ideally, ASPPFLAGS if
there were such a Make variable.

Ian


Re: [PATCH] Fix __mulv[dt]i3 and expand_mul_overflow (PR target/82274)

2017-10-13 Thread Ian Lance Taylor via gcc-patches
On Fri, Sep 22, 2017 at 6:03 AM, Jakub Jelinek  wrote:
>
> 2017-09-22  Jakub Jelinek  
>
> PR target/82274
> * internal-fn.c (expand_mul_overflow): If both operands have
> the same highpart of -1 or 0 and the topmost bit of lowpart
> is different, overflow is if res <= 0 rather than res < 0.
>
> * libgcc2.c (__mulvDI3): If both operands have
> the same highpart of -1 and the topmost bit of lowpart is 0,
> multiplication overflows even if both lowparts are 0.
>
> * gcc.dg/pr82274-1.c: New test.
> * gcc.dg/pr82274-2.c: New test.

This is OK.

Ian


Re: Check that there are no missing probabilities

2017-10-13 Thread Jakub Jelinek
On Fri, Oct 13, 2017 at 03:38:33PM +0200, Jan Hubicka wrote:
> Hi,
> this patch enables check that no edge probabilities are missing. 
> 
> Honza
> 
>   * cfghooks.c (verify_flow_info): Check that edge probabilities are
>   set.

This broke bootstrap on x86_64-linux with Ada
(--enable-checking=yes,rtl,extra).

>From what I can see, decompose_multiword_subregs has:
1619  /* Split the block after insn.  There will be a 
fallthru
1620 edge, which is OK so we keep it.  We have to 
create the
1621 exception edges ourselves.  */
1622  fallthru = split_block (bb, insn);
1623  rtl_make_eh_edge (NULL, bb, BB_END (bb));
1624  bb = fallthru->dest;
1625  insn = BB_HEAD (bb);
and rtl_make_eh_edge calls
161   make_label_edge (edge_cache, src, label,
162EDGE_ABNORMAL | EDGE_EH
163| (CALL_P (insn) ? EDGE_ABNORMAL_CALL : 0));

No idea what should initialize the probabilities.  Do probabilities make
any sense at all for EH edges (or abnormal edges)?

> 
> Index: cfghooks.c
> ===
> --- cfghooks.c(revision 253694)
> +++ cfghooks.c(working copy)
> @@ -160,6 +161,13 @@ verify_flow_info (void)
>e->src->index, e->dest->index);
> err = 1;
>   }
> +   if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
> +   && !e->probability.initialized_p ())
> + {
> +   error ("Uninitialized probability of edge %i->%i", e->src->index,
> +  e->dest->index);
> +   err = 1;
> + }
> if (!e->probability.verify ())
>   {
> error ("verify_flow_info: Wrong probability of edge %i->%i",

Jakub


Re: [PATCH, alpha] Move linux-specific specfile definitions to linux.h

2017-10-13 Thread Jeff Law
On 09/03/2017 09:47 AM, Maya Rashish wrote:
> Hi, in my first attempt to fix a build issue I found that the order of
> tm files matters, would prefer to move linux-looking parts of elf.h to
> linux.h.
> 
> other targets that include alpha/elf.h besides linux:
> openbsd:  provides their own STARTFILE_SPEC and ENDFILE_SPEC in later file:
> https://github.com/openbsd/ports/blob/master/lang/gcc/6/patches/patch-gcc_config_alpha_openbsd_h
> freebsd:  dropped alpha in freebsd 7.0 (2008)
> 
> Built trunk on netbsd/alpha (until I out of spaced with a working stage3
> after two days :-)) with some extra modifications.
So we can't depend on patches that OpenBSD applies.  What's important is
what is in the official GCC sources.

I'd like to see some discussion about what these macros should look like
for the *bsd ports.  Merely removing them from elf.h without providing
something for the *bsd ports seems wrong to me.


Jeff


Make istreambuf_iterator::_M_sbuf immutable and add debug checks

2017-10-13 Thread François Dumont

Hi

     Here is the last patch I will propose for istreambuf_iterator. 
This is mostly to remove the mutable keyword on _M_sbuf.


     To do so I had to reset _M_sbuf in valid places that is to say 
constructors and increment operators. Despite that we might still have 
eof iterators with _M_sbuf not null when you have for instance several 
iterator instance but only increment one. It seems fine to me because 
even in this case iterator will still be considered as eof and using 
several istreambuf_iterator to go through a given streambuf is not usual.


     As _M_sbuf is immutable I have been able to restore the simple 
call to _M_at_eof() in the increment operators debug check.


Ok to commit after successful tests ?

François



diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 081afe5..ef22aa9 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -94,8 +94,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // the "end of stream" iterator value.
   // NB: This implementation assumes the "end of stream" value
   // is EOF, or -1.
-  mutable streambuf_type*	_M_sbuf;
-  int_type			_M_c;
+  streambuf_type*	_M_sbuf;
+  int_type		_M_c;
 
 public:
   ///  Construct end of input stream iterator.
@@ -110,11 +110,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ///  Construct start of input stream iterator.
   istreambuf_iterator(istream_type& __s) _GLIBCXX_USE_NOEXCEPT
-  : _M_sbuf(__s.rdbuf()), _M_c(traits_type::eof()) { }
+  : _M_sbuf(__s.rdbuf()), _M_c(traits_type::eof())
+  { _M_init(); }
 
   ///  Construct start of streambuf iterator.
   istreambuf_iterator(streambuf_type* __s) _GLIBCXX_USE_NOEXCEPT
-  : _M_sbuf(__s), _M_c(traits_type::eof()) { }
+  : _M_sbuf(__s), _M_c(traits_type::eof())
+  { _M_init(); }
 
   ///  Return the current character pointed to by iterator.  This returns
   ///  streambuf.sgetc().  It cannot be assigned.  NB: The result of
@@ -138,13 +140,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istreambuf_iterator&
   operator++()
   {
-	__glibcxx_requires_cond(_M_sbuf &&
-(!_S_is_eof(_M_c) || !_S_is_eof(_M_sbuf->sgetc())),
+	__glibcxx_requires_cond(!_M_at_eof(),
 _M_message(__gnu_debug::__msg_inc_istreambuf)
 ._M_iterator(*this));
 
 	_M_sbuf->sbumpc();
 	_M_c = traits_type::eof();
+
+	if (_S_is_eof(_M_sbuf->sgetc()))
+	  _M_sbuf = 0;
+
 	return *this;
   }
 
@@ -152,14 +157,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istreambuf_iterator
   operator++(int)
   {
-	__glibcxx_requires_cond(_M_sbuf &&
-(!_S_is_eof(_M_c) || !_S_is_eof(_M_sbuf->sgetc())),
+	__glibcxx_requires_cond(!_M_at_eof(),
 _M_message(__gnu_debug::__msg_inc_istreambuf)
 ._M_iterator(*this));
 
 	istreambuf_iterator __old = *this;
 	__old._M_c = _M_sbuf->sbumpc();
 	_M_c = traits_type::eof();
+
+	if (_S_is_eof(_M_sbuf->sgetc()))
+	  _M_sbuf = __old._M_sbuf = 0;
+
 	return __old;
   }
 
@@ -172,12 +180,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return _M_at_eof() == __b._M_at_eof(); }
 
 private:
+  void
+  _M_init()
+  {
+	if (_M_sbuf && _S_is_eof(_M_sbuf->sgetc()))
+	  _M_sbuf = 0;
+  }
+
   int_type
   _M_get() const
   {
 	int_type __ret = _M_c;
-	if (_M_sbuf && _S_is_eof(__ret) && _S_is_eof(__ret = _M_sbuf->sgetc()))
-	  _M_sbuf = 0;
+	if (_M_sbuf && __builtin_expect(_S_is_eof(__ret), true))
+	  __ret = _M_sbuf->sgetc();
+
 	return __ret;
   }
 
@@ -391,10 +407,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		__c = __sb->snextc();
 	}
 
-	  __first._M_c = __eof;
+	  if (!traits_type::eq_int_type(__c, __eof))
+	{
+	  __first._M_c = __eof;
+	  return __first;
+	}
 	}
 
-  return __first;
+  return __last;
 }
 
 // @} group iterators
diff --git a/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/debug/1_neg.cc b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/debug/1_neg.cc
new file mode 100644
index 000..241fc58
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/istreambuf_iterator/debug/1_neg.cc
@@ -0,0 +1,35 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run

Re: [PATCH C++] Fix PR82357 - bogus "cannot bind bitfield" error

2017-10-13 Thread Markus Trippelsdorf
On 2017.10.13 at 12:02 -0400, Jason Merrill wrote:
> On Fri, Oct 13, 2017 at 5:40 AM, Markus Trippelsdorf
>  wrote:
> > r253266 introduced a bogus "cannot bind bitfield" error that breaks
> > building Chromium and Node.js.
> > Fix by removing the ugly goto.
> >
> > Tested on ppc64le.
> > Ok for trunk?
> 
> No, this just undoes my change, so we go back to not doing type
> checking for non-dependent static casts.  Better I think to avoid the
> call to build_static_cast in the first place, by teaching
> stabilize_reference that it can return a NON_DEPENDENT_EXPR unchanged.
> How does this (untested) patch work for you?

It works fine. Thanks.
It would be good to have a testcase that checks the type checking for
non-dependent static casts.
I'll let you handle the current issue.

-- 
Markus


Re: [PATCH 2/2] PR libgcc/59714 complex division is surprising on aarch64

2017-10-13 Thread vladimir . mezentsev
On 10/12/2017 03:40 AM, Richard Earnshaw wrote:
> On 12/10/17 06:21, vladimir.mezent...@oracle.com wrote:
>> From: Vladimir Mezentsev 
>>
>> FMA (floating-point multiply-add) instructions are supported on aarch64.
>> These instructions can produce different result if two operations executed 
>> separately.
>> -ffp-contract=off doesn't allow the FMA instructions.
>>
>> Tested on aarch64-linux-gnu.
>> No regression. Two failed tests now passed.
>>
>> ChangeLog:
>> 2017-10-11  Vladimir Mezentsev  
>>
>> PR libgcc/59714
>> * libgcc/config/aarch64/t-aarch64 (HOST_LIBGCC2_CFLAGS): Add 
>> -ffp-contract=off
>> ---
>>  libgcc/config/aarch64/t-aarch64 | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/libgcc/config/aarch64/t-aarch64 
>> b/libgcc/config/aarch64/t-aarch64
>> index 3af933c..e33bef0 100644
>> --- a/libgcc/config/aarch64/t-aarch64
>> +++ b/libgcc/config/aarch64/t-aarch64
>> @@ -18,4 +18,5 @@
>>  # along with GCC; see the file COPYING3.  If not see
>>  # .
>>  
>> +HOST_LIBGCC2_CFLAGS += -ffp-contract=off
>>  LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c
>>
> Why would we want to do this on AArch64 only?  If it's right for us,
> then surely it would be right for everyone and the option should be
> applied at the top level.

  It is a machine dependent option.
We don't need this option, for example, on sparc or intel machines.

-Vladimir
>
> Hint: I'm not convinced on the evidence here that it is right across the
> whole of libgcc.  Before moving forward on this particular PR I think we
> need to understand what behaviour we do want out of the compiler.
>
> R.



Re: [PATCH v2, middle-end]: Introduce memory_blockage named insn pattern

2017-10-13 Thread Jeff Law
On 09/05/2017 07:50 AM, Uros Bizjak wrote:
> Revised patch, incorporates fixes from Alexander's review comments.
> 
> I removed some implementation details from Alexander's description of
> memory_blockage named pattern.
> 
> 
> 2017-09-05  Uros Bizjak  
> 
> * target-insns.def: Add memory_blockage.
> * optabs.c (expand_memory_blockage): New function.
> (expand_asm_memory_barrier): Rename ...
> (expand_asm_memory_blockage): ... to this.
> (expand_mem_thread_fence): Call expand_memory_blockage
> instead of expand_asm_memory_barrier.
> (expand_mem_singnal_fence): Ditto.
> (expand_atomic_load): Ditto.
> (expand_atomic_store): Ditto.
> * doc/md.texi (Standard Pattern Names For Generation):
> Document memory_blockage instruction pattern.
> 
> Bootstrapped and regression tested together with a followup x86 patch
> on x86_64-linux-gnu {,-m32}.
> 
> OK for mainline?
SO I don't see anything technically wrong here.  But what I don't
understand is the value in providing another way to do the same thing.

I see a patch that introduces calls to gen_memory_blockage in the x86
backend.  But what I don't see is why those couldn't just be
expand_asm_memory_barrier?

Were you planning a x86 specific expander?  If so what advantages does
it have over the asm-style variant?

I feel like I'm missing something here.

jeff


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-10-13 Thread Jeff Law
On 09/06/2017 09:56 AM, Michael Collison wrote:
> Patch updated with all relevant comments and suggestions.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf, and 
> aarch64-none-linux-gnu and x86_64.
> 
> Ok for trunk?
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
> 
>   * compare-elim.c: Include emit-rtl.h.
>   (can_merge_compare_into_arith): New function.
>   (try_validate_parallel): Likewise.
>   (try_merge_compare): Likewise.
>   (try_eliminate_compare): Call the above when no previous clobber
>   is available.
>   (execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
>   dataflow problems.
> 
> 2017-08-05  Kyrylo Tkachov  
>   Michael Collison 
>   
>   * gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.
Sorry for the delay.

This looks good.  OK for the trunk.

jeff


[PATCH,RFC] collect2 LTO for AIX

2017-10-13 Thread David Edelsohn
The attached patch is an incremental step toward GCC LTO on AIX.  The
recent Libiberty Simple Object improvements for XCOFF provide more
capabilities for operations on XCOFF object files, which are a
prerequisite for GCC LTO functionality.

This patch adds the basic LTO scanning pass to the COFF support in
collect2.  I don't believe that this change should affect other COFF
targets adversely (do they even use collect2?), but I wanted to give
people an opportunity to comment.

(Yes, the formatting is messy, but all of collect2.c is messy.)

Bootstrapped on powerpc-ibm-aix7.2.0.0

Thanks, David

* collect2.c (add_lto_object): Compile for OBJECT_COFF.
(scan_prog_file): Don't skip PASS_LTOINFO. Scan for LTO objects.
Index: collect2.c
===
--- collect2.c  (revision 253736)
+++ collect2.c  (working copy)
@@ -614,7 +614,7 @@ static const char *const target_machine = TARGET_M
 
Return 0 if not found, otherwise return its name, allocated with malloc.  */
 
-#ifdef OBJECT_FORMAT_NONE
+#if defined (OBJECT_FORMAT_NONE) || defined (OBJECT_FORMAT_COFF)
 
 /* Add an entry for the object file NAME to object file list LIST.
New entries are added at the end of the list. The original pointer
@@ -634,7 +634,7 @@ add_lto_object (struct lto_object_list *list, cons
 
   list->last = n;
 }
-#endif /* OBJECT_FORMAT_NONE */
+#endif
 
 
 /* Perform a link-time recompilation and relink if any of the object
@@ -2750,8 +2750,10 @@ scan_prog_file (const char *prog_name, scanpass wh
   LDFILE *ldptr = NULL;
   int sym_index, sym_count;
   int is_shared = 0;
+  int found_lto = 0;
 
-  if (which_pass != PASS_FIRST && which_pass != PASS_OBJ)
+  if (which_pass != PASS_FIRST && which_pass != PASS_OBJ
+  && which_pass != PASS_LTOINFO)
 return;
 
 #ifdef COLLECT_EXPORT_LIST
@@ -2764,6 +2766,7 @@ scan_prog_file (const char *prog_name, scanpass wh
  eliminate scan_libraries() function.  */
   do
 {
+  found_lto = 0;
 #endif
   /* Some platforms (e.g. OSF4) declare ldopen as taking a
 non-const char * filename parameter, even though it will not
@@ -2806,6 +2809,19 @@ scan_prog_file (const char *prog_name, scanpass wh
++name;
 #endif
 
+  if (which_pass == PASS_LTOINFO)
+{
+ if (found_lto)
+   continue;
+ if (strncmp (name, "__gnu_lto_v1", 12) == 0)
+   {
+ add_lto_object (

[Patch, fortran] PR81048 - [6/7/8 Regression] incorrect derived type initialization

2017-10-13 Thread Paul Richard Thomas
Dear All,

This patch undoes a side effect of r225447 that had the effect of
eliminating the default intialization of derived type array results.

The patch corrects the offending changes to the condition in resolve_symbol.

Bootstraps and regtests of FC23/x86_64 - OK for trunk, 7- and 6-branches?

Cheers

Paul

2017-10-13  Paul Thomas  

PR fortran/81048
* resolve.c (resolve_symbol): Ensure that derived type array
results get default initialization.

2017-10-13  Paul Thomas  

PR fortran/81048
* gfortran.dg/derived_init_4.f90 : New test.
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 253525)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_symbol (gfc_symbol *sym)
*** 14967,14973 
  
if ((!a->save && !a->dummy && !a->pointer
   && !a->in_common && !a->use_assoc
!  && !a->result && !a->function)
  || (a->dummy && a->intent == INTENT_OUT && !a->pointer))
apply_default_init (sym);
else if (a->function && sym->result && a->access != ACCESS_PRIVATE
--- 14967,14978 
  
if ((!a->save && !a->dummy && !a->pointer
   && !a->in_common && !a->use_assoc
!  && a->referenced
!  && !((a->function || a->result)
!   && (!a->dimension
!   || sym->ts.u.derived->attr.alloc_comp
!   || sym->ts.u.derived->attr.pointer_comp))
!  && !(a->function && sym != sym->result))
  || (a->dummy && a->intent == INTENT_OUT && !a->pointer))
apply_default_init (sym);
else if (a->function && sym->result && a->access != ACCESS_PRIVATE
Index: gcc/testsuite/gfortran.dg/derived_init_4.f90
===
*** gcc/testsuite/gfortran.dg/derived_init_4.f90(nonexistent)
--- gcc/testsuite/gfortran.dg/derived_init_4.f90(working copy)
***
*** 0 
--- 1,59 
+ ! { dg-do run }
+ !
+ ! Test the fix for PR81048, where in the second call to 'g2' the
+ ! default initialization was "forgotten". 'g1', 'g1a' and 'g3' check
+ ! that this does not occur for scalars and explicit results.
+ !
+ ! Contributed by David Smith  
+ !
+ program test
+type f
+integer :: f = -1
+end type
+type(f) :: a, b(3)
+type(f), allocatable :: ans
+b = g2(a)
+b = g2(a)
+ans = g1(a)
+if (ans%f .ne. -1) call abort
+ans = g1(a)
+if (ans%f .ne. -1) call abort
+ans = g1a(a)
+if (ans%f .ne. -1) call abort
+ans = g1a(a)
+if (ans%f .ne. -1) call abort
+b = g3(a)
+b = g3(a)
+ contains
+function g3(a) result(res)
+   type(f) :: a, res(3)
+   do j = 1, 3
+  if (res(j)%f == -1) then
+  res(j)%f = a%f - 1
+  else
+  call abort
+  endif
+   enddo
+end function g3
+ 
+function g2(a)
+   type(f) :: a, g2(3)
+   do j = 1, 3
+  if (g2(j)%f == -1) then
+  g2(j)%f = a%f - 1
+  else
+  call abort
+  endif
+   enddo
+end function g2
+ 
+function g1(a)
+  type(f) :: g1, a
+  if (g1%f .ne. -1 ) call abort
+end function
+ 
+function g1a(a) result(res)
+  type(f) :: res, a
+  if (res%f .ne. -1 ) call abort
+end function
+ end program test


Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-10-13 Thread Jeff Law
On 09/16/2017 03:39 PM, Bernhard Reutner-Fischer wrote:
> On 15 September 2017 18:50:26 CEST, Jeff Law  wrote:
>> On 09/13/2017 03:20 PM, Wilco Dijkstra wrote:
>>> Jeff Law wrote:
 On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
> On 08/30/2017 01:46 PM, Richard Biener wrote:
>>>
>>rdivtmp = 1 / (y*C);
>>tem = x *rdivtmp;
>>tem2= z * rdivtmp;
>>
>> instead of
>>
>>rdivtmp = 1/y;
>>tem = x * 1/C * rdivtmp;
>>tem2 = z * 1/C * rdivtmp;
>
> Ideally we would be able to CSE that into
>
> rdivtmp = 1/y * 1/C;
> tem = x * rdivtmp;
> tem2 = z * rdivtmp;
 So why is your sequence significantly better than Richi's desired
 seqeuence?  They both seem to need 3 mults and a division (which in
>> both
 cases might be a reciprocal estimation).In Richi's sequence we
>> have
 to mult and feed the result as an operand into the reciprocal insn. 
>> In
 yours we feed the result of the reciprocal into the multiply.
>>>
>>> Basically this stuff happens a lot in real code, which is exactly why
>> I proposed it.
>>> I even provided counts of how many divisions each transformation
>> avoids:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026. 
>> I don't doubt that it happens in real code.  There's a reason why MIPS
>> IV added recip and rsqrt 20 years ago.  Our ability to exploit them has
>> always been fairly limited though.
>>
>> What I'm trying to avoid is a transformation where the two forms are
>> roughly equal in terms of what they expose vs what they hide.
>>
>> If pulling the 1/c out is consistently better, then that's obviously
>> good.  If it's essentially a toss-up because of the other interactions
>> with CSE, then we need to think about it more deeply.
>>
>> I _suspect_ that pulling the 1/c out is generally better, but something
>> more concrete than my intuition would be helpful.
>>
>>
>>
>>
>>>
>>> Note this transformation is a canonicalization - if you can't merge a
>>> constant somehow, moving it out to the LHS can expose more
>>> opportunities, like in (C1 * x) / (C2 * y) -> (C1 * x * C2) / y ->
>> (C3 * x) / y.
>>> Same for negation as it behaves like * -1.
>> FWIW, I generally agree.
>>
>>>
>>> The key here is that it is at least an order of magnitude worse if
>> you have
>>> to execute an extra division than if you have an extra multiply.
>> No doubt.  I'll trade a division for a multiply any day.  Similarly 1/C
>> is just a constant, so I consider it essentially free.
>>
>>
>>>
 ISTM in the end if  y*C or 1/(y*C) is a CSE, then Richi's sequence
>> wins.
  Similarly if 1/y is a CSE, then yours wins.  Is there some reason
>> to
 believe that one is a more likely CSE than the other?
>>>
>>> The idea is that 1/y is much more likely a CSE than 1/(y*C).
>> Do we have anything other than intuition to back that up?
>>>
>>> We could make the pattern only fire in single use cases and see
>> whether
>>> that makes a difference. It would be easy to test old vs new vs
>> single-use
>>> new and count how many divisions we end up with. We could add 1/ (y *
>> C)
>>> to the reciprocal phase if it is unacceptable as a canonicalization,
>> but then
>>> you won't be able to optimize (C1 * x * C2) / y.
>> We could.  I think the question would then become does restricting to
>> the single-use case kill too many opportunities.
>>
>> Sigh.  I think the 4 of us could go round and round on this forever in
>> the pursuit of the perfect code.  Though in reality I'm happy with a
>> clear improvement.
> 
> 
> Btw reminds me a little bit of 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417
> which IIRC never was applied. Maybe someone could talk Denys (a colleague of 
> yours nowadays, Jeff) into submitting a real patch? ;)
Denys has been a Red Hatter for a long time (approaching 10 years now I
think).  He's not in the compiler group, but is in the same organization
as the compiler group.

Tege hasn't worked on GCC regularly in years and Denys hasn't ever
really engaged the GCC community all that well.  I wouldn't expect 28417
to move forward without something other than Tege and Denys pushing on it.


jeff


Re: Ping for some "make more use of ..." patches

2017-10-13 Thread Jeff Law
On 09/18/2017 10:29 AM, Richard Sandiford wrote:
> Ping for some minor cleanups that help with the move to variable-length
> modes.  Segher has approved the combine.c parts (thanks).
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01339.html
> Make more use of HWI_COMPUTABLE_MODE_P
OK.

> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01452.html
> Make more use of read_modify_subreg_p
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01343.html
> Make more use of subreg_lowpart_offset
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01344.html
> Make more use of subreg_size_lowpart_offset
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01345.html
> Make more use of byte_lowpart_offset
I think I've covered the rest over the last couple days.

jeff


Re: [PATCH v2, middle-end]: Introduce memory_blockage named insn pattern

2017-10-13 Thread Uros Bizjak
On Fri, Oct 13, 2017 at 7:30 PM, Jeff Law  wrote:
> On 09/05/2017 07:50 AM, Uros Bizjak wrote:
>> Revised patch, incorporates fixes from Alexander's review comments.
>>
>> I removed some implementation details from Alexander's description of
>> memory_blockage named pattern.
>>
>>
>> 2017-09-05  Uros Bizjak  
>>
>> * target-insns.def: Add memory_blockage.
>> * optabs.c (expand_memory_blockage): New function.
>> (expand_asm_memory_barrier): Rename ...
>> (expand_asm_memory_blockage): ... to this.
>> (expand_mem_thread_fence): Call expand_memory_blockage
>> instead of expand_asm_memory_barrier.
>> (expand_mem_singnal_fence): Ditto.
>> (expand_atomic_load): Ditto.
>> (expand_atomic_store): Ditto.
>> * doc/md.texi (Standard Pattern Names For Generation):
>> Document memory_blockage instruction pattern.
>>
>> Bootstrapped and regression tested together with a followup x86 patch
>> on x86_64-linux-gnu {,-m32}.
>>
>> OK for mainline?
> SO I don't see anything technically wrong here.  But what I don't
> understand is the value in providing another way to do the same thing.
>
> I see a patch that introduces calls to gen_memory_blockage in the x86
> backend.  But what I don't see is why those couldn't just be
> expand_asm_memory_barrier?
>
> Were you planning a x86 specific expander?  If so what advantages does
> it have over the asm-style variant?
>
> I feel like I'm missing something here.

Yes: Alexander's patch generates asm-style blockage instruction in the
generic expansion part. If we want to include this instruction in the
peephole2 sequence, we need a different (simple and deterministic)
instruction with known (simple) RTL pattern. Proposed middle end patch
allows us to emit target-dependant UNSPEC pattern instead of a generic
asm-style pattern for the blockage instruction. This way, we can
include target-dependent UNSPEC in peephole2 sequence, as shown in a
follow-up target patch.

Uros.


Re: [Patch, fortran] PR81048 - [6/7/8 Regression] incorrect derived type initialization

2017-10-13 Thread Steve Kargl
On Fri, Oct 13, 2017 at 07:06:57PM +0100, Paul Richard Thomas wrote:
> 
> This patch undoes a side effect of r225447 that had the effect of
> eliminating the default intialization of derived type array results.
> 
> The patch corrects the offending changes to the condition in resolve_symbol.
> 
> Bootstraps and regtests of FC23/x86_64 - OK for trunk, 7- and 6-branches?
> 
> + end program test

Yes.

--  
Steve


[PATCH PR/82546] tree node size

2017-10-13 Thread Nathan Sidwell
[Although I filed this as a middle-end bug, it's really a core infra 
bug, not sure who the best reviewer is]


In working on tree streaming in the modules branch, I discovered poor 
tree node size and hierarchy bits.


Here's a fix for the first part of that.  tree.c (tree_code_size) 
returns sizeof (tree_type_non_common) for any tcc_type node. That's 
wasteful, given we have tree_type_common-> 
tree_type_with_lang_specific-> tree_type_non_common available as 
choices.  It's also obscuring defects in (at least) the c++ FE where we 
use tree_type_non_common fields, but claim the node doesn't contain that 
structure.   Ew.


This patch makes tree_code_size ask the lang hook for the size of 
non-core type nodes.  It also fixes the c++ and objc FEs to return a 
size for the nodes it cares about.


I don't (yet) know whether all the core types are tree_type_non_common, 
nor whether the FE's can return smaller sizes than the do with this 
patch.  But at least the control flow is now correct.  during developing 
this patch I added an assert that the lang hook was returning a size at 
least as big as tree_type_non_common, so they couldn't be more broken 
than before the patch.


I intend to continue cleaning this up of course.  It's not clear to me 
whether we should cache these node sizes in an array, and the way it 
goes about checking nodes with nested switches is understandable, but 
possible not the fastest solution. However let's at least get the sizing 
right first.


ok?

nathan
--
Nathan Sidwell
2017-10-13  Nathan Sidwell  

	PR middle-end/82546
	gcc/
	* tree.c (tree_code_size): Reformat.  Punt to lang hook for unknown
	TYPE nodes.
	gcc/cp/
	* cp-objcp-common.c (cp_tree_size): Reformat.  Adjust returns size
	of TYPE nodes.
	gcc/objc/
	* objc-act.c (objc_common_tree_size): Return size of TYPE nodes.

Index: cp/cp-objcp-common.c
===
--- cp/cp-objcp-common.c	(revision 253733)
+++ cp/cp-objcp-common.c	(working copy)
@@ -61,43 +61,34 @@ cxx_warn_unused_global_decl (const_tree
 size_t
 cp_tree_size (enum tree_code code)
 {
+  gcc_checking_assert (code >= NUM_TREE_CODES);
   switch (code)
 {
-case PTRMEM_CST:		return sizeof (struct ptrmem_cst);
-case BASELINK:		return sizeof (struct tree_baselink);
+case PTRMEM_CST:		return sizeof (ptrmem_cst);
+case BASELINK:		return sizeof (tree_baselink);
 case TEMPLATE_PARM_INDEX:	return sizeof (template_parm_index);
-case DEFAULT_ARG:		return sizeof (struct tree_default_arg);
-case DEFERRED_NOEXCEPT:	return sizeof (struct tree_deferred_noexcept);
-case OVERLOAD:		return sizeof (struct tree_overload);
-case STATIC_ASSERT: return sizeof (struct tree_static_assert);
+case DEFAULT_ARG:		return sizeof (tree_default_arg);
+case DEFERRED_NOEXCEPT:	return sizeof (tree_deferred_noexcept);
+case OVERLOAD:		return sizeof (tree_overload);
+case STATIC_ASSERT: return sizeof (tree_static_assert);
 case TYPE_ARGUMENT_PACK:
-case TYPE_PACK_EXPANSION:
-  return sizeof (struct tree_common);
-
+case TYPE_PACK_EXPANSION:	return sizeof (tree_type_non_common);
 case NONTYPE_ARGUMENT_PACK:
-case EXPR_PACK_EXPANSION:
-  return sizeof (struct tree_exp);
-
-case ARGUMENT_PACK_SELECT:
-  return sizeof (struct tree_argument_pack_select);
-
-case TRAIT_EXPR:
-  return sizeof (struct tree_trait_expr);
-
-case LAMBDA_EXPR:   return sizeof (struct tree_lambda_expr);
-
-case TEMPLATE_INFO: return sizeof (struct tree_template_info);
-
-case CONSTRAINT_INFO:   return sizeof (struct tree_constraint_info);
-
-case USERDEF_LITERAL:	return sizeof (struct tree_userdef_literal);
-
-case TEMPLATE_DECL:		return sizeof (struct tree_template_decl);
-
+case EXPR_PACK_EXPANSION:	return sizeof (tree_exp);
+case ARGUMENT_PACK_SELECT:	return sizeof (tree_argument_pack_select);
+case TRAIT_EXPR:		return sizeof (tree_trait_expr);
+case LAMBDA_EXPR:   return sizeof (tree_lambda_expr);
+case TEMPLATE_INFO: return sizeof (tree_template_info);
+case CONSTRAINT_INFO:   return sizeof (tree_constraint_info);
+case USERDEF_LITERAL:	return sizeof (tree_userdef_literal);
+case TEMPLATE_DECL:		return sizeof (tree_template_decl);
 default:
-  if (TREE_CODE_CLASS (code) == tcc_declaration)
-	return sizeof (struct tree_decl_non_common);
-  gcc_unreachable ();
+  switch (TREE_CODE_CLASS (code))
+	{
+	case tcc_declaration:	return sizeof (tree_decl_non_common);
+	case tcc_type:		return sizeof (tree_type_non_common);
+	default: gcc_unreachable ();
+	}
 }
   /* NOTREACHED */
 }
Index: objc/objc-act.c
===
--- objc/objc-act.c	(revision 253733)
+++ objc/objc-act.c	(working copy)
@@ -10118,11 +10118,14 @@ objc_common_tree_size (enum tree_code co
 case CLASS_METHOD_DECL:
 case INSTANCE_METH

Re: [PATCH v2, middle-end]: Introduce memory_blockage named insn pattern

2017-10-13 Thread Jeff Law
On 10/13/2017 12:27 PM, Uros Bizjak wrote:
> On Fri, Oct 13, 2017 at 7:30 PM, Jeff Law  wrote:
>> On 09/05/2017 07:50 AM, Uros Bizjak wrote:
>>> Revised patch, incorporates fixes from Alexander's review comments.
>>>
>>> I removed some implementation details from Alexander's description of
>>> memory_blockage named pattern.
>>>
>>>
>>> 2017-09-05  Uros Bizjak  
>>>
>>> * target-insns.def: Add memory_blockage.
>>> * optabs.c (expand_memory_blockage): New function.
>>> (expand_asm_memory_barrier): Rename ...
>>> (expand_asm_memory_blockage): ... to this.
>>> (expand_mem_thread_fence): Call expand_memory_blockage
>>> instead of expand_asm_memory_barrier.
>>> (expand_mem_singnal_fence): Ditto.
>>> (expand_atomic_load): Ditto.
>>> (expand_atomic_store): Ditto.
>>> * doc/md.texi (Standard Pattern Names For Generation):
>>> Document memory_blockage instruction pattern.
>>>
>>> Bootstrapped and regression tested together with a followup x86 patch
>>> on x86_64-linux-gnu {,-m32}.
>>>
>>> OK for mainline?
>> SO I don't see anything technically wrong here.  But what I don't
>> understand is the value in providing another way to do the same thing.
>>
>> I see a patch that introduces calls to gen_memory_blockage in the x86
>> backend.  But what I don't see is why those couldn't just be
>> expand_asm_memory_barrier?
>>
>> Were you planning a x86 specific expander?  If so what advantages does
>> it have over the asm-style variant?
>>
>> I feel like I'm missing something here.
> 
> Yes: Alexander's patch generates asm-style blockage instruction in the
> generic expansion part. If we want to include this instruction in the
> peephole2 sequence, we need a different (simple and deterministic)
> instruction with known (simple) RTL pattern. Proposed middle end patch
> allows us to emit target-dependant UNSPEC pattern instead of a generic
> asm-style pattern for the blockage instruction. This way, we can
> include target-dependent UNSPEC in peephole2 sequence, as shown in a
> follow-up target patch.
Ah, you want to use it as a sequence within a peep2 match.  That makes
sense now.

OK for the trunk.

jeff


Re: patch to fix PR82353

2017-10-13 Thread Jeff Law
On 10/12/2017 10:49 AM, Jakub Jelinek wrote:
> Hi!
> 
> On Wed, Oct 11, 2017 at 06:41:05PM -0400, Vladimir Makarov wrote:
>>> Tested on x86_64-linux -m32/-m64, and verified with cc1plus before your
>>> change, ok for trunk?
> 
> BTW, I think it is quite fragile to scan for the reload messages, so I've
> cooked up a runtime test that fails before your patch and succeeds with your
> patch.  Tested on x86_64-linux with -m32/-m64 (both with your patch reverted
> and without), ok for trunk?
> 
> 2017-10-12  Jakub Jelinek  
> 
>   PR sanitizer/82353
>   * g++.dg/ubsan/pr82353-2.C: New test.
>   * g++.dg/ubsan/pr82353-2-aux.cc: New file.
>   * g++.dg/ubsan/pr82353-2.h: New file.
OK.
Jeff


Re: Base subreg rules on REGMODE_NATURAL_SIZE rather than UNITS_PER_WORD

2017-10-13 Thread Jeff Law
On 09/18/2017 05:26 AM, Richard Sandiford wrote:
> Originally subregs operated at the word level and subreg offsets
> were measured in words.  The offset units were later changed from
> words to bytes (SUBREG_WORD became SUBREG_BYTE), but the fundamental
> assumption that subregs should operate at the word level remained.
> Whether (subreg:M1 (reg:M2 R2) N) is well-formed depended on the
> way that M1 and M2 partitioned into words and whether the subword
> part of N represented a lowpart.  However, some questions depended
> instead on the macro REGMODE_NATURAL_SIZE, which was introduced
> as part of the patch that moved from SUBREG_WORD to SUBREG_BYTE.
> It is used to decide whether setting (subreg:M1 (reg:M2 R2) N)
> clobbers all of R2 or just part of it (df_read_modify_subreg).
> 
> Using words doesn't really make sense for modern vector
> architectures.  Vector registers are usually bigger than
> a word and:
> 
> (a) setting the scalar lowpart of them usually clobbers the
> rest of the register (contrary to the subreg rules,
> where only the containing words should be clobbered).
> 
> (b) high words of vector registers are often not independently
> addressable, even though that's what the subreg rules expect.
> 
> This patch therefore uses REGMODE_NATURAL_SIZE instead of
> UNITS_PER_WORD to determine the size of the independently
> addressable blocks in an inner register.
> 
> This is needed for SVE because the number of words in a vector
> mode isn't known at compile time, so isn't a sensible basis
> for calculating the number of registers.
> 
> The only existing port to define REGMODE_NATURAL_SIZE is
> 64-bit SPARC, where FP registers are 32 bits.  (This is the
> opposite of the use case for SVE, since the natural division
> is smaller than a word.)  I compiled the testsuite before and
> after the patch for sparc64-linux-gnu and the only test whose
> assembly changed was g++.dg/debug/pr65678.C, where the order
> of two independent stores was reversed and where a different
> register was picked for one pseudo.  The new code was
> otherwise equivalent to the old code.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory, with only the SPARC differences just mentioned.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-18  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * doc/rtl.texi: Rewrite the subreg rules so that they partition
>   the inner register into REGMODE_NATURAL_SIZE bytes rather than
>   UNITS_PER_WORD bytes.
>   * emit-rtl.c (validate_subreg): Divide subregs into blocks
>   based on REGMODE_NATURAL_SIZE of the inner mode.
>   (gen_lowpart_common): Split the SCALAR_FLOAT_MODE_P and
>   !SCALAR_FLOAT_MODE_P cases.  Use REGMODE_NATURAL_SIZE for the latter.
>   * expr.c (store_constructor): Use REGMODE_NATURAL_SIZE to test
>   whether something is likely to occupy more than one register.
OK.  Though I must admit, I expected a larger change after reading the
intro.

jeff


Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-10-13 Thread Wilco Dijkstra
Jeff Law wrote:

>> Btw reminds me a little bit of  
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417

> I wouldn't expect 28417
> to move forward without something other than Tege and Denys pushing on it.

Hmm that doesn't look optimal. You can typically do a multiply with the magic
constant arranged in such a way that on 32-bit targets you don't need a shift 
at all.

There isn't much to the proof either, code like this is always self-proving: it 
tries
several possible magic constants until a solution is found that has no error 
for the
max input. Given there are usually several valid solutions, you can then select 
the
best shift/magic value combination.

Wilco

Re: [Patch, fortran] PR81048 - [6/7/8 Regression] incorrect derived type initialization

2017-10-13 Thread Paul Richard Thomas
Good you caught the missing LF. Thanks!

Paul

On 13 October 2017 at 19:29, Steve Kargl
 wrote:
> On Fri, Oct 13, 2017 at 07:06:57PM +0100, Paul Richard Thomas wrote:
>>
>> This patch undoes a side effect of r225447 that had the effect of
>> eliminating the default intialization of derived type array results.
>>
>> The patch corrects the offending changes to the condition in resolve_symbol.
>>
>> Bootstraps and regtests of FC23/x86_64 - OK for trunk, 7- and 6-branches?
>>
>> + end program test
>
> Yes.
>
> --
> Steve



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: SUBREG_PROMOTED_VAR_P handling in expand_direct_optab_fn

2017-10-13 Thread Jeff Law
On 09/20/2017 06:21 AM, Richard Sandiford wrote:
> This is needed by the later SVE LAST reductions, where an 8-bit
> or 16-bit result is zero- rather than sign-extended to 32 bits.
> I think it could occur in other situations too.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linus-gnu.
> OK to install?
> 
> Richard
> 
> 
> 2017-09-20  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * internal-fn.c (expand_direct_optab_fn): Don't assign directly
>   to a SUBREG_PROMOTED_VAR.
OK.
jeff


Re: [PATCH C++] Fix PR82357 - bogus "cannot bind bitfield" error

2017-10-13 Thread Jason Merrill
On Fri, Oct 13, 2017 at 1:21 PM, Markus Trippelsdorf
 wrote:
> On 2017.10.13 at 12:02 -0400, Jason Merrill wrote:
>> On Fri, Oct 13, 2017 at 5:40 AM, Markus Trippelsdorf
>>  wrote:
>> > r253266 introduced a bogus "cannot bind bitfield" error that breaks
>> > building Chromium and Node.js.
>> > Fix by removing the ugly goto.
>> >
>> > Tested on ppc64le.
>> > Ok for trunk?
>>
>> No, this just undoes my change, so we go back to not doing type
>> checking for non-dependent static casts.  Better I think to avoid the
>> call to build_static_cast in the first place, by teaching
>> stabilize_reference that it can return a NON_DEPENDENT_EXPR unchanged.
>> How does this (untested) patch work for you?
>
> It works fine. Thanks.
> It would be good to have a testcase that checks the type checking for
> non-dependent static casts.
> I'll let you handle the current issue.

Done.
commit d570081ae5d9afd421acccb10eef82baab2b35da
Author: Jason Merrill 
Date:   Fri Oct 13 12:31:21 2017 -0400

PR c++/82357 - bit-field in template

* tree.c (cp_stabilize_reference): Just return a NON_DEPENDENT_EXPR.

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index e21ff6a1572..366f46f1506 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -333,6 +333,10 @@ cp_stabilize_reference (tree ref)
 {
   switch (TREE_CODE (ref))
 {
+case NON_DEPENDENT_EXPR:
+  /* We aren't actually evaluating this.  */
+  return ref;
+
 /* We need to treat specially anything stabilize_reference doesn't
handle specifically.  */
 case VAR_DECL:
diff --git a/gcc/testsuite/g++.dg/template/bitfield4.C 
b/gcc/testsuite/g++.dg/template/bitfield4.C
new file mode 100644
index 000..4927b7ab144
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/bitfield4.C
@@ -0,0 +1,6 @@
+// PR c++/82357
+
+template  struct A {
+  A() { x |= 0; }
+  int x : 8;
+};
diff --git a/gcc/testsuite/g++.dg/template/cast4.C 
b/gcc/testsuite/g++.dg/template/cast4.C
new file mode 100644
index 000..2f46c7189eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/cast4.C
@@ -0,0 +1,4 @@
+template  void f()
+{
+  static_cast(42);   // { dg-error "static_cast" }
+}


Re: Check that there are no missing probabilities

2017-10-13 Thread David Edelsohn
This patch also caused a huge number of testsuite failures on PowerPC,
although it didn't break bootstrap.

Thanks, David


Re: Check that there are no missing probabilities

2017-10-13 Thread Jan Hubicka
> On Fri, Oct 13, 2017 at 03:38:33PM +0200, Jan Hubicka wrote:
> > Hi,
> > this patch enables check that no edge probabilities are missing. 
> > 
> > Honza
> > 
> > * cfghooks.c (verify_flow_info): Check that edge probabilities are
> > set.
> 
> This broke bootstrap on x86_64-linux with Ada
> (--enable-checking=yes,rtl,extra).
> 
> >From what I can see, decompose_multiword_subregs has:
> 1619/* Split the block after insn.  There will be a 
> fallthru
> 1620   edge, which is OK so we keep it.  We have to 
> create the
> 1621   exception edges ourselves.  */
> 1622fallthru = split_block (bb, insn);
> 1623rtl_make_eh_edge (NULL, bb, BB_END (bb));
> 1624bb = fallthru->dest;
> 1625insn = BB_HEAD (bb);
> and rtl_make_eh_edge calls
> 161 make_label_edge (edge_cache, src, label,
> 162  EDGE_ABNORMAL | EDGE_EH
> 163  | (CALL_P (insn) ? EDGE_ABNORMAL_CALL : 0));
> 
> No idea what should initialize the probabilities.  Do probabilities make
> any sense at all for EH edges (or abnormal edges)?

For EH we should set it to profile_probability::zero () because we know it is 
unlikely
path.   I will take a look.

Honza
> 
> > 
> > Index: cfghooks.c
> > ===
> > --- cfghooks.c  (revision 253694)
> > +++ cfghooks.c  (working copy)
> > @@ -160,6 +161,13 @@ verify_flow_info (void)
> >  e->src->index, e->dest->index);
> >   err = 1;
> > }
> > + if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
> > + && !e->probability.initialized_p ())
> > +   {
> > + error ("Uninitialized probability of edge %i->%i", e->src->index,
> > +e->dest->index);
> > + err = 1;
> > +   }
> >   if (!e->probability.verify ())
> > {
> >   error ("verify_flow_info: Wrong probability of edge %i->%i",
> 
>   Jakub


[committed][PATCH] Trivial cleanup in tree-ssa-reassoc.c

2017-10-13 Thread Jeff Law

Jakub spotted this wart in my last change.We have already tested
that we do not have 0 or 1 ops.  So != 2 is more clearly written as > 2.

Bootstrapped and regression tested on x86_64.  Committed to the trunk.

Jeff
commit 04acc76e9d46299f5251bf9f495d1b7688d7907f
Author: law 
Date:   Fri Oct 13 19:12:05 2017 +

* tree-ssa-reassoc.c (reassociate_bb): Clarify code slighly.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253740 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b593718fa6f..56383ef72f0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-10-13  Jeff Law  
+
+   * tree-ssa-reassoc.c (reassociate_bb): Clarify code slighly.
+
 2017-10-13  Jakub Jelinek  
 
PR target/82274
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index cc57ae320a3..e0e64e16eba 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5910,7 +5910,7 @@ reassociate_bb (basic_block bb)
 move it to the front.  This helps ensure that we generate
 (X & Y) & C rather than (X & C) & Y.  The former will
 often match a canonical bit test when we get to RTL.  */
- if (ops.length () != 2
+ if (ops.length () > 2
  && (rhs_code == BIT_AND_EXPR
  || rhs_code == BIT_IOR_EXPR
  || rhs_code == BIT_XOR_EXPR)


Re: Check that there are no missing probabilities

2017-10-13 Thread Jakub Jelinek
On Fri, Oct 13, 2017 at 09:06:55PM +0200, Jan Hubicka wrote:
> For EH we should set it to profile_probability::zero () because we know it is 
> unlikely
> path.   I will take a look.

With the

--- gcc/cfghooks.c.jj   2017-10-13 18:27:12.0 +0200
+++ gcc/cfghooks.c  2017-10-13 19:15:11.444650533 +0200
@@ -162,6 +162,7 @@ verify_flow_info (void)
  err = 1;
}
  if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
+ && (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_FAKE)) == 0
  && !e->probability.initialized_p ())
{
  error ("Uninitialized probability of edge %i->%i", e->src->index,

hack x86_64-linux and i686-linux bootstrapped fine, but I see still many
graphite related regressions:

/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
Uninitialized probability of edge 41->17
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
Uninitialized probability of edge 44->41
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
Uninitialized probability of edge 36->21
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
Uninitialized probability of edge 29->36
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
Uninitialized probability of edge 32->29
during GIMPLE pass: graphite
dump file: id-16.c.150t.graphite
/home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: internal 
compiler error: verify_flow_info failed
0xafac1a verify_flow_info()
../../gcc/cfghooks.c:268
0xf2a624 checking_verify_flow_info
../../gcc/cfghooks.h:198
0xf2a624 cleanup_tree_cfg_noloop
../../gcc/tree-cfgcleanup.c:901
0xf2a624 cleanup_tree_cfg()
../../gcc/tree-cfgcleanup.c:952
0x162df85 graphite_transform_loops()
../../gcc/graphite.c:422
0x162f0c0 graphite_transforms
../../gcc/graphite.c:447
0x162f0c0 execute
../../gcc/graphite.c:524

So probably graphite needs to be tweaked for this too.

Jakub


[C/C++ PATCH] Handle rotates like shifts

2017-10-13 Thread Jakub Jelinek
Hi!

I've noticed that for {L,R}ROTATE_EXPR created during GENERIC folding
we end up with e.g. long int etc. second arguments, while for shifts
we truncate those to unsigned int.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-10-13  Jakub Jelinek  

* c-gimplify.c (c_gimplify_expr): Handle [LR]ROTATE_EXPR like
[LR]SHIFT_EXPR.

--- gcc/c-family/c-gimplify.c.jj2017-05-22 10:49:37.0 +0200
+++ gcc/c-family/c-gimplify.c   2017-10-13 10:20:55.844841677 +0200
@@ -229,6 +229,8 @@ c_gimplify_expr (tree *expr_p, gimple_se
 {
 case LSHIFT_EXPR:
 case RSHIFT_EXPR:
+case LROTATE_EXPR:
+case RROTATE_EXPR:
   {
/* We used to convert the right operand of a shift-expression
   to an integer_type_node in the FEs.  But it is unnecessary

Jakub


Re: Check that there are no missing probabilities

2017-10-13 Thread Jan Hubicka
> On Fri, Oct 13, 2017 at 09:06:55PM +0200, Jan Hubicka wrote:
> > For EH we should set it to profile_probability::zero () because we know it 
> > is unlikely
> > path.   I will take a look.
> 
> With the
> 
> --- gcc/cfghooks.c.jj 2017-10-13 18:27:12.0 +0200
> +++ gcc/cfghooks.c2017-10-13 19:15:11.444650533 +0200
> @@ -162,6 +162,7 @@ verify_flow_info (void)
> err = 1;
>   }
> if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
> +   && (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_FAKE)) == 0
> && !e->probability.initialized_p ())
>   {
> error ("Uninitialized probability of edge %i->%i", e->src->index,

We should set probability of those edges to profile_probability::zero so we do 
not
need to special case them at all places we check for profile.  I fixed many 
occurences
of bugs here but I see there are more.
I will disable the check for now and take a look incrementally next week.

Honza
> 
> hack x86_64-linux and i686-linux bootstrapped fine, but I see still many
> graphite related regressions:
> 
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 41->17
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 44->41
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 36->21
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 29->36
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: error: 
> Uninitialized probability of edge 32->29
> during GIMPLE pass: graphite
> dump file: id-16.c.150t.graphite
> /home/jakub/src/gcc/gcc/testsuite/gcc.dg/graphite/id-16.c:15:1: internal 
> compiler error: verify_flow_info failed
> 0xafac1a verify_flow_info()
> ../../gcc/cfghooks.c:268
> 0xf2a624 checking_verify_flow_info
> ../../gcc/cfghooks.h:198
> 0xf2a624 cleanup_tree_cfg_noloop
> ../../gcc/tree-cfgcleanup.c:901
> 0xf2a624 cleanup_tree_cfg()
> ../../gcc/tree-cfgcleanup.c:952
> 0x162df85 graphite_transform_loops()
> ../../gcc/graphite.c:422
> 0x162f0c0 graphite_transforms
> ../../gcc/graphite.c:447
> 0x162f0c0 execute
> ../../gcc/graphite.c:524
> 
> So probably graphite needs to be tweaked for this too.
> 
>   Jakub


[PATCH] Improve simplify_rotate (PR middle-end/62263, PR middle-end/82498)

2017-10-13 Thread Jakub Jelinek
Hi!

The forwprop rotate pattern recognizer is able to detect various
patterns, but for the case where we want to support all rotate
counts without UB, it requires
Y &= B - 1;
R = (X << Y) | (X >> ((-Y) & (B - 1)));
where
R = (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)));
is another reasonable way to write the same thing (don't mask it
for the case of negation twice).

The following patch teaches forwprop to handle that.

Also, we weren't recognizing rotates of constant X by variable
shift count Y.

And finally, I've noticed there is a missing check that B is a power of
two, which matters for the & (B - 1) style patterns - if it is
not a pow2p, then it isn't doing what we expect it to be.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-13  Jakub Jelinek  

PR middle-end/62263
PR middle-end/82498
* tree-ssa-forwprop.c (simplify_rotate): Allow def_arg1[N]
to be any operand_equal_p operands.  For & (B - 1) require
B to be power of 2.  Recognize
(X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1))) and similar patterns.

* c-c++-common/rotate-5.c (f2): New function.  Move old
function to ...
(f4): ... this.  Use 127 instead of 128.
(f3, f5, f6): New functions.
(main): Test all f[1-6] functions, with both 0 and 1 as
second arguments.
* c-c++-common/rotate-6.c: New test.
* c-c++-common/rotate-6a.c: New test.
* c-c++-common/rotate-7.c: New test.
* c-c++-common/rotate-7a.c: New test.
* c-c++-common/rotate-8.c: New test.

--- gcc/tree-ssa-forwprop.c.jj  2017-09-14 22:15:12.0 +0200
+++ gcc/tree-ssa-forwprop.c 2017-10-13 14:35:31.763191645 +0200
@@ -1491,9 +1491,14 @@ defcodefor_name (tree name, enum tree_co
applied, otherwise return false.
 
We are looking for X with unsigned type T with bitsize B, OP being
-   +, | or ^, some type T2 wider than T and
+   +, | or ^, some type T2 wider than T.  For:
(X << CNT1) OP (X >> CNT2)  iff CNT1 + CNT2 == B
((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))iff CNT1 + CNT2 == B
+
+   transform these into:
+   X r<< CNT1
+
+   Or for:
(X << Y) OP (X >> (B - Y))
(X << (int) Y) OP (X >> (int) (B - Y))
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
@@ -1503,12 +1508,23 @@ defcodefor_name (tree name, enum tree_co
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1
 
-   and transform these into:
-   X r<< CNT1
+   transform these into:
X r<< Y
 
+   Or for:
+   (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
+   (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
+   ((T) ((T2) X << (Y & (B - 1 | ((T) ((T2) X >> ((-Y) & (B - 1
+   ((T) ((T2) X << (int) (Y & (B - 1 \
+ | ((T) ((T2) X >> (int) ((-Y) & (B - 1
+
+   transform these into:
+   X r<< (Y & (B - 1))
+
Note, in the patterns with T2 type, the type of OP operands
-   might be even a signed type, but should have precision B.  */
+   might be even a signed type, but should have precision B.
+   Expressions with & (B - 1) should be recognized only if B is
+   a power of 2.  */
 
 static bool
 simplify_rotate (gimple_stmt_iterator *gsi)
@@ -1578,7 +1594,9 @@ simplify_rotate (gimple_stmt_iterator *g
def_arg1[i] = tem;
   }
   /* Both shifts have to use the same first operand.  */
-  if (TREE_CODE (def_arg1[0]) != SSA_NAME || def_arg1[0] != def_arg1[1])
+  if (!operand_equal_for_phi_arg_p (def_arg1[0], def_arg1[1])
+  || !types_compatible_p (TREE_TYPE (def_arg1[0]),
+ TREE_TYPE (def_arg1[1])))
 return false;
   if (!TYPE_UNSIGNED (TREE_TYPE (def_arg1[0])))
 return false;
@@ -1649,8 +1667,10 @@ simplify_rotate (gimple_stmt_iterator *g
/* The above sequence isn't safe for Y being 0,
   because then one of the shifts triggers undefined behavior.
   This alternative is safe even for rotation count of 0.
-  One shift count is Y and the other (-Y) & (B - 1).  */
+  One shift count is Y and the other (-Y) & (B - 1).
+  Or one shift count is Y & (B - 1) and the other (-Y) & (B - 1).  */
else if (cdef_code[i] == BIT_AND_EXPR
+&& pow2p_hwi (TYPE_PRECISION (rtype))
 && tree_fits_shwi_p (cdef_arg2[i])
 && tree_to_shwi (cdef_arg2[i])
== TYPE_PRECISION (rtype) - 1
@@ -1675,17 +1695,50 @@ simplify_rotate (gimple_stmt_iterator *g
rotcnt = tem;
break;
  }
-   defcodefor_name (tem, &code, &tem, NULL);
+   tree tem2;
+   defcodefor_name (tem, &code, &tem2, NULL);
if (CONVERT_EXPR_CODE_P (code)
-   && INTEGRAL_TYPE_P (TREE_TYPE (tem))
-   && TYPE_PRECISION (TREE_TYPE (tem))
+   && INTEGRAL

[PATCH] Slightly improve phiopt value_replacement optimization (PR middle-end/62263, PR middle-end/82498)

2017-10-13 Thread Jakub Jelinek
Hi!

First of all, there was a typo, we are optimizing
(x != 0) ? x + y : y
into x + y rather than y.

And, as the comment mentions, the condition that there is just a single
stmt is too restrictive and in the various patterns posted in the various
rotate related PRs there are cases where in addition to the last
assign stmt there are some preparation statements (sometimes conversion,
sometimes bit masking with constant, sometimes both).
I think it is undesirable to turn the conditional code into always executed
one if it is too large, so this patch allows just very few very cheap
preparation statements which feed each other and it is easily possible to
propagate the cond_rhs constant through those statements to compute what
the result from those would be (and make sure no UB).

With this patch on top of the previous one, on rotate-8.c testcase
on x86_64-linux and i686-linux we get the most efficient code in all cases
- just a rol/ror instruction with perhaps some moves into registers needed
to perform that, but without any conditionals, masking etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-10-13  Jakub Jelinek  

PR middle-end/62263
PR middle-end/82498
* tree-ssa-phiopt.c (value_replacement): Comment fix.  Handle
up to 2 preparation statements for ASSIGN in MIDDLE_BB.

* c-c++-common/rotate-8.c: Expect no PHIs in optimized dump.

--- gcc/tree-ssa-phiopt.c.jj2017-10-10 22:04:08.0 +0200
+++ gcc/tree-ssa-phiopt.c   2017-10-13 17:55:01.578450763 +0200
@@ -995,11 +995,13 @@ value_replacement (basic_block cond_bb,
 
 }
 
-  /* Now optimize (x != 0) ? x + y : y to just y.
- The following condition is too restrictive, there can easily be another
- stmt in middle_bb, for instance a CONVERT_EXPR for the second argument.  
*/
-  gimple *assign = last_and_only_stmt (middle_bb);
-  if (!assign || gimple_code (assign) != GIMPLE_ASSIGN
+  /* Now optimize (x != 0) ? x + y : y to just x + y.  */
+  gsi = gsi_last_nondebug_bb (middle_bb);
+  if (gsi_end_p (gsi))
+return 0;
+
+  gimple *assign = gsi_stmt (gsi);
+  if (!is_gimple_assign (assign)
   || gimple_assign_rhs_class (assign) != GIMPLE_BINARY_RHS
   || (!INTEGRAL_TYPE_P (TREE_TYPE (arg0))
  && !POINTER_TYPE_P (TREE_TYPE (arg0
@@ -1009,6 +1011,71 @@ value_replacement (basic_block cond_bb,
   if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
 return 0;
 
+  /* Allow up to 2 cheap preparation statements that prepare argument
+ for assign, e.g.:
+  if (y_4 != 0)
+   goto ;
+  else
+   goto ;
+ :
+  _1 = (int) y_4;
+  iftmp.0_6 = x_5(D) r<< _1;
+ :
+  # iftmp.0_2 = PHI 
+ or:
+  if (y_3(D) == 0)
+   goto ;
+  else
+   goto ;
+ :
+  y_4 = y_3(D) & 31;
+  _1 = (int) y_4;
+  _6 = x_5(D) r<< _1;
+ :
+  # _2 = PHI   */
+  gimple *prep_stmt[2] = { NULL, NULL };
+  int prep_cnt;
+  for (prep_cnt = 0; ; prep_cnt++)
+{
+  gsi_prev_nondebug (&gsi);
+  if (gsi_end_p (gsi))
+   break;
+
+  gimple *g = gsi_stmt (gsi);
+  if (gimple_code (g) == GIMPLE_LABEL)
+   break;
+
+  if (prep_cnt == 2 || !is_gimple_assign (g))
+   return 0;
+
+  tree lhs = gimple_assign_lhs (g);
+  tree rhs1 = gimple_assign_rhs1 (g);
+  use_operand_p use_p;
+  gimple *use_stmt;
+  if (TREE_CODE (lhs) != SSA_NAME
+ || TREE_CODE (rhs1) != SSA_NAME
+ || !INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+ || !INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
+ || !single_imm_use (lhs, &use_p, &use_stmt)
+ || use_stmt != (prep_cnt ? prep_stmt[prep_cnt - 1] : assign))
+   return 0;
+  switch (gimple_assign_rhs_code (g))
+   {
+   CASE_CONVERT:
+ break;
+   case PLUS_EXPR:
+   case BIT_AND_EXPR:
+   case BIT_IOR_EXPR:
+   case BIT_XOR_EXPR:
+ if (TREE_CODE (gimple_assign_rhs2 (g)) != INTEGER_CST)
+   return 0;
+ break;
+   default:
+ return 0;
+   }
+  prep_stmt[prep_cnt] = g;
+}
+
   /* Only transform if it removes the condition.  */
   if (!single_non_singleton_phi_for_edges (phi_nodes (gimple_bb (phi)), e0, 
e1))
 return 0;
@@ -1019,7 +1086,7 @@ value_replacement (basic_block cond_bb,
   && profile_status_for_fn (cfun) != PROFILE_ABSENT
   && EDGE_PRED (middle_bb, 0)->probability < profile_probability::even ()
   /* If assign is cheap, there is no point avoiding it.  */
-  && estimate_num_insns (assign, &eni_time_weights)
+  && estimate_num_insns (bb_seq (middle_bb), &eni_time_weights)
 >= 3 * estimate_num_insns (cond, &eni_time_weights))
 return 0;
 
@@ -1030,6 +1097,32 @@ value_replacement (basic_block cond_bb,
   tree cond_lhs = gimple_cond_lhs (cond);
   tree cond_rhs = gimple_cond_rhs (cond);
 
+  /* Propagate the cond_rhs constant through preparation stmts,
+ make sure UB isn't invoked while doing that.  */
+  

Re: [C/C++ PATCH] Handle rotates like shifts

2017-10-13 Thread Jeff Law
On 10/13/2017 01:30 PM, Jakub Jelinek wrote:
> Hi!
> 
> I've noticed that for {L,R}ROTATE_EXPR created during GENERIC folding
> we end up with e.g. long int etc. second arguments, while for shifts
> we truncate those to unsigned int.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2017-10-13  Jakub Jelinek  
> 
>   * c-gimplify.c (c_gimplify_expr): Handle [LR]ROTATE_EXPR like
>   [LR]SHIFT_EXPR.
OK.
jeff


Re: [PATCH] Document -fdump-tree-vect option

2017-10-13 Thread Sandra Loosemore

On 10/13/2017 06:35 AM, Jonathan Wakely wrote:

This adds an item to the list of options for the -fdump-tree option,
because we show an example using 'vect' but don't document it.

OK for trunk?


No, I think this patch is addressing an imaginary bug.  I see no "vect" 
option listed in the dump_options array in dumpfile.c.


I think this is the example in question:

@smallexample
gcc -O2 -ftree-vectorize -fdump-tree-vect-blocks=foo.dump
 -fdump-tree-pre=/dev/stderr file.c
@end smallexample

outputs vectorizer dump into @file{foo.dump}, while the PRE dump is
output on to @file{stderr}. If two conflicting dump filenames are
given for the same pass, then the latter option overrides the earlier
one.

I think the "vect" in this example is part of the pass name to be 
dumped, not an option modifying the dump output.  E.g. this is explicit 
in an older version of the GCC manual


https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Debugging-Options.html#Debugging-Options

which includes the example AND documents "vect" as one of the tree dumps 
you can ask for.  The current version of this documentation doesn't 
explicitly list all the tree dumps any more, but GCC does still accept 
-fdump-tree-vect.



Is there any logic to the order of those options? Would it makes sense
to order them alphabetically?


My advice would be to move the =filename stuff out of the table (it's 
not really an option like the others) and refactor it into a paragraph 
before the option discussion, and in the table list "all" and "optall" 
first and then alphabetize the remaining entries (and make sure 
everything in the dump_options array is documented).


-Sandra



Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

2017-10-13 Thread Wilco Dijkstra
Hi,

To continue the review of the AArch64 frame code I tried a few examples
to figure out what it does now. For initial_adjust <= 63*1024 and final_adjust <
1024 there are no probes inserted as expected, ie. the vast majority of
functions are unaffected. So that works perfectly.

For larger frames the first oddity is that there are now 2 separate params
controlling how probes are generated:

stack-clash-protection-guard-size (default 12, but set to 16 on AArch64)
stack-clash-protection-probe-interval (default 12)

I don't see how this makes sense. These values are closely related, so if
one is different from the other, probing becomes ineffective/incorrect. 
For example we generate code that trivially bypasses the guard despite
all the probing:

--param=stack-clash-protection-probe-interval=13
--param=stack-clash-protection-guard-size=12

So if there is a good reason to continue with 2 separate values, we must
force probe interval <= guard size!

Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes
crashes due to the offsets used in the probes - we don't need large offsets
as we want to probe close to the bottom of the stack.

Functions with a large stack emit like alloca a lot of code, here I used
--param=stack-clash-protection-probe-interval=15:

int f1(int x)
{
  char arr[128*1024];
  return arr[x];
}

f1:
mov x16, 64512
sub sp, sp, x16
.cfi_def_cfa_offset 64512
mov x16, -32768
add sp, sp, x16
.cfi_def_cfa_offset -1024
str xzr, [sp, 32760]
add sp, sp, x16
.cfi_def_cfa_offset -66560
str xzr, [sp, 32760]
sub sp, sp, #1024
.cfi_def_cfa_offset -65536
str xzr, [sp, 1016]
ldrbw0, [sp, w0, sxtw]
.cfi_def_cfa_offset 131072
add sp, sp, 131072
.cfi_def_cfa_offset 0
ret

Note the cfa offsets are wrong.

There is an odd mix of a big initial adjustment, then some probes+adjustments 
and
then a final adjustment and probe for the remainder. I can't see the point of 
having
both an initial and remainder adjustment. I would expect this:

sub sp, sp, 65536
str xzr, [sp, 1024]
sub sp, sp, 65536
str xzr, [sp, 1024]
ldrbw0, [sp, w0, sxtw]
add sp, sp, 131072
ret


int f2(int x)
{
  char arr[128*1024];
  return arr[x];
}

f2:
mov x16, 64512
sub sp, sp, x16
mov x16, -65536
movkx16, 0xfffd, lsl 16
add x16, sp, x16
.LPSRL0:
sub sp, sp, 4096
str xzr, [sp, 4088]
cmp sp, x16
b.ne.LPSRL0
sub sp, sp, #1024
str xzr, [sp, 1016]
ldrbw0, [sp, w0, sxtw]
add sp, sp, 262144
ret

The cfa entries are OK for this case. There is a mix of positive/negative 
offsets which
makes things confusing. Again there are 3 kinds of adjustments when for this 
size we
only need the loop.

Reusing the existing gen_probe_stack_range code appears a bad idea since
it ignores the probe interval and just defaults to 4KB. I don't see why it 
should be
any more complex than this:

sub x16, sp, 262144  // only need temporary if > 1MB
.LPSRL0:
sub sp, sp, 65536
str xzr, [sp, 1024]
cmp sp, x16
b.ne.LPSRL0
ldrbw0, [sp, w0, sxtw]
add sp, sp, 262144
ret

Probe insertion if final adjustment >= 1024 also generates a lot of redundant
code - although this is more a theoretical issue given this is so rare.

Wilco


Zen tuning part 8: Fix rtx costs.

2017-10-13 Thread Jan Hubicka
Hi,
this patch fixes costs of basic operations for Zen. It also models SSE more
carefully.  ix86_rtx_costs is still based on x87 costs of operations which is
not very realistic today when x87 and SSE costs are often quite different.

The latencies in this patch are based on Agner Fog's values and I hope they
match reality for all supported CPUs.  Costs of vector operations are still
off (and as is the vectorizer costmodel) I will look into that incrementally.

Bootstrapped/regtested x86_64-linux, will commit it tomorrow after periodic
testers pick today changes.

Bootstrapped/regtested x86_64-linux.
Honza

* i386.c (ix86_rtx_costs): Make difference between x87 and SSE
operations.
* i386.h (struct processor_costs): Add addss, mulss, mulsd, divss,
divsd, sqrtss and sqrtsd
* x86-tune-costs.h: Add new entries to all costs.
(znver1_cost): Fix to match real instruction latencies.
Index: i386.c
===
--- i386.c  (revision 253694)
+++ i386.c  (working copy)
@@ -38812,6 +38812,9 @@ ix86_rtx_costs (rtx x, machine_mode mode
   enum rtx_code outer_code = (enum rtx_code) outer_code_i;
   const struct processor_costs *cost = speed ? ix86_cost : &ix86_size_cost;
   int src_cost;
+  machine_mode inner_mode = mode;
+  if (VECTOR_MODE_P (mode))
+inner_mode = GET_MODE_INNER (mode);
 
   switch (code)
 {
@@ -39012,7 +39015,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
 
 /* ??? SSE scalar/vector cost should be used here.  */
 /* ??? Bald assumption that fma has the same cost as fmul.  */
-*total = cost->fmul;
+*total = mode == SFmode ? cost->mulss : cost->mulsd;
*total += rtx_cost (XEXP (x, 1), mode, FMA, 1, speed);
 
 /* Negate in op0 or op2 is free: FMS, FNMA, FNMS.  */
@@ -39031,8 +39034,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
 case MULT:
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
{
- /* ??? SSE scalar cost should be used here.  */
- *total = cost->fmul;
+ *total = inner_mode == DFmode ? cost->mulsd : cost->mulss;
  return false;
}
   else if (X87_FLOAT_MODE_P (mode))
@@ -39043,7 +39045,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
   else if (FLOAT_MODE_P (mode))
{
  /* ??? SSE vector cost should be used here.  */
- *total = cost->fmul;
+ *total = inner_mode == DFmode ? cost->mulsd : cost->mulss;
  return false;
}
   else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
@@ -39071,7 +39073,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
  else if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX))
*total = cost->fmul * 2 + cost->fabs * 5;
  else
-   *total = cost->fmul;
+   *total = inner_mode == DFmode ? cost->mulsd : cost->mulss;
  return false;
}
   else
@@ -39125,13 +39127,12 @@ ix86_rtx_costs (rtx x, machine_mode mode
 case MOD:
 case UMOD:
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
-   /* ??? SSE cost should be used here.  */
-   *total = cost->fdiv;
+   *total = inner_mode == DFmode ? cost->divsd : cost->divss;
   else if (X87_FLOAT_MODE_P (mode))
*total = cost->fdiv;
   else if (FLOAT_MODE_P (mode))
/* ??? SSE vector cost should be used here.  */
-   *total = cost->fdiv;
+   *total = inner_mode == DFmode ? cost->divsd : cost->divss;
   else
*total = cost->divide[MODE_INDEX (mode)];
   return false;
@@ -39210,8 +39211,7 @@ ix86_rtx_costs (rtx x, machine_mode mode
 
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
{
- /* ??? SSE cost should be used here.  */
- *total = cost->fadd;
+ *total = cost->addss;
  return false;
}
   else if (X87_FLOAT_MODE_P (mode))
@@ -39221,8 +39221,8 @@ ix86_rtx_costs (rtx x, machine_mode mode
}
   else if (FLOAT_MODE_P (mode))
{
- /* ??? SSE vector cost should be used here.  */
- *total = cost->fadd;
+ /* We should account if registers are split.  */
+ *total = cost->addss;
  return false;
}
   /* FALLTHRU */
@@ -39317,13 +39317,12 @@ ix86_rtx_costs (rtx x, machine_mode mode
 
 case SQRT:
   if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)
-   /* ??? SSE cost should be used here.  */
-   *total = cost->fsqrt;
+   *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
   else if (X87_FLOAT_MODE_P (mode))
*total = cost->fsqrt;
   else if (FLOAT_MODE_P (mode))
/* ??? SSE vector cost should be used here.  */
-   *total = cost->fsqrt;
+   *total = mode == SFmode ? cost->sqrtss : cost->sqrtsd;
   return false;
 
 case UNSPEC:
Index: i386.h
===
--- i386.h  (revision 253694)
+++ i386.h  (working copy)
@@

Re: Check that there are no missing probabilities

2017-10-13 Thread Andrew Pinski
On Fri, Oct 13, 2017 at 6:38 AM, Jan Hubicka  wrote:
> Hi,
> this patch enables check that no edge probabilities are missing.

This caused a bootstrap failure on aarch64-linux-gnu with go enabled.
But I see you have disabled the code for now.

Just for reference the failure:
../../../gcc/libgo/go/unicode/letter.go
../../../gcc/libgo/go/unicode/tables.go -o unicode.o >/dev/null 2>&1
../../../gcc/libgo/go/runtime/panic.go: In function ‘runtime.gopanic’:
../../../gcc/libgo/go/runtime/panic.go:408:1: error: Uninitialized
probability of edge 103->128
 func gopanic(e interface{}) {
 ^
during RTL pass: subreg1
../../../gcc/libgo/go/runtime/panic.go:408:1: internal compiler error:
verify_flow_info failed
0x71f3b7 verify_flow_info()
../../gcc/gcc/cfghooks.c:267
0xa9402b execute_function_todo
../../gcc/gcc/passes.c:2006
0xa94de3 execute_todo
../../gcc/gcc/passes.c:2048
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Thanks,
Andrew


>
> Honza
>
> * cfghooks.c (verify_flow_info): Check that edge probabilities are
> set.
>
> Index: cfghooks.c
> ===
> --- cfghooks.c  (revision 253694)
> +++ cfghooks.c  (working copy)
> @@ -160,6 +161,13 @@ verify_flow_info (void)
>  e->src->index, e->dest->index);
>   err = 1;
> }
> + if (profile_status_for_fn (cfun) >= PROFILE_GUESSED
> + && !e->probability.initialized_p ())
> +   {
> + error ("Uninitialized probability of edge %i->%i", 
> e->src->index,
> +e->dest->index);
> + err = 1;
> +   }
>   if (!e->probability.verify ())
> {
>   error ("verify_flow_info: Wrong probability of edge %i->%i",


Re: Make more use of df_read_modify_subreg_p

2017-10-13 Thread Segher Boessenkool
On Fri, Oct 13, 2017 at 05:08:51PM +0100, Richard Sandiford wrote:
> Yeah, I agree this'll change the handling of paradoxical subregs that
> occupy more words than the SUBREG_REG, but I think the new version is
> correct.  The comment says:
> 
>   /* If the destination is anything other than CC0, PC, a REG or a SUBREG
>of a REG that occupies all of the REG, the insn uses DEST if
>it is mentioned in the destination or the source.  Otherwise, we
>need just check the source.  */
> 
> and a paradoxical subreg does occupy all of the SUBREG_REG.

I don't think a paradoxical subreg is allowed to refer to more than one
word at all?  At least that is how I read the documentation.  For example
the very first line of the subreg documentation:

@code{subreg} expressions are used to refer to a register in a machine
mode other than its natural one, or to refer to one register of
a multi-part @code{reg} that actually refers to several registers.


Segher


Re: [PATCH 2/2] PR libgcc/59714 complex division is surprising on aarch64

2017-10-13 Thread Richard Earnshaw
On 13/10/17 18:28, vladimir.mezent...@oracle.com wrote:
> On 10/12/2017 03:40 AM, Richard Earnshaw wrote:
>> On 12/10/17 06:21, vladimir.mezent...@oracle.com wrote:
>>> From: Vladimir Mezentsev 
>>>
>>> FMA (floating-point multiply-add) instructions are supported on aarch64.
>>> These instructions can produce different result if two operations executed 
>>> separately.
>>> -ffp-contract=off doesn't allow the FMA instructions.
>>>
>>> Tested on aarch64-linux-gnu.
>>> No regression. Two failed tests now passed.
>>>
>>> ChangeLog:
>>> 2017-10-11  Vladimir Mezentsev  
>>>
>>> PR libgcc/59714
>>> * libgcc/config/aarch64/t-aarch64 (HOST_LIBGCC2_CFLAGS): Add 
>>> -ffp-contract=off
>>> ---
>>>  libgcc/config/aarch64/t-aarch64 | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/libgcc/config/aarch64/t-aarch64 
>>> b/libgcc/config/aarch64/t-aarch64
>>> index 3af933c..e33bef0 100644
>>> --- a/libgcc/config/aarch64/t-aarch64
>>> +++ b/libgcc/config/aarch64/t-aarch64
>>> @@ -18,4 +18,5 @@
>>>  # along with GCC; see the file COPYING3.  If not see
>>>  # .
>>>  
>>> +HOST_LIBGCC2_CFLAGS += -ffp-contract=off
>>>  LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c
>>>
>> Why would we want to do this on AArch64 only?  If it's right for us,
>> then surely it would be right for everyone and the option should be
>> applied at the top level.
> 
>   It is a machine dependent option.
> We don't need this option, for example, on sparc or intel machines.
> 

No, it's a target-independent option (it's in the -f... option space).
It might be a no-op on some machines, but either it should be on
globally, or it should be off globally.  If it's a no-op then it won't
matter whether it's on or off.

R.

> -Vladimir
>>
>> Hint: I'm not convinced on the evidence here that it is right across the
>> whole of libgcc.  Before moving forward on this particular PR I think we
>> need to understand what behaviour we do want out of the compiler.
>>
>> R.
> 



Re: [PATCH 2/2] PR libgcc/59714 complex division is surprising on aarch64

2017-10-13 Thread Ramana Radhakrishnan
On Fri, Oct 13, 2017 at 10:25 PM, Richard Earnshaw
 wrote:
> On 13/10/17 18:28, vladimir.mezent...@oracle.com wrote:
>> On 10/12/2017 03:40 AM, Richard Earnshaw wrote:
>>> On 12/10/17 06:21, vladimir.mezent...@oracle.com wrote:
 From: Vladimir Mezentsev 

 FMA (floating-point multiply-add) instructions are supported on aarch64.
 These instructions can produce different result if two operations executed 
 separately.
 -ffp-contract=off doesn't allow the FMA instructions.

 Tested on aarch64-linux-gnu.
 No regression. Two failed tests now passed.

 ChangeLog:
 2017-10-11  Vladimir Mezentsev  

 PR libgcc/59714
 * libgcc/config/aarch64/t-aarch64 (HOST_LIBGCC2_CFLAGS): Add 
 -ffp-contract=off
 ---
  libgcc/config/aarch64/t-aarch64 | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/libgcc/config/aarch64/t-aarch64 
 b/libgcc/config/aarch64/t-aarch64
 index 3af933c..e33bef0 100644
 --- a/libgcc/config/aarch64/t-aarch64
 +++ b/libgcc/config/aarch64/t-aarch64
 @@ -18,4 +18,5 @@
  # along with GCC; see the file COPYING3.  If not see
  # .

 +HOST_LIBGCC2_CFLAGS += -ffp-contract=off
  LIB2ADD += $(srcdir)/config/aarch64/sync-cache.c

>>> Why would we want to do this on AArch64 only?  If it's right for us,
>>> then surely it would be right for everyone and the option should be
>>> applied at the top level.
>>
>>   It is a machine dependent option.
>> We don't need this option, for example, on sparc or intel machines.
>>
>
> No, it's a target-independent option (it's in the -f... option space).
> It might be a no-op on some machines, but either it should be on
> globally, or it should be off globally.  If it's a no-op then it won't
> matter whether it's on or off.

Indeed and we do have this on AArch32 with vfpv4 so that's 2
architectures already affected. It maybe that the default builds on
x86 don't tickle fma but we may see the same behavior elsewhere
depending on (fp) architecture defaults.

I haven't tried this testcase yet on AArch32 but I suspect we're
likely to hit the same issue there too ...

regards
Ramana
>
> R.
>
>> -Vladimir
>>>
>>> Hint: I'm not convinced on the evidence here that it is right across the
>>> whole of libgcc.  Before moving forward on this particular PR I think we
>>> need to understand what behaviour we do want out of the compiler.
>>>
>>> R.
>>
>


Re: [PATCH] Document -fdump-tree-vect option

2017-10-13 Thread Jonathan Wakely

On 13/10/17 14:06 -0600, Sandra Loosemore wrote:

On 10/13/2017 06:35 AM, Jonathan Wakely wrote:

This adds an item to the list of options for the -fdump-tree option,
because we show an example using 'vect' but don't document it.

OK for trunk?


No, I think this patch is addressing an imaginary bug.  I see no 
"vect" option listed in the dump_options array in dumpfile.c.


I think this is the example in question:

@smallexample
gcc -O2 -ftree-vectorize -fdump-tree-vect-blocks=foo.dump
-fdump-tree-pre=/dev/stderr file.c
@end smallexample

outputs vectorizer dump into @file{foo.dump}, while the PRE dump is
output on to @file{stderr}. If two conflicting dump filenames are
given for the same pass, then the latter option overrides the earlier
one.

I think the "vect" in this example is part of the pass name to be 
dumped, not an option modifying the dump output.  E.g. this is 
explicit in an older version of the GCC manual


https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Debugging-Options.html#Debugging-Options

which includes the example AND documents "vect" as one of the tree 
dumps you can ask for.  The current version of this documentation 
doesn't explicitly list all the tree dumps any more, but GCC does 
still accept -fdump-tree-vect.


Yes, I think I just misunderstood how this option works.


Is there any logic to the order of those options? Would it makes sense
to order them alphabetically?


My advice would be to move the =filename stuff out of the table (it's 
not really an option like the others) and refactor it into a paragraph 
before the option discussion, and in the table list "all" and "optall" 
first and then alphabetize the remaining entries (and make sure 
everything in the dump_options array is documented).


I'll leave that to somebody who understands it.




  1   2   >