date:20180720

[PATCH] SCCVN Datastructure TLC, part 2

2018-07-20 Thread Richard Biener



This removes the two-level hashtables we keep for the SCCVN iteration
in favor of removing elements inserted during optimistic iteration.
To be able to track those hashtable elements are now linked in a
list.  While it would be nice to elide the hashtable lookup to
find the slot for a specific element the removal of the 2nd lookup
should compensate this.  In theory we could add another hook to
the hash-traits that gets called whenever an entry gets a new
slot assigned (thus at hashtable expansion time).  We could then
add the slot as another member.  Not sure if it is worth the
trouble though since the compare function can be short-cutted
and with not much collisions the lookup shouldn't be too expensive.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

I'll leave the reference op vec<> to embedded trailing array re-org
for some later time since I fear it will require some extra churn
given consumers like to work with a vector of ops that can grow.

I'll probably followup with a patch merging the various globals
of the VN table state into a singleton struct.

Richard.

2018-07-20  Richard Biener  

* tree-ssa-sccvn.h (struct vn_nary_op_s): Add next member.
(struct vn_phi_s): Likewise.
(struct vn_reference_s): Likewise.
* tree-ssa-sccvn.c (vn_nary_op_hasher::equal): Add shortcut
for searching the slot of an entry known to be in the hash itself.
(vn_phi_hasher::equal): Likewise.
(vn_reference_hasher::equal): Likewise.
(last_inserted_ref, last_inserted_phi, last_inserted_nary): New
globals.
(optimistic_info, current_info): Remove, keeping only valid_info.
(vn_reference_lookup_1): Remove fallback lookup.
(vn_reference_lookup_2): Likewise.
(vn_nary_op_lookup_1): Likewise.
(vn_phi_lookup): Likewise.
(vn_nary_build_or_lookup_1): Make sure to not chain the built
hash element.
(vn_reference_insert): Adjust, chain the inserted hash element
at last_inserted_ref.
(vn_reference_insert_pieces): Likewise.
(visit_reference_op_call): Likewise.
(vn_nary_op_insert_into): Chain the inserted hash element at
last_inserted_nary.
(vn_nary_op_insert_pieces): Adjust.
(vn_nary_op_insert): Likewise.
(vn_nary_op_insert_stmt): Likewise.
(vn_phi_insert): Adjust, chain the inserted hash element at
last_inserted_phi.
(process_scc): Remove clearing and copying the optimistic
table.  Instead remove elements inserted during an optimistic
iteration from the single table we maintain.
(init_scc_vn): Adjust.
(free_scc_vn): Likewise.
(sccvn_dom_walker::record_cond): Likewise.
(sccvn_dom_walker::after_dom_children): Likewise.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 262879)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -156,7 +156,7 @@ vn_nary_op_hasher::hash (const vn_nary_o
 inline bool
 vn_nary_op_hasher::equal (const vn_nary_op_s *vno1, const vn_nary_op_s *vno2)
 {
-  return vn_nary_op_eq (vno1, vno2);
+  return vno1 == vno2 || vn_nary_op_eq (vno1, vno2);
 }
 
 typedef hash_table vn_nary_op_table_type;
@@ -187,7 +187,7 @@ vn_phi_hasher::hash (const vn_phi_s *vp1
 inline bool
 vn_phi_hasher::equal (const vn_phi_s *vp1, const vn_phi_s *vp2)
 {
-  return vn_phi_eq (vp1, vp2);
+  return vp1 == vp2 || vn_phi_eq (vp1, vp2);
 }
 
 typedef hash_table vn_phi_table_type;
@@ -242,7 +242,7 @@ vn_reference_hasher::hash (const vn_refe
 inline bool
 vn_reference_hasher::equal (const vn_reference_s *v, const vn_reference_s *c)
 {
-  return vn_reference_eq (v, c);
+  return v == c || vn_reference_eq (v, c);
 }
 
 typedef hash_table vn_reference_table_type;
@@ -295,19 +295,14 @@ static obstack vn_tables_obstack;
 /* Special obstack we never unwind.  */
 static obstack vn_tables_insert_obstack;
 
+static vn_reference_t last_inserted_ref;
+static vn_phi_t last_inserted_phi;
+static vn_nary_op_t last_inserted_nary;
+
 /* Valid hashtables storing information we have proven to be
correct.  */
 static vn_tables_t valid_info;
 
-/* Optimistic hashtables storing information we are making assumptions about
-   during iterations.  */
-static vn_tables_t optimistic_info;
-
-/* Pointer to the set of hashtables that is currently being used.
-   Should always point to either the optimistic_info, or the
-   valid_info.  */
-static vn_tables_t current_info;
-
 
 /* Reverse post order index for each basic block.  */
 static int *rpo_numbers;
@@ -1548,9 +1543,7 @@ vn_reference_lookup_1 (vn_reference_t vr
   hashval_t hash;
 
   hash = vr->hashcode;
-  slot = current_info->references->find_slot_with_hash (vr, hash, NO_INSERT);
-  if (!slot && current_info == optimistic_info)
-slot = valid_info->references->find_slot_with_hash (vr, hash, NO_INSERT);
+  slot = valid_info->reference

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-20 Thread Bernd Edlinger

>   if (TREE_CODE (arg) == ADDR_EXPR)
> {
>+  tree argtype = TREE_TYPE (arg);
>+  chartype = argtype;

This assignment should be unnecessary here.  Right?

>+
>   arg = TREE_OPERAND (arg, 0);
>   tree ref = arg;
>   if (TREE_CODE (arg) == ARRAY_REF)
>   {
> tree idx = TREE_OPERAND (arg, 1);
>-if (TREE_CODE (idx) != INTEGER_CST)
>+if (TREE_CODE (idx) != INTEGER_CST
>+&& TREE_CODE (argtype) == POINTER_TYPE)

What else could the type of an ADDR_EXPR be?
argtype is TREE_TYPE(arg).
Do you have a test case where TREE_CODE(argtype) is not POINTER_TYPE?


Bernd.

[PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Martin Liška

Hi.

I'm sending patch candidate with suggested target common hook. It allows a 
target
to list all possible values for an option. Using the API, I implemented -march 
and
-mtune option listing on i386.

Richard you asked about the values. Yes, target should list all possible values,
mainly because --help=target output needs all of these.

Thoughts?
Martin

>From b2b40f7ca1f801a318aec661d0128a5adde7cb68 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 20 Jul 2018 09:58:16 +0200
Subject: [PATCH] Prototype of hook for possible list of option values.

---
 gcc/common/common-target.def |   7 +
 gcc/common/common-targhooks.c|   7 +
 gcc/common/common-targhooks.h|   1 +
 gcc/common/config/i386/i386-common.c | 264 +
 gcc/config/i386/i386.c   | 413 +++
 gcc/config/i386/i386.h   | 144 ++
 gcc/doc/tm.texi  |   4 +
 gcc/doc/tm.texi.in   |   2 +
 gcc/opt-suggestions.c|  21 +-
 gcc/opts.c   |  33 +++
 10 files changed, 524 insertions(+), 372 deletions(-)

diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
index e0afbc6af29..8c98598b015 100644
--- a/gcc/common/common-target.def
+++ b/gcc/common/common-target.def
@@ -80,6 +80,13 @@ DEFHOOK
  bool, (bool report, struct gcc_options *opts),
  hook_bool_bool_gcc_optionsp_false)
 
+DEFHOOK
+(get_valid_option_values,
+"The hook is used for options that have a non-trivial list of\
+ possible option values.",
+ vec, (int),
+ default_get_valid_option_values)
+
 /* Leave the boolean fields at the end.  */
 
 /* True if unwinding tables should be generated by default.  */
diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
index b1090190664..3662180f2e0 100644
--- a/gcc/common/common-targhooks.c
+++ b/gcc/common/common-targhooks.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "common/common-target.h"
 #include "common/common-targhooks.h"
+#include "opts.h"
 
 /* Determine the exception handling mechanism for the target.  */
 
@@ -77,6 +78,12 @@ default_target_handle_option (struct gcc_options *opts ATTRIBUTE_UNUSED,
   return true;
 }
 
+vec
+default_get_valid_option_values (int option)
+{
+  return vec ();
+}
+
 const struct default_options empty_optimization_table[] =
   {
 { OPT_LEVELS_NONE, 0, NULL, 0 }
diff --git a/gcc/common/common-targhooks.h b/gcc/common/common-targhooks.h
index d290d7f3e21..f8a7436d9dd 100644
--- a/gcc/common/common-targhooks.h
+++ b/gcc/common/common-targhooks.h
@@ -28,6 +28,7 @@ extern bool default_target_handle_option (struct gcc_options *,
 	  struct gcc_options *,
 	  const struct cl_decoded_option *,
 	  location_t);
+extern vec default_get_valid_option_values (int);
 
 extern const struct default_options empty_optimization_table[];
 
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index 70b3c3f2fc3..e7253b60c36 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1459,4 +1459,268 @@ i386_except_unwind_info (struct gcc_options *opts)
 #undef TARGET_SUPPORTS_SPLIT_STACK
 #define TARGET_SUPPORTS_SPLIT_STACK ix86_supports_split_stack
 
+/* This table must be in sync with enum processor_type in i386.h.  */
+const struct ptt processor_target_table[PROCESSOR_max] =
+{
+  /* The "0:0:8" label alignment specified for some processors generates
+ secondary 8-byte alignment only for those label/jump/loop targets
+ which have primary alignment.  */
+
+  {"generic","16:11:8", "16:11:8", "0:0:8", "16"},
+  {"i386",   "4",   "4",   NULL,"4" },
+  {"i486",   "16",  "16",  "0:0:8", "16"},
+  {"pentium","16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"lakemont",   "16:8:8",  "16:8:8",  "0:0:8", "16"},
+  {"pentiumpro", "16",  "16:11:8", "0:0:8", "16"},
+  {"pentium4",   NULL,  NULL,  NULL,NULL},
+  {"nocona", NULL,  NULL,  NULL,NULL},
+  {"core2",  "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"nehalem","16:11:8", "16:11:8", "0:0:8", "16"},
+  {"sandybridge","16:11:8", "16:11:8", "0:0:8", "16"},
+  {"haswell","16:11:8", "16:11:8", "0:0:8", "16"},
+  {"bonnell","16",  "16:8:8",  "0:0:8", "16"},
+  {"silvermont", "16",  "16:8:8",  "0:0:8", "16"},
+  {"goldmont",   "16",  "16:8:8",  "0:0:8", "16"},
+  {"goldmont-plus",  "16",  "16:8:8",  "0:0:8", "16"},
+  {"tremont","16",  "16:8:8",  "0:0:8", "16"},
+  {"knl","16",  "16:8:8",  "0:0:8", "16"},
+  {"knm","16",  "16:8:8",  "0:0:8", "16"},
+  {"skylake","16:11:8", "16:11:8", "0:0:8", "16"},
+  {"skylake-avx512", "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"cannonlake", "16:11:8", "16:11:8", "0:0:8", "16"},
+  {"icelake-client", "16:11:8", "16:11:8", "0:0

Re: [PATCH] When using -fprofile-generate=/some/path mangle absolute path of file (PR lto/85759).

2018-07-20 Thread Martin Liška

On 07/20/2018 06:02 AM, Bin.Cheng wrote:
> On Fri, Jun 29, 2018 at 9:54 PM, Martin Liška  wrote:
>> On 06/22/2018 10:35 PM, Jeff Law wrote:
>>> On 05/16/2018 05:53 AM, Martin Liška wrote:
 On 12/21/2017 10:13 AM, Martin Liška wrote:
> On 12/20/2017 06:45 PM, Jakub Jelinek wrote:
>> Another thing is that the "/" in there is wrong, so
>>   const char dir_separator_str[] = { DIR_SEPARATOR, '\0' };
>>   char *b = concat (profile_data_prefix, dir_separator_str, pwd, NULL);
>> needs to be used instead.
> This looks much nicer, I forgot about DIR_SEPARATOR.
>
>> Does profile_data_prefix have any dir separators stripped from the end?
> That's easy to achieve..
>
>> Is pwd guaranteed to be relative in this case?
> .. however this is absolute path, which would be problematic on a DOC 
> based FS.
> Maybe we should do the same path mangling as we do for purpose of gcov:
>
> https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov.c#L2424
 Hi.

 I decided to implement that. Which means for:

 $ gcc -fprofile-generate=/tmp/myfolder empty.c -O2 && ./a.out

 we get following file:
 /tmp/myfolder/#home#marxin#Programming#testcases#tmp#empty.gcda

 That guarantees we have a unique file path. As seen in the PR it
 can produce a funny ICE.

 I've been testing the patch.
 Ready after it finishes tests?

 Martin

> What do you think about it?
> Regarding the string manipulation: I'm not an expert, but work with 
> string in C
> is for me always a pain :)
>
> Martin
>

 0001-When-using-fprofile-generate-some-path-mangle-absolu.patch


 From 386a4561a4d1501e8959871791289e95f6a89af5 Mon Sep 17 00:00:00 2001
 From: marxin 
 Date: Wed, 16 Aug 2017 10:22:57 +0200
 Subject: [PATCH] When using -fprofile-generate=/some/path mangle absolute 
 path
  of file (PR lto/85759).

 gcc/ChangeLog:

 2018-05-16  Martin Liska  

  PR lto/85759
  * coverage.c (coverage_init): Mangle full path name.
  * doc/invoke.texi: Document the change.
  * gcov-io.c (mangle_path): New.
  * gcov-io.h (mangle_path): Likewise.
  * gcov.c (mangle_name): Use mangle_path for path mangling.
>>> ISTM you can self-approve this now if you want it to move forward :-)
>>>
>>> jeff
>>>
>>
>> Sure, let me install the patch then.
> Hi,
> I am a bit confused after path mangling change.
> Now with below command line:

Hi.

> $ ./gcc -O2 -fprofile-use=./ sort.c -o sort.c

Does not make sense, it's default. When using argument
to the option, an absolute path is preferred to be used.

> or
> $ ./gcc -O2 -fprofile-use=./sort.gcda sort.c -o sort.c

Should be always a folder, not a path to a file.

> 
> The da_file_name and the final name used in gcov_open is as:
> 
> $ p name
> $11 = 0x2e63050
> ./#home#chengbin.cb#work#gcc-patches#trunk-orig#target.build#bin#sort.gcda
> or
> p da_file_name
> $1 = 0x2e63050 
> "sort.gcda/#home#chengbin.cb#work#gcc-patches#trunk-orig#target.build#bin#sort.gcda"
> 
> 
> These are not valid paths?  Or how should I modify the command line options?

Yes, please fix options.

Martin

> 
> Thanks,
> bin
>>
>> Martin

[PATCH] Remove unused code.

2018-07-20 Thread Martin Liška

Hi.

Following was introduced in r230331 and as David confirmed
it was installed accidentally. It has never been used.

Ready for trunk?
Martin

gcc/ChangeLog:

2018-07-20  Martin Liska  

* tree.h (DECL_LOCATION_RANGE): Remove unused macro.
(get_decl_source_range): Remove unused function.
---
 gcc/tree.h | 10 --
 1 file changed, 10 deletions(-)


diff --git a/gcc/tree.h b/gcc/tree.h
index 79b675025d9..8f20ff78369 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2311,9 +2311,6 @@ extern machine_mode vector_type_mode (const_tree);
 #define DECL_IS_BUILTIN(DECL) \
   (LOCATION_LOCUS (DECL_SOURCE_LOCATION (DECL)) <= BUILTINS_LOCATION)
 
-#define DECL_LOCATION_RANGE(NODE) \
-  (get_decl_source_range (DECL_MINIMAL_CHECK (NODE)))
-
 /*  For FIELD_DECLs, this is the RECORD_TYPE, UNION_TYPE, or
 QUAL_UNION_TYPE node that the field is a member of.  For VAR_DECL,
 PARM_DECL, FUNCTION_DECL, LABEL_DECL, RESULT_DECL, and CONST_DECL
@@ -5805,13 +5802,6 @@ set_source_range (tree expr, location_t start, location_t finish);
 extern location_t
 set_source_range (tree expr, source_range src_range);
 
-static inline source_range
-get_decl_source_range (tree decl)
-{
-  location_t loc = DECL_SOURCE_LOCATION (decl);
-  return get_range_from_loc (line_table, loc);
-}
-
 /* Return true if it makes sense to promote/demote from_type to to_type. */
 inline bool
 desired_pro_or_demotion_p (const_tree to_type, const_tree from_type)

Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-20 Thread Sam Tebbs


Hi all,

Here is an updated patch that does the following:

* Adds a new constraint in config/aarch64/constraints.md to check for a 
constant integer that is left consecutive. This addresses Richard 
Henderson's suggestion about combining the aarch64_is_left_consecutive 
call and the const_int match in the pattern.


* Merges the two patterns defined into one.

* Changes the pattern's type attribute to bfm.

* Improved the comment above the aarch64_is_left_consecutive implementation.

* Makes the pattern use the GPI iterator to accept smaller integer sizes 
(an extra test is added to check for this).


* Improves the tests in combine_bfxil.c to ensure they aren't optimised 
away and that they check for the pattern's correctness.


Below is a new changelog and the example given before.

Is this OK for trunk?

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
  return (a & 0xll) | (b & 0xll);
}

void read(unsigned long long a, unsigned long long b, unsigned long long 
*c) {

  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret



Bootstrapped and regtested on aarch64-none-linux-gnu and 
aarch64-none-elf with no regressions.



gcc/
2018-07-11  Sam Tebbs  

    PR target/85628
    * config/aarch64/aarch64.md (*aarch64_bfxil):
    Define.
    * config/aarch64/constraints.md (Ulc): Define
    * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
    Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New 
function.


gcc/testsuite
2018-07-11  Sam Tebbs  

    PR target/85628
    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 07/19/2018 02:02 PM, Sam Tebbs wrote:

Hi Richard,

Thanks for the feedback. I find that using "is_left_consecutive" is 
more descriptive than checking for it being a power of 2 - 1, since it 
describes the requirement (having consecutive ones from the MSB) more 
explicitly. I would be happy to change it though if that is the 
consensus.


I have addressed your point about just returning the string instead of 
using output_asm_insn and have changed it locally. I'll send an 
updated patch soon.



On 07/17/2018 02:33 AM, Richard Henderson wrote:

On 07/16/2018 10:10 AM, Sam Tebbs wrote:

+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned 
regno, unsigned,

  return SImode;
  }
  +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits are 
consecutive

+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+

...

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+    (match_operand 3 "const_int_operand"))
+    (and:DI (match_operand:DI 2 "register_operand" "0")
+    (match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[4]))"
Better is to use a define_predicate to merge both that second test 
and the

const_int_operand.

(I'm not sure about the "left_consecutive" language either.
Isn't it more descriptive to say that op3 is a power of 2 minus 1?)

(define_predicate "pow2m1_operand"
   (and (match_code "const_int")
    (match_test "exact_pow2 (INTVAL(op) + 1) > 0")))

and use

   (match_operand:DI 3 "pow2m1_operand")

and then just the

   INTVAL (operands[3]) == ~INTVAL (operands[4])

test.

Also, don't omit the modes for the constants.
Also, there's no reason this applies only to DI mode;
use the GPI iterator and % in the output template.


+    HOST_WIDE_INT op3 = INTVAL (operands[3]);
+    operands[3] = GEN_INT (ceil_log2 (op3));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
+    return "";

You can just return the string that you passed to output_asm_insn.


+  }
+  [(set_attr "type" "bfx")]

The other aliases of the BFM insn use type "bfm";
"bfx" appears to be aliases of UBFM and SBFM.
Not that it appears to matter to the scheduling
descriptions, but it is inconsistent.


r~

[PATCH][AAarch64][v2] Add support for TARGET_COMPUTE_FRAME_LAYOUT

2018-07-20 Thread Vlad Lazar


On 09/07/18 12:58, Vlad Lazar wrote:

Hi all,

The patch adds support for the TARGET_COMPUTE_FRAME_LAYOUT hook on AArch64
and removes unneeded frame layout recalculation.

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

Thanks,
Vlad

gcc/
2018-07-02  Vlad Lazar  

* config/aarch64/aarch64.h (TARGET_COMPUTE_FRAME_LAYOUT): Define.
* config/aarch64/aarch64.c (aarch64_expand_prologue, 
aarch64_expand_epilogue,
aarch64_initial_elimination_offset): Remove aarch64_layout_frame calls.



Hi,

This is an updated version of the patch which removes some extra layout 
recalculation.
Please disregard the first version.

The patch adds support for the TARGET_COMPUTE_FRAME_LAYOUT hook on AArch64
and removes unneeded frame layout recalculation.

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

Thanks,
Vlad

gcc/
2018-07-20  Vlad Lazar  

* config/aarch64/aarch64.h (TARGET_COMPUTE_FRAME_LAYOUT): Define.
* config/aarch64/aarch64.c (aarch64_expand_prologue): Remove 
aarch64_layout_frame call.
(aarch64_expand_epilogue): Likewise.
(aarch64_initial_elimination_offset): Likewise. 
(aarch64_get_separate_components): Likewise.
(aarch64_use_return_insn_p): Likewise.
(aarch64_layout_frame): Remove unneeded check.

---

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index
976f9afae54c1c98c22a4d5002d8d94e33b3190a..02aaa333fb57eff918049681173403f004a8a8e3
100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -478,6 +478,9 @@ extern unsigned aarch64_architecture_version;
 #undef DONT_USE_BUILTIN_SETJMP
 #define DONT_USE_BUILTIN_SETJMP 1

+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT aarch64_layout_frame
+
 /* Register in which the structure value is to be returned.  */
 #define AARCH64_STRUCT_VALUE_REGNUM R8_REGNUM

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index
b88e7cac27ab76e01b9769563ec9077d2a81bd7b..6a52eecf94011f5a7ee787c1295ca24732af2ff4
100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4021,9 +4021,6 @@ aarch64_layout_frame (void)
   HOST_WIDE_INT offset = 0;
   int regno, last_fp_reg = INVALID_REGNUM;

-  if (reload_completed && cfun->machine->frame.laid_out)
-return;
-
   cfun->machine->frame.emit_frame_chain = aarch64_needs_frame_chain ();

 #define SLOT_NOT_REQUIRED (-2)
@@ -4567,8 +4564,6 @@ offset_12bit_unsigned_scaled_p (machine_mode mode,
poly_int64 offset)
 static sbitmap
 aarch64_get_separate_components (void)
 {
-  aarch64_layout_frame ();
-
   sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
   bitmap_clear (components);

@@ -4592,7 +4587,7 @@ aarch64_get_separate_components (void)

   unsigned reg1 = cfun->machine->frame.wb_candidate1;
   unsigned reg2 = cfun->machine->frame.wb_candidate2;
-  /* If aarch64_layout_frame has chosen registers to store/restore with
+  /* If registers have been chosen to be stored/restored with
  writeback don't interfere with them to avoid having to output
explicit
  stack adjustment instructions.  */
   if (reg2 != INVALID_REGNUM)
@@ -4850,8 +4845,6 @@ aarch64_add_cfa_expression (rtx_insn *insn,
unsigned int reg,
 void
 aarch64_expand_prologue (void)
 {
-  aarch64_layout_frame ();
-
   poly_int64 frame_size = cfun->machine->frame.frame_size;
   poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
   HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
@@ -4964,8 +4957,6 @@ aarch64_use_return_insn_p (void)
   if (crtl->profile)
 return false;

-  aarch64_layout_frame ();
-
   return known_eq (cfun->machine->frame.frame_size, 0);
 }

@@ -4977,8 +4968,6 @@ aarch64_use_return_insn_p (void)
 void
 aarch64_expand_epilogue (bool for_sibcall)
 {
-  aarch64_layout_frame ();
-
   poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
   HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
   poly_int64 final_adjust = cfun->machine->frame.final_adjust;
@@ -7525,8 +7514,6 @@ aarch64_can_eliminate (const int from
ATTRIBUTE_UNUSED, const int to)
 poly_int64
 aarch64_initial_elimination_offset (unsigned from, unsigned to)
 {
-  aarch64_layout_frame ();
-
   if (to == HARD_FRAME_POINTER_REGNUM)
 {
   if (from == ARG_POINTER_REGNUM)

Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-20 Thread Sam Tebbs


Please disregard the original patch and see this updated version.


On 07/20/2018 10:31 AM, Sam Tebbs wrote:

Hi all,

Here is an updated patch that does the following:

* Adds a new constraint in config/aarch64/constraints.md to check for 
a constant integer that is left consecutive. This addresses Richard 
Henderson's suggestion about combining the aarch64_is_left_consecutive 
call and the const_int match in the pattern.


* Merges the two patterns defined into one.

* Changes the pattern's type attribute to bfm.

* Improved the comment above the aarch64_is_left_consecutive 
implementation.


* Makes the pattern use the GPI iterator to accept smaller integer 
sizes (an extra test is added to check for this).


* Improves the tests in combine_bfxil.c to ensure they aren't 
optimised away and that they check for the pattern's correctness.


Below is a new changelog and the example given before.

Is this OK for trunk?

This patch adds an optimisation that exploits the AArch64 BFXIL 
instruction

when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
  return (a & 0xll) | (b & 0xll);
}

void read(unsigned long long a, unsigned long long b, unsigned long 
long *c) {

  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret



Bootstrapped and regtested on aarch64-none-linux-gnu and 
aarch64-none-elf with no regressions.



gcc/
2018-07-11  Sam Tebbs  

    PR target/85628
    * config/aarch64/aarch64.md (*aarch64_bfxil):
    Define.
    * config/aarch64/constraints.md (Ulc): Define
    * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
    Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New 
function.


gcc/testsuite
2018-07-11  Sam Tebbs  

    PR target/85628
    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 07/19/2018 02:02 PM, Sam Tebbs wrote:

Hi Richard,

Thanks for the feedback. I find that using "is_left_consecutive" is 
more descriptive than checking for it being a power of 2 - 1, since 
it describes the requirement (having consecutive ones from the MSB) 
more explicitly. I would be happy to change it though if that is the 
consensus.


I have addressed your point about just returning the string instead 
of using output_asm_insn and have changed it locally. I'll send an 
updated patch soon.



On 07/17/2018 02:33 AM, Richard Henderson wrote:

On 07/16/2018 10:10 AM, Sam Tebbs wrote:

+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode 
(unsigned regno, unsigned,

  return SImode;
  }
  +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits 
are consecutive

+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+

...

+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+    (match_operand 3 "const_int_operand"))
+    (and:DI (match_operand:DI 2 "register_operand" "0")
+    (match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[4]))"
Better is to use a define_predicate to merge both that second test 
and the

const_int_operand.

(I'm not sure about the "left_consecutive" language either.
Isn't it more descriptive to say that op3 is a power of 2 minus 1?)

(define_predicate "pow2m1_operand"
   (and (match_code "const_int")
    (match_test "exact_pow2 (INTVAL(op) + 1) > 0")))

and use

   (match_operand:DI 3 "pow2m1_operand")

and then just the

   INTVAL (operands[3]) == ~INTVAL (operands[4])

test.

Also, don't omit the modes for the constants.
Also, there's no reason this applies only to DI mode;
use the GPI iterator and % in the output template.


+    HOST_WIDE_INT op3 = INTVAL (operands[3]);
+    operands[3] = GEN_INT (ceil_log2 (op3));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
+    return "";

You can just return the string that you passed to output_asm_insn.


+  }
+  [(set_attr "type" "bfx")]

The other aliases of the BFM insn use type "bfm";
"bfx" appears to be aliases of UBFM and SBFM.
Not that it appears to matter to the scheduling
descriptions, but it is inconsistent.


r~






diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 514ddc4..b025cd6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -558,4 +558

[PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-07-20 Thread Vlad Lazar


Hi,

The patch adds implementations for the NEON intrinsics vabsd_s64 and vnegd_s64.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification)

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

OK for trunk?

Thanks,
Vlad

gcc/
2018-07-02  Vlad Lazar  

* config/aarch64/arm_neon.h (vabsd_s64, vnegd_s64): New.

gcc/testsuite/
2018-07-02  Vlad Lazar  

* gcc.target/aarch64/scalar_intrinsics.c (test_vabsd_s64, 
test_vabsd_s64): New.

---

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
2d18400040f031dfcdaf60269ad484647804e1be..19e22431a85bcd09d0ea759b42b0a52420b6c43c
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11822,6 +11822,13 @@ vabsq_s64 (int64x2_t __a)
   return __builtin_aarch64_absv2di (__a);
 }
 
+__extension__ extern __inline int64_t

+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vabsd_s64 (int64_t __a)
+{
+  return __builtin_aarch64_absdi (__a);
+}
+
 /* vadd */
 
 __extension__ extern __inline int64_t

@@ -22907,6 +22914,12 @@ vneg_s64 (int64x1_t __a)
   return -__a;
 }
 
+__extension__ extern __inline int64_t

+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vnegd_s64 (int64_t __a)
+{
+  return -__a;
+}
 __extension__ extern __inline float32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vnegq_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
index 
ea29066e369b967d0781d31c8a5208bda9e4f685..45afeec373971838e0cd107038b4aa51a2d4998f
 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
@@ -603,6 +603,14 @@ test_vsqaddd_u64 (uint64_t a, int64_t b)
   return vsqaddd_u64 (a, b);
 }
 
+/* { dg-final { scan-assembler-times "\\tabs\\td\[0-9\]+" 1 } }  */

+
+int64_t
+test_vabsd_s64 (int64_t a)
+{
+  return vabsd_s64 (a);
+}
+
 /* { dg-final { scan-assembler-times "\\tsqabs\\tb\[0-9\]+" 1 } } */
 
 int8_t

@@ -627,6 +635,14 @@ test_vqabss_s32 (int32_t a)
   return vqabss_s32 (a);
 }
 
+/* { dg-final { scan-assembler-times "\\tneg\\tx\[0-9\]+" 1 } } */

+
+int64_t
+test_vnegd_s64 (int64_t a)
+{
+  return vnegd_s64 (a);
+}
+
 /* { dg-final { scan-assembler-times "\\tsqneg\\tb\[0-9\]+" 1 } } */
 
 int8_t

Optimization for std::to_string()

2018-07-20 Thread niXman



Hi,



Patch in attachments.

Tested on x86_64-linux-gnu.


--
Regards, niXman
___
C++ for Bitcoins: github.com/niXman
Index: libstdc++-v3/include/ext/string_conversions.h
===
--- libstdc++-v3/include/ext/string_conversions.h	(revision 262879)
+++ libstdc++-v3/include/ext/string_conversions.h	(working copy)
@@ -100,19 +100,18 @@
  __builtin_va_list), std::size_t __n,
 		 const _CharT* __fmt, ...)
 {
-  // XXX Eventually the result should be constructed in-place in
-  // the __cxx11 string, likely with the help of internal hooks.
-  _CharT* __s = static_cast<_CharT*>(__builtin_alloca(sizeof(_CharT)
-			  * __n));
+  _String __str(__n, 0);
 
   __builtin_va_list __args;
   __builtin_va_start(__args, __fmt);
 
-  const int __len = __convf(__s, __n, __fmt, __args);
+  const std::size_t __len = __convf(&__str[0], __n, __fmt, __args);
 
   __builtin_va_end(__args);
 
-  return _String(__s, __s + __len);
+  __str.resize(__len);
+
+  return __str;
 }
 
 _GLIBCXX_END_NAMESPACE_VERSION

Re: [PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Richard Earnshaw (lists)

On 20/07/18 09:04, Martin Liška wrote:
> Hi.
> 
> I'm sending patch candidate with suggested target common hook. It allows a 
> target
> to list all possible values for an option. Using the API, I implemented 
> -march and
> -mtune option listing on i386.
> 
> Richard you asked about the values. Yes, target should list all possible 
> values,
> mainly because --help=target output needs all of these.
> 
> Thoughts?
> Martin
> 

I don't think anyone can reasonably write an implementation of this hook
based on this specification:

+@deftypefn {Common Target Hook} {vec}
TARGET_GET_VALID_OPTION_VALUES (int)
+The hook is used for options that have a non-trivial list of possible
option values.
+@end deftypefn
+

What's the int parameter for?  What's the lifetime of the result (who
cleans it up)?  If I need to allocation memory strings in the vector,
where do I do that?  Can I assume GC memory in the driver, for example?

Frankly though, I don't really want to enumerate every possible
permutation of the options for the architecture like this, though.  It's
just too brute force and the answer is likely to be hundreds (haven't
sat down to count it).  What's more, the extensions might have meaning
in the order in which they appear.  So, for example,

-march=armv8-a+crypto+nosimd

would be very different from

-march=armv8-a+nosimd+crypto

since the extensions are applied from left to right (the first collapses
to armv8-a+nosimd, the latter to armv8-a+crypto, but there are more
complex cases as well which I don't want to dig into here).

It would be a practical impossibility to list all of these.

R.

> 
> 0001-Prototype-of-hook-for-possible-list-of-option-values.patch
> 
> 
> From b2b40f7ca1f801a318aec661d0128a5adde7cb68 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 20 Jul 2018 09:58:16 +0200
> Subject: [PATCH] Prototype of hook for possible list of option values.
> 
> ---
>  gcc/common/common-target.def |   7 +
>  gcc/common/common-targhooks.c|   7 +
>  gcc/common/common-targhooks.h|   1 +
>  gcc/common/config/i386/i386-common.c | 264 +
>  gcc/config/i386/i386.c   | 413 +++
>  gcc/config/i386/i386.h   | 144 ++
>  gcc/doc/tm.texi  |   4 +
>  gcc/doc/tm.texi.in   |   2 +
>  gcc/opt-suggestions.c|  21 +-
>  gcc/opts.c   |  33 +++
>  10 files changed, 524 insertions(+), 372 deletions(-)
> 
> diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
> index e0afbc6af29..8c98598b015 100644
> --- a/gcc/common/common-target.def
> +++ b/gcc/common/common-target.def
> @@ -80,6 +80,13 @@ DEFHOOK
>   bool, (bool report, struct gcc_options *opts),
>   hook_bool_bool_gcc_optionsp_false)
>  
> +DEFHOOK
> +(get_valid_option_values,
> +"The hook is used for options that have a non-trivial list of\
> + possible option values.",
> + vec, (int),
> + default_get_valid_option_values)
> +
>  /* Leave the boolean fields at the end.  */
>  
>  /* True if unwinding tables should be generated by default.  */
> diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
> index b1090190664..3662180f2e0 100644
> --- a/gcc/common/common-targhooks.c
> +++ b/gcc/common/common-targhooks.c
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm.h"
>  #include "common/common-target.h"
>  #include "common/common-targhooks.h"
> +#include "opts.h"
>  
>  /* Determine the exception handling mechanism for the target.  */
>  
> @@ -77,6 +78,12 @@ default_target_handle_option (struct gcc_options *opts 
> ATTRIBUTE_UNUSED,
>return true;
>  }
>  
> +vec
> +default_get_valid_option_values (int option)
> +{
> +  return vec ();
> +}
> +
>  const struct default_options empty_optimization_table[] =
>{
>  { OPT_LEVELS_NONE, 0, NULL, 0 }
> diff --git a/gcc/common/common-targhooks.h b/gcc/common/common-targhooks.h
> index d290d7f3e21..f8a7436d9dd 100644
> --- a/gcc/common/common-targhooks.h
> +++ b/gcc/common/common-targhooks.h
> @@ -28,6 +28,7 @@ extern bool default_target_handle_option (struct 
> gcc_options *,
> struct gcc_options *,
> const struct cl_decoded_option *,
> location_t);
> +extern vec default_get_valid_option_values (int);
>  
>  extern const struct default_options empty_optimization_table[];
>  
> diff --git a/gcc/common/config/i386/i386-common.c 
> b/gcc/common/config/i386/i386-common.c
> index 70b3c3f2fc3..e7253b60c36 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -1459,4 +1459,268 @@ i386_except_unwind_info (struct gcc_options *opts)
>  #undef TARGET_SUPPORTS_SPLIT_STACK
>  #define TARGET_SUPPORTS_SPLIT_STACK ix86_supports_split_stack
>  
> +/* This table must be in sync with enum processor_type in i386.h.  */
>

Re: [PATCH] Make function clone name numbering independent.

2018-07-20 Thread Richard Biener

On Fri, Jul 20, 2018 at 4:48 AM Michael Ploujnikov
 wrote:
>
> On 2018-07-17 04:25 PM, Michael Ploujnikov wrote:
> > On 2018-07-17 06:02 AM, Richard Biener wrote:
> >> On Tue, Jul 17, 2018 at 8:10 AM Bernhard Reutner-Fischer
> >>  wrote:
> >>>
> >>> On 16 July 2018 21:38:36 CEST, Michael Ploujnikov 
> >>>  wrote:
>  Hi,
> 
> >>>
>  +clone_fn_ids = hash_map::create_ggc
>  (1000);
> >>>
> >>> Isn't 1000 a bit excessive? What about 64 or thereabouts?
> >>
> >> I'm not sure we should throw memory at this "problem" in this way.
> >> What specific issue
> >> does this address?
> >
> > This goes along with the general theme of preventing changes to one 
> > function affecting codegen of others. What I'm seeing in this case is when 
> > a function bar is modified codegen decides to create more clones of it (eg: 
> > during the constprop pass). These extra clones cause the global counter to 
> > increment so the clones of the unchanged function foo are named differently 
> > only because of a source change to bar. I was hoping that the testcase 
> > would be a good illustration, but perhaps not; is there anything else I can 
> > do to make this clearer?
> >
> >
> >>
> >> Iff then I belive forgoing the automatic counter addidion is the way
> >> to go and hand
> >> control of that to the callers (for example the caller in
> >> lto/lto-partition.c doesn't
> >> even seem to need that counter.
> >
> > How can you tell that privatize_symbol_name_1 doesn't need the counter? I'm 
> > assuming it has a good reason to call clone_function_name_1 rather than 
> > appending ".lto_priv" itself.
> >
> >> You also assume the string you key on persists - luckily the
> >> lto-partition.c caller
> >> leaks it but this makes your approach quite fragile in my eye (put the 
> >> logic
> >> into clone_function_name instead, where you at least know you are dealing
> >> with a string from an indentifier which are never collected).
> >>
> >> Richard.
> >>
> >
> > Is this what you had in mind?:
> >
> > diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> > index 6e84a31..f000420 100644
> > --- gcc/cgraphclones.c
> > +++ gcc/cgraphclones.c
> > @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
> > prof_count,
> >return new_node;
> >  }
> >
> > -static GTY(()) unsigned int clone_fn_id_num;
> > +static GTY(()) hash_map *clone_fn_ids;
> >
> >  /* Return a new assembler name for a clone with SUFFIX of a decl named
> > NAME.  */
> > @@ -521,14 +521,13 @@ tree
> >  clone_function_name_1 (const char *name, const char *suffix)
> >  {
> >size_t len = strlen (name);
> > -  char *tmp_name, *prefix;
> > +  char *prefix;
> >
> >prefix = XALLOCAVEC (char, len + strlen (suffix) + 2);
> >memcpy (prefix, name, len);
> >strcpy (prefix + len + 1, suffix);
> >prefix[len] = symbol_table::symbol_suffix_separator ();
> > -  ASM_FORMAT_PRIVATE_NAME (tmp_name, prefix, clone_fn_id_num++);
> > -  return get_identifier (tmp_name);
> > +  return get_identifier (prefix);
> >  }
> >
> >  /* Return a new assembler name for a clone of DECL with SUFFIX.  */
> > @@ -537,7 +536,17 @@ tree
> >  clone_function_name (tree decl, const char *suffix)
> >  {
> >tree name = DECL_ASSEMBLER_NAME (decl);
> > -  return clone_function_name_1 (IDENTIFIER_POINTER (name), suffix);
> > +  char *decl_name = IDENTIFIER_POINTER (name);
> > +  char *numbered_name;
> > +  unsigned int *suffix_counter;
> > +  if (!clone_fn_ids) {
> > +/* Initialize the per-function counter hash table if this is the first 
> > call */
> > +clone_fn_ids = hash_map::create_ggc (64);
> > +  }
> > +  suffix_counter = &clone_fn_ids->get_or_insert(name);
> > +  ASM_FORMAT_PRIVATE_NAME (numbered_name, decl_name, *suffix_counter);
> > +  *suffix_counter = *suffix_counter + 1;
> > +  return clone_function_name_1 (numbered_name, suffix);
> >  }
> >
> > - Michael
> >
> >
> >
>
> Ping, and below is the updated version of the full patch with changelogs:
>
>
> gcc:
> 2018-07-16  Michael Ploujnikov  
>
>Make function clone name numbering independent.
>* cgraphclones.c: Replace clone_fn_id_num with clone_fn_ids.
>(clone_function_name_1): Move suffixing to clone_function_name
>and change it to use per-function clone_fn_ids.
>
> testsuite:
> 2018-07-16  Michael Ploujnikov  
>
> Clone id counters should be completely independent from one another.
> * gcc/testsuite/gcc.dg/independent-cloneids-1.c: New test.
>
> ---
>  gcc/cgraphclones.c| 19 ++
>  gcc/testsuite/gcc.dg/independent-cloneids-1.c | 38 
> +++
>  2 files changed, 52 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/independent-cloneids-1.c
>
> diff --git gcc/cgraphclones.c gcc/cgraphclones.c
> index 6e84a31..e1a77a2 100644
> --- gcc/cgraphclones.c
> +++ gcc/cgraphclones.c
> @@ -512,7 +512,7 @@ cgraph_node::create_clone (tree new_decl, profile_count 
> prof_count,

Re: Optimization for std::to_string()

2018-07-20 Thread Jonathan Wakely


On 20/07/18 12:44 +0300, niXman wrote:


Hi,



Patch in attachments.


Thanks. How did you verify it's an optimization?

The to_string functions always pass at least __n=16 to __to_xstring,
which is larger than the SSO buffer in std::__cxx11::string, and so
forces a memory allocation.

The current code uses alloca and if the result fits in the SSO buffer
there is no allocation. e.g. to_string(99) doesn't allocate from
the heap at all for the new std::__cxx11::string.

For the old COW string this doesn't make much difference, because we
always allocate. Your patch will mean we return a string with excess
capacity, rather than one of exactly capacity()==length(). That isn't
a problem.



Tested on x86_64-linux-gnu.


--
Regards, niXman
___
C++ for Bitcoins: github.com/niXman



Index: libstdc++-v3/include/ext/string_conversions.h
===
--- libstdc++-v3/include/ext/string_conversions.h   (revision 262879)
+++ libstdc++-v3/include/ext/string_conversions.h   (working copy)
@@ -100,19 +100,18 @@
 __builtin_va_list), std::size_t __n,
 const _CharT* __fmt, ...)
{
-  // XXX Eventually the result should be constructed in-place in
-  // the __cxx11 string, likely with the help of internal hooks.
-  _CharT* __s = static_cast<_CharT*>(__builtin_alloca(sizeof(_CharT)
- * __n));
+  _String __str(__n, 0);

  __builtin_va_list __args;
  __builtin_va_start(__args, __fmt);

-  const int __len = __convf(__s, __n, __fmt, __args);
+  const std::size_t __len = __convf(&__str[0], __n, __fmt, __args);

  __builtin_va_end(__args);

-  return _String(__s, __s + __len);
+  __str.resize(__len);
+
+  return __str;
}

_GLIBCXX_END_NAMESPACE_VERSION

Re: [PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Martin Liška

On 07/20/2018 11:48 AM, Richard Earnshaw (lists) wrote:
> On 20/07/18 09:04, Martin Liška wrote:
>> Hi.
>>
>> I'm sending patch candidate with suggested target common hook. It allows a 
>> target
>> to list all possible values for an option. Using the API, I implemented 
>> -march and
>> -mtune option listing on i386.
>>
>> Richard you asked about the values. Yes, target should list all possible 
>> values,
>> mainly because --help=target output needs all of these.
>>
>> Thoughts?
>> Martin
>>
> 
> I don't think anyone can reasonably write an implementation of this hook
> based on this specification:
> 
> +@deftypefn {Common Target Hook} {vec}
> TARGET_GET_VALID_OPTION_VALUES (int)
> +The hook is used for options that have a non-trivial list of possible
> option values.
> +@end deftypefn
> +
> 
> What's the int parameter for?  What's the lifetime of the result (who
> cleans it up)?  If I need to allocation memory strings in the vector,
> where do I do that?  Can I assume GC memory in the driver, for example?

Sure, can be improved, please take it as prototype.

> 
> Frankly though, I don't really want to enumerate every possible
> permutation of the options for the architecture like this, though.  It's
> just too brute force and the answer is likely to be hundreds (haven't

That's why I recommended you to list in --help=target content of
arm_arch enum. You replied that it's not complete list of all possible values.
Note that we are talking about content of --help option, it's not a 
documentation,
it should just help users. Similar to bash completion, it should not be 100% 
perfect.

> sat down to count it).  What's more, the extensions might have meaning
> in the order in which they appear.  So, for example,
> 
>   -march=armv8-a+crypto+nosimd
> 
> would be very different from
> 
>   -march=armv8-a+nosimd+crypto
> 
> since the extensions are applied from left to right (the first collapses
> to armv8-a+nosimd, the latter to armv8-a+crypto, but there are more
> complex cases as well which I don't want to dig into here).
> 
> It would be a practical impossibility to list all of these.

Yes, that's why I recommended to list only base march values. Modifiers can be 
mentioned
aside if desired.

Martin

> 
> R.
> 
>>
>> 0001-Prototype-of-hook-for-possible-list-of-option-values.patch
>>
>>
>> From b2b40f7ca1f801a318aec661d0128a5adde7cb68 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Fri, 20 Jul 2018 09:58:16 +0200
>> Subject: [PATCH] Prototype of hook for possible list of option values.
>>
>> ---
>>  gcc/common/common-target.def |   7 +
>>  gcc/common/common-targhooks.c|   7 +
>>  gcc/common/common-targhooks.h|   1 +
>>  gcc/common/config/i386/i386-common.c | 264 +
>>  gcc/config/i386/i386.c   | 413 +++
>>  gcc/config/i386/i386.h   | 144 ++
>>  gcc/doc/tm.texi  |   4 +
>>  gcc/doc/tm.texi.in   |   2 +
>>  gcc/opt-suggestions.c|  21 +-
>>  gcc/opts.c   |  33 +++
>>  10 files changed, 524 insertions(+), 372 deletions(-)
>>
>> diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
>> index e0afbc6af29..8c98598b015 100644
>> --- a/gcc/common/common-target.def
>> +++ b/gcc/common/common-target.def
>> @@ -80,6 +80,13 @@ DEFHOOK
>>   bool, (bool report, struct gcc_options *opts),
>>   hook_bool_bool_gcc_optionsp_false)
>>  
>> +DEFHOOK
>> +(get_valid_option_values,
>> +"The hook is used for options that have a non-trivial list of\
>> + possible option values.",
>> + vec, (int),
>> + default_get_valid_option_values)
>> +
>>  /* Leave the boolean fields at the end.  */
>>  
>>  /* True if unwinding tables should be generated by default.  */
>> diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
>> index b1090190664..3662180f2e0 100644
>> --- a/gcc/common/common-targhooks.c
>> +++ b/gcc/common/common-targhooks.c
>> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tm.h"
>>  #include "common/common-target.h"
>>  #include "common/common-targhooks.h"
>> +#include "opts.h"
>>  
>>  /* Determine the exception handling mechanism for the target.  */
>>  
>> @@ -77,6 +78,12 @@ default_target_handle_option (struct gcc_options *opts 
>> ATTRIBUTE_UNUSED,
>>return true;
>>  }
>>  
>> +vec
>> +default_get_valid_option_values (int option)
>> +{
>> +  return vec ();
>> +}
>> +
>>  const struct default_options empty_optimization_table[] =
>>{
>>  { OPT_LEVELS_NONE, 0, NULL, 0 }
>> diff --git a/gcc/common/common-targhooks.h b/gcc/common/common-targhooks.h
>> index d290d7f3e21..f8a7436d9dd 100644
>> --- a/gcc/common/common-targhooks.h
>> +++ b/gcc/common/common-targhooks.h
>> @@ -28,6 +28,7 @@ extern bool default_target_handle_option (struct 
>> gcc_options *,
>>struct gcc_options *,
>>const st

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-20 Thread Richard Biener

On Thu, 19 Jul 2018, Martin Sebor wrote:

> Here's one more update with tweaks addressing a couple more
> of Bernd's comments:
> 
> 1) correct the use of TREE_STRING_LENGTH() where a number of
> array elements is expected and not bytes
> 2) set CHARTYPE as soon as it's first determined rather than
> trying to extract it again later

Please look at Bernds followup comments.  One additional note:

I see you are ultimatively using CHARTYPE to get at the size
of the access.  That is wrong.

   if (TREE_CODE (arg) == ADDR_EXPR)
 {
+  tree argtype = TREE_TYPE (arg);
+  chartype = argtype;
+
   arg = TREE_OPERAND (arg, 0);
   tree ref = arg;
   if (TREE_CODE (arg) == ARRAY_REF)
{

so the "access" is of size array_ref_element_size (arg) here.  You
may not simply use TYPE_SIZE_UNIT of sth.

That is, lookign at current trunk,

  if (TREE_CODE (arg) == ADDR_EXPR)
{
  arg = TREE_OPERAND (arg, 0);
  tree ref = arg;
  if (TREE_CODE (arg) == ARRAY_REF)
{
  tree idx = TREE_OPERAND (arg, 1);
  if (TREE_CODE (idx) != INTEGER_CST)
{
  /* Extract the variable index to prevent
 get_addr_base_and_unit_offset() from failing due to
 it.  Use it later to compute the non-constant offset
 into the string and return it to the caller.  */
  varidx = idx;
  ref = TREE_OPERAND (arg, 0);

you should scale the index here by array_ref_element_size (arg).
Or simply rewrite this to instead of using get_addr_base_and_unit_offset,
use get_inner_reference which does all that magic for you.

That is, you shouldn't need chartype.

Richard.


> On 07/19/2018 01:49 PM, Martin Sebor wrote:
> > On 07/19/2018 01:17 AM, Richard Biener wrote:
> > > On Wed, 18 Jul 2018, Martin Sebor wrote:
> > > 
> > > > > > > +  while (TREE_CODE (chartype) != INTEGER_TYPE)
> > > > > > > +chartype = TREE_TYPE (chartype);
> > > > > > This is a bit concerning.  First under what conditions is chartype
> > > > > > not
> > > > > > going to be an INTEGER_TYPE?  And under what conditions will
> > > > > > extracting
> > > > > > its type ultimately lead to something that is an INTEGER_TYPE?
> > > > > 
> > > > > chartype is usually (maybe even always) pointer type here:
> > > > > 
> > > > >   const char a[] = "123";
> > > > >   extern int i;
> > > > >   n = strlen (&a[i]);
> > > > 
> > > > But your hunch was correct that the loop isn't safe because
> > > > the element type need not be an integer (I didn't know/forgot
> > > > that the function is called for non-strings too).  The loop
> > > > should be replaced by:
> > > > 
> > > >   while (TREE_CODE (chartype) == ARRAY_TYPE
> > > >  || TREE_CODE (chartype) == POINTER_TYPE)
> > > > chartype = TREE_TYPE (chartype);
> > > 
> > > As this function may be called "late" you need to cope with
> > > the middle-end ignoring type changes and thus happily
> > > passing int *** directly rather than (char *) of that.
> > > 
> > > Also doesn't the above yield int for int *[]?
> > 
> > I don't think it ever gets this far for either a pointer to
> > an array of int, or for an array of pointers to int.  So for
> > something like the following the function fails earlier:
> > 
> >   const int* const a[2] = { ... };
> >   const char* (const *p)[2] = &a;
> > 
> >   int f (void)
> >   {
> > return __builtin_memcmp (*p, "12345678", 8);
> >   }
> > 
> > (Assuming this is what you were asking about.)
> > 
> > > I guess you really want
> > > 
> > >if (POINTER_TYPE_P (chartype))
> > >  chartype = TREE_TYPE (chartype);
> > >while (TREE_CODE (chartype) == ARRAY_TYPE)
> > >  chartype = TREE_TYPE (chartype);
> > > 
> > > ?
> > 
> > That seems to work too.  Attached is an update with this tweak.
> > The update also addresses some of Bernd's comments: it removes
> > the pointless second test in:
> > 
> > if (TREE_CODE (type) == ARRAY_TYPE
> > && TREE_CODE (type) != INTEGER_TYPE)
> > 
> > the unused assignment to chartype in:
> > 
> >else if (DECL_P (arg))
> >  {
> >array = arg;
> >chartype = TREE_TYPE (arg);
> >  }
> > 
> > and calls string_constant() instead of strnlen() to compute
> > the length of a generic string.
> > 
> > Other improvements  are possible in this area but they are
> > orthogonal to the bug I'm trying to fix so I'll post separate
> > patches for some of those.
> > 
> > Martin
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Handle SLP of call pattern statements

2018-07-20 Thread Richard Sandiford

We couldn't vectorise:

  for (int j = 0; j < n; ++j)
{
  for (int i = 0; i < 16; ++i)
a[i] = (b[i] + c[i]) >> 1;
  a += step;
  b += step;
  c += step;
}

at -O3 because cunrolli unrolled the inner loop and SLP couldn't handle
AVG_FLOOR patterns (see also PR86504).  The problem was some overly
strict checking of pattern statements compared to normal statements
in vect_get_and_check_slp_defs:

  switch (gimple_code (def_stmt))
{
case GIMPLE_PHI:
case GIMPLE_ASSIGN:
  break;

default:
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "unsupported defining stmt:\n");
  return -1;
}

The easy fix would have been to add GIMPLE_CALL to the switch,
but I don't think the switch is doing anything useful.  We only create
pattern statements that the rest of the vectoriser can handle, and code
in this function and elsewhere check whether SLP is possible.

I'm also not sure why:

  if (!first && !oprnd_info->first_pattern
  /* Allow different pattern state for the defs of the
 first stmt in reduction chains.  */
  && (oprnd_info->first_dt != vect_reduction_def

is necessary.  All that should matter is that the statements in the
node are "similar enough".  It turned out to be quite hard to find a
convincing example that used a mixture of pattern and non-pattern
statements, so bb-slp-pow-1.c is the best I could come up with.
But it does show that the combination of "xi * xi" statements and
"pow (xj, 2) -> xj * xj" patterns are handled correctly.

The patch therefore just removes the whole if block.

The loop also needed commutative swapping to be extended to at least
AVG_FLOOR.

This gives +3.9% on 525.x264_r at -O3.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-07-20  Richard Sandiford  

gcc/
* internal-fn.h (first_commutative_argument): Declare.
* internal-fn.c (first_commutative_argument): New function.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Remove extra
restrictions for pattern statements.  Use first_commutative_argument
to look for commutative operands in calls to internal functions.

gcc/testsuite/
* gcc.dg/vect/bb-slp-over-widen-1.c: Expect AVG_FLOOR to be used
on vect_avg_qi targets.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/bb-slp-pow-1.c: New test.
* gcc.dg/vect/vect-avg-15.c: Likewise.

Index: gcc/internal-fn.h
===
--- gcc/internal-fn.h   2018-07-13 10:11:14.009847140 +0100
+++ gcc/internal-fn.h   2018-07-20 11:18:58.167047743 +0100
@@ -201,6 +201,8 @@ direct_internal_fn_supported_p (internal
 opt_type);
 }
 
+extern int first_commutative_argument (internal_fn);
+
 extern bool set_edom_supported_p (void);
 
 extern internal_fn get_conditional_internal_fn (tree_code);
Index: gcc/internal-fn.c
===
--- gcc/internal-fn.c   2018-07-13 10:11:14.009847140 +0100
+++ gcc/internal-fn.c   2018-07-20 11:18:58.163047778 +0100
@@ -3183,6 +3183,42 @@ direct_internal_fn_supported_p (internal
   return direct_internal_fn_supported_p (fn, tree_pair (type, type), opt_type);
 }
 
+/* If FN is commutative in two consecutive arguments, return the
+   index of the first, otherwise return -1.  */
+
+int
+first_commutative_argument (internal_fn fn)
+{
+  switch (fn)
+{
+case IFN_FMA:
+case IFN_FMS:
+case IFN_FNMA:
+case IFN_FNMS:
+case IFN_AVG_FLOOR:
+case IFN_AVG_CEIL:
+case IFN_FMIN:
+case IFN_FMAX:
+  return 0;
+
+case IFN_COND_ADD:
+case IFN_COND_MUL:
+case IFN_COND_MIN:
+case IFN_COND_MAX:
+case IFN_COND_AND:
+case IFN_COND_IOR:
+case IFN_COND_XOR:
+case IFN_COND_FMA:
+case IFN_COND_FMS:
+case IFN_COND_FNMA:
+case IFN_COND_FNMS:
+  return 1;
+
+default:
+  return -1;
+}
+}
+
 /* Return true if IFN_SET_EDOM is supported.  */
 
 bool
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2018-07-13 10:11:15.113837768 +0100
+++ gcc/tree-vect-slp.c 2018-07-20 11:18:58.167047743 +0100
@@ -299,15 +299,20 @@ vect_get_and_check_slp_defs (vec_info *v
   bool pattern = false;
   slp_oprnd_info oprnd_info;
   int first_op_idx = 1;
-  bool commutative = false;
+  unsigned int commutative_op = -1U;
   bool first_op_cond = false;
   bool first = stmt_num == 0;
   bool second = stmt_num == 1;
 
-  if (is_gimple_call (stmt))
+  if (gcall *call = dyn_cast  (stmt))
 {
-  number_of_oprnds = gimple_call_num_args (stmt);
+  number_of_oprnds = gimple_call_num_args (call)

Re: [PATCH] Remove unused code.

2018-07-20 Thread Richard Biener

On Fri, Jul 20, 2018 at 10:52 AM Martin Liška  wrote:
>
> Hi.
>
> Following was introduced in r230331 and as David confirmed
> it was installed accidentally. It has never been used.
>
> Ready for trunk?
OK

> Martin
>
> gcc/ChangeLog:
>
> 2018-07-20  Martin Liska  
>
> * tree.h (DECL_LOCATION_RANGE): Remove unused macro.
> (get_decl_source_range): Remove unused function.
> ---
>  gcc/tree.h | 10 --
>  1 file changed, 10 deletions(-)
>
>

Re: [PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Richard Earnshaw (lists)

On 20/07/18 11:14, Martin Liška wrote:
> On 07/20/2018 11:48 AM, Richard Earnshaw (lists) wrote:
>> On 20/07/18 09:04, Martin Liška wrote:
>>> Hi.
>>>
>>> I'm sending patch candidate with suggested target common hook. It allows a 
>>> target
>>> to list all possible values for an option. Using the API, I implemented 
>>> -march and
>>> -mtune option listing on i386.
>>>
>>> Richard you asked about the values. Yes, target should list all possible 
>>> values,
>>> mainly because --help=target output needs all of these.
>>>
>>> Thoughts?
>>> Martin
>>>
>>
>> I don't think anyone can reasonably write an implementation of this hook
>> based on this specification:
>>
>> +@deftypefn {Common Target Hook} {vec}
>> TARGET_GET_VALID_OPTION_VALUES (int)
>> +The hook is used for options that have a non-trivial list of possible
>> option values.
>> +@end deftypefn
>> +
>>
>> What's the int parameter for?  What's the lifetime of the result (who
>> cleans it up)?  If I need to allocation memory strings in the vector,
>> where do I do that?  Can I assume GC memory in the driver, for example?
> 
> Sure, can be improved, please take it as prototype.
> 
>>
>> Frankly though, I don't really want to enumerate every possible
>> permutation of the options for the architecture like this, though.  It's
>> just too brute force and the answer is likely to be hundreds (haven't
> 
> That's why I recommended you to list in --help=target content of
> arm_arch enum. You replied that it's not complete list of all possible values.
> Note that we are talking about content of --help option, it's not a 
> documentation,
> it should just help users. Similar to bash completion, it should not be 100% 
> perfect.
> 
>> sat down to count it).  What's more, the extensions might have meaning
>> in the order in which they appear.  So, for example,
>>
>>  -march=armv8-a+crypto+nosimd
>>
>> would be very different from
>>
>>  -march=armv8-a+nosimd+crypto
>>
>> since the extensions are applied from left to right (the first collapses
>> to armv8-a+nosimd, the latter to armv8-a+crypto, but there are more
>> complex cases as well which I don't want to dig into here).
>>
>> It would be a practical impossibility to list all of these.
> 
> Yes, that's why I recommended to list only base march values. Modifiers can 
> be mentioned
> aside if desired.

So it might be feasible to print something like:

  arch1[+ext1|+ext2]*
  arch2[+ext1|+ext3|...]*

etc.  That at least would be a concise summary of the options.  Whether
or not automated tools could handle that is another matter.

R.

> 
> Martin
> 
>>
>> R.
>>
>>>
>>> 0001-Prototype-of-hook-for-possible-list-of-option-values.patch
>>>
>>>
>>> From b2b40f7ca1f801a318aec661d0128a5adde7cb68 Mon Sep 17 00:00:00 2001
>>> From: marxin 
>>> Date: Fri, 20 Jul 2018 09:58:16 +0200
>>> Subject: [PATCH] Prototype of hook for possible list of option values.
>>>
>>> ---
>>>  gcc/common/common-target.def |   7 +
>>>  gcc/common/common-targhooks.c|   7 +
>>>  gcc/common/common-targhooks.h|   1 +
>>>  gcc/common/config/i386/i386-common.c | 264 +
>>>  gcc/config/i386/i386.c   | 413 +++
>>>  gcc/config/i386/i386.h   | 144 ++
>>>  gcc/doc/tm.texi  |   4 +
>>>  gcc/doc/tm.texi.in   |   2 +
>>>  gcc/opt-suggestions.c|  21 +-
>>>  gcc/opts.c   |  33 +++
>>>  10 files changed, 524 insertions(+), 372 deletions(-)
>>>
>>> diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
>>> index e0afbc6af29..8c98598b015 100644
>>> --- a/gcc/common/common-target.def
>>> +++ b/gcc/common/common-target.def
>>> @@ -80,6 +80,13 @@ DEFHOOK
>>>   bool, (bool report, struct gcc_options *opts),
>>>   hook_bool_bool_gcc_optionsp_false)
>>>  
>>> +DEFHOOK
>>> +(get_valid_option_values,
>>> +"The hook is used for options that have a non-trivial list of\
>>> + possible option values.",
>>> + vec, (int),
>>> + default_get_valid_option_values)
>>> +
>>>  /* Leave the boolean fields at the end.  */
>>>  
>>>  /* True if unwinding tables should be generated by default.  */
>>> diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
>>> index b1090190664..3662180f2e0 100644
>>> --- a/gcc/common/common-targhooks.c
>>> +++ b/gcc/common/common-targhooks.c
>>> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tm.h"
>>>  #include "common/common-target.h"
>>>  #include "common/common-targhooks.h"
>>> +#include "opts.h"
>>>  
>>>  /* Determine the exception handling mechanism for the target.  */
>>>  
>>> @@ -77,6 +78,12 @@ default_target_handle_option (struct gcc_options *opts 
>>> ATTRIBUTE_UNUSED,
>>>return true;
>>>  }
>>>  
>>> +vec
>>> +default_get_valid_option_values (int option)
>>> +{
>>> +  return vec ();
>>> +}
>>> +
>>>  const struct default_options empty_optimization_table[] =
>>>{
>>>  { OPT_LEVELS_N

Re: Handle SLP of call pattern statements

2018-07-20 Thread Richard Biener

On Fri, Jul 20, 2018 at 12:22 PM Richard Sandiford
 wrote:
>
> We couldn't vectorise:
>
>   for (int j = 0; j < n; ++j)
> {
>   for (int i = 0; i < 16; ++i)
> a[i] = (b[i] + c[i]) >> 1;
>   a += step;
>   b += step;
>   c += step;
> }
>
> at -O3 because cunrolli unrolled the inner loop and SLP couldn't handle
> AVG_FLOOR patterns (see also PR86504).  The problem was some overly
> strict checking of pattern statements compared to normal statements
> in vect_get_and_check_slp_defs:
>
>   switch (gimple_code (def_stmt))
> {
> case GIMPLE_PHI:
> case GIMPLE_ASSIGN:
>   break;
>
> default:
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "unsupported defining stmt:\n");
>   return -1;
> }
>
> The easy fix would have been to add GIMPLE_CALL to the switch,
> but I don't think the switch is doing anything useful.  We only create
> pattern statements that the rest of the vectoriser can handle, and code
> in this function and elsewhere check whether SLP is possible.
>
> I'm also not sure why:
>
>   if (!first && !oprnd_info->first_pattern
>   /* Allow different pattern state for the defs of the
>  first stmt in reduction chains.  */
>   && (oprnd_info->first_dt != vect_reduction_def
>
> is necessary.  All that should matter is that the statements in the
> node are "similar enough".  It turned out to be quite hard to find a
> convincing example that used a mixture of pattern and non-pattern
> statements, so bb-slp-pow-1.c is the best I could come up with.
> But it does show that the combination of "xi * xi" statements and
> "pow (xj, 2) -> xj * xj" patterns are handled correctly.
>
> The patch therefore just removes the whole if block.
>
> The loop also needed commutative swapping to be extended to at least
> AVG_FLOOR.
>
> This gives +3.9% on 525.x264_r at -O3.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

Always nice to see seemingly dead code go.  OK if you can still
build SPEC with this change and pass a test run.

At least I _do_ seem to remember having "issues" in this area...

Thanks,
Richard.

> Richard
>
>
> 2018-07-20  Richard Sandiford  
>
> gcc/
> * internal-fn.h (first_commutative_argument): Declare.
> * internal-fn.c (first_commutative_argument): New function.
> * tree-vect-slp.c (vect_get_and_check_slp_defs): Remove extra
> restrictions for pattern statements.  Use first_commutative_argument
> to look for commutative operands in calls to internal functions.
>
> gcc/testsuite/
> * gcc.dg/vect/bb-slp-over-widen-1.c: Expect AVG_FLOOR to be used
> on vect_avg_qi targets.
> * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
> * gcc.dg/vect/bb-slp-pow-1.c: New test.
> * gcc.dg/vect/vect-avg-15.c: Likewise.
>
> Index: gcc/internal-fn.h
> ===
> --- gcc/internal-fn.h   2018-07-13 10:11:14.009847140 +0100
> +++ gcc/internal-fn.h   2018-07-20 11:18:58.167047743 +0100
> @@ -201,6 +201,8 @@ direct_internal_fn_supported_p (internal
>  opt_type);
>  }
>
> +extern int first_commutative_argument (internal_fn);
> +
>  extern bool set_edom_supported_p (void);
>
>  extern internal_fn get_conditional_internal_fn (tree_code);
> Index: gcc/internal-fn.c
> ===
> --- gcc/internal-fn.c   2018-07-13 10:11:14.009847140 +0100
> +++ gcc/internal-fn.c   2018-07-20 11:18:58.163047778 +0100
> @@ -3183,6 +3183,42 @@ direct_internal_fn_supported_p (internal
>return direct_internal_fn_supported_p (fn, tree_pair (type, type), 
> opt_type);
>  }
>
> +/* If FN is commutative in two consecutive arguments, return the
> +   index of the first, otherwise return -1.  */
> +
> +int
> +first_commutative_argument (internal_fn fn)
> +{
> +  switch (fn)
> +{
> +case IFN_FMA:
> +case IFN_FMS:
> +case IFN_FNMA:
> +case IFN_FNMS:
> +case IFN_AVG_FLOOR:
> +case IFN_AVG_CEIL:
> +case IFN_FMIN:
> +case IFN_FMAX:
> +  return 0;
> +
> +case IFN_COND_ADD:
> +case IFN_COND_MUL:
> +case IFN_COND_MIN:
> +case IFN_COND_MAX:
> +case IFN_COND_AND:
> +case IFN_COND_IOR:
> +case IFN_COND_XOR:
> +case IFN_COND_FMA:
> +case IFN_COND_FMS:
> +case IFN_COND_FNMA:
> +case IFN_COND_FNMS:
> +  return 1;
> +
> +default:
> +  return -1;
> +}
> +}
> +
>  /* Return true if IFN_SET_EDOM is supported.  */
>
>  bool
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2018-07-13 10:11:15.113837768 +0100
> +++ gcc/tree-vect-slp.c 2018-07-2

Fold pointer range checks with equal spans

2018-07-20 Thread Richard Sandiford

When checking whether vectorised accesses at A and B are independent,
the vectoriser falls back to tests of the form:

A + size <= B || B + size <= A

But in the common case that "size" is just the constant size of a vector
(or a small multiple), it would be more efficient to do:

   ((size_t) A - (size_t) B + size - 1) > (size - 1) * 2

This patch adds folds to do that.  E.g. before the patch, the alias
checks for:

  for (int j = 0; j < n; ++j)
{
  for (int i = 0; i < 16; ++i)
a[i] = (b[i] + c[i]) >> 1;
  a += step;
  b += step;
  c += step;
}

were:

add x7, x1, 15
add x5, x0, 15
cmp x0, x7
add x7, x2, 15
ccmpx1, x5, 2, ls
csetw8, hi
cmp x0, x7
ccmpx2, x5, 2, ls
csetw4, hi
tst w8, w4

while after the patch they're:

sub x7, x0, x1
sub x5, x0, x2
add x7, x7, 15
add x5, x5, 15
cmp x7, 30
ccmpx5, 30, 0, hi

The old scheme needs:

[A] one addition per vector pointer
[B] two comparisons and one IOR per range check

The new one needs:

[C] one subtraction, one addition and one comparison per range check

The range checks are then ANDed together, with the same number of
ANDs either way.

With conditional comparisons (such as on AArch64), we're able to remove
the IOR between comparisons in the old scheme, but then need an explicit
AND or branch when combining the range checks, as the example above shows.
With the new scheme we can instead use conditional comparisons for
the AND chain.

So even with conditional comparisons, the new scheme should in practice
be a win in almost all cases.  Without conditional comparisons, the new
scheme removes [A] and replaces [B] with an equivalent number of operations
[C], so should always give fewer operations overall.  Although each [C]
is linear, multiple [C]s are easily parallelisable.

A better implementation of the above would be:

add x5, x0, 15
sub x7, x5, x1
sub x5, x5, x2
cmp x7, 30
ccmpx5, 30, 0, hi

where we add 15 to "a" before the subtraction.  Unfortunately,
canonicalisation rules mean that even if we try to create it in
that form, it gets folded into the one above instead.

An alternative would be not to do this in match.pd and instead get
tree-data-ref.c to do it itself.  I started out that way but thought
the match.pd approach seemed cleaner.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-07-20  Richard Sandiford  

gcc/
* match.pd: Optimise pointer range checks.

gcc/testsuite/
* gcc.dg/pointer-range-check-1.c: New test.
* gcc.dg/pointer-range-check-2.c: Likewise.

Index: gcc/match.pd
===
--- gcc/match.pd2018-07-18 18:44:22.565914281 +0100
+++ gcc/match.pd2018-07-20 11:24:33.692045585 +0100
@@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(if (inverse_conditions_p (@0, @2)
 && element_precision (type) == element_precision (op_type))
 (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))
+
+/* For pointers @0 and @2 and unsigned constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+   If pointers are known not to wrap, B checks whether @1 bytes starting at
+   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
+   A is more efficiently tested as:
+
+   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
+
+   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
+   with @1 - 1.  */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+  (simplify
+   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
+   (cmp (pointer_plus:s @2 @1) @0))
+   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
+/* Convert the B form to the A form.  */
+(with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); }
+ /* Always fails for negative values.  */
+ (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
+  /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
+tree_swap_operands_p pick a canonical order.  */
+  (with { tree ptr1 = @0, ptr2 = @2;
+ if (tree_swap_operands_p (ptr1, ptr2))
+   std::swap (ptr1, ptr2); }
+   (gt (plus (minus (convert:sizetype { ptr1; })
+   (convert:sizetype { ptr2; }))
+{ wide_int_to_tree (sizetype, off); })
+  { wide_int_to_tree (sizetype, off * 2); }
Index: gcc/testsuite/gcc.dg/pointer-range-check-1.c
===
--- /dev/null   2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/pointer-range-check-1.c

Re: Optimization for std::to_string()

2018-07-20 Thread niXman


Jonathan Wakely 2018-07-20 13:05:

On 20/07/18 12:44 +0300, niXman wrote:



Thanks. How did you verify it's an optimization?

Optimization is that there is no superfluous copying into string.


The to_string functions always pass at least __n=16 to __to_xstring,
which is larger than the SSO buffer in std::__cxx11::string, and so
forces a memory allocation.

I didn't think about this...


--
Regards, niXman
___
C++ for Bitcoins: github.com/niXman

Re: [PATCH 1/2] condition_variable: Report early wakeup of wait_until as no_timeout

2018-07-20 Thread Jonathan Wakely


On 10/07/18 11:09 +0100, Mike Crowe wrote:

As currently implemented, condition_variable always ultimately waits
against std::chrono::system_clock. This clock can be changed in arbitrary
ways by the user which may result in us waking up too early or too late
when measured against the caller-supplied clock.

We can't (yet) do much about waking up too late[1], but
if we wake up too early we must return cv_status::no_timeout to indicate a
spurious wakeup rather than incorrectly returning cv_status::timeout.


The patch looks good, thanks.

Can we come up with a test for this, using a user-defined clock that
jumps back?



[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861
---
libstdc++-v3/ChangeLog  | 5 +
libstdc++-v3/include/std/condition_variable | 8 +++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index cceef0271ae..ea7875ace9f 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2018-07-09  Mike Crowe 
+   * include/std/condition_variable (wait_until): Only report timeout
+   if we really have timed out when measured against the
+   caller-supplied clock.
+
2018-07-06  François Dumont  

   * include/debug/functions.h (__gnu_debug::__check_string): Move...
diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index 84863a162d6..a2d146a9b09 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -116,7 +116,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const auto __delta = __atime - __c_entry;
   const auto __s_atime = __s_entry + __delta;

-   return __wait_until_impl(__lock, __s_atime);
+   // We might get a timeout when measured against __clock_t but
+   // we need to check against the caller-supplied clock to tell
+   // whether we should return a timeout.
+   if (__wait_until_impl(__lock, __s_atime) == cv_status::timeout)
+ return _Clock::now() < __atime ? cv_status::no_timeout : 
cv_status::timeout;
+   else
+ return cv_status::no_timeout;
  }

template
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.

Re: [PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Martin Liška

On 07/20/2018 12:25 PM, Richard Earnshaw (lists) wrote:
> On 20/07/18 11:14, Martin Liška wrote:
>> On 07/20/2018 11:48 AM, Richard Earnshaw (lists) wrote:
>>> On 20/07/18 09:04, Martin Liška wrote:
 Hi.

 I'm sending patch candidate with suggested target common hook. It allows a 
 target
 to list all possible values for an option. Using the API, I implemented 
 -march and
 -mtune option listing on i386.

 Richard you asked about the values. Yes, target should list all possible 
 values,
 mainly because --help=target output needs all of these.

 Thoughts?
 Martin

>>>
>>> I don't think anyone can reasonably write an implementation of this hook
>>> based on this specification:
>>>
>>> +@deftypefn {Common Target Hook} {vec}
>>> TARGET_GET_VALID_OPTION_VALUES (int)
>>> +The hook is used for options that have a non-trivial list of possible
>>> option values.
>>> +@end deftypefn
>>> +
>>>
>>> What's the int parameter for?  What's the lifetime of the result (who
>>> cleans it up)?  If I need to allocation memory strings in the vector,
>>> where do I do that?  Can I assume GC memory in the driver, for example?
>>
>> Sure, can be improved, please take it as prototype.
>>
>>>
>>> Frankly though, I don't really want to enumerate every possible
>>> permutation of the options for the architecture like this, though.  It's
>>> just too brute force and the answer is likely to be hundreds (haven't
>>
>> That's why I recommended you to list in --help=target content of
>> arm_arch enum. You replied that it's not complete list of all possible 
>> values.
>> Note that we are talking about content of --help option, it's not a 
>> documentation,
>> it should just help users. Similar to bash completion, it should not be 100% 
>> perfect.
>>
>>> sat down to count it).  What's more, the extensions might have meaning
>>> in the order in which they appear.  So, for example,
>>>
>>> -march=armv8-a+crypto+nosimd
>>>
>>> would be very different from
>>>
>>> -march=armv8-a+nosimd+crypto
>>>
>>> since the extensions are applied from left to right (the first collapses
>>> to armv8-a+nosimd, the latter to armv8-a+crypto, but there are more
>>> complex cases as well which I don't want to dig into here).
>>>
>>> It would be a practical impossibility to list all of these.
>>
>> Yes, that's why I recommended to list only base march values. Modifiers can 
>> be mentioned
>> aside if desired.
> 
> So it might be feasible to print something like:
> 
>   arch1[+ext1|+ext2]*
>   arch2[+ext1|+ext3|...]*

This is not feasible for --completion= option (bash completion).

Just for sure, are we talking only about aarch64 feature modifiers. Or do you 
have any other
special suffixes used in -march, -mtune, -mcpu option values?

What about listing all possible modifiers after possible values for -march?

Martin


> 
> etc.  That at least would be a concise summary of the options.  Whether
> or not automated tools could handle that is another matter.



> 
> R.
> 
>>
>> Martin
>>
>>>
>>> R.
>>>

 0001-Prototype-of-hook-for-possible-list-of-option-values.patch


 From b2b40f7ca1f801a318aec661d0128a5adde7cb68 Mon Sep 17 00:00:00 2001
 From: marxin 
 Date: Fri, 20 Jul 2018 09:58:16 +0200
 Subject: [PATCH] Prototype of hook for possible list of option values.

 ---
  gcc/common/common-target.def |   7 +
  gcc/common/common-targhooks.c|   7 +
  gcc/common/common-targhooks.h|   1 +
  gcc/common/config/i386/i386-common.c | 264 +
  gcc/config/i386/i386.c   | 413 +++
  gcc/config/i386/i386.h   | 144 ++
  gcc/doc/tm.texi  |   4 +
  gcc/doc/tm.texi.in   |   2 +
  gcc/opt-suggestions.c|  21 +-
  gcc/opts.c   |  33 +++
  10 files changed, 524 insertions(+), 372 deletions(-)

 diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
 index e0afbc6af29..8c98598b015 100644
 --- a/gcc/common/common-target.def
 +++ b/gcc/common/common-target.def
 @@ -80,6 +80,13 @@ DEFHOOK
   bool, (bool report, struct gcc_options *opts),
   hook_bool_bool_gcc_optionsp_false)
  
 +DEFHOOK
 +(get_valid_option_values,
 +"The hook is used for options that have a non-trivial list of\
 + possible option values.",
 + vec, (int),
 + default_get_valid_option_values)
 +
  /* Leave the boolean fields at the end.  */
  
  /* True if unwinding tables should be generated by default.  */
 diff --git a/gcc/common/common-targhooks.c b/gcc/common/common-targhooks.c
 index b1090190664..3662180f2e0 100644
 --- a/gcc/common/common-targhooks.c
 +++ b/gcc/common/common-targhooks.c
 @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "tm.h"
  #includ

Re: [PATCH] Merge Ignore and Deprecated in .opt files.

2018-07-20 Thread Martin Liška

On 07/19/2018 04:21 PM, Jakub Jelinek wrote:
> On Thu, Jul 19, 2018 at 04:16:10PM +0200, Martin Liška wrote:
>> I must admit that was my intention :) In my eyes it makes it more consistent 
>> and
>> it gives consumers feedback about usage of an option that does nothing.
>> For x86_64 there's list of options that are Ignore and don't produce a 
>> warning:
>>
>> Wimport
>> Wunreachable-code
>> Wunsafe-loop-optimizations
> 
> I'm especially worried about the above ones, we don't emit any warnings if
> we do
> -Wno-foobarbazqux
> unless some other diagnostic is emitted.
> Do we warn if we do
> -Wno-unsafe-loop-optimizations
> ?  At least for -Wno-* Ignored options it would be nice to treat them 
> similarly
> to the non-existed -Wno-* options.

Agree with that. Fixed in attached version of the patch.
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Martin

> 
>   Jakub
> 

>From 2698b0af3fbd7c0f0372513dc00673932c48b6cb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 18 Jul 2018 13:40:24 +0200
Subject: [PATCH] Merge Ignore and Deprecated in .opt files.

gcc/ChangeLog:

2018-07-18  Martin Liska  

	* common.opt: Remove Warn, Init and Report for options with
Ignore flag. Warning is done automatically.  Options with Deprecated
now use Ignore flag.
	* config/i386/i386.opt: Use Ignore instead of Deprecated.
	* config/ia64/ia64.opt: Remove Warn for Ignore flags.
	* config/rs6000/rs6000.opt: Likewise.
	* cppbuiltin.c (define_builtin_macros_for_compilation_flags):
Remove usage of flag_check_pointer_bounds.
	* lto-wrapper.c (merge_and_complain): Do not handle
OPT_fcheck_pointer_bounds.
	(append_compiler_options): Likewise.
	* opt-functions.awk: Do not handle Deprecated.
	* optc-gen.awk: Check that Var, Report and Init are not
used for an option with Ignore flag.
	* opts-common.c (decode_cmdline_option): Do not report
CL_ERR_DEPRECATED.
	(read_cmdline_option): Remove warning for OPT_SPECIAL_ignore
options.
	* opts.h (struct cl_option): Remove cl_deprecated flag.
	(CL_ERR_DEPRECATED): Remove error enum value.
	* doc/options.texi: Remove entry for Deprecated and move
it to Ignore.

gcc/testsuite/ChangeLog:

2018-07-18  Martin Liska  

	* g++.dg/opt/mpx.C: Fix scanned pattern.
	* gcc.target/i386/mpx.c: Likewise.
	* g++.dg/warn/Wunreachable-code-1.C: Remove.
	* g++.dg/warn/Wunreachable-code-2.C: Likewise.
	* gcc.dg/torture/pr52969.c: Likewise.
	* g++.dg/warn/pr31246-2.C: Likewise.
	* g++.dg/warn/pr31246.C: Likewise.
	* gcc.dg/pr33092.c: Likewise.
	* g++.dg/opt/eh1.C: Remove a deprecated option.
	* g++.dg/template/inline1.C: Likewise.
	* g++.dg/tree-ssa/pr81408.C: Likewise.
	* gcc.dg/pr41837.c: Likewise.
	* gcc.dg/pr41841.c: Likewise.
	* gcc.dg/pr42250.c: Likewise.
	* gcc.dg/pr43084.c: Likewise.
	* gcc.dg/pr43317.c: Likewise.
	* gcc.dg/pr51879-18.c: Likewise.
	* gcc.dg/torture/pr36066.c: Likewise.
	* gcc.dg/tree-ssa/ifc-8.c: Likewise.
	* gcc.dg/tree-ssa/ifc-cd.c: Likewise.
	* gcc.dg/tree-ssa/pr19210-1.c: Likewise.
	* gcc.dg/tree-ssa/pr45122.c: Likewise.
	* gcc.target/i386/pr45352-2.c: Likewise.
	* gcc.target/i386/zee.c: Likewise.
	* gfortran.dg/auto_char_len_2.f90: Likewise.
	* gfortran.dg/auto_char_len_4.f90: Likewise.
	* gfortran.dg/c_ptr_tests_15.f90: Likewise.
	* gfortran.dg/char_array_structure_constructor.f90: Likewise.
	* gfortran.dg/gomp/pr47331.f90: Likewise.
	* gfortran.dg/pr40999.f: Likewise.
	* gfortran.dg/pr41011.f: Likewise.
	* gfortran.dg/pr42051.f03: Likewise.
	* gfortran.dg/pr46804.f90: Likewise.
	* gfortran.dg/pr83149_1.f90: Likewise.
	* gfortran.dg/pr83149_b.f90: Likewise.
	* gfortran.dg/whole_file_1.f90: Likewise.
	* gfortran.dg/whole_file_10.f90: Likewise.
	* gfortran.dg/whole_file_11.f90: Likewise.
	* gfortran.dg/whole_file_12.f90: Likewise.
	* gfortran.dg/whole_file_13.f90: Likewise.
	* gfortran.dg/whole_file_14.f90: Likewise.
	* gfortran.dg/whole_file_15.f90: Likewise.
	* gfortran.dg/whole_file_16.f90: Likewise.
	* gfortran.dg/whole_file_17.f90: Likewise.
	* gfortran.dg/whole_file_18.f90: Likewise.
	* gfortran.dg/whole_file_19.f90: Likewise.
	* gfortran.dg/whole_file_2.f90: Likewise.
	* gfortran.dg/whole_file_20.f03: Likewise.
	* gfortran.dg/whole_file_3.f90: Likewise.
	* gfortran.dg/whole_file_4.f90: Likewise.
	* gfortran.dg/whole_file_5.f90: Likewise.
	* gfortran.dg/whole_file_6.f90: Likewise.
	* gfortran.dg/whole_file_7.f90: Likewise.
	* gfortran.dg/whole_file_8.f90: Likewise.
	* gfortran.dg/whole_file_9.f90: Likewise.
	* gcc.dg/vect/vect.exp: Likewise.

gcc/c-family/ChangeLog:

2018-07-18  Martin Liska  

	* c.opt: Remove Warn, Init and Report for options with
Ignore flag. Warning is done automatically.  Options with Deprecated
now use Ignore flag.
---
 gcc/c-family/c.opt| 96 +--
 gcc/common.opt| 10 +-
 gcc/config/i386/i386.opt  |  2 +-
 gcc/config/ia64/ia64.opt  |  4 +-
 gcc

Make the vectoriser drop to strided accesses for stores with gaps

2018-07-20 Thread Richard Sandiford

We could vectorise:

 for (...)
   {
 a[0] = ...;
 a[1] = ...;
 a[2] = ...;
 a[3] = ...;
 a += stride;
   }

(including the case when stride == 8) but not:

 for (...)
   {
 a[0] = ...;
 a[1] = ...;
 a[2] = ...;
 a[3] = ...;
 a += 8;
   }

(where the stride is always 8).  The former was treated as a "grouped
and strided" store, while the latter was treated as grouped store with
gaps, which we don't support.

This patch makes us treat groups of stores with gaps at the end as
strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P
and all vector uses of DR_STEP to see whether there were any hard-baked
assumptions, but couldn't see any.  I wondered whether we should relax:

  /* We do not have to consider dependences between accesses that belong
 to the same group, unless the stride could be smaller than the
 group size.  */
  if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
  && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
  == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))
  && !STMT_VINFO_STRIDED_P (stmtinfo_a))
return false;

for cases in which the step is constant and the absolute step is known
to be greater than the group size, but data dependence analysis should
already return chrec_known for those cases.

The new test is a version of vect-avg-15.c with the variable step
replaced by a constant one.

A natural follow-on would be to do the same for groups with gaps in
the middle:

  /* Check that the distance between two accesses is equal to the type
 size. Otherwise, we have gaps.  */
  diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
  - TREE_INT_CST_LOW (prev_init)) / type_size;
  if (diff != 1)
{
  [...]
  if (DR_IS_WRITE (data_ref))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "interleaved store with gaps\n");
  return false;
}

But I think we should do that separately and see what the fallout
from this change is first.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-07-20  Richard Sandiford  

gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Convert
grouped stores with gaps to a strided group.

gcc/testsuite/
* gcc.dg/vect/vect-avg-16.c: New test.
* gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.
* gcc.dg/vect/vect-strided-u8-i8-gap4.c,
* gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for
the second loop in main1.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2018-06-30 13:44:38.567611988 +0100
+++ gcc/tree-vect-data-refs.c   2018-07-20 11:55:22.570911497 +0100
@@ -2632,10 +2632,14 @@ vect_analyze_group_access_1 (struct data
   if (groupsize != count
  && !DR_IS_READ (dr))
 {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"interleaved store with gaps\n");
- return false;
+ groupsize = count;
+ STMT_VINFO_STRIDED_P (stmt_info) = true;
+ stmt_vec_info next_info = stmt_info;
+ while (DR_GROUP_NEXT_ELEMENT (next_info))
+   {
+ next_info = vinfo_for_stmt (DR_GROUP_NEXT_ELEMENT (next_info));
+ STMT_VINFO_STRIDED_P (next_info) = 1;
+   }
}
 
   /* If there is a gap after the last load in the group it is the
@@ -2651,6 +2655,8 @@ vect_analyze_group_access_1 (struct data
   "Detected interleaving ");
  if (DR_IS_READ (dr))
dump_printf (MSG_NOTE, "load ");
+ else if (STMT_VINFO_STRIDED_P (stmt_info))
+   dump_printf (MSG_NOTE, "strided store ");
  else
dump_printf (MSG_NOTE, "store ");
  dump_printf (MSG_NOTE, "of size %u starting with ",
Index: gcc/testsuite/gcc.dg/vect/vect-avg-16.c
===
--- /dev/null   2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-avg-16.c 2018-07-20 11:55:22.570911497 
+0100
@@ -0,0 +1,52 @@
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+#define N 80
+
+void __attribute__ ((noipa))
+f (signed char *restrict a, signed char *restrict b,
+   signed char *restrict c, int n)
+{
+  for (int j = 0; j < n; ++j)
+{
+  for (int i = 0; i < 16; ++i)
+   a[i] = (b[i] + c[i]) >> 1;
+  a += 20;
+  b += 20;
+  c += 20;
+}
+}
+
+#define BASE1 -126
+#define BASE2 -42
+
+signed char a[N], b[N], c[N];
+
+int
+main (void)
+{
+  check_vect

Re: [PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Richard Earnshaw (lists)

On 20/07/18 11:54, Martin Liška wrote:
> On 07/20/2018 12:25 PM, Richard Earnshaw (lists) wrote:
>> On 20/07/18 11:14, Martin Liška wrote:
>>> On 07/20/2018 11:48 AM, Richard Earnshaw (lists) wrote:
 On 20/07/18 09:04, Martin Liška wrote:
> Hi.
>
> I'm sending patch candidate with suggested target common hook. It allows 
> a target
> to list all possible values for an option. Using the API, I implemented 
> -march and
> -mtune option listing on i386.
>
> Richard you asked about the values. Yes, target should list all possible 
> values,
> mainly because --help=target output needs all of these.
>
> Thoughts?
> Martin
>

 I don't think anyone can reasonably write an implementation of this hook
 based on this specification:

 +@deftypefn {Common Target Hook} {vec}
 TARGET_GET_VALID_OPTION_VALUES (int)
 +The hook is used for options that have a non-trivial list of possible
 option values.
 +@end deftypefn
 +

 What's the int parameter for?  What's the lifetime of the result (who
 cleans it up)?  If I need to allocation memory strings in the vector,
 where do I do that?  Can I assume GC memory in the driver, for example?
>>>
>>> Sure, can be improved, please take it as prototype.
>>>

 Frankly though, I don't really want to enumerate every possible
 permutation of the options for the architecture like this, though.  It's
 just too brute force and the answer is likely to be hundreds (haven't
>>>
>>> That's why I recommended you to list in --help=target content of
>>> arm_arch enum. You replied that it's not complete list of all possible 
>>> values.
>>> Note that we are talking about content of --help option, it's not a 
>>> documentation,
>>> it should just help users. Similar to bash completion, it should not be 
>>> 100% perfect.
>>>
 sat down to count it).  What's more, the extensions might have meaning
 in the order in which they appear.  So, for example,

-march=armv8-a+crypto+nosimd

 would be very different from

-march=armv8-a+nosimd+crypto

 since the extensions are applied from left to right (the first collapses
 to armv8-a+nosimd, the latter to armv8-a+crypto, but there are more
 complex cases as well which I don't want to dig into here).

 It would be a practical impossibility to list all of these.
>>>
>>> Yes, that's why I recommended to list only base march values. Modifiers can 
>>> be mentioned
>>> aside if desired.
>>
>> So it might be feasible to print something like:
>>
>>   arch1[+ext1|+ext2]*
>>   arch2[+ext1|+ext3|...]*
> 
> This is not feasible for --completion= option (bash completion).
> 
> Just for sure, are we talking only about aarch64 feature modifiers. Or do you 
> have any other
> special suffixes used in -march, -mtune, -mcpu option values?
> 
> What about listing all possible modifiers after possible values for -march?
> 

Modifiers are context dependent.  The architecture implies which
modifiers can be applied (and what they mean in detail, so, for example,
+fp means enable the default floating point variant for this
architecture).  Not all modifiers apply to all architectures - +fp is
not permitted on ARMv4t, for example.

R.

> Martin
> 
> 
>>
>> etc.  That at least would be a concise summary of the options.  Whether
>> or not automated tools could handle that is another matter.
> 
> 
> 
>>
>> R.
>>
>>>
>>> Martin
>>>

 R.

>
> 0001-Prototype-of-hook-for-possible-list-of-option-values.patch
>
>
> From b2b40f7ca1f801a318aec661d0128a5adde7cb68 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 20 Jul 2018 09:58:16 +0200
> Subject: [PATCH] Prototype of hook for possible list of option values.
>
> ---
>  gcc/common/common-target.def |   7 +
>  gcc/common/common-targhooks.c|   7 +
>  gcc/common/common-targhooks.h|   1 +
>  gcc/common/config/i386/i386-common.c | 264 +
>  gcc/config/i386/i386.c   | 413 +++
>  gcc/config/i386/i386.h   | 144 ++
>  gcc/doc/tm.texi  |   4 +
>  gcc/doc/tm.texi.in   |   2 +
>  gcc/opt-suggestions.c|  21 +-
>  gcc/opts.c   |  33 +++
>  10 files changed, 524 insertions(+), 372 deletions(-)
>
> diff --git a/gcc/common/common-target.def b/gcc/common/common-target.def
> index e0afbc6af29..8c98598b015 100644
> --- a/gcc/common/common-target.def
> +++ b/gcc/common/common-target.def
> @@ -80,6 +80,13 @@ DEFHOOK
>   bool, (bool report, struct gcc_options *opts),
>   hook_bool_bool_gcc_optionsp_false)
>  
> +DEFHOOK
> +(get_valid_option_values,
> +"The hook is used for options that have a non-trivial list of\
> + possible option values.",
> + vec,

RE: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-20 Thread Tamar Christina

Hi Jeff,

> -Original Message-
> From: Jeff Law 
> Sent: Thursday, July 19, 2018 18:32
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ;
> jos...@codesourcery.com; bonz...@gnu.org; d...@redhat.com;
> nero...@gcc.gnu.org; aol...@redhat.com; ralf.wildenh...@gmx.de
> Subject: Re: [PATCH][GCC][front-end][build-machinery][opt-framework]
> Allow setting of stack-clash via configure options. [Patch (4/6)]
> 
> On 07/19/2018 06:55 AM, Tamar Christina wrote:
> >>>
> >>> What's the purpose of including auto-host in params-list and
> >>> params-options?  It seems like you're putting a property of the
> >>> target (guard size) into the wrong place (auto-host.h).
> >>>
> >>
> >> The reason for this is because there's a test
> >> gcc.dg/params/blocksort-part.c that uses these params-options to
> >> generate test cases to perform parameter validation. However because
> >> now the params.def file can contain a CPP macro these would then fail.
> >>
> >> CPP is already called to create params-options and params-list so the
> >> easiest way to fix this test was just to include auto-host which
> >> would get it the values from configure.
> >>
> >> This test is probably not needed anymore after my second patch series
> >> as parameters are validated by the front-end now, so they can never
> >> go out of range.
> Right, but I don't immediately see a way to avoid the test.  ie, it just walks
> down everything in params.options and except for a couple exceptional
> values the test gets run.
> 
> I wonder if all this is an indication that having CPP constants in the options
> isn't going to work well as we're mixing the distinction between host/target.
> 
> 
> >>
> >>> It's also a bit unclear to me why this is necessary at all.  Are we
> >>> planning to support both the 4k and 64k guards?  My goal (once the
> >>> guard was configurable) was never for supporting multiple sizes on a
> >>> target but instead to allow experimentation to find the right default.
> >>>
> >
> > Having talked to people I believe we do need to support both 4k and 64k
> guards.
> > For the Linux/Glibc world it wouldn't matter much, either 4 or 64k would do,
> though Glibc has settled on 64k pages.
> >
> > However other systems like (open/free)BSD or musl based systems do not
> > want 64k pages but want 4k ones.  So we're ending up having to support
> both as a compromise.
> Understood.  Thanks for verifying.  I wonder if we could just bury this 
> entirely
> in the aarch64 config files and not expose the default into params.def?
> 

Burying it in config.gcc isn't ideal because if your C runtime is configurable 
(like uClibc) it means you
have to go and modify this file every time you change something. If the 
argument is 
against having these defines in the params and not the configure flag itself 
then I can just
have an aarch64 specific configure flag and just use the created define 
directly in the AArch64 back-end.

Would that be an acceptable solution?

Thanks,
Tamar

> jeff

Re: [PATCH] Prototype of hook for possible list of option values.

2018-07-20 Thread Martin Liška

On 07/20/2018 12:58 PM, Richard Earnshaw (lists) wrote:
> Modifiers are context dependent.  The architecture implies which
> modifiers can be applied (and what they mean in detail, so, for example,
> +fp means enable the default floating point variant for this
> architecture).  Not all modifiers apply to all architectures - +fp is
> not permitted on ARMv4t, for example.

I see, that said I would really add just the target hook suggested. It will
help people in bash completions and provide reasonable list in --help output.

More complex approaches are possible, but don't worth in my opinion.
If you agree, I can clean up the API and improve documentatio?

Martin

Re: Make the vectoriser drop to strided accesses for stores with gaps

2018-07-20 Thread Richard Biener

On Fri, Jul 20, 2018 at 12:57 PM Richard Sandiford
 wrote:
>
> We could vectorise:
>
>  for (...)
>{
>  a[0] = ...;
>  a[1] = ...;
>  a[2] = ...;
>  a[3] = ...;
>  a += stride;
>}
>
> (including the case when stride == 8) but not:
>
>  for (...)
>{
>  a[0] = ...;
>  a[1] = ...;
>  a[2] = ...;
>  a[3] = ...;
>  a += 8;
>}
>
> (where the stride is always 8).  The former was treated as a "grouped
> and strided" store, while the latter was treated as grouped store with
> gaps, which we don't support.
>
> This patch makes us treat groups of stores with gaps at the end as
> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P
> and all vector uses of DR_STEP to see whether there were any hard-baked
> assumptions, but couldn't see any.  I wondered whether we should relax:
>
>   /* We do not have to consider dependences between accesses that belong
>  to the same group, unless the stride could be smaller than the
>  group size.  */
>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
>   && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
>   == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))
>   && !STMT_VINFO_STRIDED_P (stmtinfo_a))
> return false;
>
> for cases in which the step is constant and the absolute step is known
> to be greater than the group size, but data dependence analysis should
> already return chrec_known for those cases.
>
> The new test is a version of vect-avg-15.c with the variable step
> replaced by a constant one.
>
> A natural follow-on would be to do the same for groups with gaps in
> the middle:
>
>   /* Check that the distance between two accesses is equal to the type
>  size. Otherwise, we have gaps.  */
>   diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
>   - TREE_INT_CST_LOW (prev_init)) / type_size;
>   if (diff != 1)
> {
>   [...]
>   if (DR_IS_WRITE (data_ref))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "interleaved store with gaps\n");
>   return false;
> }
>
> But I think we should do that separately and see what the fallout
> from this change is first.

Agreed.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

Do you need to set STMT_VINFO_STRIDED_P on all stmts?  I think
it is enough to set it on the first group element.

OK otherwise.
Thanks,
Richard.

> Richard
>
>
> 2018-07-20  Richard Sandiford  
>
> gcc/
> * tree-vect-data-refs.c (vect_analyze_group_access_1): Convert
> grouped stores with gaps to a strided group.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-avg-16.c: New test.
> * gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.
> * gcc.dg/vect/vect-strided-u8-i8-gap4.c,
> * gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for
> the second loop in main1.
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2018-06-30 13:44:38.567611988 +0100
> +++ gcc/tree-vect-data-refs.c   2018-07-20 11:55:22.570911497 +0100
> @@ -2632,10 +2632,14 @@ vect_analyze_group_access_1 (struct data
>if (groupsize != count
>   && !DR_IS_READ (dr))
>  {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"interleaved store with gaps\n");
> - return false;
> + groupsize = count;
> + STMT_VINFO_STRIDED_P (stmt_info) = true;
> + stmt_vec_info next_info = stmt_info;
> + while (DR_GROUP_NEXT_ELEMENT (next_info))
> +   {
> + next_info = vinfo_for_stmt (DR_GROUP_NEXT_ELEMENT (next_info));
> + STMT_VINFO_STRIDED_P (next_info) = 1;
> +   }
> }
>
>/* If there is a gap after the last load in the group it is the
> @@ -2651,6 +2655,8 @@ vect_analyze_group_access_1 (struct data
>"Detected interleaving ");
>   if (DR_IS_READ (dr))
> dump_printf (MSG_NOTE, "load ");
> + else if (STMT_VINFO_STRIDED_P (stmt_info))
> +   dump_printf (MSG_NOTE, "strided store ");
>   else
> dump_printf (MSG_NOTE, "store ");
>   dump_printf (MSG_NOTE, "of size %u starting with ",
> Index: gcc/testsuite/gcc.dg/vect/vect-avg-16.c
> ===
> --- /dev/null   2018-07-09 14:52:09.234750850 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-avg-16.c 2018-07-20 11:55:22.570911497 
> +0100
> @@ -0,0 +1,52 @@
> +/* { dg-additional-options "-O3" } */
> +/* { dg-require-effective-target vect_int

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-20 Thread Wilco Dijkstra

Steve Ellcey wrote:

> Yes, I see where I missed this in aarch64_push_regs
> and aarch64_pop_regs.  I think that is why the second of
> Wilco's two examples (f2) is wrong.  I am unclear about
> exactly what is meant by writeback and why we have it and
> how that and callee_adjust are used.  Any chance someone
> could help me understand this part of the prologue/epilogue
> code better?  The comments in aarch64.c/aarch64.h aren't
> really helping me understand what the code is doing or
> why it is doing it.

Writeback is the same as a base update in a load or store. When
creating the frame there are 3 stack adjustments to be made:
creating stack for locals, pushing callee-saves and reserving space
for outgoing arguments. We merge these stack adjustments as much as
possible and use load/store with writeback for codesize and performance.
See the last part in layout_frame for the different cases.

In many common cases the frame is small and there are no outgoing
arguments, so we emit an STP with writeback to store the first 2 callee saves
and create th full frame in a single instruction. In this case callee_adjust 
will
be the frame size and initial_adjust will be zero.

push_regs and pop_regs need to be passed a mode since layout_frame
will use STP with writeback of floating point callee-saves if there are no 
integer
callee-saves. Note if there is only 1 or odd number of callee-save it may use
LDR/STR with writeback, so we need to support TFmode for these too.

Wilco

[PATCH] Fix PR86585

2018-07-20 Thread Richard Biener



I have committed the following patch to make debug refs prevail
in LTO tree merging.  This will fix mixing -g0 and -g units as
well as cases where we fail to emit early debug for some decls
in one unit but emit them for the copy in a second like for
the testcase.  So for the testcase a FE fix would be possible
as well to not lose debug info.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2018-07-20  Richard Biener  

PR debug/86585
* dwarf2out.c (dwarf2out_die_ref_for_decl): Test in_lto_p
to cover -flto-partition=none.

lto/
* lto.c (unify_scc): Before we throw away an SCC see if we
can amend prevailing single-entry SCC with debug refs.

* g++.dg/lto/pr86585_0.C: New testcase.
* g++.dg/lto/pr86585_1.C: Likewise.

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index bd45e0b0685..8377cbc5dd1 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -5851,8 +5851,7 @@ dwarf2out_die_ref_for_decl (tree decl, const char **sym,
 {
   dw_die_ref die;
 
-  if ((flag_wpa || flag_incremental_link == INCREMENTAL_LINK_LTO)
-  && !decl_die_table)
+  if (in_lto_p && !decl_die_table)
 return false;
 
   if (TREE_CODE (decl) == BLOCK)
@@ -5865,8 +5864,7 @@ dwarf2out_die_ref_for_decl (tree decl, const char **sym,
   /* During WPA stage and incremental linking we currently use DIEs
  to store the decl <-> label + offset map.  That's quite inefficient
  but it works for now.  */
-  if (flag_wpa
-  || flag_incremental_link == INCREMENTAL_LINK_LTO)
+  if (in_lto_p)
 {
   dw_die_ref ref = get_AT_ref (die, DW_AT_abstract_origin);
   if (!ref)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index d1add15efeb..8db280ecefc 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -1638,6 +1638,21 @@ unify_scc (struct data_in *data_in, unsigned from,
 to the tree node mapping computed by compare_tree_sccs.  */
  if (len == 1)
{
+ /* If we got a debug reference queued, see if the prevailing
+tree has a debug reference and if not, register the one
+for the tree we are about to throw away.  */
+ if (dref_queue.length () == 1)
+   {
+ dref_entry e = dref_queue.pop ();
+ gcc_assert (e.decl
+ == streamer_tree_cache_get_tree (cache, from));
+ const char *sym;
+ unsigned HOST_WIDE_INT off;
+ if (!debug_hooks->die_ref_for_decl (pscc->entries[0], &sym,
+ &off))
+   debug_hooks->register_external_die (pscc->entries[0],
+   e.sym, e.off);
+   }
  lto_maybe_register_decl (data_in, pscc->entries[0], from);
  streamer_tree_cache_replace_tree (cache, pscc->entries[0], from);
}
@@ -1669,7 +1684,9 @@ unify_scc (struct data_in *data_in, unsigned from,
  free_node (scc->entries[i]);
}
 
- /* Drop DIE references.  */
+ /* Drop DIE references.
+???  Do as in the size-one SCC case which involves sorting
+the queue.  */
  dref_queue.truncate (0);
 
  break;
diff --git a/gcc/testsuite/g++.dg/lto/pr86585_0.C 
b/gcc/testsuite/g++.dg/lto/pr86585_0.C
new file mode 100644
index 000..2c3ae9da414
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr86585_0.C
@@ -0,0 +1,18 @@
+// { dg-lto-do link }
+// { dg-require-effective-target fpic }
+// { dg-require-effective-target shared }
+// { dg-lto-options { { -flto -g -nostdlib -shared -fPIC } } }
+namespace Inkscape {
+class a;
+}
+class b {
+Inkscape::a *c;
+virtual void d();
+};
+class e {
+b f;
+};
+class g : e {
+void h();
+};
+void g::h() {}
diff --git a/gcc/testsuite/g++.dg/lto/pr86585_1.C 
b/gcc/testsuite/g++.dg/lto/pr86585_1.C
new file mode 100644
index 000..ebcbe126768
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr86585_1.C
@@ -0,0 +1,24 @@
+struct a {
+struct b {
+   b();
+} c;
+};
+class d {
+a e;
+};
+namespace aa {
+class h {};
+} // namespace aa
+class k {
+typedef aa::h f;
+f g;
+};
+namespace Inkscape {
+class l {
+   k i;
+class : d {
+   } j;
+   l();
+};
+l::l() {}
+} // namespace Inkscape

Re: Make the vectoriser drop to strided accesses for stores with gaps

2018-07-20 Thread Richard Sandiford

Richard Biener  writes:
> On Fri, Jul 20, 2018 at 12:57 PM Richard Sandiford
>  wrote:
>>
>> We could vectorise:
>>
>>  for (...)
>>{
>>  a[0] = ...;
>>  a[1] = ...;
>>  a[2] = ...;
>>  a[3] = ...;
>>  a += stride;
>>}
>>
>> (including the case when stride == 8) but not:
>>
>>  for (...)
>>{
>>  a[0] = ...;
>>  a[1] = ...;
>>  a[2] = ...;
>>  a[3] = ...;
>>  a += 8;
>>}
>>
>> (where the stride is always 8).  The former was treated as a "grouped
>> and strided" store, while the latter was treated as grouped store with
>> gaps, which we don't support.
>>
>> This patch makes us treat groups of stores with gaps at the end as
>> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P
>> and all vector uses of DR_STEP to see whether there were any hard-baked
>> assumptions, but couldn't see any.  I wondered whether we should relax:
>>
>>   /* We do not have to consider dependences between accesses that belong
>>  to the same group, unless the stride could be smaller than the
>>  group size.  */
>>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
>>   && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
>>   == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))
>>   && !STMT_VINFO_STRIDED_P (stmtinfo_a))
>> return false;
>>
>> for cases in which the step is constant and the absolute step is known
>> to be greater than the group size, but data dependence analysis should
>> already return chrec_known for those cases.
>>
>> The new test is a version of vect-avg-15.c with the variable step
>> replaced by a constant one.
>>
>> A natural follow-on would be to do the same for groups with gaps in
>> the middle:
>>
>>   /* Check that the distance between two accesses is equal to the 
>> type
>>  size. Otherwise, we have gaps.  */
>>   diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
>>   - TREE_INT_CST_LOW (prev_init)) / type_size;
>>   if (diff != 1)
>> {
>>   [...]
>>   if (DR_IS_WRITE (data_ref))
>> {
>>   if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>  "interleaved store with gaps\n");
>>   return false;
>> }
>>
>> But I think we should do that separately and see what the fallout
>> from this change is first.
>
> Agreed.
>
>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
>> and x86_64-linux-gnu.  OK to install?
>
> Do you need to set STMT_VINFO_STRIDED_P on all stmts?  I think
> it is enough to set it on the first group element.

get_load_store_type tests STMT_VINFO_STRIDED_P on the stmt it's
given rather than the first element, but I guess that's my bug...

I'm hoping this weekend to finally finish off the series to get
rid of vinfo_for_stmt, which will make it slightly easier to do this.

Thanks,
Richard

> OK otherwise.
> Thanks,
> Richard.
>
>> Richard
>>
>>
>> 2018-07-20  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-data-refs.c (vect_analyze_group_access_1): Convert
>> grouped stores with gaps to a strided group.
>>
>> gcc/testsuite/
>> * gcc.dg/vect/vect-avg-16.c: New test.
>> * gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.
>> * gcc.dg/vect/vect-strided-u8-i8-gap4.c,
>> * gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for
>> the second loop in main1.
>>
>> Index: gcc/tree-vect-data-refs.c
>> ===
>> --- gcc/tree-vect-data-refs.c   2018-06-30 13:44:38.567611988 +0100
>> +++ gcc/tree-vect-data-refs.c   2018-07-20 11:55:22.570911497 +0100
>> @@ -2632,10 +2632,14 @@ vect_analyze_group_access_1 (struct data
>>if (groupsize != count
>>   && !DR_IS_READ (dr))
>>  {
>> - if (dump_enabled_p ())
>> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -"interleaved store with gaps\n");
>> - return false;
>> + groupsize = count;
>> + STMT_VINFO_STRIDED_P (stmt_info) = true;
>> + stmt_vec_info next_info = stmt_info;
>> + while (DR_GROUP_NEXT_ELEMENT (next_info))
>> +   {
>> + next_info = vinfo_for_stmt (DR_GROUP_NEXT_ELEMENT (next_info));
>> + STMT_VINFO_STRIDED_P (next_info) = 1;
>> +   }
>> }
>>
>>/* If there is a gap after the last load in the group it is the
>> @@ -2651,6 +2655,8 @@ vect_analyze_group_access_1 (struct data
>>"Detected interleaving ");
>>   if (DR_IS_READ (dr))
>> dump_printf (MSG_NOTE, "load ");
>> + else if (STMT_VINFO_STRIDED_P (stmt_info))
>> +   dump_printf (MSG_NOTE, "strided store ");
>>   else
>> dump_printf (MSG_

Re: [PATCH] Explicitly mark _S_ti() as default visibility to work around clang -fvisibility-inlines-hidden bug

2018-07-20 Thread Jonathan Wakely


On 19/07/18 16:58 -0700, Fāng-ruì Sòng via libstdc++ wrote:

clang (including trunk and many older versions) incorrectly marks static local 
variables (__tag) hidden when -fvisibility-inlines-hidden is used.

% cat b.cc
#include 
std::shared_ptr foo(int x) {
return std::make_shared(x);
}
% g++-8 -fvisibility-inlines-hidden -fno-rtti -c b.cc
% readelf -s b.o | grep _S_ti
 163:  1 OBJECT  UNIQUE DEFAULT   67 
_ZZNSt19_Sp_make_shared_tag5_S_tiEvE5__tag
 164:  8 FUNCWEAK   HIDDEN68 
_ZNSt19_Sp_make_shared_tag5_S_tiEv
% ~/Dev/llvm/static-release/bin/clang++ -fvisibility-inlines-hidden -fno-rtti 
-c b.cc
% readelf -s b.o | grep _S_ti
 129: 16 FUNCWEAK   HIDDEN34 
_ZNSt19_Sp_make_shared_tag5_S_tiEv
 155:  1 OBJECT  WEAK   HIDDEN   202 
_ZZNSt19_Sp_make_shared_tag5_S_tiEvE5__tag

This can lead to multiple instances of __tag when shares objects are used.
The function
virtual void* std::_Sp_counted_ptr_inplace::_M_get_deleter(const 
std::type_info& __ti) noexcept
may return nullptr and causes std::make_shared() to return nullptr 
(-fvisibility-inlines-hidden -fno-rtti).

After applying this patch (tagging _S_ti() with default visibility to override 
-fvisibility-inlines-hidden)

% readelf -s b.o | grep _S_ti
 129: 16 FUNCWEAK   DEFAULT   34 
_ZNSt19_Sp_make_shared_tag5_S_tiEv
 155:  1 OBJECT  WEAK   DEFAULT  202 
_ZZNSt19_Sp_make_shared_tag5_S_tiEvE5__tag


This issue caused 10+ check-all tests of a -DUSE_SHARED_LLVM=On build of llvm 
(compiled with clang trunk) to SIGSEGV (because std::make_shared returned 
nullptr) and this patch fixes it.


  * include/bits/shared_ptr_base.h (_S_ti): Use
  _GLIBCXX_VISIBILITY(default)



Thanks, I've tested this and committed it to trunk.

I think we want this on gcc-8-branch too, but it's too late for the
8.2 release now.



--
宋方睿



From 6da8cec298766ce043d9c6dcda7b87142228dafb Mon Sep 17 00:00:00 2001
From: Fangrui Song 
Date: Thu, 19 Jul 2018 16:40:26 -0700
Subject: [PATCH 1/1] Explicitly mark _S_ti() as default visibility to work
around clang -fvisibility-inlines-hidden bug

clang (including trunk and many older versions) incorrectly marks static
local variables (__tag) hidden when -fvisibility-inlines-hidden is used.
This can lead to multiple instances of __tag when shares objects are used.

   * include/bits/shared_ptr_base.h (_S_ti): Use
   _GLIBCXX_VISIBILITY(default)
---
libstdc++-v3/include/bits/shared_ptr_base.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
b/libstdc++-v3/include/bits/shared_ptr_base.h
index f3994da158f..870aeb9bfda 100644
--- a/libstdc++-v3/include/bits/shared_ptr_base.h
+++ b/libstdc++-v3/include/bits/shared_ptr_base.h
@@ -508,7 +508,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  friend class _Sp_counted_ptr_inplace;

static const type_info&
-_S_ti() noexcept
+_S_ti() noexcept _GLIBCXX_VISIBILITY(default)
{
  alignas(type_info) static constexpr _Sp_make_shared_tag __tag;
  return reinterpret_cast(__tag);
--
2.18.0

[PATCH] PR libstdc++/86603 Move __cpp_lib_list_remove_return_type macro

2018-07-20 Thread Jonathan Wakely


This should only be defined for C++2a not C++17.

PR libstdc++/86603
* include/std/version: Move __cpp_lib_list_remove_return_type macro.

Tested powerpc64le-linux, committed to trunk.


commit c457458395791eff165618b7bd04f6a71d99188d
Author: Jonathan Wakely 
Date:   Fri Jul 20 11:15:50 2018 +0100

PR libstdc++/86603 Move __cpp_lib_list_remove_return_type macro

This should only be defined for C++2a not C++17.

PR libstdc++/86603
* include/std/version: Move __cpp_lib_list_remove_return_type macro.

diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index a70c73fd12b..0c26d9b6a7b 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -105,7 +105,6 @@
 #define __cpp_lib_is_swappable 201603
 #define __cpp_lib_launder 201606
 #define __cpp_lib_lcm 201606
-#define __cpp_lib_list_remove_return_type 201806L
 #define __cpp_lib_logical_traits 201510
 #define __cpp_lib_make_from_tuple 201606
 #define __cpp_lib_map_insertion 201411
@@ -127,6 +126,7 @@
 
 #if __cplusplus > 201703L
 // c++2a
+#define __cpp_lib_list_remove_return_type 201806L
 #endif // C++2a
 #endif // C++17
 #endif // C++14

Re: [PATCH 1/2] condition_variable: Report early wakeup of wait_until as no_timeout

2018-07-20 Thread Mike Crowe

On Friday 20 July 2018 at 11:53:38 +0100, Jonathan Wakely wrote:
> On 10/07/18 11:09 +0100, Mike Crowe wrote:
> > As currently implemented, condition_variable always ultimately waits
> > against std::chrono::system_clock. This clock can be changed in arbitrary
> > ways by the user which may result in us waking up too early or too late
> > when measured against the caller-supplied clock.
> > 
> > We can't (yet) do much about waking up too late[1], but
> > if we wake up too early we must return cv_status::no_timeout to indicate a
> > spurious wakeup rather than incorrectly returning cv_status::timeout.
> 
> The patch looks good, thanks.
> 
> Can we come up with a test for this, using a user-defined clock that
> jumps back?

That's a good idea. I'll do that.

Thanks.

Mike.

Re: [PATCH] restore -Warray-bounds for string literals (PR 83776)

2018-07-20 Thread Rainer Orth

Hi Martin,

> On 07/19/2018 03:51 PM, Jeff Law wrote:
>> On 07/13/2018 05:45 PM, Martin Sebor wrote:

 +  offset_int ofr[] = {
 +   wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
 +   wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
 +  };

 huh.  Do you maybe want to use widest_int for ofr[]?  What's
 wrong with wi::to_offset (vr->min)?  Building another intermediate
 tree node here looks just bogus.
>>>
>>> I need to convert size_type indices to signed to keep their sign
>>> if it's negative and include it in warnings.  I've moved the code
>>> into a conditional where it's used to minimize the cost but other
>>> than that I don't know how else to convert it.
>>>

 What are you trying to do in this loop anyways?
>>>
>>> The loop It determines the range of the final index/offset for
>>> a series of POINTER_PLUS_EXPRs.  It handles cases like this:
>>>
>>>   int g (int i, int j, int k)
>>>   {
>>> if (i < 1) i = 1;
>>> if (j < 1) j = 1;
>>> if (k < 1) k = 1;
>>>
>>> const char *p0 = "123";
>>> const char *p1 = p0 + i;
>>> const char *p2 = p1 + j;
>>> const char *p3 = p2 + k;
>>>
>>> // p2[3] and p3[1] are out of bounds
>>> return p0[4] + p1[3] + p2[2] + p3[1];
>>>   }
>>>
 I suppose
 look at

   p_1 = &obj[i_2];  // already bounds-checked, but with ignore_off_by_one
   ... = MEM[p_1 + CST];

 ?  But then

 +  if (TREE_CODE (varoff) != SSA_NAME)
 +   break;

 you should at least handle INTEGER_CSTs here?
>>>
>>> It's already handled (and set in CSTOFF).  There should be no
>>> more constant offsets after that (I haven't come across any.)
>>>

 +  if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
 +   break;

 please use positive tests here, VR_RANGE || VR_ANTI_RANGE.  As usual
 the anti-range handling looks odd.  Please add comments so we can follow
 what you were thinking when writing range merging code.  Even better if you

 can stick to use existing code rather than always re-inventing the wheel...

>>>
>>> The anti-range handling is to conservatively add
>>> [-MAXOBJSIZE -1, MAXOBJSIZE] to OFFRANGE.  I've added comments
>>> to make it clear.  I'd be more than happy to reuse existing
>>> code if I knew where to find it (if it exists).  It sure would
>>> save me lots of work to have a library of utility functions to
>>> call instead of rolling my own code each time.
>> Finding stuff is never easy :(  GCC's just gotten so big it's virtually
>> impossible for anyone to know all the APIs.
>>
>> The suggestion I'd have would be to (when possible) factor this stuff
>> into routines you can reuse.  We (as a whole) have this tendency to
>> open-code all kinds of things rather than factoring the code into
>> reusable routines.
>>
>> In addition to increasing the probability that you'll be able to reuse
>> code later, just reading a new function's header tends to make me (as a
>> reviewer) internally ask if there's already a routine we should be using
>> instead.  When it's open-coded it's significantly harder to spot those
>> cases (at least for me).
>>
>>
>>>

 I think I commented on earlier variants but this doesn't seem to resemble
 any of them.
>>>
>>> I've reworked the patch (sorry) to also handle arrays.  For GCC
>>> 9 it seems I might as well do both in one go.
>>>
>>> Attached is an updated patch with these changes.
>>>
>>> Martin
>>>
>>> gcc-83776.diff
>>>
>>>
>>> PR tree-optimization/84047 - missing -Warray-bounds on an out-of-bounds
>>> index into an array
>>> PR tree-optimization/83776 - missing -Warray-bounds indexing past the
>>> end of a string literal
>>>
>>> gcc/ChangeLog:
>>>
>>> PR tree-optimization/84047
>>> PR tree-optimization/83776
>>> * tree-vrp.c (vrp_prop::check_mem_ref): New function.
>>> (check_array_bounds): Call it.
>>> * /gcc/tree-sra.c (get_access_for_expr): Fail for out-of-bounds
>>> array offsets.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR tree-optimization/83776
>>> PR tree-optimization/84047
>>> * gcc.dg/Warray-bounds-29.c: New test.
>>> * gcc.dg/Warray-bounds-30.c: New test.
>>> * gcc.dg/Warray-bounds-31.c: New test.
>>> * gcc.dg/Warray-bounds-32.c: New test.
>>>
>>
>>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>>> index 3e30f6b..8221a06 100644
>>> --- a/gcc/tree-sra.c
>>> +++ b/gcc/tree-sra.c
>>> @@ -3110,6 +3110,19 @@ get_access_for_expr (tree expr)
>>>|| !DECL_P (base))
>>>  return NULL;
>>>
>>> +  /* Fail for out-of-bounds array offsets.  */
>>> +  tree basetype = TREE_TYPE (base);
>>> +  if (TREE_CODE (basetype) == ARRAY_TYPE)
>>> +{
>>> +  if (offset < 0)
>>> +   return NULL;
>>> +
>>> +  if (tree size = DECL_SIZE (base))
>>> +   if (tree_fits_uhwi_p (size)
>>> +   && tree_to_uhwi (size) < (unsigned HOST_WIDE_INT) offset)
>>> + return NULL;
>>> +}

Re: [PATCH 1/2] condition_variable: Report early wakeup of wait_until as no_timeout

2018-07-20 Thread Jonathan Wakely

On 20/07/18 12:30 +0100, Mike Crowe wrote:

On Friday 20 July 2018 at 11:53:38 +0100, Jonathan Wakely wrote:

On 10/07/18 11:09 +0100, Mike Crowe wrote:
> As currently implemented, condition_variable always ultimately waits
> against std::chrono::system_clock. This clock can be changed in arbitrary
> ways by the user which may result in us waking up too early or too late
> when measured against the caller-supplied clock.
>
> We can't (yet) do much about waking up too late[1], but
> if we wake up too early we must return cv_status::no_timeout to indicate a
> spurious wakeup rather than incorrectly returning cv_status::timeout.

The patch looks good, thanks.

Can we come up with a test for this, using a user-defined clock that
jumps back?

That's a good idea. I'll do that.

Thanks.

There are "broken" clocks in testsuite/30_threads/this_thread/60421.cc
and testsuite/30_threads/timed_mutex/try_lock_until/57641.cc but I'm
not sure if either of them is reusable. A custom one for this test
should be reasonably easy.

[PATCH] PR libstdc++/86595 add missing noexcept

2018-07-20 Thread Jonathan Wakely


PR libstdc++/86595
* include/bits/fs_dir.h (directory_entry::refresh(error_code&)): Add
noexcept.

Tested powerpc64le-linux, committed to trunk.


commit c857adf35f9849ddda1148c65813aba86acb5c6a
Author: Jonathan Wakely 
Date:   Fri Jul 20 12:35:05 2018 +0100

PR libstdc++/86595 add missing noexcept

PR libstdc++/86595
* include/bits/fs_dir.h (directory_entry::refresh(error_code&)): Add
noexcept.

diff --git a/libstdc++-v3/include/bits/fs_dir.h 
b/libstdc++-v3/include/bits/fs_dir.h
index 6b332e171cf..cf7a3c29376 100644
--- a/libstdc++-v3/include/bits/fs_dir.h
+++ b/libstdc++-v3/include/bits/fs_dir.h
@@ -138,8 +138,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   refresh(__ec);
 }
 
-void refresh() { _M_type = symlink_status().type(); }
-void refresh(error_code& __ec) { _M_type = symlink_status(__ec).type(); }
+void
+refresh()
+{ _M_type = symlink_status().type(); }
+
+void
+refresh(error_code& __ec) noexcept
+{ _M_type = symlink_status(__ec).type(); }
 
 // observers
 const filesystem::path& path() const noexcept { return _M_path; }

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Umesh Kalappa

Reminder !!!

~Umesh


On Wed, Jul 18, 2018 at 6:01 PM, Umesh Kalappa  wrote:
> Hi Nagy/Ramana,
>
> Please help us to review the attached patch and do let me know your comments .
>
> No regress in the  gcc.target  suite for arm target.
>
> Thank you
> ~Umesh
>
> On Tue, Jul 17, 2018 at 4:01 PM, Umesh Kalappa  
> wrote:
>> Will do, thanks.
>> Thanks
>>
>> On Tue, Jul 17, 2018, 3:24 PM Ramana Radhakrishnan
>>  wrote:
>>>
>>> On Tue, Jul 17, 2018 at 10:41 AM, Umesh Kalappa
>>>  wrote:
>>> > Hi Nagy,
>>> >
>>> > Please  help us with your comments on the attached patch for the issue
>>> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86512)
>>> >
>>> > Thank you and waiting for your inputs on the same.
>>>
>>>
>>> Patches should be sent to gcc-patches@gcc.gnu.org with a clear
>>> description of what the patch hopes to
>>> achieve and why this is correct, how was it tested and if a regression
>>> test needs to be added - add one please.
>>> Please read https://gcc.gnu.org/contribute.html before sending a patch.
>>>
>>> This is the wrong list to send patches to.
>>>
>>> regards
>>> Ramana
>>> > ~Umesh
>>> >
>>> > On Fri, Jul 13, 2018 at 1:22 PM, Umesh Kalappa
>>> >  wrote:
>>> >> Thank you and issue  raised at
>>> >> gcc-patches@gcc.gnu.org
>>> >>
>>> >> ~Umesh
>>> >>
>>> >> On Thu, Jul 12, 2018 at 9:33 PM, Szabolcs Nagy 
>>> >> wrote:
>>> >>> On 12/07/18 16:20, Umesh Kalappa wrote:
>>> 
>>>  Hi everyone,
>>> 
>>>  we have our source base ,that was compiled for armv7 on gcc8.1 with
>>>  soft-float and for following input
>>> 
>>>  a=0x0010
>>>  b=0x0001
>>> 
>>>    result = a - b ;
>>> 
>>>  we are getting the result as "0x000e" and with
>>>  -mhard-float (disabled the flush to zero mode ) we are getting the
>>>  result as ""0x000f" as expected.
>>> 
>>> >>>
>>> >>> please submit it as a bug report to bugzilla
>>> >>>
>>> >>>
>>>  while debugging the soft-float code,we see that ,the compiler calls
>>>  the intrinsic "__aeabi_dsub" with arm calling conventions i.e passing
>>>  "a" in r0 and r1 registers and respectively for "b".
>>> 
>>>  we are investigating the routine "__aeabi_dsub" that comes from
>>>  libgcc
>>>  for incorrect result  and meanwhile we would like to know that
>>> 
>>>  a)do libgcc routines/intrinsic for float operations support or
>>>  consider the subnormal values ? ,if so how we can enable the same.
>>> 
>>>  Thank you
>>>  ~Umesh
>>> 
>>> >>>

Re: Fold pointer range checks with equal spans

2018-07-20 Thread Marc Glisse


On Fri, 20 Jul 2018, Richard Sandiford wrote:


--- gcc/match.pd2018-07-18 18:44:22.565914281 +0100
+++ gcc/match.pd2018-07-20 11:24:33.692045585 +0100
@@ -4924,3 +4924,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (inverse_conditions_p (@0, @2)
&& element_precision (type) == element_precision (op_type))
(view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))
+
+/* For pointers @0 and @2 and unsigned constant offset @1, look for
+   expressions like:
+
+   A: (@0 + @1 < @2) | (@2 + @1 < @0)
+   B: (@0 + @1 <= @2) | (@2 + @1 <= @0)
+
+   If pointers are known not to wrap, B checks whether @1 bytes starting at
+   @0 and @2 do not overlap, while A tests the same thing for @1 + 1 bytes.
+   A is more efficiently tested as:
+
+   ((sizetype) @0 - (sizetype) @2 + @1) > (@1 * 2)
+
+   as long as @1 * 2 doesn't overflow.  B is the same with @1 replaced
+   with @1 - 1.  */
+(for ior (truth_orif truth_or bit_ior)
+ (for cmp (le lt)
+  (simplify
+   (ior (cmp (pointer_plus:s @0 INTEGER_CST@1) @2)
+   (cmp (pointer_plus:s @2 @1) @0))


Do you want :c on cmp, in case it appears as @2 > @0 + @1 ? (may need some 
care with "cmp == LE_EXPR" below)

Do you want :s on cmp as well?


+   (if (!flag_trapv && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))


Don't you want TYPE_OVERFLOW_UNDEFINED?


+/* Convert the B form to the A form.  */
+(with { offset_int off = wi::to_offset (@1) - (cmp == LE_EXPR ? 1 : 0); }
+ /* Always fails for negative values.  */
+ (if (wi::min_precision (off, UNSIGNED) * 2 <= TYPE_PRECISION (sizetype))
+  /* It doesn't matter whether we use @2 - @0 or @0 - @2, so let
+tree_swap_operands_p pick a canonical order.  */
+  (with { tree ptr1 = @0, ptr2 = @2;
+ if (tree_swap_operands_p (ptr1, ptr2))
+   std::swap (ptr1, ptr2); }
+   (gt (plus (minus (convert:sizetype { ptr1; })
+   (convert:sizetype { ptr2; }))
+{ wide_int_to_tree (sizetype, off); })
+  { wide_int_to_tree (sizetype, off * 2); }


--
Marc Glisse

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Szabolcs Nagy


On 18/07/18 13:31, Umesh Kalappa wrote:

Hi Nagy/Ramana,

Please help us to review the attached patch and do let me know your comments .

No regress in the  gcc.target  suite for arm target.



have you submitted a copyright assignment form yet?
https://gcc.gnu.org/contribute.html


Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 262850)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2018-07-18  Umesh Kalappa
+
+   PR libgcc/86512
+   * gcc.target/arm/pr86512.c :New test.
+


normally changelog should be part of the mail message and
not included in the diff as it will conflict with other changes
and the whitespace formatting around : and before the email
address is wrong.


Index: gcc/testsuite/gcc.target/arm/pr86512.c
===
--- gcc/testsuite/gcc.target/arm/pr86512.c  (nonexistent)
+++ gcc/testsuite/gcc.target/arm/pr86512.c  (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -msoft-float" } */
+
+#include
+#include
+#define PRIx64"llx"


this macro is not needed.


+
+typedef union
+{
+double d;
+uint64_t i;
+} u;
+
+int main()
+{
+  u smallestPositiveNormal, smallesPositiveSubnormal, expectedResult, result;
+
+  smallesPositiveSubnormal.i = 1;
+
+  smallestPositiveNormal.i = 0x0010;
+  expectedResult.i = 0x000f;
+  result.d = smallestPositiveNormal.d - smallesPositiveSubnormal.d;
+
+
+  if (result.i != expectedResult.i)
+   abort();
+   
+  return 0;
+}
+
Index: libgcc/ChangeLog
===
--- libgcc/ChangeLog(revision 262850)
+++ libgcc/ChangeLog(working copy)
@@ -1,3 +1,9 @@
+2018-07-18  Umesh Kalappa
+   
+   PR libgcc/86512
+   * config/arm/ieee754-df.S :Don't normalise the denormal result.
+


same as above.


--- libgcc/config/arm/ieee754-df.S  (revision 262850)
+++ libgcc/config/arm/ieee754-df.S  (working copy)
@@ -203,8 +203,11 @@
  #endif
  
  	@ Determine how to normalize the result.

+   @ if result is denormal i.e (exp)=0,then don't normalise the result,
  LSYM(Lad_p):
cmp xh, #0x0010
+   blt LSYM(Lad_e)
+   cmp xh, #0x0010
bcc LSYM(Lad_a)


i think you don't need to retest the same thing,
blt won't clobber the condition codes.

as for the technical contents, sorry i have not
reviewed it in detail, the blt might not be the
right check since it's signed compare..
(but i'm not even sure why is there an assembly
implementation for this, generic c code should do
a good job.)


cmp xh, #0x0020
bcc LSYM(Lad_e)

[PATCH] Don't create non zero terminated string constant

2018-07-20 Thread Bernd Edlinger

Hi!

This fixes a not NUL terminated STRING_CST object.

Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
2018-07-20  Bernd Edlinger  

* gimple-fold.c (gimple_fold_builtin_printf): Don't create a not NUL
	terminated STRING_CST object.


--- gimple-fold.c.jj	2018-07-20 11:59:42.384727747 +0200
+++ gimple-fold.c	2018-07-20 12:19:45.195342409 +0200
@@ -3433,23 +3433,13 @@ gimple_fold_builtin_printf (gimple_stmt_
 	  && (int) len > 0)
 	{
 	  char *newstr;
-	  tree offset_node, string_cst;
 
 	  /* Create a NUL-terminated string that's one char shorter
 		 than the original, stripping off the trailing '\n'.  */
-	  newarg = build_string_literal (len, str);
-	  string_cst = string_constant (newarg, &offset_node);
-	  gcc_checking_assert (string_cst
-   && (TREE_STRING_LENGTH (string_cst)
-   == (int) len)
-   && integer_zerop (offset_node)
-   && (unsigned char)
-  TREE_STRING_POINTER (string_cst)[len - 1]
-  == target_newline);
-	  /* build_string_literal creates a new STRING_CST,
-		 modify it in place to avoid double copying.  */
-	  newstr = CONST_CAST (char *, TREE_STRING_POINTER (string_cst));
+	  newstr = xstrdup (str);
 	  newstr[len - 1] = '\0';
+	  newarg = build_string_literal (len, newstr);
+	  free (newstr);
 	  if (fn_puts)
 		{
 		  gcall *repl = gimple_build_call (fn_puts, 1, newarg);

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Wilco Dijkstra

Hi Umesh,

Looking at your patch, this would break all results which need to be normalized.


Index: libgcc/config/arm/ieee754-df.S
===
--- libgcc/config/arm/ieee754-df.S  (revision 262850)
+++ libgcc/config/arm/ieee754-df.S  (working copy)
@@ -203,8 +203,11 @@
 #endif
 
@ Determine how to normalize the result.
+   @ if result is denormal i.e (exp)=0,then don't normalise the result,
 LSYM(Lad_p):
cmp xh, #0x0010
+   blt LSYM(Lad_e)
+   cmp xh, #0x0010
bcc LSYM(Lad_a)
cmp xh, #0x0020
bcc LSYM(Lad_e)

It seems Lad_a doesn't correctly handle the case where the result is a 
denormal. For this case
the result is correct so nothing else needs to be done. This requires an 
explicit test that the
exponent is zero - other cases still need to be renormalized as usual. This 
code looks overly
complex so any change will require extensive testing of all the corner cases.

Wilco

Re: Optimization for std::to_string()

2018-07-20 Thread Jonathan Wakely


On 20/07/18 13:51 +0300, niXman wrote:

Jonathan Wakely 2018-07-20 13:05:

On 20/07/18 12:44 +0300, niXman wrote:



Thanks. How did you verify it's an optimization?

Optimization is that there is no superfluous copying into string.


The to_string functions always pass at least __n=16 to __to_xstring,
which is larger than the SSO buffer in std::__cxx11::string, and so
forces a memory allocation.

I didn't think about this...


Avoiding an unconditional dynamic allocation is the main advantage of
using alloca.

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-20 Thread Bernd Edlinger

Martin,

when I look at tree-ssa-forwprop.c:

   str1 = string_constant (src1, &off1);
   if (str1 == NULL_TREE)
 break;
   if (!tree_fits_uhwi_p (off1)
   || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
   || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
  - tree_to_uhwi (off1)) > 0
   || TREE_CODE (TREE_TYPE (str1)) != ARRAY_TYPE
   || TYPE_MODE (TREE_TYPE (TREE_TYPE (str1)))
  != TYPE_MODE (char_type_node))
 break;

I don't think it is a good idea to let string_constant return
strings which have not necessarily valid content after the first
NUL byte.  It looks like the content has to be valid up to
TREE_STRING_LENGTH.

So may I suggest to do something like the following
(diff to your last patch):

--- expr.c.jj   2018-07-20 10:51:51.983605588 +0200
+++ expr.c  2018-07-20 15:07:29.769423479 +0200
@@ -11277,6 +11277,7 @@ tree
  string_constant (tree arg, tree *ptr_offset)
  {
tree array;
+  tree array_size;
STRIP_NOPS (arg);
  
/* Non-constant index into the character array in an ARRAY_REF
@@ -11295,9 +11296,11 @@ string_constant (tree arg, tree *ptr_off
  
arg = TREE_OPERAND (arg, 0);
tree ref = arg;
+  tree reftype = TREE_TYPE (ref);
if (TREE_CODE (arg) == ARRAY_REF)
{
  tree idx = TREE_OPERAND (arg, 1);
+ reftype = TREE_TYPE (TREE_OPERAND (arg, 0));
  if (TREE_CODE (idx) != INTEGER_CST
  && TREE_CODE (argtype) == POINTER_TYPE)
{
@@ -11313,6 +11316,11 @@ string_constant (tree arg, tree *ptr_off
return NULL_TREE;
}
}
+  if (TREE_CODE (reftype) != ARRAY_TYPE)
+   return NULL_TREE;
+  while (TREE_CODE (TREE_TYPE (reftype)) == ARRAY_TYPE)
+   reftype = TREE_TYPE (reftype);
+  array_size = TYPE_SIZE_UNIT (reftype);
array = get_addr_base_and_unit_offset (ref, &base_off);
if (!array
  || (TREE_CODE (array) != VAR_DECL
@@ -11345,7 +11353,10 @@ string_constant (tree arg, tree *ptr_off
return NULL_TREE;
  }
else if (DECL_P (arg))
-array = arg;
+{
+  array = arg;
+  array_size = DECL_SIZE_UNIT (array);
+}
else
  return NULL_TREE;
  
@@ -11410,24 +11421,18 @@ string_constant (tree arg, tree *ptr_off
if (!init || TREE_CODE (init) != STRING_CST)
  return NULL_TREE;
  
-  tree array_size = DECL_SIZE_UNIT (array);
if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
  return NULL_TREE;
  
/* Avoid returning a string that doesn't fit in the array
- it is stored in, like
+ it is stored in, like:
   const char a[4] = "abcde";
- but do handle those that fit even if they have excess
- initializers, such as in
- const char a[4] = "abc\000\000";
- The excess elements contribute to TREE_STRING_LENGTH()
- but not to strlen().  */
-  unsigned HOST_WIDE_INT charsize
-= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init;
+ ... or:
+ const char a[4] = "abc\000";
+ ... because some callers access the string up to the limit
+ imposed by TREE_STRING_LENGTH.  */
unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
-  length = string_length (TREE_STRING_POINTER (init), charsize,
- length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (compare_tree_int (array_size, length) < 0)
  return NULL_TREE;
  
*ptr_offset = offset;


Considering Richard's last comment, I don't know if TYPE_SIZE_UNIT
of the ARRAY_TYPE is the correct way to get the size of the
innermost arrayhere, but I think we will need it.



Bernd.

[PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Qing Zhao

Hi,

this is the 2nd version of the change, mainly addressed Jakub’s comments:

1. Give up the inlining expansion for strcmp/strncmp/memcmp on a target
where the type of the call has same or narrower presicion than unsigned
char.
2.  add conversions before expand_simple_binop to the two operands.

and
3. also updated comments of routine inline_string_cmp to reflect the conversions
in the expanded code.

have tested on X86 and aarch64. No regressions.

Okay for thunk?

Qing

gcc/ChangeLog:

+2018-07-20  Qing Zhao  
+
+   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
+   call to inline_expand_builtin_string_cmp.
+   (expand_builtin_strcmp): Likewise.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): Delete the last parameter, change char_type_node
+   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
+   two operands.
+   (inline_expand_builtin_string_cmp): Delete the last parameter, give up
+   the inlining expansion on target where the type of the call has same or 
+   narrower presicion than unsigned char.
+



78809C_uchar.patch
Description: Binary data


> On Jul 19, 2018, at 12:31 PM, Jakub Jelinek  wrote:
> 
> If you expand it as (int) ((unsigned char *)p)[n] - (int) ((unsigned char 
> *)q)[n]
> then aren't you relying on int type to have wider precision than unsigned char
> (or unit_mode being narrower than mode)?  I don't see anywhere where you'd
> give up on doing the inline expansion on targets where e.g. lowest
> addressable unit would be 16-bit and int would be 16-bit too.
> On targets where int is as wide as char, one would need to expand it instead
> as something like:
> if (((unsigned char *)p)[n] == ((unsigned char *)q)[n]) loop;
> ret = ((unsigned char *)p)[n] < ((unsigned char *)q)[n] ? -1 : 1;
> or similar or just use the library routine.
> 
> Also:
>  var_rtx
>= adjust_address (var_rtx_array, TYPE_MODE (unit_type_node), offset);
>  const_rtx = c_readstr (const_str + offset, unit_mode);
>  rtx op0 = (const_str_n == 1) ? const_rtx : var_rtx;
>  rtx op1 = (const_str_n == 1) ? var_rtx : const_rtx;
> 
>  result = expand_simple_binop (mode, MINUS, op0, op1,
>result, is_memcmp ? 1 : 0, OPTAB_WIDEN);
> doesn't look correct to me, var_rtx and const_rtx here are in unit_mode,
> you need to convert those to mode before you can use those in
> expand_simple_binop, using
>  op0 = convert_modes (mode, unit_mode, op0, 1);
>  op1 = convert_modes (mode, unit_mode, op1, 1);
> before the expand_simple_binop.
> While expand_simple_binop is called with an unsignedp argument, that is
> meant for the cases where the expansion needs to widen it further, not for
> calling expand_simple_binop with arguments with known incorrect mode;
> furthermore, one of them being CONST_INT which has VOIDmode.

Re: [PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Jakub Jelinek

On Fri, Jul 20, 2018 at 09:53:24AM -0500, Qing Zhao wrote:
> +2018-07-20  Qing Zhao  
> +
> +   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
> +   call to inline_expand_builtin_string_cmp.
> +   (expand_builtin_strcmp): Likewise.
> +   (expand_builtin_strncmp): Likewise.
> +   (inline_string_cmp): Delete the last parameter, change char_type_node
> +   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
> +   two operands.
> +   (inline_expand_builtin_string_cmp): Delete the last parameter, give up
> +   the inlining expansion on target where the type of the call has same 
> or 
> +   narrower presicion than unsigned char.

s/presicion/precision/

Also in the patch, where there is another typo, s/of/or/.

Ok for trunk with that fixed.

Jakub

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-20 Thread Steve Ellcey

On Fri, 2018-07-20 at 11:11 +, Wilco Dijkstra wrote:

> Steve Ellcey wrote:
> 
> > Yes, I see where I missed this in aarch64_push_regs
> > and aarch64_pop_regs.  I think that is why the second of
> > Wilco's two examples (f2) is wrong.  I am unclear about
> > exactly what is meant by writeback and why we have it and
> > how that and callee_adjust are used.  Any chance someone
> > could help me understand this part of the prologue/epilogue
> > code better?  The comments in aarch64.c/aarch64.h aren't
> > really helping me understand what the code is doing or
> > why it is doing it.

> Writeback is the same as a base update in a load or store. When
> creating the frame there are 3 stack adjustments to be made:
> creating stack for locals, pushing callee-saves and reserving space
> for outgoing arguments. We merge these stack adjustments as much as
> possible and use load/store with writeback for codesize and performance.
> See the last part in layout_frame for the different cases.

OK, I think I understand this a bit better now.  I think my main
problem is with the  term 'writeback' which I am not used to seeing.
But if I understand things correctly we are saving one or two registers
and (possibly) updating the stack pointer using auto-increment/auto-
decrement in one instruction and that the updating of SP is what you
mean by 'writeback'.  Correct?

Steve Ellcey
sell...@cavium.com

Re: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-20 Thread Jeff Law

On 07/20/2018 05:03 AM, Tamar Christina wrote:
>> Understood.  Thanks for verifying.  I wonder if we could just bury this 
>> entirely
>> in the aarch64 config files and not expose the default into params.def?
>>
> 
> Burying it in config.gcc isn't ideal because if your C runtime is 
> configurable (like uClibc) it means you
> have to go and modify this file every time you change something. If the 
> argument is 
> against having these defines in the params and not the configure flag itself 
> then I can just
> have an aarch64 specific configure flag and just use the created define 
> directly in the AArch64 back-end.
Not config.gcc, but in a .h/.c file for the target.

If we leave the generic option, but override the default in the target
files.  Would that work?

Jeff

Re: [PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Qing Zhao



> On Jul 20, 2018, at 9:59 AM, Jakub Jelinek  wrote:
> 
> On Fri, Jul 20, 2018 at 09:53:24AM -0500, Qing Zhao wrote:
>> +2018-07-20  Qing Zhao  
>> +
>> +   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
>> +   call to inline_expand_builtin_string_cmp.
>> +   (expand_builtin_strcmp): Likewise.
>> +   (expand_builtin_strncmp): Likewise.
>> +   (inline_string_cmp): Delete the last parameter, change char_type_node
>> +   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
>> +   two operands.
>> +   (inline_expand_builtin_string_cmp): Delete the last parameter, give 
>> up
>> +   the inlining expansion on target where the type of the call has same 
>> or 
>> +   narrower presicion than unsigned char.
> 
> s/presicion/precision/
> 
> Also in the patch, where there is another typo, s/of/or/.

Okay.
> 
> Ok for trunk with that fixed.

thanks a lot for the review.

Qing
> 
>   Jakub

RE: [PATCH][GCC][front-end][build-machinery][opt-framework] Allow setting of stack-clash via configure options. [Patch (4/6)]

2018-07-20 Thread Tamar Christina

> 
> On 07/20/2018 05:03 AM, Tamar Christina wrote:
> >> Understood.  Thanks for verifying.  I wonder if we could just bury
> >> this entirely in the aarch64 config files and not expose the default into
> params.def?
> >>
> >
> > Burying it in config.gcc isn't ideal because if your C runtime is
> > configurable (like uClibc) it means you have to go and modify this
> > file every time you change something. If the argument is against
> > having these defines in the params and not the configure flag itself then I
> can just have an aarch64 specific configure flag and just use the created
> define directly in the AArch64 back-end.
> Not config.gcc, but in a .h/.c file for the target.
> 
> If we leave the generic option, but override the default in the target files.
> Would that work?

So leaving the generic configure option? Yes that would work too. The only 
downside is
that if we have want to do any validation on the value at configure time it 
would need to
be manually kept in sync with those in params.def. Or we'd just have to not do 
any checking
at configure time.  This would mean you can get to the end of your build and 
only when you
try to use the compiler would it complain. 

Both aren't a real deal breaker to me.

Shall I then just leave the configure flag but remove the params plumbing?

Thanks,
Tamar

> 
> Jeff

Re: [PATCH 2/2] Add "-fsave-optimization-record"

2018-07-20 Thread David Malcolm

On Thu, 2018-07-19 at 14:39 +0200, Richard Biener wrote:
> On Wed, Jul 11, 2018 at 12:53 PM David Malcolm 
> wrote:
> > 
> > This patch implements a -fsave-optimization-record option, which
> > leads to a JSON file being written out, recording the dump_* calls
> > made (via the optinfo infrastructure in the previous patch).
> > 
> > The patch includes a minimal version of the JSON patch I posted
> > last
> > year, with just enough support needed for optimization records (I
> > removed all of the parser code, leaving just the code for building
> > in-memory JSON trees and writing them to a pretty_printer).
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> 
> +@item -fsave-optimization-record
> +@opindex fsave-optimization-record
> +Write a SRCFILE.opt-record.json file detailing what optimizations
> +were performed.
> +
> 
> I guess this should note that it is experimental and in no way
> complete.  Maybe list areas where reports will be generated,
> like vectorization?

Indeed. I also noticed that for some reason my patch had this option
between
  @item -frounding-math
and
  @item -fsignaling-nans
in the section on floating-point arithmetic options, so I've moved it
to the "GCC Developer Options" section for now, immediately after the
documentation of -fopt-info.

Here's what I've committed to invoke.texi (r262905):

BEGIN QUOTE:

@item -fsave-optimization-record
@opindex fsave-optimization-record
Write a SRCFILE.opt-record.json file detailing what optimizations
were performed, for those optimizations that support @option{-fopt-info}.

This option is experimental and the format of the data within the JSON
file is subject to change.

It is roughly equivalent to a machine-readable version of
@option{-fopt-info-all}, as a collection of messages with source file,
line number and column number, with the following additional data for
each message:

@itemize @bullet

@item
the execution count of the code being optimized, along with metadata about
whether this was from actual profile data, or just an estimate, allowing
consumers to prioritize messages by code hotness,

@item
the function name of the code being optimized, where applicable,

@item
the ``inlining chain'' for the code being optimized, so that when
a function is inlined into several different places (which might
themselves be inlined), the reader can distinguish between the copies,

@item
objects identifying those parts of the message that refer to expressions,
statements or symbol-table nodes, which of these categories they are, and,
when available, their source code location,

@item
the GCC pass that emitted the message, and

@item
the location in GCC's own code from which the message was emitted

@end itemize

Additionally, some messages are logically nested within other
messages, reflecting implementation details of the optimization
passes.

END QUOTE


> Did you check what happens with -flto -fsave-optimization-record?
> Will the compile-phase emit a json file for each source (expected,
> like with early inlining decisions)?  Will the WPA phase emit one
> (IPA decisions?) or will IPA decisions be recorded in the LTRANS
> one?  How will the LTRANS ones be named and where can they
> be found?  You don't need to solve all the issues with this patch
> but they should be eventually addressed somehow.

I'm looking at that now.

> I don't question the use or implementation of JSON, I'll just
> approve it.
> 
> The rest looks obvious enough, thus OK.

> 
> Some overall blurb in the documentation or changes.html
> on how to use this would be nice of course.

Thanks; will do.

Dave

> Thanks,
> Richard.
> 
> 
> > gcc/ChangeLog:
> > * Makefile.in (OBJS): Add json.o and optinfo-emit-json.o.
> > (CFLAGS-optinfo-emit-json.o): Define TARGET_NAME.
> > * common.opt (fsave-optimization-record): New option.
> > * coretypes.h (struct kv_pair): Move here from dumpfile.c.
> > * doc/invoke.texi (-fsave-optimization-record): New option.
> > * dumpfile.c: Include "optinfo-emit-json.h".
> > (struct kv_pair): Move to coretypes.h.
> > (optgroup_options): Make non-static.
> > (dump_context::end_scope): Call
> > optimization_records_maybe_pop_dump_scope.
> > * dumpfile.h (optgroup_options): New decl.
> > * json.cc: New file.
> > * json.h: New file.
> > * optinfo-emit-json.cc: New file.
> > * optinfo-emit-json.h: New file.
> > * optinfo.cc: Include "optinfo-emit-json.h".
> > (optinfo::emit): Call
> > optimization_records_maybe_record_optinfo.
> > (optinfo_enabled_p): Check optimization_records_enabled_p.
> > (optinfo_wants_inlining_info_p): Likewise.
> > * optinfo.h: Update comment.
> > * profile-count.c (profile_quality_as_string): New
> > function.
> > * profile-count.h (profile_quality_as_string): New decl.
> > (profile_count::quality): New accessor

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Umesh Kalappa

Thank you all for your comments .

Wilco,
We tried some of the normalisation numbers and the fix works and please
could you help us with the input ,where  if you see that fix breaks down.

Thank you again
~Umesh
On Fri, Jul 20, 2018, 7:07 PM Wilco Dijkstra  wrote:

> Hi Umesh,
>
> Looking at your patch, this would break all results which need to be
> normalized.
>
>
> Index: libgcc/config/arm/ieee754-df.S
> ===
> --- libgcc/config/arm/ieee754-df.S  (revision 262850)
> +++ libgcc/config/arm/ieee754-df.S  (working copy)
> @@ -203,8 +203,11 @@
>  #endif
>
> @ Determine how to normalize the result.
> +   @ if result is denormal i.e (exp)=0,then don't normalise the
> result,
>  LSYM(Lad_p):
> cmp xh, #0x0010
> +   blt LSYM(Lad_e)
> +   cmp xh, #0x0010
> bcc LSYM(Lad_a)
> cmp xh, #0x0020
> bcc LSYM(Lad_e)
>
> It seems Lad_a doesn't correctly handle the case where the result is a
> denormal. For this case
> the result is correct so nothing else needs to be done. This requires an
> explicit test that the
> exponent is zero - other cases still need to be renormalized as usual.
> This code looks overly
> complex so any change will require extensive testing of all the corner
> cases.
>
> Wilco
>

[PATCH] -fsave-optimization-record: add contrib/optrecord.py

2018-07-20 Thread David Malcolm

This patch adds a Python 3 module to "contrib" for reading the output of
-fsave-optimization-record.

It can be imported from other Python code, or run standalone as a script,
in which case it prints the saved messages in a form resembling GCC
diagnostics.

OK for trunk?

contrib/ChangeLog:
* optrecord.py: New file.
---
 contrib/optrecord.py | 295 +++
 1 file changed, 295 insertions(+)
 create mode 100755 contrib/optrecord.py

diff --git a/contrib/optrecord.py b/contrib/optrecord.py
new file mode 100755
index 000..b07488e
--- /dev/null
+++ b/contrib/optrecord.py
@@ -0,0 +1,295 @@
+#!/usr/bin/env python3
+#
+# Python module for working with the result of -fsave-optimization-record
+# Contributed by David Malcolm .
+#
+# Copyright (C) 2018 Free Software Foundation, Inc.
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .  */
+
+import argparse
+import json
+import os
+import sys
+
+class TranslationUnit:
+"""Top-level class for containing optimization records"""
+@staticmethod
+def from_filename(filename):
+with open(filename) as f:
+root_obj = json.load(f)
+return TranslationUnit(filename, root_obj)
+
+def __init__(self, filename, json_obj):
+self.filename = filename
+self.pass_by_id = {}
+
+# Expect a 3-tuple
+metadata, passes, records = json_obj
+
+self.format = metadata['format']
+self.generator = metadata['generator']
+self.passes = [Pass(obj, self) for obj in passes]
+self.records = [Record(obj, self) for obj in records]
+
+def __repr__(self):
+return ('TranslationUnit(%r, %r, %r, %r)'
+% (self.filename, self.generator, self.passes, self.records))
+
+class Pass:
+"""An optimization pass"""
+def __init__(self, json_obj, tu):
+self.id_ = json_obj['id']
+self.name = json_obj['name']
+self.num = json_obj['num']
+self.optgroups = set(json_obj['optgroups']) # list of strings
+self.type = json_obj['type']
+tu.pass_by_id[self.id_] = self
+self.children = [Pass(child, tu)
+ for child in json_obj.get('children', [])]
+
+def __repr__(self):
+return ('Pass(%r, %r, %r, %r)'
+% (self.name, self.num, self.optgroups, self.type))
+
+def from_optional_json_field(cls, jsonobj, field):
+if field not in jsonobj:
+return None
+return cls(jsonobj[field])
+
+class ImplLocation:
+"""An implementation location (within the compiler itself)"""
+def __init__(self, json_obj):
+self.file = json_obj['file']
+self.line = json_obj['line']
+self.function = json_obj['function']
+
+def __repr__(self):
+return ('ImplLocation(%r, %r, %r)'
+% (self.file, self.line, self.function))
+
+class Location:
+"""A source location"""
+def __init__(self, json_obj):
+self.file = json_obj['file']
+self.line = json_obj['line']
+self.column = json_obj['column']
+
+def __str__(self):
+return '%s:%i:%i' % (self.file, self.line, self.column)
+
+def __repr__(self):
+return ('Location(%r, %r, %r)'
+% (self.file, self.line, self.column))
+
+class Count:
+"""An execution count"""
+def __init__(self, json_obj):
+self.quality = json_obj['quality']
+self.value = json_obj['value']
+
+def __repr__(self):
+return ('Count(%r, %r)'
+% (self.quality, self.value))
+
+def is_precise(self):
+return self.quality in ('precise', 'adjusted')
+
+class Record:
+"""A optimization record: success/failure/note"""
+def __init__(self, json_obj, tu):
+self.kind = json_obj['kind']
+if 'pass' in json_obj:
+self.pass_ = tu.pass_by_id[json_obj['pass']]
+else:
+self.pass_ = None
+self.function = json_obj.get('function', None)
+self.impl_location = from_optional_json_field(ImplLocation, json_obj,
+  'impl_location')
+self.message = [Item.from_json(obj) for obj in json_obj['message']]
+self.count = from_optional_json_field(Count, json_obj, 'count')
+self.location = from_optional_json_field(Location, json_obj, 
'location')
+if

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Wilco Dijkstra

Umesh Kalappa wrote:

> We tried some of the normalisation numbers and the fix works and please
> could you help us with the input ,where  if you see that fix breaks down.

Well try any set of inputs which require normalisation. You'll find these no
longer get normalised and so will get incorrect results. Try basic cases like
1.0 - 0.75 which I think will return 0.625...

A basic test would be to run old vs new on a large set of inputs to verify
there aren't any obvious differences.

Wilco

[PATCH 1/2] condition_variable: Report early wakeup of wait_until as no_timeout

2018-07-20 Thread Mike Crowe

As currently implemented, condition_variable always ultimately waits
against std::chrono::system_clock. This clock can be changed in arbitrary
ways by the user which may result in us waking up too early or too late
when measured against the caller-supplied clock.

We can't (yet) do much about waking up too late[1], but
if we wake up too early we must return cv_status::no_timeout to indicate a
spurious wakeup rather than incorrectly returning cv_status::timeout.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861
---
 libstdc++-v3/ChangeLog|  7 +++
 libstdc++-v3/include/std/condition_variable   |  8 +++-
 libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc | 52 

 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index c9cd62a..4657af7 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,10 @@
+2018-07-20  Mike Crowe 
+   * include/std/condition_variable (wait_until): Only report timeout
+   if we really have timed out when measured against the
+   caller-supplied clock.
+   * testsuite/30_threads/condition_variable/members/2.cc: Add test
+   case to confirm above behaviour.
+
 2018-07-20  Jonathan Wakely  

PR libstdc++/86595
diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index 84863a1..a2d146a 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -116,7 +116,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
const auto __delta = __atime - __c_entry;
const auto __s_atime = __s_entry + __delta;

-   return __wait_until_impl(__lock, __s_atime);
+   // We might get a timeout when measured against __clock_t but
+   // we need to check against the caller-supplied clock to tell
+   // whether we should return a timeout.
+   if (__wait_until_impl(__lock, __s_atime) == cv_status::timeout)
+ return _Clock::now() < __atime ? cv_status::no_timeout : 
cv_status::timeout;
+   else
+ return cv_status::no_timeout;
   }

 template
diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc 
b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
index 6f9724b..16850a4 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
@@ -52,8 +52,60 @@ void test01()
 }
 }

+struct slow_clock
+{
+  using rep = std::chrono::system_clock::rep;
+  using period = std::chrono::system_clock::period;
+  using duration = std::chrono::system_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = false;
+
+  static time_point now()
+  {
+auto real = std::chrono::system_clock::now();
+return time_point{real.time_since_epoch() / 3};
+  }
+};
+
+
+void test01_alternate_clock()
+{
+  try
+{
+  std::condition_variable c1;
+  std::mutex m;
+  std::unique_lock l(m);
+  auto const expire = slow_clock::now() + std::chrono::seconds(1);
+
+  while (slow_clock::now() < expire)
+   {
+ auto const result = c1.wait_until(l, expire);
+
+ // If wait_until returns before the timeout has expired when
+ // measured against the supplied clock, then wait_until must
+ // return no_timeout.
+ if (slow_clock::now() < expire)
+   VERIFY(result == std::cv_status::no_timeout);
+
+ // If wait_until returns timeout then the timeout must have
+ // expired.
+ if (result == std::cv_status::timeout)
+   VERIFY(slow_clock::now() >= expire);
+   }
+}
+  catch (const std::system_error& e)
+{
+  VERIFY( false );
+}
+  catch (...)
+{
+  VERIFY( false );
+}
+}
+
 int main()
 {
   test01();
+  test01_alternate_clock();
   return 0;
 }

base-commit: 3052e4ec519e4f5456ab63f4954ae098524316ce
--
git-series 0.9.1
BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.

[PATCH 2/2] condition_variable: Use steady_clock to implement wait_for

2018-07-20 Thread Mike Crowe

I believe[1][2] that the C++ standard says that
std::condition_variable::wait_for should be implemented to be equivalent
to:

 return wait_until(lock, chrono::steady_clock::now() + rel_time);

But the existing implementation uses chrono::system_clock. Now that
wait_until has potentially-different behaviour for chrono::steady_clock,
let's at least try to wait using the correct clock.

[1] https://en.cppreference.com/w/cpp/thread/condition_variable/wait_for
[2] https://github.com/cplusplus/draft/blob/master/source/threads.tex
---
 libstdc++-v3/ChangeLog  | 3 +++
 libstdc++-v3/include/std/condition_variable | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index 4657af7..432cb84 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,4 +1,7 @@
 2018-07-20  Mike Crowe 
+   * include/std/condition_variable (wait_for): Use steady_clock.
+
+2018-07-20  Mike Crowe 
* include/std/condition_variable (wait_until): Only report timeout
if we really have timed out when measured against the
caller-supplied clock.
diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index a2d146a..ce58399 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -65,6 +65,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class condition_variable
   {
 typedef chrono::system_clock   __clock_t;
+typedef chrono::steady_clock   __steady_clock_t;
 typedef __gthread_cond_t   __native_type;

 #ifdef __GTHREAD_COND_INIT
@@ -142,11 +143,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   wait_for(unique_lock& __lock,
   const chrono::duration<_Rep, _Period>& __rtime)
   {
-   using __dur = typename __clock_t::duration;
+   using __dur = typename __steady_clock_t::duration;
auto __reltime = chrono::duration_cast<__dur>(__rtime);
if (__reltime < __rtime)
  ++__reltime;
-   return wait_until(__lock, __clock_t::now() + __reltime);
+   return wait_until(__lock, __steady_clock_t::now() + __reltime);
   }

 template
--
git-series 0.9.1
BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.

Re: [PATCH] restore -Warray-bounds for string literals (PR 83776)

2018-07-20 Thread Martin Sebor


On 07/20/2018 05:34 AM, Rainer Orth wrote:

Hi Martin,


On 07/19/2018 03:51 PM, Jeff Law wrote:

On 07/13/2018 05:45 PM, Martin Sebor wrote:


+  offset_int ofr[] = {
+   wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
+   wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
+  };

huh.  Do you maybe want to use widest_int for ofr[]?  What's
wrong with wi::to_offset (vr->min)?  Building another intermediate
tree node here looks just bogus.


I need to convert size_type indices to signed to keep their sign
if it's negative and include it in warnings.  I've moved the code
into a conditional where it's used to minimize the cost but other
than that I don't know how else to convert it.



What are you trying to do in this loop anyways?


The loop It determines the range of the final index/offset for
a series of POINTER_PLUS_EXPRs.  It handles cases like this:

  int g (int i, int j, int k)
  {
if (i < 1) i = 1;
if (j < 1) j = 1;
if (k < 1) k = 1;

const char *p0 = "123";
const char *p1 = p0 + i;
const char *p2 = p1 + j;
const char *p3 = p2 + k;

// p2[3] and p3[1] are out of bounds
return p0[4] + p1[3] + p2[2] + p3[1];
  }


I suppose
look at

  p_1 = &obj[i_2];  // already bounds-checked, but with ignore_off_by_one
  ... = MEM[p_1 + CST];

?  But then

+  if (TREE_CODE (varoff) != SSA_NAME)
+   break;

you should at least handle INTEGER_CSTs here?


It's already handled (and set in CSTOFF).  There should be no
more constant offsets after that (I haven't come across any.)



+  if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
+   break;

please use positive tests here, VR_RANGE || VR_ANTI_RANGE.  As usual
the anti-range handling looks odd.  Please add comments so we can follow
what you were thinking when writing range merging code.  Even better if you

can stick to use existing code rather than always re-inventing the wheel...



The anti-range handling is to conservatively add
[-MAXOBJSIZE -1, MAXOBJSIZE] to OFFRANGE.  I've added comments
to make it clear.  I'd be more than happy to reuse existing
code if I knew where to find it (if it exists).  It sure would
save me lots of work to have a library of utility functions to
call instead of rolling my own code each time.

Finding stuff is never easy :(  GCC's just gotten so big it's virtually
impossible for anyone to know all the APIs.

The suggestion I'd have would be to (when possible) factor this stuff
into routines you can reuse.  We (as a whole) have this tendency to
open-code all kinds of things rather than factoring the code into
reusable routines.

In addition to increasing the probability that you'll be able to reuse
code later, just reading a new function's header tends to make me (as a
reviewer) internally ask if there's already a routine we should be using
instead.  When it's open-coded it's significantly harder to spot those
cases (at least for me).






I think I commented on earlier variants but this doesn't seem to resemble
any of them.


I've reworked the patch (sorry) to also handle arrays.  For GCC
9 it seems I might as well do both in one go.

Attached is an updated patch with these changes.

Martin

gcc-83776.diff


PR tree-optimization/84047 - missing -Warray-bounds on an out-of-bounds
index into an array
PR tree-optimization/83776 - missing -Warray-bounds indexing past the
end of a string literal

gcc/ChangeLog:

PR tree-optimization/84047
PR tree-optimization/83776
* tree-vrp.c (vrp_prop::check_mem_ref): New function.
(check_array_bounds): Call it.
* /gcc/tree-sra.c (get_access_for_expr): Fail for out-of-bounds
array offsets.

gcc/testsuite/ChangeLog:

PR tree-optimization/83776
PR tree-optimization/84047
* gcc.dg/Warray-bounds-29.c: New test.
* gcc.dg/Warray-bounds-30.c: New test.
* gcc.dg/Warray-bounds-31.c: New test.
* gcc.dg/Warray-bounds-32.c: New test.




diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 3e30f6b..8221a06 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -3110,6 +3110,19 @@ get_access_for_expr (tree expr)
   || !DECL_P (base))
 return NULL;

+  /* Fail for out-of-bounds array offsets.  */
+  tree basetype = TREE_TYPE (base);
+  if (TREE_CODE (basetype) == ARRAY_TYPE)
+{
+  if (offset < 0)
+   return NULL;
+
+  if (tree size = DECL_SIZE (base))
+   if (tree_fits_uhwi_p (size)
+   && tree_to_uhwi (size) < (unsigned HOST_WIDE_INT) offset)
+ return NULL;
+}
+
   if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
 return NULL;

So I'm a bit curious about this hunk.  Did we end up creating an access
structure that walked off the end of the array?  Presumably  you
suppressing SRA at this point so that you can see the array access later
and give a suitable warning.  Right?


Yes, but I didn't make a note of the test case that triggered
it and I'm not able to trigger the code p

[PATCH] i386: Remove _Unwind_Frames_Increment

2018-07-20 Thread H.J. Lu

Tested on CET SDV using the CET kernel on cet branch at:

https://github.com/yyu168/linux_cet/tree/cet

OK for trunk and GCC 8 branch?

Thanks.


H.J.
---
The CET kernel has been changed to place a restore token on shadow stack
for signal handler to enhance security.  It is usually transparent to user
programs since kernel will pop the restore token when signal handler
returns.  But when an exception is thrown from a signal handler, now
we need to remove _Unwind_Frames_Increment to pop the the restore token
from shadow stack.  Otherwise, we get

FAIL: g++.dg/torture/pr85334.C   -O0  execution test
FAIL: g++.dg/torture/pr85334.C   -O1  execution test
FAIL: g++.dg/torture/pr85334.C   -O2  execution test
FAIL: g++.dg/torture/pr85334.C   -O3 -g  execution test
FAIL: g++.dg/torture/pr85334.C   -Os  execution test
FAIL: g++.dg/torture/pr85334.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test

PR libgcc/85334
* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
Removed.
---
 libgcc/config/i386/shadow-stack-unwind.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/libgcc/config/i386/shadow-stack-unwind.h 
b/libgcc/config/i386/shadow-stack-unwind.h
index a32f3e74b52..40f48df2aec 100644
--- a/libgcc/config/i386/shadow-stack-unwind.h
+++ b/libgcc/config/i386/shadow-stack-unwind.h
@@ -49,8 +49,3 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
}   \
 }  \
 while (0)
-
-/* Increment frame count.  Skip signal frames.  */
-#undef _Unwind_Frames_Increment
-#define _Unwind_Frames_Increment(context, frames) \
-  if (!_Unwind_IsSignalFrame (context)) frames++
-- 
2.17.1

Re: [PATCH][Middle-end][version 2]change char type to unsigned char type when expanding strcmp/strncmp

2018-07-20 Thread Qing Zhao

the patch was committed as:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=262907 


thanks.

Qing
> On Jul 20, 2018, at 9:59 AM, Jakub Jelinek  wrote:
> 
> On Fri, Jul 20, 2018 at 09:53:24AM -0500, Qing Zhao wrote:
>> +2018-07-20  Qing Zhao  
>> +
>> +   * builtins.c (expand_builtin_memcmp): Delete the last parameter for
>> +   call to inline_expand_builtin_string_cmp.
>> +   (expand_builtin_strcmp): Likewise.
>> +   (expand_builtin_strncmp): Likewise.
>> +   (inline_string_cmp): Delete the last parameter, change char_type_node
>> +   to unsigned_char_type_node for strcmp/strncmp, add conversions to the
>> +   two operands.
>> +   (inline_expand_builtin_string_cmp): Delete the last parameter, give 
>> up
>> +   the inlining expansion on target where the type of the call has same 
>> or 
>> +   narrower presicion than unsigned char.
> 
> s/presicion/precision/
> 
> Also in the patch, where there is another typo, s/of/or/.
> 
> Ok for trunk with that fixed.
> 
>   Jakub

[PATCH,rs6000] AIX test fixes 2

2018-07-20 Thread Carl Love

GCC maintainers:

The following patch fixes errors on AIX for the "vector double" tests
in altivec-1-runnable.c file.  The type "vector double" requires the
use of the GCC command line option -mvsx. The vector double tests
in altivec-1-runnable.c should be in altivec-2-runnable.c.  It looks
like my Linux testing of the original patch worked because I configured
GCC by default with -mcpu=power8.  AIX is not using that as the default
processor thus causing the compile of altivec-1-runnable.c to fail.

The vec_or tests in builtins-1.c were moved to another file by a
previous patch.  The vec_or test generated the xxlor instruction.  The
count of the xxlor instruction varies depending on the target as it is
used as a move instruction.  No other tests generate the xxlor
instruction. Hence, the count check was removed.

The patch has been tested on 

powerpc64le-unknown-linux-gnu (Power 8 LE) 
powerpc64-unknown-linux-gnu (Power 8 BE)
AIX (Power 8)

With no regressions.

Please let me know if the patch looks OK for trunk.

 Carl Love



gcc/testsuite/ChangeLog:

2018-07-20  Carl Love  

* gcc.target/powerpc/altivec-1-runnable.c: Move vector double tests to
file altivec-2-runnable.c.
* gcc.target/powerpc/altivec-2-runnable.c: Add vector double tests.
* gcc.target/powerpc/buitlins-1.c: Remove check for xxlor.  Add linux 
and AIX
targets for divdi3 and udivdi3 instructions.
---
 .../gcc.target/powerpc/altivec-1-runnable.c| 50 --
 .../gcc.target/powerpc/altivec-2-runnable.c| 49 -
 gcc/testsuite/gcc.target/powerpc/builtins-1.c  |  9 ++--
 3 files changed, 52 insertions(+), 56 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index bb913d2..da8ebbc 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -31,16 +31,9 @@ int main ()
   vector signed int vec_si_result, vec_si_expected;
   vector signed char vec_sc_arg;
   vector signed char vec_sc_result, vec_sc_expected;
-  vector float vec_float_arg;
-  vector double vec_double_result, vec_double_expected;
   vector pixel vec_pixel_arg;
   vector unsigned int vec_ui_result, vec_ui_expected;
 
-  union conv {
-     double d;
-     unsigned long long l;
-  } conv_exp, conv_val;
-
   vec_bs_arg = (vector bool short){ 0, 101, 202, 303,
    404, 505, 606, 707 };
   vec_bi_expected = (vector bool int){ 0, 101, 202, 303 };
@@ -209,49 +202,6 @@ int main ()
abort();
 #endif
   }
-  
-
-  vec_float_arg = (vector float){ 0.0, 1.5, 2.5, 3.5 };
-
-  vec_double_expected = (vector double){ 0.0, 1.5 };
-
-  vec_double_result = vec_unpackh (vec_float_arg);
-
-  for (i = 0; i < 2; i++) {
-if (vec_double_expected[i] != vec_double_result[i])
-  {
-#if DEBUG
-    printf("ERROR: vec_unpackh(), vec_double_expected[%d] = %f does not 
match vec_double_result[%d] = %f\n",
-   i, vec_double_expected[i], i, vec_double_result[i]);
-    conv_val.d = vec_double_result[i];
-    conv_exp.d = vec_double_expected[i];
-    printf(" vec_unpackh(), vec_double_expected[%d] = 0x%llx does not 
match vec_double_result[%d] = 0x%llx\n",
-   i, conv_exp.l, i,conv_val.l);
-#else
-    abort();
-#endif
-}
-  }
-
-  vec_double_expected = (vector double){ 2.5, 3.5 };
-
-  vec_double_result = vec_unpackl (vec_float_arg);
-
-  for (i = 0; i < 2; i++) {
-if (vec_double_expected[i] != vec_double_result[i])
-  {
-#if DEBUG
- printf("ERROR: vec_unpackl() vec_double_expected[%d] = %f does not 
match vec_double_result[%d] = %f\n",
-   i, vec_double_expected[i], i, vec_double_result[i]);
-    conv_val.d = vec_double_result[i];
-    conv_exp.d = vec_double_expected[i];
-    printf(" vec_unpackh(), vec_double_expected[%d] = 0x%llx does not 
match vec_double_result[%d] = 0x%llx\n",
-   i, conv_exp.l, i,conv_val.l);
-#else
- abort();
-#endif
-  }
-  }
 
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 9d8aad4..041edcb 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -23,8 +23,15 @@ int main ()
 
   vector signed int vec_si_arg;
   vector signed long long int vec_slli_result, vec_slli_expected;
+  vector float vec_float_arg;
+  vector double vec_double_result, vec_double_expected;
 
-  /*  use of ‘long long’ in AltiVec types requires -mvsx */
+  union conv {
+     double d;
+     unsigned long long l;
+  } conv_exp, conv_val;
+
+  /* Use of 'double' and ‘long long’ in AltiVec types requires -mvsx */
   /* __builtin_alti

Go patch committed: Fix order of evaluation of boolean expressions

2018-07-20 Thread Ian Lance Taylor

This patch by Cherry Zhang changes the Go frontend to run the
order_evaluations before the remove_shortcuts pass.

In remove_shortcuts, the shortcut expressions (&&, ||) are rewritten
to if statements, which are lifted out before the statement containing
the shortcut expression.  If the containing statement has other
(sub)expressions that should be evaluated before the shortcut
expression, which has not been lifted out, this will result in
incorrect evaluation order.

For example, F() + G(A() && B()), the evaluation order per the
language spec is F, A, B (if A returns true), G. If we lift A() and
B() out first, they will be called before F, which is wrong.

To fix this, this patch splits order_evaluations to two phases.  The
first phase, which runs before remove_shortcuts, skips shortcut
expressions' components.  So it won't lift out subexpressions that are
evaluated conditionally.  The shortcut expression itself is ordered,
since it may have side effects.  Then we run remove_shortcuts.  At
this point the subexpressions that should be evaluated before the
shortcut expression are already lifted out.  The remove_shortcuts pass
also runs the second phase of order_evaluations to order the
components of shortcut expressions, which were skipped during the
first phase.

This reorders the code blocks of remove_shortcuts and
order_evaluations, since remove_shortcuts now calls Order_eval.

This fixes https://golang.org/issue/26495.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262833)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-38850073f25f9de4f3daa33d799def3a33c942ab
+39d4d755db7d71b5e770ca435a8b1d1f08f53185
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/go.cc
===
--- gcc/go/gofrontend/go.cc (revision 262658)
+++ gcc/go/gofrontend/go.cc (working copy)
@@ -143,12 +143,12 @@ go_parse_input_files(const char** filena
   // Export global identifiers as appropriate.
   ::gogo->do_exports();
 
-  // Turn short-cut operators (&&, ||) into explicit if statements.
-  ::gogo->remove_shortcuts();
-
   // Use temporary variables to force order of evaluation.
   ::gogo->order_evaluations();
 
+  // Turn short-cut operators (&&, ||) into explicit if statements.
+  ::gogo->remove_shortcuts();
+
   // Convert named types to backend representation.
   ::gogo->convert_named_types();
 
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 262658)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -3432,210 +3432,6 @@ Gogo::check_types_in_block(Block* block)
   block->traverse(&traverse);
 }
 
-// A traversal class used to find a single shortcut operator within an
-// expression.
-
-class Find_shortcut : public Traverse
-{
- public:
-  Find_shortcut()
-: Traverse(traverse_blocks
-  | traverse_statements
-  | traverse_expressions),
-  found_(NULL)
-  { }
-
-  // A pointer to the expression which was found, or NULL if none was
-  // found.
-  Expression**
-  found() const
-  { return this->found_; }
-
- protected:
-  int
-  block(Block*)
-  { return TRAVERSE_SKIP_COMPONENTS; }
-
-  int
-  statement(Block*, size_t*, Statement*)
-  { return TRAVERSE_SKIP_COMPONENTS; }
-
-  int
-  expression(Expression**);
-
- private:
-  Expression** found_;
-};
-
-// Find a shortcut expression.
-
-int
-Find_shortcut::expression(Expression** pexpr)
-{
-  Expression* expr = *pexpr;
-  Binary_expression* be = expr->binary_expression();
-  if (be == NULL)
-return TRAVERSE_CONTINUE;
-  Operator op = be->op();
-  if (op != OPERATOR_OROR && op != OPERATOR_ANDAND)
-return TRAVERSE_CONTINUE;
-  go_assert(this->found_ == NULL);
-  this->found_ = pexpr;
-  return TRAVERSE_EXIT;
-}
-
-// A traversal class used to turn shortcut operators into explicit if
-// statements.
-
-class Shortcuts : public Traverse
-{
- public:
-  Shortcuts(Gogo* gogo)
-: Traverse(traverse_variables
-  | traverse_statements),
-  gogo_(gogo)
-  { }
-
- protected:
-  int
-  variable(Named_object*);
-
-  int
-  statement(Block*, size_t*, Statement*);
-
- private:
-  // Convert a shortcut operator.
-  Statement*
-  convert_shortcut(Block* enclosing, Expression** pshortcut);
-
-  // The IR.
-  Gogo* gogo_;
-};
-
-// Remove shortcut operators in a single statement.
-
-int
-Shortcuts::statement(Block* block, size_t* pindex, Statement* s)
-{
-  // FIXME: This approach doesn't work for switch statements, because
-  // we add the new statements before the whole switch when we need to
-  // instead add them just before the switch expression.  The right
-  // fix is probably to lower switch statem

[PATCH] libsanitizer: Mark REAL(swapcontext) with indirect_return attribute on x86

2018-07-20 Thread H.J. Lu

Cherry-pick compiler-rt revision 337603:

When shadow stack from Intel CET is enabled, the first instruction of all
indirect branch targets must be a special instruction, ENDBR.

lib/asan/asan_interceptors.cc has

...
  int res = REAL(swapcontext)(oucp, ucp);
...

REAL(swapcontext) is a function pointer to swapcontext in libc.  Since
swapcontext may return via indirect branch on x86 when shadow stack is
enabled, as in this case,

int res = REAL(swapcontext)(oucp, ucp);
    This function may be
returned via an indirect branch.

Here compiler must insert ENDBR after call, like

call *bar(%rip)
endbr64

I opened an LLVM bug:

https://bugs.llvm.org/show_bug.cgi?id=38207

to add the indirect_return attribute so that it can be used to inform
compiler to insert ENDBR after REAL(swapcontext) call.  We mark
REAL(swapcontext) with the indirect_return attribute if it is available.

This fixed:

https://bugs.llvm.org/show_bug.cgi?id=38249

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D49608

OK for trunk?

H.J.
---
PR target/86560
* asan/asan_interceptors.cc (swapcontext): Call REAL(swapcontext)
with indirect_return attribute on x86 if indirect_return attribute
is available.
* sanitizer_common/sanitizer_internal_defs.h (__has_attribute):
New.
---
 libsanitizer/asan/asan_interceptors.cc  | 8 
 libsanitizer/sanitizer_common/sanitizer_internal_defs.h | 5 +
 2 files changed, 13 insertions(+)

diff --git a/libsanitizer/asan/asan_interceptors.cc 
b/libsanitizer/asan/asan_interceptors.cc
index a8f4b72723f..552cf9347af 100644
--- a/libsanitizer/asan/asan_interceptors.cc
+++ b/libsanitizer/asan/asan_interceptors.cc
@@ -267,7 +267,15 @@ INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
   uptr stack, ssize;
   ReadContextStack(ucp, &stack, &ssize);
   ClearShadowMemoryForContextStack(stack, ssize);
+#if __has_attribute(__indirect_return__) && \
+(defined(__x86_64__) || defined(__i386__))
+  int (*real_swapcontext)(struct ucontext_t *, struct ucontext_t *)
+__attribute__((__indirect_return__))
+= REAL(swapcontext);
+  int res = real_swapcontext(oucp, ucp);
+#else
   int res = REAL(swapcontext)(oucp, ucp);
+#endif
   // swapcontext technically does not return, but program may swap context to
   // "oucp" later, that would look as if swapcontext() returned 0.
   // We need to clear shadow for ucp once again, as it may be in arbitrary
diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h 
b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
index edd6a21c122..4413a88bea0 100644
--- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
+++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
@@ -104,6 +104,11 @@
 # define __has_feature(x) 0
 #endif
 
+// Older GCCs do not understand __has_attribute.
+#if !defined(__has_attribute)
+# define __has_attribute(x) 0
+#endif
+
 // For portability reasons we do not include stddef.h, stdint.h or any other
 // system header, but we do need some basic types that are not defined
 // in a portable way by the language itself.
-- 
2.17.1

Re: [PATCH] specify large command line option arguments (PR 82063)

2018-07-20 Thread Martin Sebor


On 07/19/2018 04:31 PM, Jeff Law wrote:

On 06/24/2018 03:05 PM, Martin Sebor wrote:

Storing integer command line option arguments in type int
limits options such as -Wlarger-than= or -Walloca-larger-than
to at most INT_MAX (see bug 71905).  Larger values wrap around
zero.  The value zero is considered to disable the option,
making it impossible to specify a zero limit.

To get around these limitations, the -Walloc-size-larger-than=
option accepts a string argument that it then parses itself
and interprets as HOST_WIDE_INT.  The option also accepts byte
size suffixes like KB, MB, GiB, etc. to make it convenient to
specify very large limits.

The int limitation is obviously less than ideal in a 64-bit
world.  The treatment of zero as a toggle is just a minor wart.
The special treatment to make it work for just a single option
makes option handling inconsistent.  It should be possible for
any option that takes an integer argument to use the same logic.

The attached patch enhances GCC option processing to do that.
It changes the storage type of option arguments from int to
HOST_WIDE_INT and extends the existing (although undocumented)
option property Host_Wide_Int to specify wide option arguments.
It also introduces the ByteSize property for options for which
specifying the byte-size suffix makes sense.

To make it possible to consider zero as a meaningful argument
value rather than a flag indicating that an option is disabled
the patch also adds a CLVC_SIZE enumerator to the cl_var_type
enumeration, and modifies how options of the kind are handled.

Warning options that take large byte-size arguments can be
disabled by specifying a value equal to or greater than
HOST_WIDE_INT_M1U.  For convenience, aliases in the form of
-Wno-xxx-larger-than have been provided for all the affected
options.

In the patch all the existing -larger-than options are set
to PTRDIFF_MAX.  This makes them effectively enabled, but
because the setting is exceedingly permissive, and because
some of the existing warnings are already set to the same
value and some other checks detect and reject such exceedingly
large values with errors, this change shouldn't noticeably
affect what constructs are diagnosed.

Although all the options are set to PTRDIFF_MAX, I think it
would make sense to consider setting some of them lower, say
to PTRDIFF_MAX / 2.  I'd like to propose that in a followup
patch.

To minimize observable changes the -Walloca-larger-than and
-Wvla-larger-than warnings required more extensive work to
make of the new mechanism because of the "unbounded" argument
handling (the warnings trigger for arguments that are not
visibly constrained), and because of the zero handling
(the warnings also trigger


Martin


gcc-82063.diff


PR middle-end/82063 - issues with arguments enabled by -Wall

gcc/ada/ChangeLog:

PR middle-end/82063
* gcc-interface/misc.c (gnat_handle_option): Change function argument
to HOST_WIDE_INT.

gcc/brig/ChangeLog:
* brig/brig-lang.c (brig_langhook_handle_option): Change function
argument to HOST_WIDE_INT.

gcc/c-family/ChangeLog:

PR middle-end/82063
* c-common.h (c_common_handle_option): Change function argument
to HOST_WIDE_INT.
* c-opts.c (c_common_init_options): Same.
(c_common_handle_option): Same.  Remove special handling of
OPT_Walloca_larger_than_ and OPT_Wvla_larger_than_.
* c.opt (-Walloc-size-larger-than, -Walloca-larger-than): Change
options to take a HOST_WIDE_INT argument and accept a byte-size
suffix.  Initialize.
(-Wvla-larger-than): Same.
(-Wno-alloc-size-larger-than, -Wno-alloca-larger-than): New.
(-Wno-vla-larger-than): Same.


gcc/fortran/ChangeLog:

PR middle-end/82063
* gfortran.h (gfc_handle_option): Change function argument
to HOST_WIDE_INT.
* options.c (gfc_handle_option): Same.

gcc/go/ChangeLog:

PR middle-end/82063
* go-lang.c (go_langhook_handle_option): Change function argument
to HOST_WIDE_INT.

gcc/lto/ChangeLog:

PR middle-end/82063
* lto-lang.c (lto_handle_option): Change function argument
to HOST_WIDE_INT.

gcc/testsuite/ChangeLog:

PR middle-end/82063
* gcc.dg/Walloc-size-larger-than-16.c: Adjust.
* gcc.dg/Walloca-larger-than.c: New test.
* gcc.dg/Wframe-larger-than-2.c: New test.
* gcc.dg/Wlarger-than3.c: New test.
* gcc.dg/Wvla-larger-than-3.c: New test.

gcc/ChangeLog:

PR middle-end/82063
* builtins.c (expand_builtin_alloca): Adjust.
* calls.c (alloc_max_size): Simplify.
* cgraphunit.c (cgraph_node::expand): Adjust.
* common.opt (larger_than_size, warn_frame_larger_than): Remove
variables.
(frame_larger_than_size): Same.
(-Wframe-larger-than, -Wlarger-than, -Wstack-usage): Change options
to take a HOST_WIDE_INT argument and accept

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-20 Thread Bernd Edlinger

I think I have found a valid test case where the latest patch
does generate invalid code:

$ cat part.c
static const char a[3][8] = { "1234", "12345", "123456" };

int main ()
{
   volatile int i = 1;
   int n = __builtin_strlen (*(&a[1]+i));

   if (n != 6)
 __builtin_abort ();
}
$ gcc part.c -fdump-tree-original
$ ./a.out
Aborted (core dumped)
$ cat part.c.004t.original

;; Function main (null)
;; enabled by -tree-original


{
   volatile int i = 1;
   int n = (ssizetype) (SAVE_EXPR <(long unsigned int) i * 8>) <= 5 ? (int) (5 
- (unsigned int) (SAVE_EXPR <(long unsigned int) i * 8>)) : 0;

 volatile int i = 1;
 int n = (ssizetype) (SAVE_EXPR <(long unsigned int) i * 8>) <= 5 ? (int) 
(5 - (unsigned int) (SAVE_EXPR <(long unsigned int) i * 8>)) : 0;
   if (n != 6)
 {
   __builtin_abort ();
 }
}
return 0;


string_constant is called with arg = (const char *) (&a[1] + (sizetype) ((long 
unsigned int) i * 8))



Bernd.

committed: remove redundant -Wall from -Warray-bounds (PR 82063)

2018-07-20 Thread Martin Sebor


As the last observation in PR 82063 Jim points out that

  Both -Warray-bounds and -Warray-bounds= are listed in the c.opt
  file as being enabled by -Wall, but they are the same option,
  and it causes this one option to be processed twice in the
  C_handle_option_auto function in the generated options.c file.
  It gets set to the same value twice, so it does work as intended,
  but this is wasteful.

I have removed the redundant -Wall from the first option and
committed the change as obvious in r262912.

Martin

[Patch, Fortran] PR 57160: short-circuit IF only with -ffrontend-optimize

2018-07-20 Thread Janus Weil

Hi all,

here is a follow-up patch to my recent commit for PR 85599, also
dealing with the short-circuiting of logical operators. In the course
of the extensive discussion of that PR it became clear that the
Fortran standard allows the short-circuiting of .AND. and .OR.
operators, but does not mandate it.

gfortran currently does short-circuiting, and after my patch for PR
85599 warns about cases where this might remove an impure function
call (which potentially can change results).

Now, this PR (57160) is about code which relies on the
short-circuiting behavior. Since short-circuiting is not guaranteed by
the standard, such code is invalid. Generating a warning or an error
at compile-time is a bit harder here, though, since there are multiple
variations of such a situation, e.g.:
* ASSOCIATED(p) .AND. p%T
* ALLOCATED(a) .AND. a%T
* i

PR fortran/57160
* trans-expr.c (gfc_conv_expr_op): Use short-circuiting operators only
with -ffrontend-optimize. Without that flag, make sure that both
operands are evaluated.


2018-07-20  Janus Weil  

PR fortran/57160
* gfortran.dg/actual_pointer_function_1.f90: Fix invalid test case.
Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c	(revision 262908)
+++ gcc/fortran/trans-expr.c	(working copy)
@@ -3348,12 +3348,12 @@ gfc_conv_expr_op (gfc_se * se, gfc_expr * expr)
   return;
 
 case INTRINSIC_AND:
-  code = TRUTH_ANDIF_EXPR;
+  code = flag_frontend_optimize ? TRUTH_ANDIF_EXPR : TRUTH_AND_EXPR;
   lop = 1;
   break;
 
 case INTRINSIC_OR:
-  code = TRUTH_ORIF_EXPR;
+  code = flag_frontend_optimize ? TRUTH_ORIF_EXPR : TRUTH_OR_EXPR;
   lop = 1;
   break;
 
Index: gcc/testsuite/gfortran.dg/actual_pointer_function_1.f90
===
--- gcc/testsuite/gfortran.dg/actual_pointer_function_1.f90	(revision 262908)
+++ gcc/testsuite/gfortran.dg/actual_pointer_function_1.f90	(working copy)
@@ -17,7 +17,11 @@ CONTAINS
 
   logical function cp_logger_log(logger)
 TYPE(cp_logger_type), POINTER ::logger
-cp_logger_log = associated (logger) .and. (logger%a .eq. 42)
+if (associated (logger)) then
+  cp_logger_log = (logger%a .eq. 42)
+else
+  cp_logger_log = .false.
+end if
   END function
 
   FUNCTION cp_get_default_logger(v) RESULT(res)

[committed] libcpp: remove redundant parameter from rich_location::set_range

2018-07-20 Thread David Malcolm

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r262913.

gcc/c-family/ChangeLog:
* c-common.c (c_cpp_error): Remove redundant "line_table"
parameter from call to rich_location::set_range.
(maybe_suggest_missing_token_insertion): Likewise.

gcc/ChangeLog:
* pretty-print.c (text_info::set_location): Remove redundant
"line_table" parameter from call to rich_location::set_range.

libcpp/ChangeLog:
* include/line-map.h (rich_location::set_range): Remove redundant
line_maps * parameter.
* line-map.c (rich_location::set_range): Likewise.
---
 gcc/c-family/c-common.c   | 4 ++--
 gcc/pretty-print.c| 2 +-
 libcpp/include/line-map.h | 3 +--
 libcpp/line-map.c | 4 ++--
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f5e..422d668 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6133,7 +6133,7 @@ c_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int 
level, int reason,
   gcc_unreachable ();
 }
   if (done_lexing)
-richloc->set_range (line_table, 0, input_location, true);
+richloc->set_range (0, input_location, true);
   diagnostic_set_info_translated (&diagnostic, msg, ap,
  richloc, dlevel);
   diagnostic_override_option_index (&diagnostic,
@@ -8174,7 +8174,7 @@ maybe_suggest_missing_token_insertion (rich_location 
*richloc,
   location_t hint_loc = hint->get_start_loc ();
   location_t old_loc = richloc->get_loc ();
 
-  richloc->set_range (line_table, 0, hint_loc, true);
+  richloc->set_range (0, hint_loc, true);
   richloc->add_range (old_loc, false);
 }
 }
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index df3ee81..dc7791a 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -699,7 +699,7 @@ void
 text_info::set_location (unsigned int idx, location_t loc, bool show_caret_p)
 {
   gcc_checking_assert (m_richloc);
-  m_richloc->set_range (line_table, idx, loc, show_caret_p);
+  m_richloc->set_range (idx, loc, show_caret_p);
 }
 
 location_t
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index ba1750d..a4baa49 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -1639,8 +1639,7 @@ class rich_location
   add_range (source_location loc,  bool show_caret_p);
 
   void
-  set_range (line_maps *set, unsigned int idx, source_location loc,
-bool show_caret_p);
+  set_range (unsigned int idx, source_location loc, bool show_caret_p);
 
   unsigned int get_num_locations () const { return m_ranges.count (); }
 
diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index 1051022..a1a765f 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -2104,8 +2104,8 @@ rich_location::add_range (source_location loc, bool 
show_caret_p)
- the "%C" and "%L" format codes in the Fortran frontend.  */
 
 void
-rich_location::set_range (line_maps * /*set*/, unsigned int idx,
- source_location loc, bool show_caret_p)
+rich_location::set_range (unsigned int idx, source_location loc,
+ bool show_caret_p)
 {
   /* We can either overwrite an existing range, or add one exactly
  on the end of the array.  */
-- 
1.8.5.3

[PATCH] Fix PR70828 - broken array-type subarrays inside acc data, in OpenACC

2018-07-20 Thread Cesar Philippidis

Attached is an old gomp-4_0-branch that fixes PR70828. Besides for
fixing the PR, it also introduces some changes which will enable the
forthcoming nvptx vector length enhancements. More details on the patch
can be found here 

I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar
>From 3a58144cfaca8f6e3a889346e736e68a9ed17e6a Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Thu, 18 Aug 2016 01:12:15 +
Subject: [PATCH 1/5] Fix PR70828s "broken array-type subarrays inside acc data
 in openacc"

2018-XX-YY  Cesar Philippidis  

	gcc/
	* gimplify.c (struct gimplify_omp_ctx): Add tree clauses member.
	(new_omp_context): Initialize clauses to NULL_TREE.
	(gimplify_scan_omp_clauses): Set clauses in the gimplify_omp_ctx.
	(omp_clause_matching_array_ref): New function.
	(gomp_needs_data_present): New function.
	(gimplify_adjust_omp_clauses_1): Use preset or pointer omp clause map
	kinds when creating implicit data clauses for OpenACC offloaded
	variables defined used an acc data region as necessary.  Link ACC
	new clauses with the old ones.

	gcc/testsuite/
	* c-c++-common/goacc/acc-data-chain.c: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
	* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
	* testsuite/libgomp.oacc-fortran/lib-13.f90: Remove XFAIL.
---
 gcc/gimplify.c| 101 +-
 .../c-c++-common/goacc/acc-data-chain.c   |  24 +
 .../libgomp.oacc-c-c++-common/pr70828.c   |  25 +
 .../testsuite/libgomp.oacc-fortran/lib-13.f90 |   1 -
 .../libgomp.oacc-fortran/pr70828.f90  |  24 +
 5 files changed, 173 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/acc-data-chain.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr70828.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828.f90

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 4a109aee27a..cf8977c8508 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -191,6 +191,7 @@ struct gimplify_omp_ctx
   bool target_map_scalars_firstprivate;
   bool target_map_pointers_as_0len_arrays;
   bool target_firstprivatize_array_bases;
+  tree clauses;
 };
 
 static struct gimplify_ctx *gimplify_ctxp;
@@ -409,6 +410,7 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->clauses = NULL_TREE;
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -7501,6 +7503,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
   tree *prev_list_p = NULL;
 
   ctx = new_omp_context (region_type);
+  ctx->clauses = *list_p;
   outer_ctx = ctx->outer_context;
   if (code == OMP_TARGET)
 {
@@ -8696,6 +8699,58 @@ struct gimplify_adjust_omp_clauses_data
   gimple_seq *pre_p;
 };
 
+/* Return true if clause contains an array_ref of DECL.  */
+
+static bool
+omp_clause_matching_array_ref (tree clause, tree decl)
+{
+  tree cdecl = OMP_CLAUSE_DECL (clause);
+
+  if (TREE_CODE (cdecl) != ARRAY_REF)
+return false;
+
+  return TREE_OPERAND (cdecl, 0) == decl;
+}
+
+/* Inside OpenACC parallel and kernels regions, the implicit data
+   clauses for arrays must respect the explicit data clauses set by a
+   containing acc data region.  Specifically, care must be taken
+   pointers or if an subarray of a local array is specified in an acc
+   data region, so that the referenced array inside the offloaded
+   region has a present data clasue for that array with an
+   approporiate subarray argument.  This function returns the tree
+   node of the acc data clause that utilizes DECL as an argument.  */
+
+static tree
+gomp_needs_data_present (tree decl)
+{
+  gimplify_omp_ctx *ctx = NULL;
+  bool found_match = false;
+  tree c = NULL_TREE;
+
+  if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE)
+return NULL_TREE;
+
+  if (gimplify_omp_ctxp->region_type != ORT_ACC_PARALLEL
+  && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS)
+return NULL_TREE;
+
+  for (ctx = gimplify_omp_ctxp->outer_context; !found_match && ctx;
+   ctx = ctx->outer_context)
+{
+  if (ctx->region_type != ORT_ACC_DATA)
+	break;
+
+  for (c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
+	&& (omp_clause_matching_array_ref (c, decl)
+		|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER))
+	  return c;
+}
+
+  return NULL_TREE;
+}
+
 /* For all variables that were not actually used within the context,
remove PRIVATE, SHARED, and FIRSTPRIVATE clauses.  */
 
@@ -8849,7 +8904,51 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	  gcc_unreachable ();
 	}
   OMP_CLAUSE_SET_MAP_KIND (clause, kind);
-  if (DECL_SIZE (decl)
+  tree c2 = gomp_needs_data_present (decl);
+  /* Handle OpenACC pointers that were

[PATCH] Add support for making maps 'private' inside OpenACC offloaded regions

2018-07-20 Thread Cesar Philippidis

Due to the different levels of parallelism available in OpenACC, it is
useful to mark certain variables as GOMP_MAP_PRIVATE so that they can be
used in reductions. This patch was introduced in openacc-gcc-7-branch
here .


I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar

>From b0e7fb09bf3a3f853e77c2712b6f85ad21472e72 Mon Sep 17 00:00:00 2001
From: Chung-Lin Tang 
Date: Tue, 5 Sep 2017 22:09:34 +0800
Subject: [PATCH 2/5] [OpenACC] Add support for making maps 'private' inside
 offloaded regions

2018-XX-YY Chung-Lin Tang  
	   Cesar Philippidis  

	gcc/
	* tree.h (OMP_CLAUSE_MAP_PRIVATE): Define macro.
	* gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_PRIVATE enum value.
	(omp_add_variable): Add GOVD_MAP_PRIVATE to reduction clause flags if
	not a gang-partitioned loop directive.
	(gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_PRIVATE of new map
	clause to 1 if GOVD_MAP_PRIVATE flag is present.
	* omp-low.c (lower_oacc_reductions): Handle map clauses with
	OMP_CLAUSE_MAP_PRIVATE set in same matter as firstprivate/private.
	(lower_omp_target): Likewise. Add copy back code for map clauses with
	OMP_CLAUSE_MAP_PRIVATE set.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/reduction-9.c: New test.

(cherry picked from openacc-gcc-7-branch commit
2dc21f336368889c1ebf031801a7613f65899ef1, e17bb2068f9)
---
 gcc/gimplify.c| 34 ++-
 gcc/omp-low.c | 28 +++--
 gcc/tree.h|  3 ++
 .../libgomp.oacc-c-c++-common/reduction-9.c   | 41 +++
 4 files changed, 101 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-9.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index cf8977c8508..7dadf69b758 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -105,6 +105,9 @@ enum gimplify_omp_var_data
   /* Flag for GOVD_MAP: must be present already.  */
   GOVD_MAP_FORCE_PRESENT = 524288,
 
+  /* Flag for GOVD_MAP, copy to/from private storage inside offloaded region.  */
+  GOVD_MAP_PRIVATE = 1048576,
+
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
 			   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
 			   | GOVD_LOCAL)
@@ -6835,6 +6838,21 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
 {
   struct gimplify_omp_ctx *outer_ctx = ctx->outer_context;
+
+  bool gang = false, worker = false, vector = false;
+  for (tree c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+	{
+	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_GANG)
+	gang = true;
+	  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_WORKER)
+	worker = true;
+	  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_VECTOR)
+	vector = true;
+	}
+
+  /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
+  bool map_private = !gang && (worker || vector);
+
   while (outer_ctx)
 	{
 	  n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl);
@@ -6856,12 +6874,21 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 		  /* Remove firstprivate and make it a copy map.  */
 		  n->value &= ~GOVD_FIRSTPRIVATE;
 		  n->value |= GOVD_MAP;
+
+		  /* If not gang-partitioned, add MAP_PRIVATE on the map
+		 clause.  */
+		  if (map_private)
+		n->value |= GOVD_MAP_PRIVATE;
 		}
 	}
 	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
 	{
-	  splay_tree_insert (outer_ctx->variables, (splay_tree_key)decl,
- GOVD_MAP | GOVD_SEEN);
+	  unsigned f = GOVD_MAP | GOVD_SEEN;
+
+	  /* If not gang-partitioned, add MAP_PRIVATE on the map clause.  */
+	  if (map_private)
+		f |= GOVD_MAP_PRIVATE;
+	  splay_tree_insert (outer_ctx->variables, (splay_tree_key)decl, f);
 	  break;
 	}
 	  outer_ctx = outer_ctx->outer_context;
@@ -8904,6 +8931,9 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
 	  gcc_unreachable ();
 	}
   OMP_CLAUSE_SET_MAP_KIND (clause, kind);
+  if ((flags & GOVD_MAP_PRIVATE)
+	  && TREE_CODE (OMP_CLAUSE_DECL (clause)) == VAR_DECL)
+	OMP_CLAUSE_MAP_PRIVATE (clause) = 1;
   tree c2 = gomp_needs_data_present (decl);
   /* Handle OpenACC pointers that were declared inside acc data
 	 regions.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 714490d6921..ef3c7651c74 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4907,7 +4907,9 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		  goto has_outer_reduction;
 		}
 		  else if ((OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_FIRSTPRIVATE
-			|| OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE)
+			|| OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_PRIVATE
+			|| (OMP_CLAUSE_CODE (cls) == OMP_CLAUSE_MAP
+&& OMP_CLAUSE_MAP_PRIVATE (cls)))
 			   && ori

[PATCH] Privatize independent OpenACC reductions

2018-07-20 Thread Cesar Philippidis

This is another OpenACC reduction patch to privatize reduction variables
used inside inner acc loops. For some reason, I can't find the original
email announcement on the gcc-patches mailing list. But according to the
ChangeLog, I committed that change to og7 back on Jan 26, 2018.

I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar
>From a4753e2b40cf3d707aabd7c9d5bad7d8f9be8b6f Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Fri, 26 Jan 2018 08:30:13 -0800
Subject: [PATCH 3/5] Privatize independent OpenACC reductions

2018-XX-YY  Cesar Philippidis  

	gcc/
	* gimplify.c (oacc_privatize_reduction): New function.
	(omp_add_variable): Use it to determine if a reduction variable
	needs to be privatized.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/inner-reduction.c: New test.

(cherry picked from openacc-gcc-7-branch commit
330ba2316fabd0e5525c99fdacedb0bfae270244, 133f3a8fb5c)
---
 gcc/gimplify.c| 35 ++-
 .../inner-reduction.c | 23 
 2 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 7dadf69b758..737a280cfe9 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6722,6 +6722,32 @@ omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *ctx, tree type)
   lang_hooks.types.omp_firstprivatize_type_sizes (ctx, type);
 }
 
+/* Determine if CTX might contain any gang partitioned loops.  During
+   oacc_dev_low, independent loops are assign gangs at the outermost
+   level, and vectors in the innermost.  */
+
+static bool
+oacc_privatize_reduction (struct gimplify_omp_ctx *ctx)
+{
+  if (ctx == NULL)
+return false;
+
+  if (ctx->region_type != ORT_ACC)
+return false;
+
+  for (tree c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+switch (OMP_CLAUSE_CODE (c))
+  {
+  case OMP_CLAUSE_SEQ:
+	return oacc_privatize_reduction (ctx->outer_context);
+  case OMP_CLAUSE_GANG:
+	return true;
+  default:;
+  }
+
+  return true;
+}
+
 /* Add an entry for DECL in the OMP context CTX with FLAGS.  */
 
 static void
@@ -6851,7 +6877,14 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 	}
 
   /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
-  bool map_private = !gang && (worker || vector);
+  bool map_private;
+
+  if (gang)
+	map_private = false;
+  else if (worker || vector)
+	map_private = true;
+  else
+	map_private = oacc_privatize_reduction (ctx->outer_context);
 
   while (outer_ctx)
 	{
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c
new file mode 100644
index 000..0c317dcf8a6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c
@@ -0,0 +1,23 @@
+#include 
+
+int
+main ()
+{
+  const int n = 1000;
+  int i, j, temp, a[n];
+
+#pragma acc parallel loop
+  for (i = 0; i < n; i++)
+{
+  temp = i;
+#pragma acc loop reduction (+:temp)
+  for (j = 0; j < n; j++)
+	temp ++;
+  a[i] = temp;
+}
+
+  for (i = 0; i < n; i++)
+assert (a[i] == i+n);
+
+  return 0;
+}
-- 
2.17.1

[PATCH] Enable firstprivate OpenACC reductions

2018-07-20 Thread Cesar Philippidis

At present, all reduction variables are transferred via an implicit
'copy' clause. As shown the the recent patches I've been posting, that
causes a lot of problems when the reduction variables are used by
multiple workers or vectors. This patch teaches the gimplifier to
transfer reduction variable as firstprivate in OpenACC parallel regions,
if the are in an inner loop. This matches the behavior of reductions in
OpenACC 2.6.

Is this patch OK for trunk? I bootstrapped and regtested on x86_64/nvptx.

Thanks,
Cesar
>From 035be51a795ad8bed5342ba181220bf3102bcd6d Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Wed, 31 Jan 2018 07:21:53 -0800
Subject: [PATCH 4/5] Enable firstprivate OpenACC reductions

2018-XX-YY  Cesar Philippidis  

	gcc/
	* gimplify.c (omp_add_variable): Allow certain OpenACC reduction
	variables to remain firstprivate.

	gcc/testsuite/
	* c-c++-common/goacc/reduction-8.c: New test.

(cherry picked from openacc-gcc-7-branch commit
441621739e2a067c97409f8b0e3e30362a7905be, cec00212ad8)
---
 gcc/gimplify.c| 30 --
 .../c-c++-common/goacc/reduction-8.c  | 94 +++
 2 files changed, 117 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/reduction-8.c

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 737a280cfe9..bcfb029275c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6858,9 +6858,16 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   else
 splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags);
 
-  /* For reductions clauses in OpenACC loop directives, by default create a
- copy clause on the enclosing parallel construct for carrying back the
- results.  */
+  /* For OpenACC loop directives, when a reduction is immediately
+ enclosed within an acc parallel or kernels construct, it must
+ have an implied copy data mapping. E.g.
+
+   #pragma acc parallel
+	 {
+	   #pragma acc loop reduction (+:sum)
+
+ a copy clause for sum should be added on the enclosing parallel
+ construct for carrying back the results.  */
   if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION))
 {
   struct gimplify_omp_ctx *outer_ctx = ctx->outer_context;
@@ -6876,8 +6883,11 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 	vector = true;
 	}
 
-  /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
-  bool map_private;
+  /* Reduction data maps need to be marked as private for worker
+	 and vector loops, in order to ensure that value of the
+	 reduction carried back to the host.  Set new copy map as
+	 'private' if sure we're not gang-partitioning.  */
+  bool map_private, update_data_map = false;
 
   if (gang)
 	map_private = false;
@@ -6886,6 +6896,10 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
   else
 	map_private = oacc_privatize_reduction (ctx->outer_context);
 
+  if (ctx->outer_context
+	  && ctx->outer_context->region_type == ORT_ACC_PARALLEL)
+	update_data_map = true;
+
   while (outer_ctx)
 	{
 	  n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl);
@@ -6902,7 +6916,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 		  gcc_assert (!(n->value & GOVD_FIRSTPRIVATE)
 			  && (n->value & GOVD_MAP));
 		}
-	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
+	  else if (update_data_map
+		   && outer_ctx->region_type == ORT_ACC_PARALLEL)
 		{
 		  /* Remove firstprivate and make it a copy map.  */
 		  n->value &= ~GOVD_FIRSTPRIVATE;
@@ -6914,7 +6929,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 		n->value |= GOVD_MAP_PRIVATE;
 		}
 	}
-	  else if (outer_ctx->region_type == ORT_ACC_PARALLEL)
+	  else if (update_data_map
+		   && outer_ctx->region_type == ORT_ACC_PARALLEL)
 	{
 	  unsigned f = GOVD_MAP | GOVD_SEEN;
 
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-8.c b/gcc/testsuite/c-c++-common/goacc/reduction-8.c
new file mode 100644
index 000..8a0283f4ac3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-8.c
@@ -0,0 +1,94 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+#define n 1000
+
+int
+main(void)
+{
+  int i, j;
+  int result, array[n];
+
+#pragma acc parallel loop reduction (+:result)
+  for (i = 0; i < n; i++)
+result ++;
+
+#pragma acc parallel
+#pragma acc loop reduction (+:result)
+  for (i = 0; i < n; i++)
+result ++;
+
+#pragma acc parallel
+#pragma acc loop
+  for (i = 0; i < n; i++)
+{
+  result = i;
+
+#pragma acc loop reduction(+:result)
+  for (j = 0; j < n; j++)
+	result ++;
+
+  array[i] = result;
+}
+
+#pragma acc parallel
+#pragma acc loop
+  for (i = 0; i < n; i++)
+{
+  result = i;
+
+#pragma acc loop worker vector reduction(+:result)
+  for (j = 0; j < n; j++)
+	resu

[PATCH] Adjust offsets for present data clauses

2018-07-20 Thread Cesar Philippidis

This is another old gomp4 patch that corrects a bug where the runtime
was passing the wrong offset for subarray data to the accelerator. The
original description of this patch can be found here


I bootstrapped and regtested on x86_64/nvptx. Is it OK for trunk?

Thanks,
Cesar
>From fb743d8a45193c177cb0082400d140949e8c1e6d Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Wed, 24 Aug 2016 00:02:50 +
Subject: [PATCH 5/5] [libgomp, OpenACC] Adjust offsets for present data
 clauses

2018-XX-YY  Cesar Philippidis  

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Add offset to devaddrs.
	* testsuite/libgomp.oacc-c-c++-common/data_offset.c: New test.
	* testsuite/libgomp.oacc-fortran/data_offset.f90: New test.

(cherry picked from gomp-4_0-branch r239723, 00c2585)
---
 libgomp/oacc-parallel.c   | 10 -
 .../libgomp.oacc-c-c++-common/data_offset.c   | 41 ++
 .../libgomp.oacc-fortran/data_offset.f90  | 43 +++
 3 files changed, 92 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90

diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index b80ace58590..20e9ab2e251 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -231,8 +231,14 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
 
   devaddrs = gomp_alloca (sizeof (void *) * mapnum);
   for (i = 0; i < mapnum; i++)
-devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
-			+ tgt->list[i].key->tgt_offset);
+{
+  if (tgt->list[i].key != NULL)
+	devaddrs[i] = (void *) (tgt->list[i].key->tgt->tgt_start
++ tgt->list[i].key->tgt_offset
++ tgt->list[i].offset);
+  else
+	devaddrs[i] = NULL;
+}
 
   acc_dev->openacc.exec_func (tgt_fn, mapnum, hostaddrs, devaddrs,
 			  async, dims, tgt);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c
new file mode 100644
index 000..ccbbfcab87b
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data_offset.c
@@ -0,0 +1,41 @@
+/* Test present data clauses in acc offloaded regions when the
+   subarray inside the present clause does not have the same base
+   offset value as the subarray in the enclosing acc data or acc enter
+   data variable.  */
+
+#include 
+
+void
+offset (int *data, int n)
+{
+  int i;
+
+#pragma acc parallel loop present (data[0:n])
+  for (i = 0; i < n; i++)
+data[i] = n;
+}
+
+int
+main ()
+{
+  const int n = 30;
+  int data[n], i;
+
+  for (i = 0; i < n; i++)
+data[i] = -1;
+
+#pragma acc data copy(data[0:n])
+  {
+offset (data+10, 10);
+  }
+
+  for (i = 0; i < n; i++)
+{
+  if (i < 10 || i >= 20)
+	assert (data[i] == -1);
+  else
+	assert (data[i] == 10);
+}
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90 b/libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90
new file mode 100644
index 000..ff8ee39f964
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/data_offset.f90
@@ -0,0 +1,43 @@
+! Test present data clauses in acc offloaded regions when the subarray
+! inside the present clause does not have the same base offset value
+! as the subarray in the enclosing acc data or acc enter data variable.
+
+program test
+  implicit none
+
+  integer, parameter :: n = 30, m = 10
+  integer :: i
+  integer, allocatable :: data(:)
+  logical bounded
+
+  allocate (data(n))
+
+  data(:) = -1
+
+  !$acc data copy (data(5:20))
+  call test_data (data, n, m)
+  !$acc end data
+
+  do i = 1, n
+ bounded = i < m .or. i >= m+m
+ if (bounded .and. (data(i) /= -1)) then
+call abort
+ else if (.not. bounded .and. data(i) /= 10) then
+call abort
+ end if
+  end do
+
+  deallocate (data)
+end program test
+
+subroutine test_data (data, n, m)
+  implicit none
+
+  integer :: n, m, data(n), i
+
+  !$acc parallel loop present (data(m:m))
+  do i = m, m+m-1
+ data(i) = m
+  end do
+  !$acc end parallel loop
+end subroutine test_data
-- 
2.17.1

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-20 Thread Martin Sebor


On 07/20/2018 08:51 AM, Bernd Edlinger wrote:

Martin,

when I look at tree-ssa-forwprop.c:

   str1 = string_constant (src1, &off1);
   if (str1 == NULL_TREE)
 break;
   if (!tree_fits_uhwi_p (off1)
   || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
   || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
  - tree_to_uhwi (off1)) > 0
   || TREE_CODE (TREE_TYPE (str1)) != ARRAY_TYPE
   || TYPE_MODE (TREE_TYPE (TREE_TYPE (str1)))
  != TYPE_MODE (char_type_node))
 break;

I don't think it is a good idea to let string_constant return
strings which have not necessarily valid content after the first
NUL byte.  It looks like the content has to be valid up to
TREE_STRING_LENGTH.


I'm not sure I understand when this could happen.  From the patch
below it sounds like you are concerned about cases like:

  const char a[4] = "abc\000\000";

where the STRING_CST is bigger than the array.  But the string
constant has valid content here up to TREE_STRING_LENGTH.  Am
I missing something?  (A test case would be helpful.)

In any case, unless the concern is specifically related to
this bug fix let's discuss it separately so I can fix this
bug.  I'm not opposed to making further changes here (in
fact, I have one in the queue and two others that I raised
in Bugzilla in response to our discussion so far), I just
want to avoid mission creep.

Martin



So may I suggest to do something like the following
(diff to your last patch):

--- expr.c.jj   2018-07-20 10:51:51.983605588 +0200
+++ expr.c  2018-07-20 15:07:29.769423479 +0200
@@ -11277,6 +11277,7 @@ tree
  string_constant (tree arg, tree *ptr_offset)
  {
tree array;
+  tree array_size;
STRIP_NOPS (arg);

/* Non-constant index into the character array in an ARRAY_REF
@@ -11295,9 +11296,11 @@ string_constant (tree arg, tree *ptr_off

arg = TREE_OPERAND (arg, 0);
tree ref = arg;
+  tree reftype = TREE_TYPE (ref);
if (TREE_CODE (arg) == ARRAY_REF)
{
  tree idx = TREE_OPERAND (arg, 1);
+ reftype = TREE_TYPE (TREE_OPERAND (arg, 0));
  if (TREE_CODE (idx) != INTEGER_CST
  && TREE_CODE (argtype) == POINTER_TYPE)
{
@@ -11313,6 +11316,11 @@ string_constant (tree arg, tree *ptr_off
return NULL_TREE;
}
}
+  if (TREE_CODE (reftype) != ARRAY_TYPE)
+   return NULL_TREE;
+  while (TREE_CODE (TREE_TYPE (reftype)) == ARRAY_TYPE)
+   reftype = TREE_TYPE (reftype);
+  array_size = TYPE_SIZE_UNIT (reftype);
array = get_addr_base_and_unit_offset (ref, &base_off);
if (!array
  || (TREE_CODE (array) != VAR_DECL
@@ -11345,7 +11353,10 @@ string_constant (tree arg, tree *ptr_off
return NULL_TREE;
  }
else if (DECL_P (arg))
-array = arg;
+{
+  array = arg;
+  array_size = DECL_SIZE_UNIT (array);
+}
else
  return NULL_TREE;

@@ -11410,24 +11421,18 @@ string_constant (tree arg, tree *ptr_off
if (!init || TREE_CODE (init) != STRING_CST)
  return NULL_TREE;

-  tree array_size = DECL_SIZE_UNIT (array);
if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
  return NULL_TREE;

/* Avoid returning a string that doesn't fit in the array
- it is stored in, like
+ it is stored in, like:
   const char a[4] = "abcde";
- but do handle those that fit even if they have excess
- initializers, such as in
- const char a[4] = "abc\000\000";
- The excess elements contribute to TREE_STRING_LENGTH()
- but not to strlen().  */
-  unsigned HOST_WIDE_INT charsize
-= tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init;
+ ... or:
+ const char a[4] = "abc\000";
+ ... because some callers access the string up to the limit
+ imposed by TREE_STRING_LENGTH.  */
unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
-  length = string_length (TREE_STRING_POINTER (init), charsize,
- length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (compare_tree_int (array_size, length) < 0)
  return NULL_TREE;

*ptr_offset = offset;


Considering Richard's last comment, I don't know if TYPE_SIZE_UNIT
of the ARRAY_TYPE is the correct way to get the size of the
innermost arrayhere, but I think we will need it.



Bernd.

Re: [PATCH] fix a couple of bugs in const string folding (PR 86532)

2018-07-20 Thread Martin Sebor


On 07/20/2018 03:11 PM, Bernd Edlinger wrote:

I think I have found a valid test case where the latest patch
does generate invalid code:


This is due to another bug in string_constant() that's latent
on trunk but that's exposed by the patch.  Trunk only "works"
because of a bug/limitation in c_strlen() that the patch fixes
or removes.

The underlying root cause is the handling of POINTER_PLUS
expressions in string_constant().  The original code (before
the handling of aggregates was added) just dealt with string
constants.  The new code does much more but doesn't get it
quite right in these cases.

It shouldn't be too difficult to fix, but as valuable as
the testing you are doing is, I'd really prefer to focus
on one problem at a time and make incremental progress.
It will make it easier to track down and fix regressions
than lumping multiple fixes into a larger patch.

Martin



$ cat part.c
static const char a[3][8] = { "1234", "12345", "123456" };

int main ()
{
   volatile int i = 1;
   int n = __builtin_strlen (*(&a[1]+i));

   if (n != 6)
 __builtin_abort ();
}
$ gcc part.c -fdump-tree-original
$ ./a.out
Aborted (core dumped)
$ cat part.c.004t.original

;; Function main (null)
;; enabled by -tree-original


{
   volatile int i = 1;
   int n = (ssizetype) (SAVE_EXPR <(long unsigned int) i * 8>) <= 5 ? (int) (5 - 
(unsigned int) (SAVE_EXPR <(long unsigned int) i * 8>)) : 0;

 volatile int i = 1;
 int n = (ssizetype) (SAVE_EXPR <(long unsigned int) i * 8>) <= 5 ? (int) (5 - 
(unsigned int) (SAVE_EXPR <(long unsigned int) i * 8>)) : 0;
   if (n != 6)
 {
   __builtin_abort ();
 }
}
return 0;


string_constant is called with arg = (const char *) (&a[1] + (sizetype) ((long 
unsigned int) i * 8))

Re: [RFC] Induction variable candidates not sufficiently general

2018-07-20 Thread Bin.Cheng

On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen  wrote:
> Thanks for looking at this for me.  In simplifying the test case for a bug 
> report, I've narrowed the "problem" to integer overflow considerations.  My 
> len variable is declared int, and the target has 64-bit pointers.  I'm 
> gathering that the "manual transformation" I quoted below is not considered 
> "equivalent" to the original source code due to different integer overflow 
> behaviors.  If I redeclare len to be unsigned long long, then I automatically 
> get the optimizations that I was originally expecting.
>
> I suppose this is really NOT a bug?
As your test case demonstrates, it is caused by wrapping unsigned int32.
>
> Is there a compiler optimization flag that allows the optimizer to ignore 
> array index integer overflow in considering legal optimizations?
I am not aware of one for unsigned integer, and I guess it won't be
introduced in the future either?

Thanks,
bin
>
>
>
> On 7/13/18 9:14 PM, Bin.Cheng wrote:
>> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  
>> wrote:
>>> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
>>> manually unrolled version of the following loop:
>>>
   while (++len != len_limit) /* this is loop */
   if (pb[len] != cur[len])
   break;
>>>
>>> As unrolled, the loop appears as:
>>>
 while (++len != len_limit) /* this is loop */ {
   if (pb[len] != cur[len])
 break;
   if (++len == len_limit)  /* unrolled 2nd iteration */
 break;
   if (pb[len] != cur[len])
 break;
   if (++len == len_limit)  /* unrolled 3rd iteration */
 break;
   if (pb[len] != cur[len])
 break;
   if (++len == len_limit)  /* unrolled 4th iteration */
 break;
   if (pb[len] != cur[len])
 break;
 }
>>>
>>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the 
>>> only induction variable candidates that are being considered are all forms 
>>> of the len variable.  We are not considering any induction variables to 
>>> represent the address expressions &pb[len] and &cur[len].
>>>
>>> I rewrote the source code for this loop to make the addressing expressions 
>>> more explicit, as in the following:
>>>
   cur++;
   while (++pb != last_pb) /* this is loop */ {
   if (*pb != *cur)
 break;
   ++cur;
   if (++pb == last_pb)  /* unrolled 2nd iteration */
 break;
   if (*pb != *cur)
 break;
   ++cur;
   if (++pb == last_pb)  /* unrolled 3rd iteration */
 break;
   if (*pb != *cur)
 break;
   ++cur;
   if (++pb == last_pb)  /* unrolled 4th iteration */
 break;
   if (*pb != *cur)
 break;
   ++cur;
   }
>>>
>>> Now, gcc does a better job of identifying the "address expression induction 
>>> variables".  This version of the loop runs about 10% faster than the 
>>> original on my target architecture.
>>>
>>> This would seem to be a textbook pattern for the induction variable 
>>> analysis.  Does anyone have any thoughts on the best way to add these 
>>> candidates to the set of induction variables that are considered by 
>>> tree-ssa-loop-ivopts.c?
>>>
>>> Thanks in advance for any suggestions.
>>>
>> Hi,
>> Could you please file a bug with your original slow test code
>> attached?  I tried to construct meaningful test case from your code
>> snippet but not successful.  There is difference in generated
>> assembly, but it's not that fundamental.  So a bug with preprocessed
>> test would be high appreciated.
>> I think there are two potential issues in cost computation for such
>> case: invariant expression and iv uses outside of loop handled as
>> inside uses.
>>
>> Thanks,
>> bin
>>
>>

76 matches

Mail list logo