[rs6000] Add documentation for __builtin_mtfsf

2019-07-21 Thread Paul Clarke


2019-07-21  Paul A. Clarke  

[gcc]

* doc/extend.texi: Add documentation for __builtin_mtfsf.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 273615)
+++ gcc/doc/extend.texi (working copy)
@@ -16848,6 +16848,7 @@ unsigned long __builtin_ppc_mftb ();
 double __builtin_unpack_ibm128 (__ibm128, int);
 __ibm128 __builtin_pack_ibm128 (double, double);
 double __builtin_mffs (void);
+double __builtin_mtfsf (const int, double);
 void __builtin_mtfsb0 (const int);
 void __builtin_mtfsb1 (const int);
 void __builtin_set_fpscr_rn (int);
@@ -16864,6 +16865,10 @@ return the value of the FPSCR register.  Note, ISA
 @code{__builtin_mffsl()} which permits software to read the control and
 non-sticky status bits in the FSPCR without the higher latency associated with
 accessing the sticky status bits.  The
+@code{__builtin_mtfsf} takes a constant 8-bit integer field mask and a
+representation of the new value of the FPSCR and generates the @code{mtfsf}
+instruction to copy the supplied value into the FPSCR, subject to the field
+mask, each bit of which represents a nibble of the FPSCR.  The
 @code{__builtin_mtfsb0} and @code{__builtin_mtfsb1} take the bit to change
 as an argument.  The valid bit range is between 0 and 31.  The builtins map to
 the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and

--
PC



Re: [PATCH] i386: Expand roundeven for SSE4.1+

2019-07-21 Thread Tejas Joshi
Hi.
Thanks for the heads up. I did successful bootstrap build and
regression testing on x86_64-linux-gnu for the above patch and have
the test summary diff for patched and unpatched versions, but I do not
know if it has passed the regression test or not hence attaching the
diff here.

Thanks,
Tejas

On Mon, 15 Jul 2019 at 01:48, Uros Bizjak  wrote:
>
> > This patch is for expanding roundeven inline for SSE4.1 and later.
> > Note that this patch is to be applied on top of
> > . The patch
> > is bootstrapped and regression tested on x86_64-linux-gnu.
>
> Actually, your patch at [1] is the way to go, but you need several
> other changes to get x87 mode switching in order. Please also note
> that there is no corresponding non-SSE4 ix86_expand_... function for
> roundeven, so non-SSE4 SSE FP-math 2 expander has
> to be disabled for ROUNDEVEN int iterator. Please see (otherwise
> untested) attached patch which fixes both issues.
>
> [1] https://gcc.gnu.org/ml/gcc/2019-06/msg00352.html
>
> Uros.
diff --git a/../patches/patched b/../patches/unpatched
index e69de29bb2d..7726f002aae 100644
--- a/../patches/patched
+++ b/../patches/unpatched
@@ -0,0 +1,251 @@
+cat <<'EOF' |
+Native configuration is x86_64-pc-linux-gnu
+
+   === gcc tests ===
+
+
+Running target unix
+XPASS: gcc.dg/guality/example.c   -O0  execution test
+XPASS: gcc.dg/guality/example.c   -O1  -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/example.c   -O2  -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/example.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
+XPASS: gcc.dg/guality/example.c  -Og -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/guality.c   -O0  execution test
+XPASS: gcc.dg/guality/guality.c   -O1  -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/guality.c   -O2  -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/guality.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
+XPASS: gcc.dg/guality/guality.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION execution test
+XPASS: gcc.dg/guality/guality.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution 
test
+XPASS: gcc.dg/guality/guality.c   -Os  -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/guality.c  -Og -DPREVENT_OPTIMIZATION  execution test
+XPASS: gcc.dg/guality/inline-params.c   -O2  -DPREVENT_OPTIMIZATION  execution 
test
+XPASS: gcc.dg/guality/inline-params.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
+XPASS: gcc.dg/guality/inline-params.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION execution test
+XPASS: gcc.dg/guality/inline-params.c   -O3 -g  -DPREVENT_OPTIMIZATION  
execution test
+XPASS: gcc.dg/guality/inline-params.c   -Os  -DPREVENT_OPTIMIZATION  execution 
test
+FAIL: gcc.dg/guality/loop-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  -DPREVENT_OPTIMIZATION  line 20 i == 1
+XPASS: gcc.dg/guality/pr41353-1.c   -O0  line 28 j == 28 + 37
+XPASS: gcc.dg/guality/pr41353-1.c  -Og -DPREVENT_OPTIMIZATION  line 28 j == 28 
+ 37
+FAIL: gcc.dg/guality/pr41616-1.c   -O2  -DPREVENT_OPTIMIZATION  execution test
+FAIL: gcc.dg/guality/pr41616-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION execution test
+FAIL: gcc.dg/guality/pr41616-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  execution 
test
+FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 20 y == 25
+FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 20 z == 6
+FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 23 y == 117
+FAIL: gcc.dg/guality/pr54519-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 23 z == 8
+FAIL: gcc.dg/guality/pr54519-2.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 17 y == 25
+FAIL: gcc.dg/guality/pr54519-2.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 17 y == 25
+FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 23 y == 117
+FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 23 z == 8
+FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 23 y == 117
+FAIL: gcc.dg/guality/pr54519-3.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 23 z == 8
+FAIL: gcc.dg/guality/pr54519-4.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT

[PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-21 Thread Fredrik Noring
Hi Paul, Matthew,

Paul -- as I'm preparing the R5900 kernel patches there was a USB DMA series
and breakage that needed attention. The fixes ending with ff2437befd8f ("usb:
host: Fix excessive alignment restriction for local memory allocations") are
now merged with Linus' kernel, and recommended for the R5900 series. The
initial R5900 patch submission is getting closer. :)

The present problem is related to GCC and the R5900 short loop bug[1]. It
turns out GCC emits assembly code like

loop:   addiu   $5,$5,1
addiu   $4,$4,1
lb  $2,-1($5)
.setnoreorder
.setnomacro
bne $2,$3,loop
sb  $2,-1($4)
.setmacro
.setreorder

that is undefined for the R5900 (this short loop has five instructions),
for simple C code such as

while ((*s++ = *p++) != '\n')
;

in the kernel. The noreorder directive prohibits GAS from corrections, and
GAS really ought to give an error for it, I think. In the meantime, I have a
tool that does machine code analysis of ELF objects to catch undefined R5900
short loops, including those made with assembler macros in the kernel.

[ In theory, GAS could actually insert NOPs prior to the noreorder directive,
to make the loop longer that six instructions, but GAS does not have that
kind of capability. Another option that GCC prevents is to place a NOP in
the delay slot. ]

A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not
make explicit use of the branch delay slot, as suggested by the patch below?
Then GCC will emit

loop:   addiu   $5,$5,1
addiu   $4,$4,1
lb  $2,-1($5)
sb  $2,-1($4)
bne $2,$3,loop

that GAS will adjust in the ELF object to

   4:   24a50001addiu   a1,a1,1
   8:   24840001addiu   a0,a0,1
   c:   80a2lb  v0,-1(a1)
  10:   a082sb  v0,-1(a0)
  14:   1443fffbbne v0,v1,4
  18:   nop

where a NOP is placed in the delay slot to avoid the bug. For longer loops,
this relies on GAS making appropriate use of the delay slot. I'm not sure if
GAS is good at that, though?

Fredrik

References:

[1] 
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gas/config/tc-mips.c;h=b7b4b6989a12d3091a02de7155fbea3adbf1c9d7;hb=HEAD#l7133

On the R5900 short loops need to be fixed by inserting a NOP in the
branch delay slot.

The short loop bug under certain conditions causes loops to execute
only once or twice.  We must ensure that the assembler never
generates loops that satisfy all of the following conditions:

- a loop consists of less than or equal to six instructions
  (including the branch delay slot);
- a loop contains only one conditional branch instruction at the end
  of the loop;
- a loop does not contain any other branch or jump instructions;
- a branch delay slot of the loop is not NOP (EE 2.9 or later).

We need to do this because of a hardware bug in the R5900 chip.

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index e17b1d522f0..acd31a8960c 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -1091,6 +1091,7 @@
 
 (define_delay (and (eq_attr "type" "branch")
   (not (match_test "TARGET_MIPS16"))
+  (not (match_test "TARGET_FIX_R5900"))
   (eq_attr "branch_likely" "yes"))
   [(eq_attr "can_delay" "yes")
(nil)
@@ -1100,6 +1101,7 @@
 ;; not annul on false.
 (define_delay (and (eq_attr "type" "branch")
   (not (match_test "TARGET_MIPS16"))
+  (not (match_test "TARGET_FIX_R5900"))
   (ior (match_test "TARGET_CB_NEVER")
(and (eq_attr "compact_form" "maybe")
 (not (match_test "TARGET_CB_ALWAYS")))


Re: [PATCH] [rs6000] Add _mm_blend_epi16 and _mm_blendv_epi8

2019-07-21 Thread Segher Boessenkool
Hi Paul,

All looks fine, okay for trunk.  Thanks!

Just some possible improvements:

On Fri, Jul 19, 2019 at 10:18:47PM -0500, Paul Clarke wrote:
> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))

Maybe all these terribly long lines would be better if they used a
macro?  Something defined in xmmintrin.h I guess, and just for the
attribute part?

> +_mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
> +{
> +  __v8hu __bitmask = vec_splats ((unsigned short) __imm8);
> +  const __v8hu __shifty = { 0, 1, 2, 3, 4, 5, 6, 7 };
> +  __bitmask = vec_sr (__bitmask, __shifty);
> +  const __v8hu __ones = vec_splats ((unsigned short) 0x0001);
> +  __bitmask = vec_and (__bitmask, __ones);
> +  const __v8hu __zero = {0};
> +  __bitmask = vec_sub (__zero, __bitmask);
> +  return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __bitmask);
> +}

You can do a lot better than this, using vgbbd (that's vec_gb in
instrinsics).  It's probably nicest if you splat the __imm8 to all
bytes in a vector, then do the vgbbd, and then you can immediately
vec_sel with the result of that.

> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
> +{
> +  const __v16qu __hibits = vec_splats ((unsigned char) 0x80);
> +  __v16qu __lmask = vec_and ((__v16qu) __mask, __hibits);
> +  const __v16qu __zero = {0};
> +  __lmask = (vector unsigned char) vec_cmpgt (__lmask, __zero);
> +  return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> +}

Can you do this with just a vsrab / vec_sra?  Splat imm 7 to a vec,
sra by that?


Segher


Re: [PATCH], Patch #10, move PowerPC data structures & helper functions from rs6000.c to rs6000-internal.h

2019-07-21 Thread Segher Boessenkool
On Sat, Jul 20, 2019 at 12:13:08PM -0400, Michael Meissner wrote:
> I will be iterating on patch #9 and sending out a replacement shortly.
> 
> This is patch #10.  It moves the various data structures from rs6000.c to

I'll review this tomorrow, fwiw.

A general request: please don't send patches as replies to other emails.


Segher


Re: [rs6000] Add documentation for __builtin_mtfsf

2019-07-21 Thread Segher Boessenkool
Hi Paul,

On Sun, Jul 21, 2019 at 04:06:32AM -0500, Paul Clarke wrote:
> @@ -16864,6 +16865,10 @@ return the value of the FPSCR register.  Note, ISA
>  @code{__builtin_mffsl()} which permits software to read the control and
>  non-sticky status bits in the FSPCR without the higher latency associated 
> with
>  accessing the sticky status bits.  The
> +@code{__builtin_mtfsf} takes a constant 8-bit integer field mask and a
> +representation of the new value of the FPSCR and generates the @code{mtfsf}
> +instruction to copy the supplied value into the FPSCR, subject to the field
> +mask, each bit of which represents a nibble of the FPSCR.  The
>  @code{__builtin_mtfsb0} and @code{__builtin_mtfsb1} take the bit to change
>  as an argument.  The valid bit range is between 0 and 31.  The builtins map 
> to
>  the @code{mtfsb0} and @code{mtfsb1} instructions which take the argument and

"A representation of the new value"?  I guess you want to say that it
sits in an FPR?

Before we document __builtin_mtfsf, maybe we should make it work with
the W and/or L fields first, or at least, decide how we want that?


Segher


[libgomp, WIP, GSoC'19] Modification to a single queue, single execution path.

2019-07-21 Thread 김규래
Finished unifying the three queues to team->task_queue.
All the tests passed except some unsupported target tests. 
 
=== libgomp Summary ===
 
# of expected passes 6749
# of expected failures 4
# of unsupported tests 349
 
I also tried to make taskwait_end, taskgroup_end, maybe_wait_for_dependencies 
share the same execution routines,
The current state of that is pretty rough.
I think there are too many mutex lock and unlocks.
I haven't tested the performance of this patch I'll try that soon for the sake 
of curiosity.
I'll try to reduce the locked regions and then split the queues into a 
multiqueue.
 
2019-07-22  Khu-rai Kim  
 
* libgomp/libgomp.h: Removed task->children_queue,
taskgroup->children_queue, added children counter for taskgroup. 
Added a new task kind, GOMP_DONE to track the lifecycle of 
dangling parents.
* libgomp/task.c: Unified all queue to team->task_queue. 
taskwait_end, taskgroup_end, maybe_wait_for_dependencies share 
the same task execution routine. Parents finished executing with
remaining children are kept until all their children are done 
executing.
* libgomp/taskloop.c: Unified all queue to team->task_queue.
 
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9f433160ab5..3a615f1d9af 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -405,7 +405,11 @@ enum gomp_task_kind
  but not yet completed.  Once that completes, they will be readded
  into the queues as GOMP_TASK_WAITING in order to perform the var
  unmapping.  */
-  GOMP_TASK_ASYNC_RUNNING
+  GOMP_TASK_ASYNC_RUNNING,
+
+  /* The task is left only for dependency tracking purpose
+ and is ready to be freed anytime. */
+  GOMP_DONE
 };
 
 struct gomp_task_depend_entry
@@ -447,7 +451,7 @@ struct gomp_task
   /* Parent of this task.  */
   struct gomp_task *parent;
   /* Children of this task.  */
-  struct priority_queue children_queue;
+  /* struct priority_queue children_queue; */
   /* Taskgroup this task belongs in.  */
   struct gomp_taskgroup *taskgroup;
   /* Tasks that depend on this task.  */
@@ -461,13 +465,16 @@ struct gomp_task
  into the various queues to be scheduled.  */
   size_t num_dependees;
 
+  /* Number of childrens created and queued from this task. */
+  size_t num_children;
+
   /* Priority of this task.  */
   int priority;
   /* The priority node for this task in each of the different queues.
  We put this here to avoid allocating space for each priority
  node.  Then we play offsetof() games to convert between pnode[]
  entries and the gomp_task in which they reside.  */
-  struct priority_node pnode[3];
+  struct priority_node pnode;
 
   struct gomp_task_icv icv;
   void (*fn) (void *);
@@ -491,7 +498,7 @@ struct gomp_taskgroup
 {
   struct gomp_taskgroup *prev;
   /* Queue of tasks that belong in this taskgroup.  */
-  struct priority_queue taskgroup_queue;
+  /* struct priority_queue taskgroup_queue; */
   uintptr_t *reductions;
   bool in_taskgroup_wait;
   bool cancelled;
@@ -1211,7 +1218,7 @@ extern int gomp_test_nest_lock_25 (omp_nest_lock_25_t *) 
__GOMP_NOTHROW;
 static inline size_t
 priority_queue_offset (enum priority_queue_type type)
 {
-  return offsetof (struct gomp_task, pnode[(int) type]);
+  return offsetof (struct gomp_task, pnode);
 }
 
 /* Return the task associated with a priority NODE of type TYPE.  */
diff --git a/libgomp/task.c b/libgomp/task.c
index 15177ac8824..df822526c3f 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -81,11 +81,12 @@ gomp_init_task (struct gomp_task *task, struct gomp_task 
*parent_task,
   task->final_task = false;
   task->copy_ctors_done = false;
   task->parent_depends_on = false;
-  priority_queue_init (&task->children_queue);
+  // priority_queue_init (&task->children_queue);
   task->taskgroup = NULL;
   task->dependers = NULL;
   task->depend_hash = NULL;
   task->depend_count = 0;
+  task->num_children = 0;
 }
 
 /* Clean up a task, after completing it.  */
@@ -100,61 +101,6 @@ gomp_end_task (void)
   thr->task = task->parent;
 }
 
-/* Clear the parent field of every task in LIST.  */
-
-static inline void
-gomp_clear_parent_in_list (struct priority_list *list)
-{
-  struct priority_node *p = list->tasks;
-  if (p)
-do
-  {
- priority_node_to_task (PQ_CHILDREN, p)->parent = NULL;
- p = p->next;
-  }
-while (p != list->tasks);
-}
-
-/* Splay tree version of gomp_clear_parent_in_list.
-
-   Clear the parent field of every task in NODE within SP, and free
-   the node when done.  */
-
-static void
-gomp_clear_parent_in_tree (prio_splay_tree sp, prio_splay_tree_node node)
-{
-  if (!node)
-return;
-  prio_splay_tree_node left = node->left, right = node->right;
-  gomp_clear_parent_in_list (&node->key.l);
-#if _LIBGOMP_CHECKING_
-  memset (node, 0xaf, sizeof (*node));
-#endif
-  /* No need to remove the node from the tree.  We're nuking
- everything, so just free the nodes and our caller can clear the
- entire splay tree.  */
-  free (node);
-  gomp_clear_parent_in_tr

Re: [rs6000] Add documentation for __builtin_mtfsf

2019-07-21 Thread Paul Clarke
On 7/21/19 1:13 PM, Segher Boessenkool wrote:
> On Sun, Jul 21, 2019 at 04:06:32AM -0500, Paul Clarke wrote:
>> +@code{__builtin_mtfsf} takes a constant 8-bit integer field mask and a
>> +representation of the new value of the FPSCR and generates the @code{mtfsf}
>> +instruction to copy the supplied value into the FPSCR, subject to the field
>> +mask, each bit of which represents a nibble of the FPSCR.  The

> "A representation of the new value"?  I guess you want to say that it
> sits in an FPR?

It's the 2nd parameter to the builtin, so a "double".  It may or may not be in 
an FPR, but the user of the builtin doesn't really know or care.  (It'll 
eventually be in an FPR, of course, but the user has it in a variable.)  It's a 
"representation" because it's not actually the new value, because it gets 
written under mask.

> Before we document __builtin_mtfsf, maybe we should make it work with
> the W and/or L fields first, or at least, decide how we want that?

It's been available but undocumented for ages.  Do you want me to thus document 
how it is currently implemented by including something like "...generates the 
mtfsf (extended mnemonic) instruction ..." ?

If you think the basic mnemonic form needs to be supported, that's a whole 
other piece of work ;-)

PC


Re: [PATCH] Simplify LTO section format.

2019-07-21 Thread Jeff Law
On 7/17/19 4:32 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about simplified LTO ELF section header where
> want to make public fields major_version, minor_version and
> slim_object. The rest is implementation defined by GCC.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-07-15  Martin Liska  
> 
>   * lto-section-in.c (lto_get_section_data):
>   Use new function get_compression.
>   * lto-streamer-out.c (produce_lto_section): Use
>   set_compression to encode compression algorithm.
>   * lto-streamer.h (struct lto_section): Do not
>   use bitfields in the format.
> ---
>  gcc/lto-section-in.c   |  3 ++-
>  gcc/lto-streamer-out.c |  3 ++-
>  gcc/lto-streamer.h | 19 ---
>  3 files changed, 20 insertions(+), 5 deletions(-)
> 
> 
OK
jeff


Re: [PATCH][MSP430] Fix unnecessary saving of all callee-saved regs in an interrupt function that calls another function

2019-07-21 Thread Jeff Law
On 7/18/19 6:33 AM, Jozef Lawrynowicz wrote:
> The attached patch fixes an issue for msp430 where the logic to decide which
> registers need to be saved in an interrupt function was unnecessarily
> choosing to save all callee-saved registers regardless of whether they were
> used or not. This came at a code size and performance penalty for the 430 ISA,
> and a performance penalty for the 430X ISA.
> 
> Interrupt functions require special conventions for saving registers which
> would normally be caller-saved. Since the interrupt happens without warning,
> registers that would normally have been preserved by the caller of a function
> cannot be preserved when an interrupt is triggered. This means interrupts must
> save and restore the used caller-saved registers, in addition to the used
> callee-saved registers that a regular function would save.
> 
> If an interrupt is not a leaf function, all caller-saved registers must be
> saved/restored in the prologue/epilogue of the interrupt function, since it
> is unknown which of these will be modified in later functions.
> 
> We can rely on the function called by an interrupt to save and restore
> callee-saved registers, so it is unnecessary to save all callee-saved regs
> in the ISR. This is what this patch changes.
> 
> Successfully regtested for msp430-elf on trunk for C/C++.
> 
> Ok for trunk?
> 
> Thanks,
> Jozef
> 
> 
> 0001-MSP430-Fix-unnecessary-saving-of-all-callee-saved-re.patch
> 
> From 1e151dac2be34ae50bea8b4b37bd2d78c5f7ddd6 Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Thu, 18 Jul 2019 09:25:52 +0100
> Subject: [PATCH] MSP430: Fix unnecessary saving of all callee-saved regs in an
>  ISR which calls another function
> 
> gcc/ChangeLog:
> 
> 2019-07-18  Jozef Lawrynowicz  
> 
>   * config/msp430/msp430.c (msp430_preserve_reg_p): Don't save
>   callee-saved regs R4->R10 in an interrupt function that calls another
>   function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-07-18  Jozef Lawrynowicz  
> 
>   * gcc.target/msp430/isr-push-pop-main.c: New test.
>   * gcc.target/msp430/isr-push-pop-isr-430.c: Likewise.
>   * gcc.target/msp430/isr-push-pop-isr-430x.c: Likewise.
>   * gcc.target/msp430/isr-push-pop-leaf-isr-430.c: Likewise.
>   * gcc.target/msp430/isr-push-pop-leaf-isr-430x.c: Likewise.
OK
jeff



Re: [PATCH] Do not emit __gnu_lto_v1 symbol.

2019-07-21 Thread Jeff Law
On 7/15/19 7:30 AM, Martin Liška wrote:
> Hi.
> 
> The patch is about removal of the emission of __gnu_lto_v1.
> The symbol should not be needed any longer for GCC driver.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-07-15  Martin Liska  
> 
>   * config/avr/avr.c (avr_asm_output_aligned_decl_common): Update
>   comment.
>   * toplev.c (compile_file): Do not emit __gnu_lto_v1 symbol.
> 
> libgcc/ChangeLog:
> 
> 2019-07-15  Martin Liska  
> 
>   * config/pa/stublib.c: Remove stub symbol __gnu_lto_v1.
>   * config/pa/t-stublib: Likewise.
> 
> libiberty/ChangeLog:
> 
> 2019-07-15  Martin Liska  
> 
>   * simple-object-elf.c (simple_object_elf_copy_lto_debug_sections):
>   Do not search for gnu_lto_v1, but search for first '\0'.
OK.
jeff


Re: Add dg test for matching function bodies

2019-07-21 Thread Jeff Law
On 7/16/19 8:04 AM, Richard Sandiford wrote:
> There isn't a 1:1 mapping from SVE intrinsics to SVE instructions,
> but the intrinsics are still close enough to the instructions for
> there to be a specific preferred sequence (or sometimes choice of
> preferred sequences) for a given combination of operands.  Sometimes
> these sequences will be one instruction, sometimes they'll be several.
> 
> I therefore wanted a convenient way of matching the exact assembly
> implementation of a given function.  It's possible to do that using
> single scan-assembler lines, but:
> 
> (a) they become hard to read for multiline matches
> (b) the PASS/FAIL lines tend to be long
> (c) it's useful to have a single place that skips over uninteresting
> lines, such as entry block labels and .cfi_* directives, without
> being overly broad
> 
> This patch therefore adds a new check-function-bodies dg-final test
> that looks for specially-formatted comments.  As a demo, the patch
> converts the SVE vec_init tests to use the new harness instead of
> scan-assembler.
> 
> The regexps in parse_function_bodies are fairly general, but might
> still need to be extended in future for targets like Darwin or AIX.
> 
> Tested on aarch64-linux-gnu (and x86_64-linux-gnu, somewhat pointlessly
> given the contents of the patch).  OK to install?
> 
> Richard
> 
> 
> 2019-07-16  Richard Sandiford  
> 
> gcc/
>   * doc/sourcebuild.texi (check-function-bodies): Document.
> 
> gcc/testsuite/
>   * lib/scanasm.exp (parse_function_bodies, check_function_body)
>   (check-function-bodies): New procedures.
>   * gcc.target/aarch64/sve/init_1.c: Use check-function-bodies
>   instead of scan-assembler.
>   * gcc.target/aarch64/sve/init_2.c: Likewise.
>   * gcc.target/aarch64/sve/init_3.c: Likewise.
>   * gcc.target/aarch64/sve/init_4.c: Likewise.
>   * gcc.target/aarch64/sve/init_5.c: Likewise.
>   * gcc.target/aarch64/sve/init_6.c: Likewise.
>   * gcc.target/aarch64/sve/init_7.c: Likewise.
>   * gcc.target/aarch64/sve/init_8.c: Likewise.
>   * gcc.target/aarch64/sve/init_9.c: Likewise.
>   * gcc.target/aarch64/sve/init_10.c: Likewise.
>   * gcc.target/aarch64/sve/init_11.c: Likewise.
>   * gcc.target/aarch64/sve/init_12.c: Likewise.
OK
jeff


[PPC, committed] Fix bootstrap for non-SVR4 targets.

2019-07-21 Thread Iain Sandoe
The recent change to move code into the new rs6000-call.c file is missing a
default value for the TARGET_NO_PROTOTYPE value (which only affects targets
that don’t include svr4.h).  Fixed by moving the fallback setting from
rs6000.c (which has no uses now) to rs6000-call.c.

pre-approved on IRC by Segher,

tested on powerpc-ibm-aix7.2, powerpc-darwin9,
applied to mainline,
Iain

2019-07-21  Iain Sandoe  

* config/rs6000/rs6000.c (TARGET_NO_PROTOTYPE): Move from here...
* config/rs6000/rs6000-call.c: ... to here.

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index cefb737bae..2ef8c7f861 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -86,6 +86,10 @@
 # endif
 #endif
 
+#ifndef TARGET_NO_PROTOTYPE
+#define TARGET_NO_PROTOTYPE 0
+#endif
+
 struct builtin_description
 {
   const HOST_WIDE_INT mask;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index dbb6a0f007..edd8f2b4df 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -84,10 +84,6 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-#ifndef TARGET_NO_PROTOTYPE
-#define TARGET_NO_PROTOTYPE 0
-#endif
-
   /* Set -mabi=ieeelongdouble on some old targets.  In the future, power server
  systems will also set long double to be IEEE 128-bit.  AIX and Darwin
  explicitly redefine TARGET_IEEEQUAD and TARGET_IEEEQUAD_DEFAULT to 0, so



Re: [PATCH] Make a warning for -Werror=wrong-language (PR driver/91172).

2019-07-21 Thread Jeff Law
On 7/17/19 1:40 AM, Martin Liška wrote:
> On 7/16/19 6:40 PM, Martin Sebor wrote:
>> On 7/16/19 5:16 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> I noticed in the PR that -Werror=argument argument is not verified
>>> that the option is supported by a language we compile for.
>>> That's changed in the patch. However, it's not ideal as I need to mark
>>> the -Werror as the problematic option and one can't print a proper
>>> list of valid languages for which the rejected option can be used.
>>>
>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>>
>>> Ready to be installed?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-07-16  Martin Liska  
>>>
>>> PR driver/91172
>>> * opts-common.c (decode_cmdline_option): Decode
>>> argument of -Werror and check it for a wrong language.
>>> * opts-global.c (complain_wrong_lang): Remove such case.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-07-16  Martin Liska  
>>>
>>> PR driver/91172
>>> * gcc.dg/pr91172.c: New test.
>>> ---
>>>   gcc/opts-common.c  | 20 +++-
>>>   gcc/opts-global.c  |  6 +-
>>>   gcc/testsuite/gcc.dg/pr91172.c |  3 +++
>>>   3 files changed, 27 insertions(+), 2 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.dg/pr91172.c
>> Nice catch!
> Yep, I came to the quite accidentally.
> 
>> @@ -745,6 +746,23 @@ decode_cmdline_option (const char **argv, unsigned int 
>> lang_mask,
>>    /* Check if this is a switch for a different front end.  */
>>    if (!option_ok_for_language (option, lang_mask))
>>  errors |= CL_ERR_WRONG_LANG;
>> +  else if (strcmp (option->opt_text, "-Werror=") == 0
>> +   && strchr (opt_value, ',') == NULL)
>> +    {
>> +  /* Verify that -Werror argument is a valid warning
>> + for a languages.  */
>>
>> Typo: "for a language" (singular).
> Fixed.
> 
>> +  else
>> +    /* Happens for -Werror=warning_name.  */
>> +    warning (0, "command-line error argument %qs is not valid for %s",
>> + text, bad_lang);
>>
>> It might be better phrased as something like
>>
>>   "%<-Werror=%> argument %qs is not valid for %s"
>>
>> The argument is not one of a "command-line error."  It's one
>> to the -Werror option (which can be specified in other places
>> besides the command line).
> I like language corrections from native speakers.
> 
> I'm sending updated version of the patch.
> Thanks,
> Martin
> 
>> Martin
> 
> 
> 0001-Make-a-warning-for-Werror-wrong-language-PR-driver-9.patch
> 
> From 03baf640c12ea6dfda2215ae07d288b292179217 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Tue, 16 Jul 2019 11:11:00 +0200
> Subject: [PATCH] Make a warning for -Werror=wrong-language (PR driver/91172).
> 
> gcc/ChangeLog:
> 
> 2019-07-16  Martin Liska  
> 
>   PR driver/91172
>   * opts-common.c (decode_cmdline_option): Decode
>   argument of -Werror and check it for a wrong language.
>   * opts-global.c (complain_wrong_lang): Remove such case.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-07-16  Martin Liska  
> 
>   PR driver/91172
>   * gcc.dg/pr91172.c: New test.
OK
jeff


Re: Ping: [PATCH] x86/AVX512: improve generated code for bit-wise negation of vectors of integers

2019-07-21 Thread Jeff Law
On 7/18/19 10:05 AM, Jan Beulich wrote:
 On 27.06.19 at 10:59,  wrote:
>> NOT on vectors of integers does not require loading a constant vector of
>> all ones into a register - VPTERNLOG can be used here (and could/should
>> be further used to carry out other binary and ternary logical operations
>> which don't have a special purpose instruction).
>>
>> gcc/
>> 2019-06-27  Jan Beulich  
>>
>>  * config/i386/sse.md (ternlogsuffix): New.
>>  (one_cmpl2): Don't force CONSTM1_RTX into a register when
>>  AVX512F is in use.
>>  (one_cmpl2): New.
I'll trust you got the constant right, I didn't work through that aspect
of how vpternlog works.

OK for the trunk,
jeff


[wwwdocs] Update C++ DR list

2019-07-21 Thread Marek Polacek
A small update.

Applied to CVS.

Index: cxx-dr-status.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx-dr-status.html,v
retrieving revision 1.23
diff -u -r1.23 cxx-dr-status.html
--- cxx-dr-status.html  8 Jul 2019 18:51:50 -   1.23
+++ cxx-dr-status.html  21 Jul 2019 20:58:37 -
@@ -2970,7 +2970,7 @@
   http://wg21.link/cwg421";>421
   CD1
   Is rvalue.field an rvalue?
-  ?
+  Yes
   
 
 
@@ -4335,8 +4335,8 @@
   http://wg21.link/cwg616";>616
   CD3
   Definition of “indeterminate value”
-  ?
-  
+  9
+  https://gcc.gnu.org/PR67853";>PR67853
 
 
   http://wg21.link/cwg617";>617
@@ -8515,7 +8515,7 @@
   http://wg21.link/cwg1213";>1213
   CD3
   Array subscripting and xvalues
-  ?
+  9
   
 
 


C++ PATCH to add test for c++/67853

2019-07-21 Thread Marek Polacek
Tested on x86_64-linux, applying.

2019-07-21  Marek Polacek  

PR c++/67853
* g++.dg/cpp0x/decltype72.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/decltype72.C 
gcc/testsuite/g++.dg/cpp0x/decltype72.C
new file mode 100644
index 000..071e0e76210
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/decltype72.C
@@ -0,0 +1,19 @@
+// PR c++/67853
+// { dg-do compile { target c++11 } }
+
+template
+struct is_same
+{
+  static const bool value = false;
+};
+
+template
+struct is_same
+{
+  static const bool value = true;
+};
+
+struct Member {};
+struct A { Member x; };
+A MakeA();
+static_assert(is_same::value, "");


Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-21 Thread Maciej W. Rozycki
Hi Fredrik,

> The present problem is related to GCC and the R5900 short loop bug[1]. It
> turns out GCC emits assembly code like
> 
> loop: addiu   $5,$5,1
>   addiu   $4,$4,1
>   lb  $2,-1($5)
>   .setnoreorder
>   .setnomacro
>   bne $2,$3,loop
>   sb  $2,-1($4)
>   .setmacro
>   .setreorder
> 
> that is undefined for the R5900 (this short loop has five instructions),
> for simple C code such as
> 
>   while ((*s++ = *p++) != '\n')
>   ;
> 
> in the kernel. The noreorder directive prohibits GAS from corrections, and
> GAS really ought to give an error for it, I think. In the meantime, I have a
> tool that does machine code analysis of ELF objects to catch undefined R5900
> short loops, including those made with assembler macros in the kernel.

 I think that should be a GAS warning really (similarly to macros that 
expand to multiple instructions in a delay slot) as people ought to be 
allowed to do what they wish, and then `-Werror' can be used for code 
quality enforcement (and possibly disabled on a case-by-case basis).

> [ In theory, GAS could actually insert NOPs prior to the noreorder directive,
> to make the loop longer that six instructions, but GAS does not have that
> kind of capability. Another option that GCC prevents is to place a NOP in
> the delay slot. ]

 Well, GAS does have that capability, although of course it is not enabled 
for `noreorder' code.  For generated code I think however that usually it 
will be cheaper performance-wise if a non-trivial delay-slot instruction 
is never produced in the affected cases (i.e. a dummy NOP is always used).

> A reasonable fix for GCC is perhaps to update gcc/config/mips/mips.md to not
> make explicit use of the branch delay slot, as suggested by the patch below?
> Then GCC will emit
> 
> loop: addiu   $5,$5,1
>   addiu   $4,$4,1
>   lb  $2,-1($5)
>   sb  $2,-1($4)
>   bne $2,$3,loop
> 
> that GAS will adjust in the ELF object to
> 
>4: 24a50001addiu   a1,a1,1
>8: 24840001addiu   a0,a0,1
>c: 80a2lb  v0,-1(a1)
>   10: a082sb  v0,-1(a0)
>   14: 1443fffbbne v0,v1,4
>   18: nop
> 
> where a NOP is placed in the delay slot to avoid the bug. For longer loops,
> this relies on GAS making appropriate use of the delay slot. I'm not sure if
> GAS is good at that, though?

 I'm sort-of surprised that GCC has produced `reorder' code here, making 
it rely on GAS for delay slot scheduling.  Have you used an unusual set of 
options that prevents GCC from making `noreorder' code, which as I recall 
is the usual default (not for the MIPS16 mode IIRC, as well as some 
obscure corner cases)?

> On the R5900 short loops need to be fixed by inserting a NOP in the
> branch delay slot.
> 
> The short loop bug under certain conditions causes loops to execute
> only once or twice.  We must ensure that the assembler never
> generates loops that satisfy all of the following conditions:
> 
> - a loop consists of less than or equal to six instructions
>   (including the branch delay slot);
> - a loop contains only one conditional branch instruction at the end
>   of the loop;
> - a loop does not contain any other branch or jump instructions;
> - a branch delay slot of the loop is not NOP (EE 2.9 or later).
> 
> We need to do this because of a hardware bug in the R5900 chip.
> 
> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index e17b1d522f0..acd31a8960c 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -1091,6 +1091,7 @@
>  
>  (define_delay (and (eq_attr "type" "branch")
>  (not (match_test "TARGET_MIPS16"))
> +(not (match_test "TARGET_FIX_R5900"))
>  (eq_attr "branch_likely" "yes"))
>[(eq_attr "can_delay" "yes")
> (nil)
> @@ -1100,6 +1101,7 @@
>  ;; not annul on false.
>  (define_delay (and (eq_attr "type" "branch")
>  (not (match_test "TARGET_MIPS16"))
> +(not (match_test "TARGET_FIX_R5900"))
>  (ior (match_test "TARGET_CB_NEVER")
>   (and (eq_attr "compact_form" "maybe")
>(not (match_test "TARGET_CB_ALWAYS")))
> 

 I think you need to modify the default `can_delay' attribute definition 
instead (in the same way).  An improved future version might determine the 
exact conditions as noted in your proposed commit description, however I'd 
suggest making this simple change first.

 HTH,

  Maciej


[PATCH v2] [rs6000] Add _mm_blend_epi16 and _mm_blendv_epi8

2019-07-21 Thread Paul Clarke
Add compatibility implementations of _mm_blend_epi16 and _mm_blendv_epi8
intrinsics.

Respective test cases are copied almost verbatim (minor changes to
the dejagnu head lines) from i386.

2019-07-21  Paul A. Clarke  

[gcc]

* config/rs6000/smmintrin.h (_mm_blend_epi16): New.
(_mm_blendv_epi8): New.

[gcc/testsuite]

* gcc.target/powerpc/sse4_1-check.h: New.
* gcc.target/powerpc/sse4_1-pblendvb.c: New.
* gcc.target/powerpc/sse4_1-pblendw.c: New.
* gcc.target/powerpc/sse4_1-pblendw-2.c: New.

Tested on 64bit LE, 64bit and 32bit BE.

v2: algorithm improvements as suggested by Segher.  Note that _mm_blend_epi16,
which now uses vec_gb, also requires the use of vec_unpackh to handle the
16 bit elements.  It also requires a vec_reve on big endian, due to the endian
characteristics of vec_gb.  Both are still much shorter.  Thanks, Segher!

Index: gcc/config/rs6000/smmintrin.h
===
--- gcc/config/rs6000/smmintrin.h   (revision 273615)
+++ gcc/config/rs6000/smmintrin.h   (working copy)
@@ -66,4 +66,24 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
+{
+  __v16qi __charmask = vec_splats ((signed char) __imm8);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
+{
+  const __v16qu __seven = vec_splats ((unsigned char) 0x07);
+  __v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
+  return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
+}
+
 #endif
Index: gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
===
--- gcc/testsuite/gcc.target/powerpc/sse4_1-check.h (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/sse4_1-check.h (working copy)
@@ -0,0 +1,27 @@
+#include 
+#include 
+
+#include "m128-check.h"
+
+//#define DEBUG 1
+
+#define TEST sse4_1_test
+
+static void sse4_1_test (void);
+
+static void
+__attribute__ ((noinline))
+do_test (void)
+{
+  sse4_1_test ();
+}
+
+int
+main ()
+{
+  do_test ();
+#ifdef DEBUG
+  printf ("PASSED\n");
+#endif
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c
===
--- gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/sse4_1-pblendvb.c  (working copy)
@@ -0,0 +1,71 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include 
+#include 
+
+#define NUM 20
+
+static void
+init_pblendvb (unsigned char *src1, unsigned char *src2,
+  unsigned char *mask)
+{
+  int i, sign = 1; 
+
+  for (i = 0; i < NUM * 16; i++)
+{
+  src1[i] = i* i * sign;
+  src2[i] = (i + 20) * sign;
+  mask[i] = (i % 3) + ((i * (14 + sign))
+  ^ (src1[i] | src2[i] | (i*3)));
+  sign = -sign;
+}
+}
+
+static int
+check_pblendvb (__m128i *dst, unsigned char *src1,
+   unsigned char *src2, unsigned char *mask)
+{
+  unsigned char tmp[16];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 16; j++)
+if (mask [j] & 0x80)
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  union
+{
+  __m128i x[NUM];
+  unsigned char c[NUM * 16];
+} dst, src1, src2, mask;
+  int i;
+
+  init_pblendvb (src1.c, src2.c, mask.c);
+
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blendv_epi8 (src1.x[i], src2.x[i], mask.x[i]);
+  if (check_pblendvb (&dst.x[i], &src1.c[i * 16], &src2.c[i * 16],
+ &mask.c[i * 16]))
+   abort ();
+}
+}
Index: gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c
===
--- gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/sse4_1-pblendw-2.c (working copy)
@@ -0,0 +1,80 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include "sse4_1-check.h"
+
+#include 
+#include 
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xfe
+
+static void
+init_pblendw (short *src1, short *

Re: cp: implementation of p1301 for C++

2019-07-21 Thread JeanHeyd Meneide
I think I managed to fix all of the issues. Do let me know if I missed
anything!
--
diff --git a/.gitignore b/.gitignore
index b53f60db792..8988746a314 100644
--- a/.gitignore
+++ b/.gitignore
@@ -55,3 +55,6 @@ REVISION
 /mpc*
 /gmp*
 /isl*
+
+# ignore some editor-specific files
+.vscode/*
\ No newline at end of file
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 711a31ea597..1c70f9d769f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -52,6 +52,12 @@
  * gcc/config/or1k/predicates.md (volatile_mem_operand): New.
  (reg_or_mem_operand): New.

+2019-07-22  ThePhD  
+
+ p1301
+ * escaped_string.h: New. Refactored out of tree.c to make more
+ broadly available (e.g. to parser.c, cvt.c).
+
 2019-07-21  Iain Sandoe  

  * config/rs6000/rs6000.c (TARGET_NO_PROTOTYPE): Move from here...
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index e6452542bcc..0c3bdbc2fd1 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  ThePhD  
+
+ p1301
+ * c-family/c-lex.c: increase [[nodiscard]] feature macro
+ value (final value pending post-Cologne mailing)
+
 2019-07-20  Jakub Jelinek  

  * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_LOOP.
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 851fd704e5d..f2c0b62c95b 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
   else if (is_attribute_p ("deprecated", attr_name))
  result = 201309;
   else if (is_attribute_p ("maybe_unused", attr_name)
-   || is_attribute_p ("nodiscard", attr_name)
|| is_attribute_p ("fallthrough", attr_name))
  result = 201603;
   else if (is_attribute_p ("no_unique_address", attr_name)
|| is_attribute_p ("likely", attr_name)
|| is_attribute_p ("unlikely", attr_name))
  result = 201803;
+  else if (is_attribute_p ("nodiscard", attr_name))
+ result = 201907; /* placeholder until C++20 Post-Cologne Working Draft. */
   if (result)
  attr_name = NULL_TREE;
 }
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index d645cdef147..b6aa7f543f6 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,14 @@
+2019-07-22  ThePhD  
+
+ p1301
+ * tree.c: Implement p1301 - nodiscard("should have a reason"))
+ Added C++2a nodiscard string message handling.
+ Increase nodiscard argument handling max_length from 0
+ to 1. (error C++2a gated)
+ * parser.c: add requirement that nodiscard only be seen
+ once in attribute-list (C++2a gated)
+ * cvt.c: add nodiscard message to output, if applicable
+
 2019-07-20  Jason Merrill  

  * cp-tree.h (ovl_iterator::using_p): A USING_DECL by itself was also
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 23d2aabc483..e473025bb66 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"

 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1029,26 +1030,42 @@ maybe_warn_nodiscard (tree expr, impl_conv_void
implicit)
   if (implicit != ICV_CAST && fn
   && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
 {
+  tree attr = DECL_ATTRIBUTES (fn);
+  escaped_string msg;
+  if (attr)
+ msg.escape (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
+  bool has_msg = static_cast(msg);
+  const char* pre_msg = (has_msg ? ": %<" : "");
+  const char* raw_msg = (has_msg ? (const char*)msg : "");
+  const char* post_msg = (has_msg ? "%>" : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
-  "ignoring return value of %qD, "
-  "declared with attribute nodiscard", fn))
- inform (DECL_SOURCE_LOCATION (fn), "declared here");
+ "ignoring return value of %qD, "
+ "declared with attribute %qE%s%s%s", fn, attr,
pre_msg, raw_msg, post_msg))
+inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
&& lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
 {
+  tree attr = TYPE_ATTRIBUTES (rettype);
+  escaped_string msg;
+  if (attr)
+ msg.escape (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
+  bool has_msg = static_cast(msg);
+  const char* pre_msg = (has_msg ? ": %<" : "");
+  const char* raw_msg = (has_msg ? (const char*)msg : "");
+  const char* post_msg = (has_msg ? "%>" : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
-  "ignoring returned value of type %qT, "
-  "declared with attribute nodiscard", rettype))
- {
-  if (fn)
-inform (DECL_SOURCE_LOCATION (fn),
-"in call to %qD, declared here", fn);
-  inform (DECL_SOURCE_LOCATION (TYPE_NAME (rettype)),
-  "%qT declared here", rettype);
- }
+  "ignoring returned valu

Re: cp: implementation of p1301 for C++

2019-07-21 Thread JeanHeyd Meneide
Oops. I learned that %< and %> do not get applied as part of the string
arguments, only the initial format string. So, updated patch:
-
diff --git a/.gitignore b/.gitignore
index b53f60db792..8988746a314 100644
--- a/.gitignore
+++ b/.gitignore
@@ -55,3 +55,6 @@ REVISION
 /mpc*
 /gmp*
 /isl*
+
+# ignore some editor-specific files
+.vscode/*
\ No newline at end of file
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 711a31ea597..1c70f9d769f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -52,6 +52,12 @@
  * gcc/config/or1k/predicates.md (volatile_mem_operand): New.
  (reg_or_mem_operand): New.

+2019-07-22  ThePhD  
+
+ p1301
+ * escaped_string.h: New. Refactored out of tree.c to make more
+ broadly available (e.g. to parser.c, cvt.c).
+
 2019-07-21  Iain Sandoe  

  * config/rs6000/rs6000.c (TARGET_NO_PROTOTYPE): Move from here...
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index e6452542bcc..0c3bdbc2fd1 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  ThePhD  
+
+ p1301
+ * c-family/c-lex.c: increase [[nodiscard]] feature macro
+ value (final value pending post-Cologne mailing)
+
 2019-07-20  Jakub Jelinek  

  * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_LOOP.
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 851fd704e5d..f2c0b62c95b 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
   else if (is_attribute_p ("deprecated", attr_name))
  result = 201309;
   else if (is_attribute_p ("maybe_unused", attr_name)
-   || is_attribute_p ("nodiscard", attr_name)
|| is_attribute_p ("fallthrough", attr_name))
  result = 201603;
   else if (is_attribute_p ("no_unique_address", attr_name)
|| is_attribute_p ("likely", attr_name)
|| is_attribute_p ("unlikely", attr_name))
  result = 201803;
+  else if (is_attribute_p ("nodiscard", attr_name))
+ result = 201907; /* placeholder until C++20 Post-Cologne Working Draft. */
   if (result)
  attr_name = NULL_TREE;
 }
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index d645cdef147..b6aa7f543f6 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,14 @@
+2019-07-22  ThePhD  
+
+ p1301
+ * tree.c: Implement p1301 - nodiscard("should have a reason"))
+ Added C++2a nodiscard string message handling.
+ Increase nodiscard argument handling max_length from 0
+ to 1. (error C++2a gated)
+ * parser.c: add requirement that nodiscard only be seen
+ once in attribute-list (C++2a gated)
+ * cvt.c: add nodiscard message to output, if applicable
+
 2019-07-20  Jason Merrill  

  * cp-tree.h (ovl_iterator::using_p): A USING_DECL by itself was also
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 23d2aabc483..d89b6bcf07a 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"

 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1029,26 +1030,40 @@ maybe_warn_nodiscard (tree expr, impl_conv_void
implicit)
   if (implicit != ICV_CAST && fn
   && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
 {
+  tree attr = DECL_ATTRIBUTES (fn);
+  escaped_string msg;
+  if (attr)
+ msg.escape (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
+  bool has_msg = static_cast(msg);
+  const char* pre_msg = (has_msg ? ": " : "");
+  const char* raw_msg = (has_msg ? (const char*)msg : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
-  "ignoring return value of %qD, "
-  "declared with attribute nodiscard", fn))
- inform (DECL_SOURCE_LOCATION (fn), "declared here");
+ "ignoring return value of %qD, "
+ "declared with attribute %%s%s", fn,
pre_msg, raw_msg))
+inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
&& lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
 {
+  tree attr = TYPE_ATTRIBUTES (rettype);
+  escaped_string msg;
+  if (attr)
+ msg.escape (TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;
+  bool has_msg = static_cast(msg);
+  const char* pre_msg = (has_msg ? ": " : "");
+  const char* raw_msg = (has_msg ? (const char*)msg : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
-  "ignoring returned value of type %qT, "
-  "declared with attribute nodiscard", rettype))
- {
-  if (fn)
-inform (DECL_SOURCE_LOCATION (fn),
-"in call to %qD, declared here", fn);
-  inform (DECL_SOURCE_LOCATION (TYPE_NAME (rettype)),
-  "%qT declared here", rettype);
- }
+  "ignoring returned value of type %qT, "
+  "declared with attribute %%s%s",
retty

Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-21 Thread Kewen.Lin
Hi Bin,

on 2019/7/21 上午11:07, Bin.Cheng wrote:
> On Wed, Jun 19, 2019 at 7:47 PM Kewen.Lin  wrote:
>>
>> Hi all,
>>
>> This is the following patch after 
>> https://gcc.gnu.org/ml/gcc-patches/2019-06/msg00910.html
>>
>> Main steps:
>>   1) Identify the doloop cmp type iv use and record its bind_cand (explain 
>> it later).
>>   2) Set zero cost for pairs between this use and any iv cand.
>>   3) IV cand set selecting algorithm runs as usual.
>>   4) Fix up the selected iv cand for doloop use if need.
>>
>> It only focuses on the targets like Power which has specific count register.
>> target hook have_count_reg_decr_p is proposed for it.
>>
>> Some notes:
>>
>> *) Why we need zero cost?  How about just decrease the cost for the pair
>>between doloop use and its original iv cand?  How about just decrease
>>the cost for the pair between doloop use and one selected iv cand?
>>
>>Since some target supports hardware count register for decrement and
>>branch, it doesn't need the general instruction sequence for decr, cmp and
>>branch in general registers.  The cost of moving count register to GPR
>>is generally high, so it's standalone and can't be shared with other iv
>>uses.  It means IVOPTs can take doloop use as invisible (zero cost).
>>
>>Let's take a look at PR80791 for example.
>>
>> original biv (cand 4)  use derived iv (cand 6)
>>  generic use:   4  0
>>  comp use (doloop use): 0 infinite
>>
>> For iv cost, original biv has cost 4 while use derived iv has cost 5.
>> When IVOPTs considers doloop use, the optimal cost is 8 (original biv
>> iv cost 4 + use cost 4).  Unfortunately it's not actually optimal, since
>> later doloop transformation updates loop closing by count register,
>> original biv (and its update) won't be needed in loop closing any more.
>> The generic use become the only use for original biv.  That means, if we
>> know the doloop will perform later, we shouldn't consider the doloop use
>> when determining IV set.  If we don't consider it, the algorithm will
>> choose iv cand 6 with total cost 5 (iv cost 5 + use cost 0).
>>
>> From the above, we can see that to decrease the cost for the pair between
>> doloop use and original biv doesn't work.  Meanwhile it's hard to predict
>> one good iv cand in final optimal set here and pre-update the cost
>> between it and doloop use.  The analysis would be heavy and imperfect.
>>
>> *) Why we need bind_cand?
>>
>> As above, we assign zero cost for pairs between doloop use and each iv
>> cand.  It's possible that doloop use gets assigned one iv cand which is
>> invalid to be used during later rewrite.  Then we have to fix it up with 
>> iv
>> cand originally used for it.  It's fine whatever this iv cand exists in
>> final iv cand set or not, even if it's not in the set, it will be
>> eliminated in doloop transformation.
>>
>> By the way, I was thinking whether we can replace the hook 
>> have_count_reg_decr_p
>> with flag_branch_on_count_reg.  As the description of the "no-" option, 
>> "Disable
>> the optimization pass that scans for opportunities to use 'decrement and 
>> branch'
>> instructions on a count register instead of instruction sequences that 
>> decrement
>> a register, compare it against zero, and then branch based upon the 
>> result.", it
>> implicitly says it has count register support.  But I noticed that the gate 
>> of
>> doloop_optimize checks this flag, as what I got from the previous 
>> discussions, some
>> targets which can perform doloop_optimize don't have specific count 
>> register, so it
>> sounds we can't make use of the flag, is it correct?
>>
>> Bootstrapped on powerpcle, also ran regression testing on powerpcle, got one 
>> failure
>> which is exposed by this patch and the root cause is duplicate of PR62147.
>> case is gcc.target/powerpc/20050830-1.c
>>
>> Is it OK for trunk?
> Sorry for the delaying.
> 
> I am not in favor of the approach very much.  When rewriting the pass
> last time, we tried to reuse as much code as possible between cost
> computation and iv_use rewriting.  we also followed guideline when
> finite cost computed for cand/use pair, the use should be rewritten
> using the cand successfully.  However, the patch adjust infinite cost
> to zero cost causing cand can't be used to rewrite iv_use selected,
> this is a backward step IMHO.

Thanks a lot for your time and comments.

V2: https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00655.html

The previous version 2 (above link) used the way to teach selection 
algorithm to be aware of the group with bind_cand, it didn't zeroing 
the cost of doloop IV use, but both of them are equivalent to ignore
this doloop IV use in selection. 

Then I was thinking as granted that it changed many places to take care 
of this bind_cand group, worsen the readability and seems 

Re: [PATCH], Patch #10, move PowerPC data structures & helper functions from rs6000.c to rs6000-internal.h

2019-07-21 Thread Segher Boessenkool
Hi!

On Sat, Jul 20, 2019 at 12:13:08PM -0400, Michael Meissner wrote:
> 2019-07-20  Michael Meissner  
> 
>   * config/rs6000/rs6000-internal.h (rs6000_hard_regno_mode_ok_p):
>   Move various declarations relating to addressing and register
>   allocation to rs6000-internal.h from rs6000.c so that in the
>   future we can move things out of rs6000.c.

Just say
  (rs6000_hard_regno_mode_ok_p): New declaration.
for the things that only had a definition before.

>   Make the static arrays global,

That's not this entry.  Say that in the entries where it applies.

>   and define them in rs6000.c.

Say that in the corresponding entry for rs6000.c .

>   (enum rs6000_reg_type): Likewise.

This one always was a declaration.

(... ten gazillion "Likewise." ...)
Most of those are *not* the same thing.  Don't say "likewise" if not
the same comment applies.

If it is hard to write a proper changelog, your patch series probably
could use some restructuring.  Or sometimes the changelog you need just
is more work than you would prefer.

You don't necessarily have to keep the same order in the changelog as
in the patch, if that helps.  But roughly the same order helps review,
so please consider that too ;-)

> +/* Simplfy register classes into simpler classifications.  We assume

(Typo, not new, but still a typo :-) )

> +/* Register classes we care about in secondary reload or go if legitimate
> +   address.  We only need to worry about GPR, FPR, and Altivec registers 
> here,
> +   along an ANY field that is the OR of the 3 register classes.  */

We haven't had GO_IF_LEGITIMATE_ADDRESS for ten years now, please don't
introduce new references to it ;-)

> +#define RELOAD_REG_VALID 0x0001  /* Mode valid in register..  */
> +#define RELOAD_REG_MULTIPLE  0x0002  /* Mode takes multiple registers.  */
> +#define RELOAD_REG_INDEXED   0x0004  /* Reg+reg addressing.  */
> +#define RELOAD_REG_OFFSET0x0008  /* Reg+offset addressing. */
> +#define RELOAD_REG_PRE_INCDEC0x0010  /* PRE_INC/PRE_DEC valid.  */
> +#define RELOAD_REG_PRE_MODIFY0x0020  /* PRE_MODIFY valid.  */
> +#define RELOAD_REG_AND_M16   0x0040  /* AND -16 addressing.  */
> +#define RELOAD_REG_QUAD_OFFSET   0x0080  /* quad offset is limited.  */

Why all the extra zeroes?  If you introduce some 0x100 later, just leave
the 0x80 as 0x80 please, that is much more readable.


It's hard to tell whether the problem is factored sanely, or if this
creates a big mountain of spaghetti instead.  Can you show how this is
used later?

Normally, you send a whole series, and then perhaps many of the first
are preparatory only, but a reviewer can see where things are headed,
and *then* simple refactorings like this can make sense.  The way this
patch looks now you are just making a lot of data global.


Segher


Re: [rs6000] Add documentation for __builtin_mtfsf

2019-07-21 Thread Segher Boessenkool
On Sun, Jul 21, 2019 at 02:50:41PM -0500, Paul Clarke wrote:
> On 7/21/19 1:13 PM, Segher Boessenkool wrote:
> > On Sun, Jul 21, 2019 at 04:06:32AM -0500, Paul Clarke wrote:
> >> +@code{__builtin_mtfsf} takes a constant 8-bit integer field mask and a
> >> +representation of the new value of the FPSCR and generates the 
> >> @code{mtfsf}
> >> +instruction to copy the supplied value into the FPSCR, subject to the 
> >> field
> >> +mask, each bit of which represents a nibble of the FPSCR.  The

("nybble" fwiw.  Well the spelling list wisely avoids this one, so maybe
so should we ;-) )

> > "A representation of the new value"?  I guess you want to say that it
> > sits in an FPR?
> 
> It's the 2nd parameter to the builtin, so a "double".

Yeah, it's an integer that sits in the low bits of a double.  It's
probably best to not even try to describe it, just refer to the machine
instruction?

> It may or may not be in an FPR, but the user of the builtin doesn't
> really know or care.  (It'll eventually be in an FPR, of course, but
> the user has it in a variable.)  It's a "representation" because it's
> not actually the new value, because it gets written under mask.

> > Before we document __builtin_mtfsf, maybe we should make it work with
> > the W and/or L fields first, or at least, decide how we want that?
> 
> It's been available but undocumented for ages.  Do you want me to thus
> document how it is currently implemented by including something like
> "...generates the mtfsf (extended mnemonic) instruction ..." ?

That's a good idea yes.  L=W=0.

> If you think the basic mnemonic form needs to be supported, that's a
> whole other piece of work ;-)

Sure :-)  My thinking was, if it wasn't documented and we want to
define new variants, we may have a bit more leeway.  But since this
is such and old builtin, that's not going to happen anyway.

So can you reword a little bit and resend please?  It really doesn't
need to say more than "this does what the mtfsf insn does", and saying
more is Hard :-)

Thanks,


Segher


Re: [PATCH v2] [rs6000] Add _mm_blend_epi16 and _mm_blendv_epi8

2019-07-21 Thread Segher Boessenkool
On Sun, Jul 21, 2019 at 05:22:19PM -0500, Paul Clarke wrote:
> Add compatibility implementations of _mm_blend_epi16 and _mm_blendv_epi8
> intrinsics.
> 
> Respective test cases are copied almost verbatim (minor changes to
> the dejagnu head lines) from i386.
> 
> 2019-07-21  Paul A. Clarke  
> 
> [gcc]
> 
>   * config/rs6000/smmintrin.h (_mm_blend_epi16): New.
>   (_mm_blendv_epi8): New.
> 
> [gcc/testsuite]
> 
>   * gcc.target/powerpc/sse4_1-check.h: New.
>   * gcc.target/powerpc/sse4_1-pblendvb.c: New.
>   * gcc.target/powerpc/sse4_1-pblendw.c: New.
>   * gcc.target/powerpc/sse4_1-pblendw-2.c: New.
> 
> Tested on 64bit LE, 64bit and 32bit BE.
> 
> v2: algorithm improvements as suggested by Segher.  Note that _mm_blend_epi16,
> which now uses vec_gb, also requires the use of vec_unpackh to handle the
> 16 bit elements.  It also requires a vec_reve on big endian, due to the endian
> characteristics of vec_gb.  Both are still much shorter.  Thanks, Segher!

Ah yes, I missed those "details".  Glad to hear it still helps.

Approved for trunk, please apply.  Thanks!

Do we need/want backports for this?


Segher


Re: cp: implementation of p1301 for C++

2019-07-21 Thread Segher Boessenkool
Hi JeanHeyd,

Just some patch-technical comments:

On Mon, Jul 22, 2019 at 01:53:23AM +0200, JeanHeyd Meneide wrote:
> diff --git a/.gitignore b/.gitignore
> index b53f60db792..8988746a314 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -55,3 +55,6 @@ REVISION
>  /mpc*
>  /gmp*
>  /isl*
> +
> +# ignore some editor-specific files
> +.vscode/*

The dotfiles are earlier in this file.

Do we want this here at all?  A user of this IDE should probably have
something like this in his global .gitignore, instead.

(Oh, and it should be a separate patch, anyway).

> \ No newline at end of file

That is wrong; please end all (text) files with a newline.

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog

Don't put changelogs in the diff; instead, put them *before* the patch.

> index 711a31ea597..1c70f9d769f 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -52,6 +52,12 @@
>   * gcc/config/or1k/predicates.md (volatile_mem_operand): New.
>   (reg_or_mem_operand): New.
> 
> +2019-07-22  ThePhD  
> +
> + p1301
> + * escaped_string.h: New. Refactored out of tree.c to make more
> + broadly available (e.g. to parser.c, cvt.c).
> +
>  2019-07-21  Iain Sandoe  
> 
>   * config/rs6000/rs6000.c (TARGET_NO_PROTOTYPE): Move from here...

No one else but you can ever apply that like this.  Also, you're adding
and entry to the middle of a changelog?

> diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
> index e6452542bcc..0c3bdbc2fd1 100644
> --- a/gcc/c-family/ChangeLog
> +++ b/gcc/c-family/ChangeLog
> @@ -1,3 +1,9 @@
> +2019-07-22  ThePhD  
> +
> + p1301
> + * c-family/c-lex.c: increase [[nodiscard]] feature macro
> + value (final value pending post-Cologne mailing)

Sentences end in a full stop.  Sentences start with a capital letter.
All lines in a changelog are indented with a tab (not with a space).

> -  "ignoring return value of %qD, "
> -  "declared with attribute nodiscard", fn))
> - inform (DECL_SOURCE_LOCATION (fn), "declared here");
> + "ignoring return value of %qD, "
> + "declared with attribute %%s%s", fn,
> pre_msg, raw_msg))
> +inform (DECL_SOURCE_LOCATION (fn), "declared here");

Your email client wraps lines.  This line is much too long, too.

Your email client ate those tabs as well it seems?  Please fix that.


Segher


Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-21 Thread Segher Boessenkool
Hi!

(Maybe I am missing half of the discussion -- sorry if so).

I think we should have a new iv for just the doloop (which can have the
same starting value and step and type as another iv).

Has this been considered?


Segher


Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-21 Thread Kewen.Lin
Hi Segher,

on 2019/7/22 下午2:26, Segher Boessenkool wrote:
> Hi!
> 
> (Maybe I am missing half of the discussion -- sorry if so).
> 
> I think we should have a new iv for just the doloop (which can have the
> same starting value and step and type as another iv).
> 
> Has this been considered?
> 
> 

I don't have any patches to introduce it.  I guess you mean one pre-bind
candidate is dedicated to doloop use only?  Version 2 introduced pre-bind,
but I dropped it as it's invasive to the current selection algorithm.

The current implementation is to zeroing cost for doloop use with any 
candidates and let selection algorithm pick up whatever for it.  I think
it's fine since doloop_optimize can transform anythings to expected only
if it knows the iteration count.

Thanks,
Kewen