Re: [patch, fortran] Implement FINDLOC

2018-10-28 Thread Paul Richard Thomas
Hi Thomas,

The patch is ready to go. Please correct the following tiny nits:

s/Check that en expression/Check that an expression/

s/Set this if resolution has already happened and it could be
harmful/Set this if resolution has already happened. It could be
harmful/

An even tinier, probably ignorable one: Why did you break this line?
-/* MINLOC and MAXLOC get special treatment because their argument
-   might have to be reordered.  */

Many thanks for working on this.

Cheers

Paul


On Tue, 23 Oct 2018 at 22:03, Thomas Koenig  wrote:
>
> Am 23.10.18 um 18:16 schrieb Dominique d'Humières:
> >
>
> >> Anyway, the attached patch fixes this,
> >
> > It now gives the error
> >
> > 4 |integer, parameter :: I_FINDLOC_BACK(1) = findloc([1,1],1, &
> >|1
> > Error: transformational intrinsic 'findloc' at (1) is not permitted in an 
> > initialization expression
>
> That error message was misleading, the new one now has
>
> Error: Parameter 'x' at (1) has not been declared or is a variable,
> which does not reduce to a constant expression
>
> > The following test
> >
> > program logtest3
> > implicit none
> > ! !
> > ! *** Everything depends on this parameter ***!
> >
> > integer, parameter :: A1 = 2
> > logical :: L
> > L = transfer(A1,L)
> > call sub(L)
> > end program logtest3
> >
> > subroutine sub(x)
> > implicit none
> > logical x
> > integer a(1)
> > character(*), parameter :: strings(2) = ['.TRUE. ','.FALSE.']
> >
> > a = findloc([1,1],1,mask=[x,.TRUE.])
> > write(*,'(a)') 'Value by FINDLOC(MASK): '// &
> >trim(strings(a(1)))
> > a = findloc([1,1],1,back=x)
> > write(*,'(a)') 'Value by FINDLOC(BACK): '// &
> >trim(strings(3-a(1)))
> >
> > end subroutine sub
> >
> > does not link:
> >
> >  8 |L = transfer(A1,L)
> >|   1
> > Warning: Assigning value other than 0 or 1 to LOGICAL has undefined result 
> > at (1)
> > Undefined symbols for architecture x86_64:
> >"__gfortran_findloc0_i4", referenced from:
> >_sub_ in ccnoLKfH.o
> >"__gfortran_mfindloc0_i4", referenced from:
> >_sub_ in ccnoLKfH.o
> > ld: symbol(s) not found for architecture x86_64
> > collect2: error: ld returned 1 exit status
>
> Ah, I didn't include the newly generated files in the previous patch.
> Now included.
>
>
> > Finally the line before the end of findloc_6.f90 should be
> >
> >if (findloc(ch,"CC ",dim=1,mask=false) /= 0) stop 23
>
> Changed, also the whitespace fixes that Bernhard mentioned.
>
> So, I think this should be clear for trunk now.  I will supply
> the documentation later.
>
> Regards
>
> Thomas



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [Patch, fortran] PR40196 - [F03] [F08] Type parameter inquiry (str%len, a%kind) and Complex parts (z%re, z%im)

2018-10-28 Thread Bernhard Reutner-Fischer
On Sat, 27 Oct 2018 20:03:47 +0100
Paul Richard Thomas  wrote:

A few nits.

> + /* Pull an inquiry result out of an expression.  */
> + 
> + static bool
> + find_inquiry_ref (gfc_expr *p, gfc_expr **newp)
> + {
> +   gfc_ref *ref;
> +   gfc_ref *inquiry = NULL;
> +   gfc_expr *tmp;
> + 
> +   tmp = gfc_copy_expr (p);
> + 
> +   if (tmp->ref && tmp->ref->type == REF_INQUIRY)
> + {
> +   inquiry = tmp->ref;
> +   tmp->ref = NULL;
> + }
> +   else
> + {
> +   for (ref = tmp->ref; ref; ref = ref->next)
> + if (ref->next && ref->next->type == REF_INQUIRY)
> +   {
> + inquiry = ref->next;
> + ref->next = NULL;
> +   }
> + }
> + 
> +   if(!inquiry)

missing space before open parenthesis

> *** typedef struct gfc_ref
> *** 1960,1965 
> --- 1963,1970 
>   }
>   ss;
>   
> + inquiry_type i;

inq would be easier to understand and unambiguous imho.

> + /* Used by gfc_match_varspec() to match an inquiry reference.  */
> + 
> + static bool
> + is_inquiry_ref (const char *name, gfc_ref **ref)
> + {
> +   inquiry_type type;
> + 
> +   if (name == NULL)
> + return false;
> + 
> +   if (ref) *ref = NULL;
> + 
> +   switch (name[0])
> + {
> + case 'r':
> +   if (strcmp (name, "re") == 0)
> + type = INQUIRY_RE;
> +   else
> + return false;
> +   break;
> + 
> + case 'i':
> +   if (strcmp (name, "im") == 0)
> + type = INQUIRY_IM;
> +   else
> + return false;
> +   break;
> + 
> + case 'k':
> +   if (strcmp (name, "kind") == 0)
> + type = INQUIRY_KIND;
> +   else
> + return false;
> +   break;
> + 
> + case 'l':
> +   if (strcmp (name, "len") == 0)
> + type = INQUIRY_LEN;
> +   else
> + return false;
> +   break;
> + 
> + default:
> +   return false;
> + }

Is the switch really worth it? I'd have used a plain chain of strcmp,
fwiw.

> !   switch (tmp->u.i)
> ! {
> ! case INQUIRY_RE:
> ! case INQUIRY_IM:
> !   if (!gfc_notify_std (GFC_STD_F2008, "re or im
> part_refs at %C")) !  return MATCH_ERROR;

I guess RE and IM should be capitalised?

> *** gfc_variable_attr (gfc_expr *expr, gfc_t
> *** 2358,2363 
> --- 2521,2527 
> gfc_ref *ref;
> gfc_symbol *sym;
> gfc_component *comp;
> +   bool has_inquiry_part;
>   
> if (expr->expr_type != EXPR_VARIABLE && expr->expr_type !=
> EXPR_FUNCTION) gfc_internal_error ("gfc_variable_attr(): Expression
> isn't a variable"); *** gfc_variable_attr (gfc_expr
> *expr, gfc_t *** 2387,2392 
> --- 2551,2561 
> if (ts != NULL && expr->ts.type == BT_UNKNOWN)
>   *ts = sym->ts;
>   
> +   has_inquiry_part = false;
> +   for (ref = expr->ref; ref; ref = ref->next)
> + if (ref->type == REF_INQUIRY)
> +   has_inquiry_part = true;

you could break here

> + 
> for (ref = expr->ref; ref; ref = ref->next)
>   switch (ref->type)
> {

> Index: gcc/fortran/trans-expr.c
> ===
> *** gcc/fortran/trans-expr.c  (revision 265411)
> --- gcc/fortran/trans-expr.c  (working copy)
> *** conv_parent_component_references (gfc_se
> *** 2510,2515 
> --- 2510,2549 
> conv_parent_component_references (se, &parent);
>   }
>   
> + 
> + static void
> + conv_inquiry (gfc_se * se, gfc_ref * ref, gfc_expr *expr,
> gfc_typespec *ts)
> + {
> +   tree res = se->expr;
> + 
> +   switch (ref->u.i)
> + {
> + case INQUIRY_RE:
> +   res = fold_build1_loc (input_location, REALPART_EXPR,
> +  TREE_TYPE (TREE_TYPE (res)), res);
> +   break;
> + 
> + case INQUIRY_IM:
> +   res = fold_build1_loc (input_location, IMAGPART_EXPR,
> +  TREE_TYPE (TREE_TYPE (res)), res);
> +   break;
> + 
> + case INQUIRY_KIND:
> +   res = build_int_cst (gfc_typenode_for_spec (&expr->ts),
> +ts->kind);
> +   break;
> + 
> + case INQUIRY_LEN:
> +   res = fold_convert (gfc_typenode_for_spec (&expr->ts),
> +   se->string_length);
> +   break;
> + 
> + default:
> +   gcc_unreachable ();
> + }
> +   se->expr = res;

Don't you have to gfc_free_expr (se->expr) or gfc_replace_expr() ?

cheers,


Re: [Patch, fortran] PR40196 - [F03] [F08] Type parameter inquiry (str%len, a%kind) and Complex parts (z%re, z%im)

2018-10-28 Thread Paul Richard Thomas
Hi Bernhard,

Thanks for going through the patch:
snip
> missing space before open parenthesis

Corrected.
snip
> inq would be easier to understand and unambiguous imho.

Why? inquiry_type seems fine to me.

snip
> Is the switch really worth it? I'd have used a plain chain of strcmp,
> fwiw.

I have done it. However, I might revert in order to combine the switch
block where I set the typespec for the primary expression.

snip
> I guess RE and IM should be capitalised?

Done

> you could break here
>
> > +
> > for (ref = expr->ref; ref; ref = ref->next)
> >   switch (ref->type)
> > {

Done

>
> > Index: gcc/fortran/trans-expr.c
snip...

> Don't you have to gfc_free_expr (se->expr) or gfc_replace_expr() ?

No these are tree expressions not gfc_expr. No cleanup is needed.

I haven't added testcases for errors. Does anybody think that this is necessary?

Cheers

Paul


Re: [patch, fortran] Implement FINDLOC

2018-10-28 Thread Thomas Koenig

Hi Paul,


The patch is ready to go. Please correct the following tiny nits:


I have corrected those.


s/Check that en expression/Check that an expression/

s/Set this if resolution has already happened and it could be
harmful/Set this if resolution has already happened. It could be
harmful/



An even tinier, probably ignorable one: Why did you break this line?
-/* MINLOC and MAXLOC get special treatment because their argument
-   might have to be reordered.  */


I think I hit M-q in emacs at some stage - I have left it as it is.

Thanks for the review!

Committed as r265570.

Regards

Thomas


Re: [Patch, fortran] PR40196 - [F03] [F08] Type parameter inquiry (str%len, a%kind) and Complex parts (z%re, z%im)

2018-10-28 Thread Thomas Koenig

Hi Paul,



inq would be easier to understand and unambiguous imho.


Why? inquiry_type seems fine to me.


I think Bernhard means the name of the member, i.

I think it makes sense to leave as it is - gfc_ref is a
struct that occurs a lot in complicated expressions, and the other
members are one and two letters, too.


snip

Is the switch really worth it? I'd have used a plain chain of strcmp,
fwiw.


I have done it. However, I might revert in order to combine the switch
block where I set the typespec for the primary expression.


Whatever suits you best.


I haven't added testcases for errors. Does anybody think that this is necessary?


Might not be a bad idea to run through at least each new error message
again.

There is one illwfL test case which ICEs:

$ cat b.f90
program main
  character(len=:), allocatable :: a
  allocate(a,source="abc")
  a%len = 2
  print *,a
end
$ gfortran b.f90
gimplification failed:
(integer(kind=4)) .a 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x7f138acd15e8 precision:32 min 0x7f138acbcd68 -2147483648> max 

pointer_to_this >

arg:0 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set -1 
canonical-type 0x7f138acd1738 precision:64 min 0x7f138acbcdf8 -9223372036854775808> max 9223372036854775807>

pointer_to_this >
used DI b.f90:1:0 size  
unit-size 
align:64 warn_if_not_align:0 context 0x7f138ae83200 MAIN__>

chain 
used unsigned DI b.f90:2:0 size 64> unit-size 
align:64 warn_if_not_align:0 context 0x7f138ae83200 MAIN__

b.f90:4:0:

4 |   a%len = 2
  |
internal compiler error: gimplification failed
0xb45602 gimplify_expr(tree_node**, gimple**, gimple**, bool 
(*)(tree_node*), int)

../../trunk/gcc/gimplify.c:12568

Regards

Thomas


[patch, doc, fortran] Document FINDLOC

2018-10-28 Thread Thomas König

Hi,

here is the promised documentation for FINDLOC.  Tested
with "make dvi" and "make pdf".

OK for trunk?

Regards

Thomas

2018-10-28  Thomas Koenig  

PR fortran/54613
* gfortran.texi (File format of unformatted sequential files):
Replace random comma with period.
* intrinsic.texi (Intrinsic Procedures): Add FINDLOC to menu.
(FINDLOC): Document.
(MAXLOC): Add refrence to FINDLOC.
(MINLOC): Likewise.
Index: gfortran.texi
===
--- gfortran.texi	(Revision 265569)
+++ gfortran.texi	(Arbeitskopie)
@@ -1479,7 +1479,7 @@ contains a negative number, then there is a preced
 
 In the most simple case, with only one subrecord per logical record,
 both record markers contain the number of bytes of user data in the
-record,
+record.
 
 The format for unformatted sequential data can be duplicated using
 unformatted stream, as shown in the example program for an unformatted
Index: intrinsic.texi
===
--- intrinsic.texi	(Revision 265569)
+++ intrinsic.texi	(Arbeitskopie)
@@ -148,6 +148,7 @@ Some basic guidelines for editing this document:
 * @code{FDATE}: FDATE, Subroutine (or function) to get the current time as a string
 * @code{FGET}:  FGET,  Read a single character in stream mode from stdin
 * @code{FGETC}: FGETC, Read a single character in stream mode
+* @code{FINDLOC}:   FINDLOC,   Search an array for a value
 * @code{FLOOR}: FLOOR, Integer floor function
 * @code{FLUSH}: FLUSH, Flush I/O unit(s)
 * @code{FNUM}:  FNUM,  File number function
@@ -6021,8 +6022,68 @@ END PROGRAM
 @ref{FGET}, @ref{FPUT}, @ref{FPUTC}
 @end table
 
+@node FINDLOC
+@section @code{FINDLOC} --- Search an array for a value
+@fnindex FINDLOC
+@cindex findloc
 
+@table @asis
+@item @emph{Description}:
+Determines the location of the element in the array with the value
+given in the @var{VALUE} argument, or, if the @var{DIM} argument is
+supplied, determines the locations of the maximum element along each
+row of the array in the @var{DIM} direction. If @var{MASK} is present,
+only the elements for which @var{MASK} is @code{.TRUE.} are
+considered.  If more than one element in the array has the value
+@var{VALUE}, the location returned is that of the first such element
+in array element order if the @var{BACK} is not present, or if it
+false; otherwise, the location returned is that of the first such
+element. If the array has zero size, or all of the elements of
+@var{MASK} are @code{.FALSE.}, then the result is an array of zeroes.
+Similarly, if @var{DIM} is supplied and all of the elements of
+@var{MASK} along a given row are zero, the result value for that row
+is zero.
 
+@item @emph{Standard}:
+Fortran 2008 and later.
+
+@item @emph{Class}:
+Transformational function
+
+@item @emph{Syntax}:
+@multitable @columnfractions .80
+@item @code{RESULT = FINDLOC(ARRAY, VALUE, DIM [, MASK] [,KIND] [,BACK])}
+@item @code{RESULT = FINDLOC(ARRAY, VALUE, [, MASK] [,KIND] [,BACK])}
+@end multitable
+
+@item @emph{Arguments}:
+@multitable @columnfractions .15 .70
+@item @var{ARRAY} @tab Shall be an array of intrinsic type.
+@item @var{VALUE} @tab A scalar of intrinsic type which is in type
+conformance with @var{ARRAY}.
+@item @var{DIM}   @tab (Optional) Shall be a scalar of type
+@code{INTEGER}, with a value between one and the rank of @var{ARRAY},
+inclusive.  It may not be an optional dummy argument.
+@item @var{KIND} @tab (Optional) An @code{INTEGER} initialization
+expression indicating the kind parameter of the result.
+@item @var{BACK} @tab (Optional) A scalar of type @code{LOGICAL}.
+@end multitable
+
+@item @emph{Return value}:
+If @var{DIM} is absent, the result is a rank-one array with a length
+equal to the rank of @var{ARRAY}.  If @var{DIM} is present, the result
+is an array with a rank one less than the rank of @var{ARRAY}, and a
+size corresponding to the size of @var{ARRAY} with the @var{DIM}
+dimension removed.  If @var{DIM} is present and @var{ARRAY} has a rank
+of one, the result is a scalar.   If the optional argument @var{KIND}
+is present, the result is an integer of kind @var{KIND}, otherwise it
+is of default kind.
+
+@item @emph{See also}:
+@ref{MAXLOC}, @ref{MINLOC}
+
+@end table
+
 @node FLOOR
 @section @code{FLOOR} --- Integer floor function
 @fnindex FLOOR
@@ -10039,7 +10100,7 @@ is present, the result is an integer of kind @var{
 is of default kind.
 
 @item @emph{See also}:
-@ref{MAX}, @ref{MAXVAL}
+@ref{FINDLOC}, @ref{MAX}, @ref{MAXVAL}
 
 @end table
 
@@ -10395,7 +10456,7 @@ is present, the result is an integer of kind @var{
 is of default kind.
 
 @item @emph{See also}:
-@ref{MIN}, @ref{MINVAL}
+@ref{FINDLOC}, @ref{MIN}, @ref{MINVAL}
 
 @end table
 


Re: [PATCH] asm inline

2018-10-28 Thread Eric Gallager
On 10/12/18, Segher Boessenkool  wrote:
> The Linux kernel people want a feature that makes GCC pretend some
> inline assembler code is tiny (while it would think it is huge), so
> that such code will be inlined essentially always instead of
> essentially never.
>
> This patch lets you say "asm inline" instead of just "asm", with the
> result that that inline assembler is always counted as minimum cost
> for inlining.  It implements this for C and C++.
>
> Tested on powerpc64-linux {-m32,-m64}.  Is this okay for trunk?
>
> (I'll wait a bit committing this until kernel testing has finished,
> either way).
>
>
> Segher
>
>
> 2018-10-10  Segher Boessenkool  
>
>   * doc/extend.texi (Using Assembly Language with C): Document asm inline.
>   (Size of an asm): Fix typo.  Document asm inline.
>   * gimple-pretty-print.c (dump_gimple_asm): Handle asm inline.
>   * gimple.h (enum gf_mask): Add GF_ASM_INLINE.
>   (gimple_asm_set_volatile): Fix typo.
>   * gimple_asm_inline_p: New.
>   * gimple_asm_set_inline: New.
>   * gimplify.c (gimplify_asm_expr): Propagate the asm inline flag from
>   tree to gimple.
>   * ipa-icf-gimple.c (func_checker::compare_gimple_asm): Compare the
>   gimple_asm_inline_p flag, too.
>   * tree-core.h (tree_base): Document that protected_flag is ASM_INLINE_P
>   in an ASM_EXPR.
>   * tree-inline.c (estimate_num_insns): If gimple_asm_inline_p return
>   a minimum size for an asm.
>   * tree.h (ASM_INLINE_P): New.
>
> c/
>   * c-parser.c (c_parser_asm_statement): Detect the inline keyword
>   after asm.  Pass a flag for it to build_asm_expr.
>   * c-tree.h (build_asm_expr): Update declaration.
>   * c-typeck.c (build_asm_stmt): Add is_inline parameter.  Use it to
>   set ASM_INLINE_P.
>
> cp/
>   * cp-tree.h (finish_asm_stmt): Update declaration.
>   * parser.c (cp_parser_asm_definition): Detect the inline keyword
>   after asm.  Pass a flag for it to finish_asm_stmt.
>   * pt.c (tsubst_expr): Pass the ASM_INLINE_P flag to finish_asm_stmt.
>   * semantics.c (finish_asm_stmt): Add inline_p parameter.  Use it to
>   set ASM_INLINE_P.
>
> gcc/testsuite/
>   * c-c++-common/torture/asm-inline.c: New testcase.
>
> ---
>  gcc/c/c-parser.c| 16 ++--
>  gcc/c/c-tree.h  |  3 +-
>  gcc/c/c-typeck.c|  3 +-
>  gcc/cp/cp-tree.h|  2 +-
>  gcc/cp/parser.c | 38 +-
>  gcc/cp/pt.c |  2 +-
>  gcc/cp/semantics.c  |  3 +-
>  gcc/doc/extend.texi | 10 -
>  gcc/gimple-pretty-print.c   |  2 +
>  gcc/gimple.h| 24 ++-
>  gcc/gimplify.c  |  1 +
>  gcc/ipa-icf-gimple.c|  3 ++
>  gcc/testsuite/c-c++-common/torture/asm-inline.c | 53
> +
>  gcc/tree-core.h |  3 ++
>  gcc/tree-inline.c   |  3 ++
>  gcc/tree.h  |  3 ++
>  16 files changed, 147 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/torture/asm-inline.c
>
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 1f173fc..2a8c2dd 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -6267,8 +6267,8 @@ c_parser_for_statement (c_parser *parser, bool ivdep,
> unsigned short unroll,
> allowed.
>
> asm-statement:
> - asm type-qualifier[opt] ( asm-argument ) ;
> - asm type-qualifier[opt] goto ( asm-goto-argument ) ;
> + asm type-qualifier[opt] inline[opt] ( asm-argument ) ;
> + asm type-qualifier[opt] inline[opt] goto ( asm-goto-argument ) ;
>
> asm-argument:
>   asm-string-literal

Since this is touching c_parser_asm_statement() it seems relevant to
bug 55681; could you check to see how it interacts with some of the
cases listed in that bug?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55681

> @@ -6287,7 +6287,7 @@ static tree
>  c_parser_asm_statement (c_parser *parser)
>  {
>tree quals, str, outputs, inputs, clobbers, labels, ret;
> -  bool simple, is_goto;
> +  bool simple, is_goto, is_inline;
>location_t asm_loc = c_parser_peek_token (parser)->location;
>int section, nsections;
>
> @@ -6311,6 +6311,13 @@ c_parser_asm_statement (c_parser *parser)
>else
>  quals = NULL_TREE;
>
> +  is_inline = false;
> +  if (c_parser_next_token_is_keyword (parser, RID_INLINE))
> +{
> +  c_parser_consume_token (parser);
> +  is_inline = true;
> +}
> +
>is_goto = false;
>if (c_parser_next_token_is_keyword (parser, RID_GOTO))
>  {
> @@ -6393,7 +6400,8 @@ c_parser_asm_statement (c_parser *parser)
>  c_parser_skip_to_end_of_block_or_statement (parser)

Re: Is the D frontend good to go? (was Re: [PATCH 02/14] Add D frontend (GDC) implementation.)

2018-10-28 Thread Iain Buclaw
On Fri, 26 Oct 2018 at 11:02, Richard Biener  wrote:
>
> On Thu, Oct 25, 2018 at 4:13 PM Iain Buclaw  wrote:
> >
> > On Thu, 25 Oct 2018 at 15:06, David Malcolm  wrote:
> > >
> > > On Tue, 2018-10-23 at 19:21 +0200, Iain Buclaw wrote:
> > > > On Tue, 23 Oct 2018 at 15:48, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Iain Buclaw  writes:
> > > > > > I'm just going to post the diff since the original here, just to
> > > > > > show
> > > > > > what's been done since review comments.
> > > > > >
> > > > > > I think I've covered all that's been addressed, except for the
> > > > > > couple
> > > > > > of notes about the quadratic parts (though I think one of them is
> > > > > > actually O(N^2)).  I've raised bug reports on improving them
> > > > > > later.
> > > > > >
> > > > > > I've also rebased them against trunk, so there's a couple new
> > > > > > things
> > > > > > present that are just to support build.
> > > > >
> > > > > Thanks, this is OK when the frontend is accepted in principle
> > > > > (can't remember where things stand with that).
> > > > >
> > > >
> > > > As discussed, the front-end has already been approved by the SC.
> > > >
> > > > I'm not sure if there's anything else further required, or if any
> > > > final review needs to be done.
> > > >
> > > > Thanks.
> > >
> > > I'm wondering what the state of this is [1]
> > >
> > > Iain: are all of the patches individually approved, with the necessary
> > > issues fixed?
> > >
> >
> > I've posted diffs a few days back that cover all requested changes.
> >
> > > IIRC, the front-end as a whole was approved, pending approval of all of
> > > the individual patches (URL?).  If that's done, then presumably this is
> > > good to go in - unless there was still some license discussion pending?
> > >
> >
> > I have on tab responses from each patch, from what I see, they have
> > all been OK'd.
> >
> > 02: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01432.html
> > 03: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00734.html
> > 04: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00928.html
> > 05: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00592.html
> > 06: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00609.html
> > 07: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00955.html
> > 08: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01270.html
> > 09: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01264.html
> > 10: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01269.html
> > 12: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00735.html
> > 14: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00970.html
> >
> > 1, 11, and 13 are DMD, Druntime, and Phobos, which are mirrored from
> > upstream dlang repositories.  I spoke with Richard Stallman a couple
> > days after this years GNU Cauldron, and he said there is no problem
> > with regards to their license.
> >
> > > I take it that you've already got your contributor paperwork in place,
> > > right?  I see from your maintainers commit that you presumably have svn
> > > access.
> > >
> > > I'm not a global reviewer or steering committee member though; would be
> > > nice to get a "go for it" from one of those.  Richard is a global
> > > reviewer.
> > >
> >
> > Yes, I was going to wait a couple days to make sure that there's no
> > objection, before pressing on with it.
> >
> > Having a "go for it" from one of the reviewers would be nice though.
> >
> > > I'm not sure if it should be one big mega-commit, or split out the same
> > > way you split things out for review.
> > >
> >
> > I think splitting makes sense, though not necessarily in 14 pieces,
> > there are only a few distinct parts.
> >
> > - D language front-end.
> > - D standard and runtime libraries.
> > - D language testsuite
> > - D language support in GCC proper
> > - D language support in GCC targets
> > - Toplevel configure/makefile patches that add front-end and library
> > to the build
> >
> > The first three can be squashed into one commit, as it's only adding new 
> > files.
>
> If you make sure each individual commit still builds splitting is OK, though
> technically I see no need for splitting.
>
> Go for it!
>

Bootstrapped with --enable-languages=all both before and after, and
ran all front-end testsuites on x86_64-linux-gnu, and saw no new
regressions.

So I'm going for it.  Just rebased on trunk and running just the D
testsuite one last time for my own sanity and committing.

Thanks.
--
Iain


C++ PATCH to Implement P0846R0, ADL and function templates

2018-10-28 Thread Marek Polacek
This patch implements P0846R0: ADL and Function Templates that are not Visible

whereby a name for which a normal lookup produces either no result or finds one
or more functions and that is followed by a "<" would be treated as if a 
function
template name had been found and would cause ADL to be performed.  Thus e.g.

namespace N {
  struct S { };
  template void f(S);
}

void
bar (N::S s)
{
  f<3>(s);
}

will now compile; ADL will find N::f.  The gist of the approach I took is in
cp_parser_template_name and setting TEMPLATE_ID_TRY_ADL_P.

One of the side effects is that a function name followed by a < means that the 
< is
always taken as the delimiter of a template-argument-list, rendering e.g. this

  fun-ptr < a;

ill-formed.

There's something I'm not clear on; the proposal talks about an
unqualified-id followed by a <, which is also the case for

  a.foo < 1;

which is "postfix-expression . unqualified-id <", but treating "foo" as a
template name would break valid programs.  I don't think this proposal should
be in effect for class member accesses, so I've disabled it by using scoped_p
below.  See fn-template8.C for a complete testcase for this scenario.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-10-28  Marek Polacek  

Implement P0846R0, ADL and function templates.
* cp-tree.h (TEMPLATE_ID_TRY_ADL_P): New macro.
* decl.c (grokfndecl): Allow FUNCTION_DECL in assert.
* parser.c (cp_parser_postfix_expression): Also perform ADL for
template-id that is TEMPLATE_ID_TRY_ADL_P.
(cp_parser_template_id): Adjust a call to cp_parser_template_name.
Allow FUNCTION_DECL in assert.  Return error node if parsing the
template argument list didn't go well.  Set TEMPLATE_ID_TRY_ADL_P
on the template-id.
(cp_parser_template_name): Add a new parameter.  Consider a name to
refer to a template if it is an unqualified-id followed by <.
(cp_parser_constructor_declarator_p): Adjust a call to
cp_parser_template_name.

* g++.dg/addr_builtin-1.C: Adjust dg-error.
* g++.dg/cpp2a/fn-template1.C: New test.
* g++.dg/cpp2a/fn-template10.C: New test.
* g++.dg/cpp2a/fn-template11.C: New test.
* g++.dg/cpp2a/fn-template12.C: New test.
* g++.dg/cpp2a/fn-template13.C: New test.
* g++.dg/cpp2a/fn-template2.C: New test.
* g++.dg/cpp2a/fn-template3.C: New test.
* g++.dg/cpp2a/fn-template4.C: New test.
* g++.dg/cpp2a/fn-template5.C: New test.
* g++.dg/cpp2a/fn-template6.C: New test.
* g++.dg/cpp2a/fn-template7.C: New test.
* g++.dg/cpp2a/fn-template8.C: New test.
* g++.dg/cpp2a/fn-template9.C: New test.
* g++.dg/parse/fn-template1.C: New test.
* g++.dg/parse/fn-template2.C: New test.
* g++.dg/template/pr61745.C: Add target to dg-error.

diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index 26ded3a9214..aa7ddb0830c 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -409,6 +409,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   SWITCH_STMT_ALL_CASES_P (in SWITCH_STMT)
   REINTERPRET_CAST_P (in NOP_EXPR)
   ALIGNOF_EXPR_STD_P (in ALIGNOF_EXPR)
+  TEMPLATE_ID_TRY_ADL_P (in TEMPLATE_ID_EXPR)
1: IDENTIFIER_KIND_BIT_1 (in IDENTIFIER_NODE)
   TI_PENDING_TEMPLATE_FLAG.
   TEMPLATE_PARMS_FOR_INLINE.
@@ -4987,6 +4988,11 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
 #define ALIGNOF_EXPR_STD_P(NODE) \
   TREE_LANG_FLAG_0 (ALIGNOF_EXPR_CHECK (NODE))
 
+/* True iff this template-id uses an unqualified-id that should
+   be considered to refer to a template-name, as per P0846R0.  */
+#define TEMPLATE_ID_TRY_ADL_P(NODE) \
+  (TREE_LANG_FLAG_0 (TEMPLATE_ID_EXPR_CHECK (NODE)))
+
 /* An enumeration of the kind of tags that C++ accepts.  */
 enum tag_types {
   none_type = 0, /* Not a tag type.  */
diff --git gcc/cp/decl.c gcc/cp/decl.c
index 5ebfaaf85e6..e63743f8a6f 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -8827,7 +8827,9 @@ grokfndecl (tree ctype,
 the information in the TEMPLATE_ID_EXPR.  */
  SET_DECL_IMPLICIT_INSTANTIATION (decl);
 
- gcc_assert (identifier_p (fns) || TREE_CODE (fns) == OVERLOAD);
+ gcc_assert (identifier_p (fns)
+ || TREE_CODE (fns) == OVERLOAD
+ || TREE_CODE (fns) == FUNCTION_DECL);
  DECL_TEMPLATE_INFO (decl) = build_template_info (fns, args);
 
  for (t = TYPE_ARG_TYPES (TREE_TYPE (decl)); t; t = TREE_CHAIN (t))
diff --git gcc/cp/parser.c gcc/cp/parser.c
index ebe326eb923..a99bde334fb 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -2323,7 +2323,7 @@ static tree cp_parser_type_parameter
 static tree cp_parser_template_id
   (cp_parser *, bool, bool, enum tag_types, bool);
 static tree cp_parser_template_name
-  (cp_parser *, bool, bool, bool, enum tag_types, boo

Re: [PATCH] asm inline

2018-10-28 Thread Segher Boessenkool
On Sun, Oct 28, 2018 at 01:32:09PM -0400, Eric Gallager wrote:
> Since this is touching c_parser_asm_statement() it seems relevant to
> bug 55681; could you check to see how it interacts with some of the
> cases listed in that bug?
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55681

I wasn't aware of this PR.  Thanks!


Segher


Re: Fix dg-prune-output regex for versioned namespace

2018-10-28 Thread François Dumont

On 10/25/18 1:16 PM, Jonathan Wakely wrote:

On 24/10/18 21:30 +0200, François Dumont wrote:
Some tests dg-prune-output regex need to be adapted to pass when 
versioned namespace is activated.


I preferred to add the version namespace in the regex rather than 
removing namespace qualification. Let me know if you would prefer the 
other approach.


    * testsuite/23_containers/deque/48101_neg.cc: Add optional version
    namespace in dg-prune-output regex.
    * testsuite/23_containers/vector/48101_neg.cc: Likewise.
    * testsuite/27_io/filesystem/path/io/dr2989.cc: Likewise.

Tested under Linux x86_64 with and without versioned namespace.

Ok to commit ?


We can just simplify them instead of making them more complicated.


Yes we can, I even proposed it.

So I took it for an ok and after checking your proposal with versioned 
namespace enabled and disabled I comitted it.


    * testsuite/23_containers/deque/48101_neg.cc: Remove dg-prune-output
    'std' from regex pattern for versioned namespace compatibility.
    * testsuite/23_containers/vector/48101_neg.cc: Likewise.
    * testsuite/27_io/filesystem/path/io/dr2989.cc: Likewise.

François

diff --git a/libstdc++-v3/testsuite/23_containers/deque/48101_neg.cc b/libstdc++-v3/testsuite/23_containers/deque/48101_neg.cc
index 1f9e3e3b932..5d59f593c41 100644
--- a/libstdc++-v3/testsuite/23_containers/deque/48101_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/deque/48101_neg.cc
@@ -26,5 +26,5 @@ test01()
 }
 
 // { dg-error "non-const, non-volatile value_type" "" { target *-*-* } 0 }
-// { dg-prune-output "std::allocator<.* has no member named " }
+// { dg-prune-output "::allocator<.* has no member named " }
 // { dg-prune-output "must have the same value_type as its allocator" }
diff --git a/libstdc++-v3/testsuite/23_containers/vector/48101_neg.cc b/libstdc++-v3/testsuite/23_containers/vector/48101_neg.cc
index 620170d0a15..89eb62617e5 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/48101_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/48101_neg.cc
@@ -26,5 +26,5 @@ test01()
 }
 
 // { dg-error "non-const, non-volatile value_type" "" { target *-*-* } 0 }
-// { dg-prune-output "std::allocator<.* has no member named " }
+// { dg-prune-output "::allocator<.* has no member named " }
 // { dg-prune-output "must have the same value_type as its allocator" }
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc b/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc
index b9a1235e1fe..b1fb13a0dcf 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/io/dr2989.cc
@@ -32,4 +32,4 @@ void foo(std::iostream& s) {
   s << p; // { dg-error "no match" }
   s >> p; // { dg-error "no match" }
 }
-// { dg-prune-output "no type .* std::enable_if" }
+// { dg-prune-output "no type .*enable_if" }


Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-10-28 Thread Stafford Horne
Hi Segher,

Thank you for the review and thank you for all the help up until now.

On Sat, Oct 27, 2018 at 09:57:30PM -0500, Segher Boessenkool wrote:
> Hi Stafford,
> 
> Some minor comments.  I didn't read the atomics much, but I did look at
> everything else, and it looks fine :-)
> 
> On Sat, Oct 27, 2018 at 01:37:02PM +0900, Stafford Horne wrote:
> > +case ${target} in
> > +or1k*-*-linux*)
> > +tm_file="${tm_file} gnu-user.h linux.h glibc-stdint.h"
> > +tm_file="${tm_file} or1k/linux.h"
> > +;;
> 
> Spaces instead of tabs.

OK, I will fix.

> > +  /* Number of bytes saved on the stack for outgoing/sub-fucntion args.  */
> 
> Typo ("function").

OK.

> > +  /* The sum of sizes: locals vars, called saved regs, stack pointer
> > +   * and an optional frame pointer.
> > +   * Used in expand_prologue () and expand_epilogue().  */
> 
> We don't use leading stars in comments (just spaces), normally.

OK.

> > +  /* Set address to volitile to ensure the store doesn't get optimized 
> > out.  */
> 
> "volatile"

OK.

> > +/* Helper for defining INITIAL_ELIMINATION_OFFSET.
> > +   We allow the following eliminiations:
> > + FP -> HARD_FP or SP
> > + AP -> HARD_FP or SP
> > +
> > +   HFP and AP are the same which is handled below.  */
> > +
> > +HOST_WIDE_INT
> > +or1k_initial_elimination_offset (int from, int to)
> 
> You could calculate this as  some_offset (from) - some_offset (to)  with
> some_offset a simple helper function.  That gives you all possible
> eliminations :-)
> 
> (Each offset is very cheap to compute in your case, so that's not a problem).

Right, Do you mean something like the following?  I think it would work, but I
am not sure it make the code easier to read.  Do you think there would be much
benefits supporting all possible eliminations?


/* Helper function for use with INITIAL_ELIMINATION_OFFSET.  */

static HOST_WIDE_INT
or1k_stack_pointer_offset (int from)
{
   HOST_WIDE_INT offset;

  /* Set OFFSET to the offset from the stack pointer.  */
  switch (from)
{
/* Incoming args are all the way up at the previous frame.  */
case HARD_FRAME_POINTER_REGNUM:
case ARG_POINTER_REGNUM:
  offset = cfun->machine->total_size;
  break;

/* Local args grow downward from the saved registers.  */
case FRAME_POINTER_REGNUM:
  offset = cfun->machine->args_size + cfun->machine->local_vars_size;
  break;

default:
  gcc_unreachable ();
}

  return offset;
}

/* Helper for defining INITIAL_ELIMINATION_OFFSET.
   We allow the following eliminiations:
 FP -> HARD_FP or SP
 AP -> HARD_FP or SP

   HARD_FP and AP are actually the same.  */

HOST_WIDE_INT
or1k_initial_elimination_offset (int from, int to)
{
  return or1k_stack_pointer_offset (from) - or1k_stack_pointer_offset (to);
}


> > +static rtx
> > +or1k_function_value (const_tree valtype,
> > +const_tree fn_decl_or_type ATTRIBUTE_UNUSED,
> > +bool outgoing ATTRIBUTE_UNUSED)
> 
> Since we use C++ now you can write this as
>bool /*outgoing*/)
> or such, for enhanced readability.

Sure, I will remove all ATTRIBUTE_UNUSED instances.

> > +   split.  Symbols are lagitimized using split relocations.  */
> 
> "legitimized"

OK.

> > +void
> > +or1k_expand_move (machine_mode mode, rtx op0, rtx op1)
> > +{
> > +  if (MEM_P (op0))
> > +{
> > +  if (!const0_operand(op1, mode))
> 
> Space before (.

OK. I found a few ore too, thanks.

> > +#undef TARGET_RTX_COSTS
> > +#define TARGET_RTX_COSTS or1k_rtx_costs
> 
> You may want TARGET_INSN_COST as well (it is easier to get (more) correct).

OK, I was not considering that for the first port.  Perhaps after getting this
in?  I think in general the OpenRISC insruction costs are fairly flat for the
ones are using.

> > +  if (hi != 0)
> > +   {
> > + rtx scratch2 = gen_rtx_REG (Pmode, RV_REGNUM);
> > + emit_move_insn (scratch2, GEN_INT (hi));
> > + emit_insn (gen_add2_insn (scratch, scratch2));
> > +}
> 
> Tab instead of spaces.

OK.

> > +  /* Generate a tail call to the target function.  */
> > +  if (! TREE_USED (function))
> 
> No space after !.

Ok.

> > +#define TARGET_RETURN_IN_MEMORYor1k_return_in_memory
> 
> > +#defineTARGET_PASS_BY_REFERENCE or1k_pass_by_reference
> 
> These two have a tab instead of a space (as the rest do).

OK, also some TARGET_* are aligned and some not.  Will fix.

> > +#define WCHAR_TYPE_SIZE32
> 
> And this.

OK.

> > +   This ABI has no adjacent call-saved register, which means that
> > +   DImode/DFmode pseudos cannot be call-saved and will always be
> > +   spilled across calls.  To solve this without changing the ABI,
> > +   remap the compiler internal register numbers to place the even
> > +   call-saved registers r16-r30 in 24-31, and the odd call-clobbered
> > +   registers r17-r31 in 16-23.  */
> 
> Ooh evilness :-)

Richard did this, I thought it was rather

Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-10-28 Thread Richard Henderson
On 10/28/18 2:57 AM, Segher Boessenkool wrote:
>> +(define_insn "xorsi3"
>> +  [(set (match_operand:SI 0 "register_operand" "=r,r")
>> +  (xor:SI
>> +   (match_operand:SI 1 "register_operand"   "%r,r")
>> +   (match_operand:SI 2 "reg_or_s16_operand" " r,I")))]
>> +  ""
>> +  "@
>> +  l.xor\t%0, %1, %2
>> +  l.xori\t%0, %1, %2")
> 
> Is this correct?  Should this be unsigned (u16 and K)?

No, l.xori is signed.


r~


Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-10-28 Thread Segher Boessenkool
Hi!

On Mon, Oct 29, 2018 at 06:47:23AM +0900, Stafford Horne wrote:
> On Sat, Oct 27, 2018 at 09:57:30PM -0500, Segher Boessenkool wrote:
> > > +/* Helper for defining INITIAL_ELIMINATION_OFFSET.
> > > +   We allow the following eliminiations:
> > > + FP -> HARD_FP or SP
> > > + AP -> HARD_FP or SP
> > > +
> > > +   HFP and AP are the same which is handled below.  */
> > > +
> > > +HOST_WIDE_INT
> > > +or1k_initial_elimination_offset (int from, int to)
> > 
> > You could calculate this as  some_offset (from) - some_offset (to)  with
> > some_offset a simple helper function.  That gives you all possible
> > eliminations :-)
> > 
> > (Each offset is very cheap to compute in your case, so that's not a 
> > problem).
> 
> Right, Do you mean something like the following?  I think it would work, but I
> am not sure it make the code easier to read.  Do you think there would be much
> benefits supporting all possible eliminations?

Yes, like that.  It also easily can handle the other combos (those with
STACK_POINTER), and it is easier if you have to switch FRAME_GROWS_DOWNWARD
("false" is better on some args, but "true" is required for ssp).

Your code is fine as-is of course.

> > > +#undef TARGET_RTX_COSTS
> > > +#define TARGET_RTX_COSTS or1k_rtx_costs
> > 
> > You may want TARGET_INSN_COST as well (it is easier to get (more) correct).
> 
> OK, I was not considering that for the first port.  Perhaps after getting this
> in?  I think in general the OpenRISC insruction costs are fairly flat for the
> ones are using.

Oh, this was just a suggestion for the future :-)

If you compile with -dp you will see the cost and length for every insn
annotated; are most/all correct?

> > > +   This ABI has no adjacent call-saved register, which means that
> > > +   DImode/DFmode pseudos cannot be call-saved and will always be
> > > +   spilled across calls.  To solve this without changing the ABI,
> > > +   remap the compiler internal register numbers to place the even
> > > +   call-saved registers r16-r30 in 24-31, and the odd call-clobbered
> > > +   registers r17-r31 in 16-23.  */
> > 
> > Ooh evilness :-)
> 
> Richard did this, I thought it was rather clever. :)

Yes!

> > > +#define FUNCTION_ARG_REGNO_P(r) (r >= 3 && r <= 8)
> > 
> > IN_RANGE ?
> 
> OK, I may change it, I think without the macro, its easy to understand that 
> its
> (inclusive).

Yeah, you'll have to remember that IN_RANGE always is inclusive too.  Maybe
if it were used more that woul become second nature to more people :-)


Segher


Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it

2018-10-28 Thread Kugan Vivekanandarajah
Hi Richard and Jeff,

Thanks for your comments.

On Fri, 26 Oct 2018 at 19:40, Richard Biener  wrote:
>
> On Fri, Oct 26, 2018 at 4:55 AM Jeff Law  wrote:
> >
> > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > Hi,
> > >
> > > PR87528 showed a case where libgcc generated popcount is causing
> > > regression for Skylake.
> > > We also have PR86677 where kernel build is failing because the kernel
> > > does not use the libgcc (when backend is not defining popcount
> > > pattern).  While I agree that the kernel should implement its own
> > > functionality when it is not using the libgcc, I am afraid that the
> > > implementation can have the same performance issues reported for
> > > Skylake in PR87528.
> > >
> > > Therefore, I would like to propose that we disable popcount detection
> > > when we don't have a pattern for that. The attached patch (based on
> > > previous discussions) does this.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > regressions. We need to disable the popcount* testcases. I will have
> > > to define a effective_target_with_popcount in
> > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > Thanks,
> > > Kugan
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-10-25  Kugan Vivekanandarajah  
> > >
> > > * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN 
> > > POPCOUNT
> > > as expensive when backend does not define it.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-10-25  Kugan Vivekanandarajah  
> > >
> > > * gcc.target/aarch64/popcount4.c: New test.
> > >
> > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > patch from you.
> >
> > Richi argued that it's the kernel team's responsibility to provide a
> > popcount since they don't link with libgcc.  And I'm generally in
> > agreement with that position, though it does tend to generate some
> > friction with the kernel developers.  We also run the real risk of GCC 9
> > not being able to build the kernel which, IMHO, would be a disaster from
> > a PR standpoint.
> >
> > I'd like to hear from others here.  I fully realize we're beyond the
> > realm of what is strictly technically correct here from a review standpoint.
>
> As said final value replacement to a library call is probably not wanted
> for optimization purpose, so adjusting expression_expensive_p is OK with
> me.  It might not fully solve the (non-)issue in case another optimization 
> pass
> chooses to materialize niter computation result.
>
> Few comments on the patch:
>
> +  tree fndecl = get_callee_fndecl (expr);
> +
> +  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> +   {
> + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
>
>   combined_fn cfn = gimple_call_combined_fn (expr);
>   switch (cfn)
> {

Did you mean:
combined_fn cfn = get_call_combined_fn (expr);

> ...
>
> cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is 
> mostly
> offline but eventually he knows whether there is a better way to query
>
> +   CASE_CFN_POPCOUNT:
> + /* Check if opcode for popcount is available.  */
> + if (optab_handler (popcount_optab,
> +TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> (expr, 0
> + == CODE_FOR_nothing)
> +   return true;
>
> note that we currently generate builtin calls rather than IFN calls
> (when a direct
> optab is supported).
>
> Another comment on the patch is that you probably have to adjust existing
> popcount testcases to add architecture specific flags enabling suport for
> the instructions, otherwise you won't see loop replacement.
Indeed.
In lib/target-supports.exp, I will try to add support for
check_effective_target_popcount_long.
When I grep for the popcount pattern in md files, I see it is defined for:

tilegx
tilepro
alpha
aarch64  when TARGET_SIMD
ia64
rs6000
i386  when TARGET_POPCOUNT
popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
s390  when TARGET_Z916 && TARGET_64BIT
sparc when TARGET_POPC
arm when TARGET_NEON
mips when ISA_HAS_POP
spu
avr

I can check these targets with the condition.
Another possibility is to check with a sample code and see if we are
getting a libcall in the asm. Not sure if that is straightforward. Are
there any example for such.

We could also move these test to a primary target that is tested often
tested which also defines popcount pattern. I dont think these tests
change for targets and if we can test in one target that could be
enough,

I am happy to implement what is appropriate.

Thanks,
Kugan



>
> Also I think that the expression is only expensive (for final value
> replacement!)
> if you consider optimizing for speed.  When optimizing for size getting rid of
> the loop is probably beneificial unconditionally.  That would leave the
> possibility to switch s

Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR libstdc++/83938)

2018-10-28 Thread François Dumont

Hi

    Some feedback regarding this patch ?

Thanks,
François

On 8/21/18 10:34 PM, François Dumont wrote:
I missed a test that was failing because of this patch. So I revert a 
small part of it and here is the new proposal.


Tested under Linux x86_64, ok to commit ?

François


On 24/07/2018 12:22, François Dumont wrote:

Hi

    Any chance to review this patch ?

François


On 06/06/2018 18:39, François Dumont wrote:

Hi

    I review and rework this proposal. I noticed that the same idea 
to limit buffer size within inplace_merge also apply to stable_sort.


    I also change the decision when buffer is too small to consider 
the buffer size rather than going through successive cuts of the 
original ranges. This way we won't cut the range more than 
necessary. Note that if you prefer this second part could be 
committed seperatly.


    PR libstdc++/83938
    * include/bits/stl_algo.h:
    (__stable_partition_adaptive): When buffer to small cut range using
    buffer size.
    (__inplace_merge): Take temporary buffer length from smallest 
range.
    (__merge_adaptive): When buffer too small consider smallest 
range first

    and cut based on buffer size.
    (__stable_sort_adaptive): When buffer too small cut based on buffer
    size.
    (__stable_sort): Limit temporary buffer length.
    * include/bits/stl_tempbuf.h (get_temporary_buffer): Change 
function

    to reduce requested buffer length on allocation failure.

Tested under Linux x86_64.

Ok to commit ?

François


On 25/01/2018 23:37, chang jc wrote:

Hi:

1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
__len = (__len == 1) ? 0 : ((__len + 1) / 2);

2. The coding gain has been shown  PR c++/83938; I re-post here




   21
   20
   19
   18
   17
   16


   0.471136
   0.625695
   0.767262
   0.907461
   1.04838
   1.19508


   0.340845
   0.48651
   0.639139
   0.770133
   0.898454
   1.04632

it means when Merge [0, 4325376, 16777216); A is a sorted integer with
4325376 & B with 12451840 elements, total with 16M integers

The proposed method has the speed up under given buffer size =, ex
2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
given sizof(int)*64K bytes.

3. As your suggestion, _TmpBuf __buf should be rewrite.

4. It represents a fact that the intuitive idea to split from larger
part is wrong.

For example, if you have an input sorted array A & B, A has 8 integers
& B has 24 integers. Given tmp buffer whose capacity = 4 integers.

Current it tries to split from B, right?

Then we have:

A1 | A2  B1 | B2

B1 & B2 has 12 integers each, right?

Current algorithm selects pivot as 13th integer from B, right?

If the corresponding upper bound of A is 6th integer.

Then it split in

A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12

After rotate, we have two arrays to merge

[A1 = 5 | B1 = 12]  & [A2 = 3 | B2 = 12]

Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.

Sadly, [A1 = 5 | B1 = 12] CANNOT.

So we do rotate again, split & merge the two split arrays from [A1 = 5
| B1 = 12] again.


But wait, if we always split from the smaller one instead of larger 
one.


After rotate, it promises two split arrays both contain 
ceiling[small/2].


Since tmp buffer also allocate by starting from sizeof(small) &
recursively downgrade by ceiling[small/2^(# of fail allocate)].

It means the allocated tmp buffer promises to be sufficient at the
level of (# of fail allocate).

Instead, you can see if split from large at level (# of fail allocate)
several split array still CANNOT use  tmp buffer to do buffered merge.


As you know, buffered merge is far speed then (split, rotate, and
merge two sub-arrays) (PR c++/83938 gives the profiling figures),

the way should provide speedup.


Thanks.










On 24/01/2018 18:23, François Dumont wrote:

Hi


 It sounds like a very sensitive change to make but nothing 
worth figures.

Do you have any bench showing the importance of the gain ?

 At least the memory usage optimization is obvious.

On 19/01/2018 10:43, chang jc wrote:

Current std::inplace_merge() suffers from performance issue by 
inefficient


logic under limited memory,

It leads to performance downgrade.

Please help to review it.

Index: include/bits/stl_algo.h
===
--- include/bits/stl_algo.h    (revision 256871)
+++ include/bits/stl_algo.h    (working copy)
@@ -2437,7 +2437,7 @@
 _BidirectionalIterator __second_cut = __middle;
 _Distance __len11 = 0;
 _Distance __len22 = 0;
-  if (__len1 > __len2)
+  if (__len1 < __len2)
   {
 __len11 = __len1 / 2;
 std::advance(__first_cut, __len11);
@@ -2539,9 +2539,15 @@
 const _DistanceType __len1 = std::distance(__first, 
__middle);

 const _DistanceType __len2 = std::distance(__middle, __last);

+

 typedef _Temporary_buffer<_BidirectionalIterator, _ValueType>
_TmpBuf;

-  _TmpBuf __buf(__first, __last);
-
+