Re: Support for AVX512 ternary logic instruction

2019-01-21 Thread Richard Biener
On Mon, Jan 21, 2019 at 2:46 AM Andi Kleen  wrote:
>
> Wojciech Muła  writes:
> >
> > The main concern is if it's a proper approach? Seems that to match
> > other logic functions, like "a & b | c", a separate pattern is required.
> > Since an argument can be either negated or not, and we can use three
> > logic ops (or, and, xor) there would be 72 patterns. So maybe a new
> > optimization pass would be easier to create and maintain? (Just a silly
> > guess.)
>
> Yes that's not scalable.

You can use code iterators for the logic ops so only need to
explicitely write down the not variants.

> >
> > I'd be grateful for any comments and advice.
>
> Maybe you could write it in the simplifier pattern language
> and then generate a suitable builtin.

Using an UNSPEC and machine-reorg might also be an option...

> See https://gcc.gnu.org/onlinedocs/gccint/Match-and-Simplify.html
>
> However the problem is that this may affect other optimizations
> because it happens too early. e.g. the compiler would also need
> to learn to constant propagate the new builtin, and understand
> its side effects, which might affect a lot of places.
>
> So a custom compiler patch that runs late may be better.
> Or perhaps some extension of the simplifier that does it.
>
> I looked at this at some point for PCMP*STR* which are similarly
> powerful instructions that could potentially replace a lot of
> others.
>
> -Andi


SLP-based reduction vectorization

2019-01-21 Thread Anton Youdkevitch
Here is the prototype for doing vectorized reduction
using SLP approach. I would appreciate feedback if this
is a feasible approach and if overall the direction is
right.

The idea is to vectorize reduction like this

S = A[0]+A[1]+...A[N];

into

Sv = Av[0]+Av[1]+...+Av[N/VL];


So that, for instance, the following code:

typedef double T;
T sum;

void foo (T*  __restrict__ a)
{
sum = a[0]+ a[1] + a[2]+ a[3] + a[4]+ a[5] + a[6]+ a[7];
}


instead of:

foo:
.LFB23:
.cfi_startproc
movsd   (%rdi), %xmm0
movsd   16(%rdi), %xmm1
addsd   8(%rdi), %xmm0
addsd   24(%rdi), %xmm1
addsd   %xmm1, %xmm0
movsd   32(%rdi), %xmm1
addsd   40(%rdi), %xmm1
addsd   %xmm1, %xmm0
movsd   48(%rdi), %xmm1
addsd   56(%rdi), %xmm1
addsd   %xmm1, %xmm0
movsd   %xmm0, sum2(%rip)
ret
.cfi_endproc


be compiled into:

foo:
.LFB11:
.cfi_startproc
movupd  32(%rdi), %xmm0
movupd  48(%rdi), %xmm3
movupd  (%rdi), %xmm1
movupd  16(%rdi), %xmm2
addpd   %xmm3, %xmm0
addpd   %xmm2, %xmm1
addpd   %xmm1, %xmm0
haddpd  %xmm0, %xmm0
movlpd  %xmm0, sum(%rip)
ret
.cfi_endproc


As this is a very crude prototype there are some things
to consider.

1. As the current SLP framework assumes presence of
group stores I cannot use directly it as reduction
does not require group stores (or even stores at all),
so, I'm partially using the existing functionality but
sometimes I have to create a stripped down version
of it for my own needs;

2. The current version considers only PLUS reduction
as it is encountered most often and therefore is the
most practical;

3. While normally SLP transformation should operate
inside single basic block this requirement greatly
restricts it's practical application as in a code
complex enough there will be vectorizable subexpressions
defined in basic block(s) different from that where the
reduction result resides. However, for the sake of
simplicity only single uses in the same block are
considered now;

4. For the same sake the current version does not deal
with partial reductions which would require partial sum
merging and careful removal of the scalars that participate
in the vector part. The latter gets done automatically
by DCE in the case of full reduction vectorization;

5. There is no cost model yet for the reasons mentioned
in the paragraphs 3 and 4.

Thanks in advance.

-- 
  Anton
>From eb2644765d68ef1c629e584086355a8d66df7c73 Mon Sep 17 00:00:00 2001
From: Anton Youdkevitch 
Date: Fri, 9 Nov 2018 20:50:05 +0300
Subject: [PATCH] WIP BB-only SLP reduction

Very basic effort to implement SLP vectorization
for reductions.
---
 gcc/tree-vect-slp.c | 313 
 1 file changed, 313 insertions(+)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 0ab7bd8086c..14c7a7e8069 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2913,6 +2913,317 @@ vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
   return bb_vinfo;
 }
 
+#include "alias.h"
+#include "tree-into-ssa.h"
+
+static tree
+create_vector_load(stmt_vec_info info, gimple_stmt_iterator *gsi)
+{
+  tree dataref_ptr = NULL_TREE;
+  tree ref_type;
+  tree offset = NULL_TREE;
+  tree byte_offset = NULL_TREE;
+  gimple *ptr_incr = NULL;
+  tree dummy;
+  tree data_ref = NULL;
+
+  tree vectype = STMT_VINFO_VECTYPE (info); 
+  tree scalar_dest = gimple_assign_lhs (info->stmt);
+
+  data_reference *dr = STMT_VINFO_DATA_REF (info);
+  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (info);
+  ref_type = reference_alias_ptr_type (DR_REF (dr));
+  tree bump = size_zero_node; 
+  if (DR_BASE_ALIGNMENT (dr) < TYPE_ALIGN (vectype))
+{
+  vectype = build_aligned_type
+	(vectype, DR_BASE_ALIGNMENT (dr) * BITS_PER_UNIT); 
+}
+  /* manually set the misalignment as
+ it is uninitialized at this moment */
+  SET_DR_MISALIGNMENT (dr_info, 0);
+  DR_TARGET_ALIGNMENT (dr_info) = DR_BASE_ALIGNMENT (dr);
+
+  dataref_ptr = vect_create_data_ref_ptr (info, vectype, NULL,
+	  offset, &dummy, gsi, &ptr_incr,
+	  false, byte_offset, bump);
+  data_ref
+= fold_build2 (MEM_REF, vectype, dataref_ptr,
+		   build_int_cst (ref_type, 0));
+  tree vdst = vect_create_destination_var (scalar_dest, vectype);
+  tree new_name = make_ssa_name (vdst, NULL);
+  gassign* new_stmt = gimple_build_assign (new_name, data_ref);
+  vect_finish_stmt_generation (info, new_stmt, gsi);
+  return new_name;
+}
+
+
+/* Blatantly taken from tree-vect-data-refs.c */
+
+static int
+dr_group_sort_cmp (const void *dra_, const void *drb_)
+{
+  data_reference_p dra = *(data_reference_p *)const_cast(dra_);
+  data_reference_p drb = *(data_reference_p *)const_cast(drb_);
+  int cmp;
+
+  /* Stabilize sort.  */
+  if (dra == drb)
+return 0;
+
+  /* DRs in different loops never belong to the same group.  */
+  loop_p loopa =

Google Summer Of Code

2019-01-21 Thread Vikramsingh Kushwaha
Respected sir/madam
I, Vikramsingh Kushwaha, currently studying in B.Tech 3rd year computer
engineering in MIT Pune, India. I am very much interested to contribute in
the open source projects. But i am new to this so I needed some guidance.
Even i wanted to participate in Google Dummer Of Code, so i wanted your
organisation to be my mentor.
Kindly, be my mentor, i am ready for any challenge or task, test whatever
you want to take. I shall be sharing my github and codechef profile. I am
an average coder but a dedicated hard worker.
Kindly guide me and be my mentor for Google Summer Of Code.
Regards,
Vikramsingh Kushwaha
MIT Academy Of Engineering
Pune


Re: Google Summer Of Code

2019-01-21 Thread Vikramsingh Kushwaha
Github Profile: https://github.com/vikramsinghkushwaha
Codechef Profile: https://www.codechef.com/users/vskushwaha_15

On Mon, Jan 21, 2019 at 11:37 PM Vikramsingh Kushwaha <
vskushw...@mitaoe.ac.in> wrote:

> Respected sir/madam
> I, Vikramsingh Kushwaha, currently studying in B.Tech 3rd year computer
> engineering in MIT Pune, India. I am very much interested to contribute in
> the open source projects. But i am new to this so I needed some guidance.
> Even i wanted to participate in Google Dummer Of Code, so i wanted your
> organisation to be my mentor.
> Kindly, be my mentor, i am ready for any challenge or task, test whatever
> you want to take. I shall be sharing my github and codechef profile. I am
> an average coder but a dedicated hard worker.
> Kindly guide me and be my mentor for Google Summer Of Code.
> Regards,
> Vikramsingh Kushwaha
> MIT Academy Of Engineering
> Pune
>
>


Re: About GSOC.

2019-01-21 Thread Tejas Joshi
Hello.
I've been inactive for some time due to exams but I have been studying
the real.h and IEEE 754 floating point format as far as I could.

> floating-point built-in functions.  That means you should instead
> understand REAL_EXP and the significands of floating-point values, and

In GCC's representation of REAL or may I say floating point numbers
(including decimal floating point values), values are defined in
macros in real.h like

#define SIGNIFICAND_BITS(128 + HOST_BITS_PER_LONG)   (why
128+host-bits_per_long?, even quad precision has total 128 bits.)
#define EXP_BITS(32 - 6)

This include EXP_BITS resolving to I believe, exponent bits and macro
REAL_EXP to exponent value which determines the value of the exponent
of REAL r, which is passed in real.c with mathematical calculations
like XOR and shifting (multiplication by 2) though the operation is
unclear. (Adding comment to these will also be helpful in a patch for
me!)

> true that it doesn't have a comment specifying its semantics directly, but
> the /* ENUM_BITFIELD (real_value_class) */ should give a strong hint,
> along with the values that are stored in that field.  By looking at how

As far as the struct real_value is concerned, I believe the values
associated with decimal, sign, etc are used for handling switch
conditions in functions of real.c and then carrying out specific
functions like clear_signifcand_below.
Relating to enumeration real_value_class, it determines the type of
the number like nan or normal in the functions. Though, attributes of
struct real_value are pretty unclear to me regarding to the number it
represents. (Am I right within this grasp?).
Thank you.

Regards,
-Tejas
On Fri, 16 Nov 2018 at
22:20, Joseph Myers jos...@codesourcery.com>
wrote:On Fri, 16 Nov 2018, Tejas Joshi wrote:

> About roundeven, there might be need to add case to
> expand_builtin_int_roundingfn similar to
> ceil, for expansion.
> But how is round() expanded since there's no
> entry for it in expand_builtin_int_roundingfn ?

Please see the comment above expand_builtin_int_roundingfn, and that above 
expand_builtin_int_roundingfn_2, which handle different sets of
functions.  
Those functions are of no relevance to adding support for built-in 
roundeven.  (They might be of relevance to support for built-in fromfp 
functions, depending on whether the detailed semantics allows 
casts-to-integer of calls to other functions to be converted into calls to 
the fromfp functions.  But I don't think inventing
__builtin_lroundeven 
would be appropriate.)

> Also, is it right to have an added case for roundeven in convert.c
> along CASE_FLT_FN (BUILT_IN_ROUND)
> in convert_to_integer_1?

Not until doing things with fromfp functions.  There is no
lroundeven (for 
example) in TS 18661-1.

-- 
Joseph S. Myers
mailto:jos...@codesourcery.com";
target="_blank">jos...@codesourcery.com



Re: About GSOC.

2019-01-21 Thread Joseph Myers
On Tue, 22 Jan 2019, Tejas Joshi wrote:

> the number like nan or normal in the functions. Though, attributes of
> struct real_value are pretty unclear to me regarding to the number it
> represents. (Am I right within this grasp?).

It may be helpful to run the compiler under a debugger to examine how 
particular real numbers are represented in real_value - that should help 
answer questions such as what endianness is used for the significand, or 
whether floating point values with a given exponent are in the range 
[2^EXP, 2^(EXP+1)) or [2^(EXP-1), 2^EXP), where conventions commonly 
differ.  (It's the unoptimized, stage1 cc1 that should be run under a 
debugger.  See  for more details.)

And of course contribute comments in real.h once you've determined the 
answers - because there are such areas where conventions about 
representation of floating-point numbers commonly differ, it's 
particularly valuable to have such comments because even someone familiar 
with floating-point won't know which convention has been chosen by this 
code in GCC.

-- 
Joseph S. Myers
jos...@codesourcery.com