RE: unfused fma question

2015-02-23 Thread Steve Ellcey
On Sun, 2015-02-22 at 10:30 -0800, Matthew Fortune wrote:
> Steve Ellcey  writes:
> > Or one could change convert_mult_to_fma to add a check if fma is fused
> > vs. non-fused in addition to the check for the flag_fp_contract_mode
> > in order to decide whether to convert expressions into an fma and then
> > define fma instructions in the md file.
> 
> I was about to say that I see no reason to change how non-fused multiply
> adds work i.e. leave them to pattern matching but I think your point was
> that when both fused and non-fused patterns are available then what
> should we do.

No, I am thinking about the case where there are only non-fused multiply
add instructions available.  To make sure I am using the right
terminology, I am using a non-fused multiply-add to mean a single fma
instruction that does '(a + (b * c))' but which rounds the result of '(b
* c)' before adding it to 'a' so that there is no difference in the
results between using this instruction and using individual add and mult
instructions.  My understanding is that this is how the mips32r2 madd
instruction works.

In this case there seems to be two ways to have GCC generate the fma
instruction.  One is the current method using combine_instructions with
an instruction defined as:


(define_insn "*madd" (set (0) (plus (mult (1) (2
"madd.\t%0,%3,%1,%2"


The other way would be to extend the convert_mult_to_fma so that instead
of:

  if (FLOAT_TYPE_P (type)
  && flag_fp_contract_mode == FP_CONTRACT_OFF)
return false

it has something like:

  if (FLOAT_TYPE_P (type)
  && (flag_fp_contract_mode == FP_CONTRACT_OFF)
  && !targetm.fma_does_rounding))
return false

And then define an instruction like:

(define_insn "fma" (set (0) (fma (1) (2) (3"
madd.\t%0,%3,%1,%2"


The question I have is whether one or the other of these two approaches
would be better at creating fma instructions (vs leaving mult/add
combinations) or be might be preferable for some other reason.

Steve Ellcey
sell...@imgtec.com




Re: array bounds, sanitizer, safe programming, and cilk array notation

2015-02-23 Thread Joseph Myers
On Sat, 21 Feb 2015, Marek Polacek wrote:

> option that detects a particular UB.  Or say that a particular UB is a
> compile-time error (e.g. "declaring a function at block scope with an explicit
> storage-class specifier other than extern").

That one is already a hard error for cases such as static / register, a 
defined extension (nested functions) for auto.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: unfused fma question

2015-02-23 Thread Jeff Law

On 02/23/15 10:42, Steve Ellcey wrote:

No, I am thinking about the case where there are only non-fused multiply
add instructions available.  To make sure I am using the right
terminology, I am using a non-fused multiply-add to mean a single fma
instruction that does '(a + (b * c))' but which rounds the result of '(b
* c)' before adding it to 'a' so that there is no difference in the
results between using this instruction and using individual add and mult
instructions.  My understanding is that this is how the mips32r2 madd
instruction works.
Ahhh, nevermind, nothing I said was relevant then.  I misunderstood 
completely :-)




In this case there seems to be two ways to have GCC generate the fma
instruction.  One is the current method using combine_instructions with
an instruction defined as:


(define_insn "*madd" (set (0) (plus (mult (1) (2
 "madd.\t%0,%3,%1,%2"

>



The other way would be to extend the convert_mult_to_fma so that instead
of:

   if (FLOAT_TYPE_P (type)
   && flag_fp_contract_mode == FP_CONTRACT_OFF)
 return false

it has something like:

   if (FLOAT_TYPE_P (type)
   && (flag_fp_contract_mode == FP_CONTRACT_OFF)
   && !targetm.fma_does_rounding))
 return false

And then define an instruction like:

(define_insn "fma" (set (0) (fma (1) (2) (3"
 madd.\t%0,%3,%1,%2"


The question I have is whether one or the other of these two approaches
would be better at creating fma instructions (vs leaving mult/add
combinations) or be might be preferable for some other reason.
The combiner pattern is useful in cases where we can't see the FMA at 
gimple->rtl expansion time.  But there may be cases where exposing the 
FMA earlier is helpful as well.


So I think an argument could be easily made that we want to support both.

Jeff




Re: unfused fma question

2015-02-23 Thread Joseph Myers
On Fri, 20 Feb 2015, Steve Ellcey  wrote:

> Or one could change convert_mult_to_fma to add a check if fma is fused
> vs. non-fused in addition to the check for the flag_fp_contract_mode in
> order to decide whether to convert expressions into an fma and then
> define fma instructions in the md file.

It's a bad idea for the meaning of GIMPLE codes to depend on command-line 
options.  Remember that objects built with different options may be linked 
together with LTO.  In view of offloading, it's a bad idea for the meaning 
of GIMPLE codes to depend on the target either (though targets may still 
have different machine modes, built-in functions, etc.).

Thus FMA_EXPR should only mean a fused operation (and I think the same 
applies to fma RTL).  Whether there should be a non-fused MULT_ADD_EXPR I 
don't know.

(I wonder if convert_mult_to_fma is something that should move to 
match-and-simplify infrastructure.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: unfused fma question

2015-02-23 Thread Jeff Law

On 02/23/15 11:38, Joseph Myers wrote:


(I wonder if convert_mult_to_fma is something that should move to
match-and-simplify infrastructure.)

Yea, it probably should.

Jeff


Re: unfused fma question

2015-02-23 Thread Marc Glisse

On Mon, 23 Feb 2015, Jeff Law wrote:


On 02/23/15 11:38, Joseph Myers wrote:


(I wonder if convert_mult_to_fma is something that should move to
match-and-simplify infrastructure.)

Yea, it probably should.


Currently, it happens in a pass that is quite late. If it moves to 
match-and-simplify, I am afraid it might inhibit some other optimizations 
(we can turn plus+mult to fma but not the reverse), unless we use some way 
to inhibit some patterns until a certain pass (possibly a simple "if", if 
that's not too costly). Such "time-restricted" patterns might be useful 
for other purposes: don't introduce complicated vector/complex operations 
after the corresponding lowering passes, do narrowing until a certain 
point but then prefer fast integer sizes, etc (I haven't thought about 
those particular examples, they are only an illustration).


--
Marc Glisse


Re: target macro removal (fwd)

2015-02-23 Thread Joseph Myers
I sent this discussion of how one might go about large-scale target macro 
removal in response to an off-list enquiry last month, but it may be of 
more general interest.

-- 
Joseph S. Myers
jos...@codesourcery.com

-- Forwarded message --
Date: Tue, 20 Jan 2015 18:04:27 + (UTC)
From: Joseph Myers 
Subject: Re: target macro removal

Say you want to convert all (or nearly all) 680 (or thereabouts) target 
macros into hooks, and have several person-months to spend on this 
conversion.  (Much the same applies even if dealing with smaller subsets 
such as all target macros used in front ends.)  This won't get 
target-independence in code that no longer needs to include tm.h - in 
particular, option handling involves a global enumeration of all options 
and brings defines relating to one part of the compiler into other parts 
of the compiler (similarly, insn-* files would also need considering) - 
but it's a reasonable starting point and we can discuss further 
target-dependence removal after the target macro removal.

Although this would involve 680 conversion patches (except where it makes 
sense to convert a set of closely related target macros at once), it 
should not need to involve 680 manually-written patches.  Rather, if doing 
a large-scale target macro removal project I think a good starting point 
would be to write a set of robust Python scripts that (a) parse the 
structure of GCC source code at the preprocessor level (so understanding, 
for example, what macros are used directly in #if / #elif conditions; what 
are used indirectly through being in the expansion of another macro used 
in such a condition; and, for each macro definition, what #if conditions 
apply for that definition to be active), and (b) can carry out 
refactorings based on that understanding.  The results of such a 
refactoring may need manual editing where e.g. it's hard to get the 
scripts to get the formatting of new hooks completely right, or where the 
English wording of the documentation of the macro, converted to 
documentation for a hook, needs fine-tuning, but having refactoring 
scripts should save a lot of work with the actual conversions.

Now, such refactoring scripts do not need to handle the fully general case 
so that one script can handle converting all 680 macros.  It's quite 
reasonable to have a script that detects problems and gives up, and 
separate refactorings to make things ready for that script.  And some of 
the preparation patches might well be completely manual.

I listed the main problem cases in 
.  Code built 
for the target simply cannot use hooks, which are for the host; tests in 
#if, whether direct or indirect, again simply cannot use hooks.  The 
following approaches apply to fixing such cases to prepare for hook 
conversions; I think of these as high priorities because they open the way 
to automatic refactorings converting more hooks.

In all cases, for multi-target changes it's a good idea to test, as a 
minimum, building cc1 for all affected architectures (with 
--enable-werror=always building starting from a native compiler of the 
same version, if possible; see contrib/config-list.mk for a more thorough 
check, though I don't think the full set of targets there is needed for 
every patch), as well as a clean bootstrap and regression test for at 
least one configuration.


(a) Where a target macro is used in code built for the host and in code 
built for the target, predefine a macro with -fbuilding-libgcc and then 
make the code built for the target use that new predefine; in general I 
think such patches would be manually written rather than generated by 
refactoring scripts.  Note that:

(i) Sometimes an existing macro may suffice (e.g. LONG_LONG_TYPE_SIZE in 
target code could be changed to __SIZEOF_LONG_LONG__ * __CHAR_BIT__).

(ii) Sometimes a literal one-to-one conversion may not be cleanest (e.g. 
 lists 
GTHREAD_USE_WEAK SUPPORTS_WEAK TARGET_ATTRIBUTE_WEAK - do all really need 
separate defines?).  But if it's not clear how to do a cleaner conversion, 
it may be best to do a literal conversion and leave further cleanup to 
later.

(iii) Some source files are used both for the host and for the target.  
Consider, for example, libgcc/libgcov.h.  The code inside "#ifndef 
IN_GCOV_TOOL" is for the target, so it needs e.g. LONG_LONG_TYPE_SIZE 
converted as above (and BITS_PER_UNIT - whether BITS_PER_UNIT strictly 
counts as a target macro now is unclear, but it comes from host-side code 
so is best treated as one for target-side code).  But other code is for 
the host - or for both host and target and so needs handling accordingly.  
Much the same applies to various files under gcc/ada/ - they are built for 
both host and target.

(iv) After such a change there may still be other obstacles to converting 
to a hook (e.g. the macro may be used in #if on 

Re: Listing a maintainer for libcilkrts, and GCC's Cilk Plus implementation generally?

2015-02-23 Thread H.J. Lu
On Mon, Sep 29, 2014 at 4:00 AM, Jakub Jelinek  wrote:
> On Mon, Sep 29, 2014 at 12:56:06PM +0200, Thomas Schwinge wrote:
>> Hi!
>>
>> On Tue, 23 Sep 2014 11:02:30 +, "Zamyatin, Igor" 
>>  wrote:
>> > Jeff Law wrote:
>> > > The original plan was for Balaji to take on this role; however, his 
>> > > assignment
>> > > within Intel has changed and thus he's not going to have time to work on
>> > > Cilk+ anymore.
>> > >
>> > > Igor Zamyatin has been doing a fair amount of Cilk+ maintenance/bugfixing
>> > > and it might make sense for him to own it in the long term if he's 
>> > > interested.
>> >
>> > That's right.
>>
>> Thanks!
>>
>> > Can I add 2 records (cilk plus and libcilkrts) to Various Maintainers 
>> > section?
>>
>> I understand Jeff's email as a pre-approval of such a patch.
>
> I think only SC can appoint maintainers, and while Jeff is in the SC,
> my reading of that mail wasn't that it was the SC that has acked that, but
> rather a question if Igor is willing to take that role, which then would
> need to be acked by SC.
>

Where are we on this?  Do we have a maintainer for Cilk Plus
and its run-time library?


-- 
H.J.


questionable checks for flexible array members in c-ubsan.c and tree-vrp.c (was: Re: array bounds, sanitizer, safe programming, and cilk array notation)

2015-02-23 Thread Martin Uecker

Martin Uecker :
> Marek Polacek :
> 
> > > void foo(int (*x)[4])
> > > {
> > >   (*x)[4] = 5;// warning
> > > }
> >  
> > This is detected by -fsanitize=object-size, turned on by default in
> > -fsanitize=undefined.  Since it makes use of __builtin_object_size,
> > it is necessary to compile with optimizations turned on.
> 
> Not always.  -fsanitize=object-size  does not seem to detect the
> illegal out-of-bound access by itself, but instead uses some very 
> coarse notion of object size. For example, the following was not
> detected in my tests:

So I found some time to look at this. It seems this should be
detected by -fsanitize=bounds. But it isn't. The problem is in 
'gcc/c-family/c-ubsan.c' in 'ubsan_instrument_bounds'. The
code starting with

 /* Detect flexible array members and suchlike.  */
  tree base = get_base_address (array);
  if (base && (TREE_CODE (base) == INDIRECT_REF
   || TREE_CODE (base) == MEM_REF))
{

also somehow prevents accesses via pointers to arrays
to be instrumented. This is bad because it eliminates
a lot of useful checks.

The code looks very similar to the code in 'tree-vrp.c' 
where similar checks prevented corresponding compile-time 
warnings. There I recently added a condition 
(warn_array_bounds < 2) to get the warnings atleast
for -Warray-bounds=2.

But looking at both places some more, the code simply seems wrong 
to me. There should be no need at all in the backend to 
somehow detect the case of flexible array members. Not having 
the frontend set an expression for the bound should be enough.

So I removed both code paths completely, and - so far - couldn't
trigger false positives for Warray-bounds or -fsanitize-bounds.
I attached a file with some tests. lines marked with 'ubsan'
are currently detected while lines with 'ubsan!' are only detected
after removing the test for flexible array members.

Am I missing something?

Martin





/* { dg-do compile } */
/* { dg-options "-O3 -Warray-bounds=2" } */

extern void* malloc(unsigned long x);

int e[3];

struct f { int f[3]; };

extern void bar(int v[]);
extern void barn(int n, int v[]);

struct h {

int i;
int j[];
};

struct h0 {

int i;
int j[0];
};

struct h0b {

int i;
int j[0];
int k;
};

struct h1 {

int i;
int j[1];
};

struct h1b {

int i;
int j[1];
int k;
};

struct h3 {

int i;
int j[3];
};

struct h3b {

int i;
int j[3];
int k;
};

void foo(int (*a)[3], int n, int (*b)[n])
{
(*a)[3] = 1;/* { dg-warning "subscript is above array bound" } */   
/* ubsan! */
a[0][0] = 1;// ok   
a[1][0] = 1;// ok
a[1][3] = 1;/* { dg-warning "subscript is above array bound" } */   
/* ubsan! */

(*b)[n] = 1; // !   
/* ubsan! */

int c[3] = { 0 };
int d[n];

c[3] = 1;   /* { dg-warning "subscript is above array bound" } */   
/* ubsan */
d[n] = 1;   // !
/* ubsan */

e[3] = 1;   /* { dg-warning "subscript is above array bound" } */   
/* ubsan */

struct f f;
f.f[3] = 1; /* { dg-warning "subscript is above array bound" } */   
/* ubsan */

int m = 3;
int g[m];
g[3] = 1;   // !
/* ubsan */

struct h* h = malloc(sizeof(struct h) + 3 * sizeof(int));
struct h0* h0 = malloc(sizeof(struct h0) + 3 * sizeof(int));
struct h1* h1 = malloc(sizeof(struct h1) + 3 * sizeof(int));
struct h3* h3 = malloc(sizeof(struct h3));

h->j[3] = 1;// flexible array member
h0->j[3] = 1;   // zero-sized array extension
h1->j[3] = 1;   /* { dg-warning "subscript is above array bound" } */   
/* ubsan! */
h3->j[3] = 1;   /* { dg-warning "subscript is above array bound" } */   
/* ubsan! */


struct h0b* h0b = malloc(sizeof(struct h) + 3 * sizeof(int));
struct h1b* h1b = malloc(sizeof(struct h1b) + 3 * sizeof(int));
struct h3b* h3b = malloc(sizeof(struct h3b));
h0b->j[3] = 1;  // warning ?
h1b->j[3] = 1;; /* { dg-warning "subscript is above array bound" } */   
/* ubsan */
h3b->j[3] = 1;; /* { dg-warning "subscript is above array bound" } */   
/* ubsan */

struct hn {

int i;
int j[n];
} hn;

hn.j[3] = 1;// !/* ubsan */

struct hnb {

int i;
int j[n];
int k;
} hnb;

hnb.j[3] = 1;   // !/* ubsan */
}


int main()
{
int a[3];
foo(&a, 3, &a);
}