On Fri, Oct 30, 2015 at 11:21 AM, Kumar, Venkataramanan <venkataramanan.ku...@amd.com> wrote: > Hi Andrew, > >> -----Original Message----- >> From: Andrew Pinski [mailto:pins...@gmail.com] >> Sent: Friday, October 30, 2015 3:38 PM >> To: Kumar, Venkataramanan >> Cc: Richard Beiner (richard.guent...@gmail.com); gcc-patches@gcc.gnu.org >> Subject: Re: [RFC] [Patch] Relax tree-if-conv.c trap assumptions. >> >> On Fri, Oct 30, 2015 at 6:06 PM, Kumar, Venkataramanan >> <venkataramanan.ku...@amd.com> wrote: >> > Hi Richard, >> > >> > I am trying to "if covert the store" in the below test case and later >> > help it to get vectorized under -Ofast -ftree-loop-if-convert-stores >> > -fno-common >> > >> > #define LEN 4096 >> > __attribute__((aligned(32))) float array[LEN]; void test() { for (int i = >> > 0; i < >> LEN; i++) { >> > if (array[i] > (float)0.) >> > array[i] =3 ; >> > >> > } >> > } >> > >> > Currently in GCC 5.2 does not vectorize it. >> > https://goo.gl/9nS6Pd >> > >> > However ICC seems to vectorize it >> > https://goo.gl/y1yGHx >> > >> > As discussed with you earlier, to allow "if convert store" I am checking >> > the >> following: >> > >> > (1) We already read the reference "array[i]" unconditionally once . >> > (2) I am now checking if we are conditionally writing to memory which is >> defined as read and write and is bound to the definition we are seeing. >> >> >> I don't think this is thread safe .... >> > > I thought under -ftree-loop-if-convert-stores it is ok to do this > transformation.
Yes, that's what we have done in the past here. Note that I think we should remove the flag in favor of the --param allow-store-data-races and if-convert safe stores always (stores to thread-local memory). Esp. using masked stores should be always safe. > Regards, > Venkat. > >> Thanks, >> Andrew >> >> > >> > With this change, I get able to if convert and the vectorize the case also. >> > >> > /build/gcc/xgcc -B ./build/gcc/ ifconv.c -Ofast -fopt-info-vec -S >> > -ftree-loop-if-convert-stores -fno-common >> > ifconv.c:2:63: note: loop vectorized >> > >> > Patch >> > ------ >> > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index >> > f201ab5..6475cc0 100644 >> > --- a/gcc/tree-if-conv.c >> > +++ b/gcc/tree-if-conv.c >> > @@ -727,6 +727,34 @@ write_memrefs_written_at_least_once (gimple >> *stmt, >> > return true; >> > } >> > >> > +static bool >> > +write_memrefs_safe_to_access_unconditionally (gimple *stmt, >> > + >> > +vec<data_reference_p> drs) { { to the next line The function has a bad name it should be write_memrefs_writable () >> > + int i; >> > + data_reference_p a; >> > + bool found = false; >> > + >> > + for (i = 0; drs.iterate (i, &a); i++) >> > + { >> > + if (DR_STMT (a) == stmt >> > + && DR_IS_WRITE (a) >> > + && (DR_WRITTEN_AT_LEAST_ONCE (a) == 0) >> > + && (DR_RW_UNCONDITIONALLY (a) == 1)) >> > + { >> > + tree base = get_base_address (DR_REF (a)); >> > + found = false; >> > + if (DECL_P (base) >> > + && decl_binds_to_current_def_p (base) >> > + && !TREE_READONLY (base)) >> > + { >> > + found = true; So if the vector ever would contain more than one write you'd return true if only one of them is not readonly. IMHO all the routines need refactoring to operate on single DRs which AFAIK is the only case if-conversion handles anyway (can't if-convert calls or aggregate assignments or asms). Ugh, it seems the whole thing is quadratic, doing linear walks to find the DRs for a stmt ... A simple Index: gcc/tree-if-conv.c =================================================================== --- gcc/tree-if-conv.c (revision 229572) +++ gcc/tree-if-conv.c (working copy) @@ -612,9 +612,10 @@ memrefs_read_or_written_unconditionally data_reference_p a, b; tree ca = bb_predicate (gimple_bb (stmt)); - for (i = 0; drs.iterate (i, &a); i++) - if (DR_STMT (a) == stmt) - { + for (i = gimple_uid (stmt) - 1; drs.iterate (i, &a); i++) + { + if (DR_STMT (a) != stmt) + break; bool found = false; int x = DR_RW_UNCONDITIONALLY (a); @@ -684,10 +685,13 @@ write_memrefs_written_at_least_once (gim data_reference_p a, b; tree ca = bb_predicate (gimple_bb (stmt)); - for (i = 0; drs.iterate (i, &a); i++) - if (DR_STMT (a) == stmt - && DR_IS_WRITE (a)) - { + for (i = gimple_uid (stmt) - 1; drs.iterate (i, &a); i++) + { + if (DR_STMT (a) != stmt) + break; + if (! DR_IS_WRITE (a)) + continue; + bool found = false; int x = DR_WRITTEN_AT_LEAST_ONCE (a); @@ -721,7 +725,7 @@ write_memrefs_written_at_least_once (gim DR_WRITTEN_AT_LEAST_ONCE (a) = 0; return false; } - } + } return true; } @@ -1291,6 +1297,7 @@ if_convertible_loop_p_1 (struct loop *lo case GIMPLE_CALL: case GIMPLE_DEBUG: case GIMPLE_COND: + gimple_set_uid (gsi_stmt (gsi), 0); break; default: return false; @@ -1304,6 +1311,8 @@ if_convertible_loop_p_1 (struct loop *lo dr->aux = XNEW (struct ifc_dr); DR_WRITTEN_AT_LEAST_ONCE (dr) = -1; DR_RW_UNCONDITIONALLY (dr) = -1; + if (gimple_uid (DR_STMT (dr)) == 0) + gimple_set_uid (DR_STMT (dr), i + 1); } predicate_bbs (loop); would improve that tremendously ... (well, given the array of DRs is sorted by stmt which is an implementation detail not documented). Computing all the DR flags in separate loops also doesn't make much sense to me and just complicates code. Richard. >> > + } >> > + } >> > + } >> > + return found; >> > +} >> > + >> > /* Return true when the memory references of STMT won't trap in the >> > if-converted code. There are two things that we have to check for: >> > >> > @@ -748,8 +776,20 @@ write_memrefs_written_at_least_once (gimple >> > *stmt, static bool ifcvt_memrefs_wont_trap (gimple *stmt, >> > vec<data_reference_p> refs) { >> > - return write_memrefs_written_at_least_once (stmt, refs) >> > - && memrefs_read_or_written_unconditionally (stmt, refs); >> > + bool memrefs_written_once, memrefs_read_written_unconditionally; >> > + bool memrefs_safe_to_access; >> > + >> > + memrefs_written_once >> > + = write_memrefs_written_at_least_once (stmt, refs); >> > + >> > + memrefs_read_written_unconditionally >> > + = memrefs_read_or_written_unconditionally (stmt, refs); >> > + >> > + memrefs_safe_to_access >> > + = write_memrefs_safe_to_access_unconditionally (stmt, >> > + refs); >> > + >> > + return ((memrefs_written_once || memrefs_safe_to_access) >> > + && memrefs_read_written_unconditionally); >> > } >> > >> > /* Wrapper around gimple_could_trap_p refined for the needs of the >> > >> > >> > do I need this function write_memrefs_written_at_least_once anymore? >> > Please suggest if there a better way to do this. >> > >> > Bootstrapped and regression tested on x86_64. >> > I am not adding change log and comments now, as I wanted to check >> approach first. >> > >> > Regards, >> > Venkat. >> > >> >