PING ^1: V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-28 Thread H.J. Lu
>>> create > > > > >>> paths where you're going to emit the xor when it's not used. > > > > >>> > > > > >>> The whole point of the LCM algorithms is they are optimal in terms > > > > >>>

Re: V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-22 Thread H.J. Lu
t;> expression evaluations. > > > >> We tried LCM and it didn't work well for this case. LCM places a > > > >> single > > > >> VXOR close to the location where it is needed, which can be inside a > > > >> loop. Th

Re: V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-22 Thread Richard Biener
tions. > > >> We tried LCM and it didn't work well for this case. LCM places a single > > >> VXOR close to the location where it is needed, which can be inside a > > >> loop. There is nothing wrong with the LCM algorithms. But this doesn't > > >> so

Re: V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-21 Thread H.J. Lu
t; >> loop. There is nothing wrong with the LCM algorithms. But this doesn't > >> solve > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007 > >> > >> where VXOR is executed multiple times inside of a func

Re: V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-21 Thread Jeff Law
ke loop that contains the whole function: >> >> bb = nearest_common_dominator_for_set (CDI_DOMINATORS, >> convert_bbs); >> while (bb->loop_father->latch >> != EXIT_BLOCK_PTR_FOR_FN (cfun)) >&

PING^1: V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-18 Thread H.J. Lu
On Mon, Jan 7, 2019 at 5:55 AM H.J. Lu wrote: > > On Sun, Dec 30, 2018 at 8:40 AM H.J. Lu wrote: > > > > On Wed, Nov 28, 2018 at 12:17 PM Jeff Law wrote: > > > > > > On 11/28/18 12:48 PM, H.J. Lu wrote: > > > > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: > > > >> > > > >>> On 11/5/18 7:21

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-11 Thread Jeff Law
On 12/30/18 9:50 AM, H.J. Lu wrote: > On Wed, Nov 28, 2018 at 12:21 PM Jan Hubicka wrote: >> >>> On 11/28/18 12:48 PM, H.J. Lu wrote: On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: > >> On 11/5/18 7:21 AM, Jan Hubicka wrote: Did you mean "the nearest common domin

V3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2019-01-07 Thread H.J. Lu
://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007 > > where VXOR is executed multiple times inside of a function, instead of > just once. We are investigating to generate a single VXOR at entry of the > nearest dominator for basic blocks with SF/DF conversio

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-12-30 Thread H.J. Lu
On Wed, Nov 28, 2018 at 12:21 PM Jan Hubicka wrote: > > > On 11/28/18 12:48 PM, H.J. Lu wrote: > > > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: > > >> > > >>> On 11/5/18 7:21 AM, Jan Hubicka wrote: > > > > > > Did you mean "the nearest common dominator"? > > > > If the ne

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-12-30 Thread H.J. Lu
On Wed, Nov 28, 2018 at 12:17 PM Jeff Law wrote: > > On 11/28/18 12:48 PM, H.J. Lu wrote: > > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: > >> > >>> On 11/5/18 7:21 AM, Jan Hubicka wrote: > > > > Did you mean "the nearest common dominator"? > > If the nearest common domina

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-28 Thread Jan Hubicka
> On 11/28/18 12:48 PM, H.J. Lu wrote: > > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: > >> > >>> On 11/5/18 7:21 AM, Jan Hubicka wrote: > > > > Did you mean "the nearest common dominator"? > > If the nearest common dominator appears in the loop while all uses are > ou

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-28 Thread Jeff Law
On 11/28/18 12:48 PM, H.J. Lu wrote: > On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: >> >>> On 11/5/18 7:21 AM, Jan Hubicka wrote: > > Did you mean "the nearest common dominator"? If the nearest common dominator appears in the loop while all uses are out of loops, this w

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-28 Thread H.J. Lu
On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka wrote: > > > On 11/5/18 7:21 AM, Jan Hubicka wrote: > > >> > > >> Did you mean "the nearest common dominator"? > > > > > > If the nearest common dominator appears in the loop while all uses are > > > out of loops, this will result in suboptimal xor placem

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-05 Thread Jan Hubicka
> On 11/5/18 7:21 AM, Jan Hubicka wrote: > >> > >> Did you mean "the nearest common dominator"? > > > > If the nearest common dominator appears in the loop while all uses are > > out of loops, this will result in suboptimal xor placement. > > In this case you want to split edges out of the loop. >

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-05 Thread Jeff Law
On 11/5/18 7:21 AM, Jan Hubicka wrote: >> >> Did you mean "the nearest common dominator"? > > If the nearest common dominator appears in the loop while all uses are > out of loops, this will result in suboptimal xor placement. > In this case you want to split edges out of the loop. > > In general

Re: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-05 Thread Jan Hubicka
on dominator for basic blocks with SF/DF > conversions. OK for trunk? > > Thanks. > > > -- > H.J. > From e2a437f48778ae9586f2038220840ecc41566f69 Mon Sep 17 00:00:00 2001 > From: "H.J. Lu" > Date: Wed, 15 Aug 2018 09:58:31 -0700 > Subject: [PATCH] i386:

PING: V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-11-04 Thread H.J. Lu
On Fri, Oct 19, 2018 at 1:44 AM H.J. Lu wrote: > > On 10/18/18, Jan Hubicka wrote: > >> we need to generate > >> > >> vxorp[ds] %xmmN, %xmmN, %xmmN > >> ... > >> vcvtss2sd f(%rip), %xmmN, %xmmX > >> ... > >> vcvtsi2ss i(%rip), %xmmN, %xmmY > >> > >> to a

V2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-10-19 Thread H.J. Lu
answer would be to look for the postdominance > frontier Did you mean "the nearest common dominator"? > of the set of all uses of the zero register? > Here is the updated patch to adds a pass to generate a single vxorps %xmmN, %xmmN, %xmmN at entry of the

Re: PING^1 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-10-18 Thread Jan Hubicka
> we need to generate > > vxorp[ds] %xmmN, %xmmN, %xmmN > ... > vcvtss2sd f(%rip), %xmmN, %xmmX > ... > vcvtsi2ss i(%rip), %xmmN, %xmmY > > to avoid partial XMM register stall. This patch adds a pass to generate > a single > > vxorps

PING^4 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-29 Thread H.J. Lu
On Tue, Sep 18, 2018 at 9:44 AM H.J. Lu wrote: > > On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu wrote: > > On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu wrote: > >> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu wrote: > >>> With -mavx, for > >>> > >>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i > >>> extern float f; >

PING^3 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-18 Thread H.J. Lu
On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu wrote: > On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu wrote: >> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu wrote: >>> With -mavx, for >>> >>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i >>> extern float f; >>> extern double d; >>> extern int i; >>> >>> void >>> foo (void)

PING^2 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-11 Thread H.J. Lu
On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu wrote: > On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu wrote: >> With -mavx, for >> >> [hjl@gnu-cfl-1 skx-2]$ cat foo.i >> extern float f; >> extern double d; >> extern int i; >> >> void >> foo (void) >> { >> d = f; >> f = i; >> } >> >> we need to generate

PING^1 [PATCH] i386: Add pass_remove_partial_avx_dependency

2018-09-04 Thread H.J. Lu
On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu wrote: > With -mavx, for > > [hjl@gnu-cfl-1 skx-2]$ cat foo.i > extern float f; > extern double d; > extern int i; > > void > foo (void) > { > d = f; > f = i; > } > > we need to generate > > vxorp[ds] %xmmN, %xmmN, %xmmN > ... >

[PATCH] i386: Add pass_remove_partial_avx_dependency

2018-08-28 Thread H.J. Lu
With -mavx, for [hjl@gnu-cfl-1 skx-2]$ cat foo.i extern float f; extern double d; extern int i; void foo (void) { d = f; f = i; } we need to generate vxorp[ds] %xmmN, %xmmN, %xmmN ... vcvtss2sd f(%rip), %xmmN, %xmmX ... vcvtsi2ss i(%