Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Evgeny Stupachenko
While developing I've tried the following scheme: First step is 3 shuffles (as initially): A1 = (0 3 6) (1 4 7) (2 5) A2 = (8 11 14) (9 12 15) (10 13) A3 = (16 19 22) (17 20 23) (18 21) R1 = blend [ blend [A1 A2], A3] = (0 3 6) (9 12 15) (18 21) B2 = blend [A1, A2] = (0 3 6) (1 4 7) (10 13) R

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Richard Henderson
On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote: > + 1st vec: 0 1 2 3 4 5 6 7 > + 2nd vec: 8 9 10 11 12 13 14 15 > + 3rd vec: 16 17 18 19 20 21 22 23 > + > + The output sequence should be: > + > + 1st vec: 0 3 6 9 12 15 18 21 > + 2nd vec: 1 4 7 10 13 16 19 22 > + 3rd

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Uros Bizjak
On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko wrote: > Are i386 changes ok? > Patches with corresponding changes and new tests are attached. Please remove all target selectors from dg-options and dg-final testcase directives, they are not needed inside gcc.dg/i386 directory. The patch is

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-17 Thread Evgeny Stupachenko
Are i386 changes ok? Patches with corresponding changes and new tests are attached. Thanks, Evgeny On Thu, Jun 12, 2014 at 12:14 PM, Richard Biener wrote: > On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko > wrote: >> Testing finished. No new regressions. >> Is the following patch ok? > > +

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-12 Thread Richard Biener
On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko wrote: > Testing finished. No new regressions. > Is the following patch ok? + if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 || + !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain)) ||s and &&s go t

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-11 Thread Evgeny Stupachenko
Testing finished. No new regressions. Is the following patch ok? 2014-06-11 Evgeny Stupachenko * config/i386/i386.c (ix86_reassociation_width): Add alternative for vector case. * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. * config/i386/x86-tune.

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-10 Thread Evgeny Stupachenko
ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which include vector mode. I'll try to separate this into scalar and vector part, but it will require more testing (under the testing now). What about the rest of the patch? Thanks, Evgeny On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radh

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-05 Thread Ramana Radhakrishnan
On 06/05/14 12:43, Evgeny Stupachenko wrote: New hook is related to vector instructions only. Vector instructions could be sequential in pipeline, but scalar - parallel. For x86 architectures TARGET_SCHED_REASSOC_WIDTH does not give required differentiation. General hooks could be potentially reu

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-05 Thread Evgeny Stupachenko
New hook is related to vector instructions only. Vector instructions could be sequential in pipeline, but scalar - parallel. For x86 architectures TARGET_SCHED_REASSOC_WIDTH does not give required differentiation. General hooks could be potentially reused in other algorithms/by other architectures.

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-05 Thread Ramana Radhakrishnan
On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko wrote: > Hi, > > The patch introduces alternative way of permutations for load groups > of size 2 and 3 which should be faster on architectures with low > parallelism. > The patch gives 2 times gain on Silvermont to the test from PR52252 > (in ad

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

2014-06-05 Thread Evgeny Stupachenko
make check passed: no new fails. On Wed, May 28, 2014 at 5:09 PM, Evgeny Stupachenko wrote: > Hi, > > The patch introduces alternative way of permutations for load groups > of size 2 and 3 which should be faster on architectures with low > parallelism. > The patch gives 2 times gain on Silvermont