While developing I've tried the following scheme:
First step is 3 shuffles (as initially):
A1 = (0 3 6) (1 4 7) (2 5)
A2 = (8 11 14) (9 12 15) (10 13)
A3 = (16 19 22) (17 20 23) (18 21)
R1 = blend [ blend [A1 A2], A3] = (0 3 6) (9 12 15) (18 21)
B2 = blend [A1, A2] = (0 3 6) (1 4 7) (10 13)
R
On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote:
> + 1st vec: 0 1 2 3 4 5 6 7
> + 2nd vec: 8 9 10 11 12 13 14 15
> + 3rd vec: 16 17 18 19 20 21 22 23
> +
> + The output sequence should be:
> +
> + 1st vec: 0 3 6 9 12 15 18 21
> + 2nd vec: 1 4 7 10 13 16 19 22
> + 3rd
On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko wrote:
> Are i386 changes ok?
> Patches with corresponding changes and new tests are attached.
Please remove all target selectors from dg-options and dg-final
testcase directives, they are not needed inside gcc.dg/i386 directory.
The patch is
Are i386 changes ok?
Patches with corresponding changes and new tests are attached.
Thanks,
Evgeny
On Thu, Jun 12, 2014 at 12:14 PM, Richard Biener
wrote:
> On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko
> wrote:
>> Testing finished. No new regressions.
>> Is the following patch ok?
>
> +
On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko wrote:
> Testing finished. No new regressions.
> Is the following patch ok?
+ if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1 ||
+ !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi,
&result_chain))
||s and &&s go t
Testing finished. No new regressions.
Is the following patch ok?
2014-06-11 Evgeny Stupachenko
* config/i386/i386.c (ix86_reassociation_width): Add alternative for
vector case.
* config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
* config/i386/x86-tune.
ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which
include vector mode.
I'll try to separate this into scalar and vector part, but it will
require more testing (under the testing now).
What about the rest of the patch?
Thanks,
Evgeny
On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radh
On 06/05/14 12:43, Evgeny Stupachenko wrote:
New hook is related to vector instructions only. Vector instructions
could be sequential in pipeline, but scalar - parallel. For x86
architectures TARGET_SCHED_REASSOC_WIDTH does not give required
differentiation.
General hooks could be potentially reu
New hook is related to vector instructions only. Vector instructions
could be sequential in pipeline, but scalar - parallel. For x86
architectures TARGET_SCHED_REASSOC_WIDTH does not give required
differentiation.
General hooks could be potentially reused in other algorithms/by other
architectures.
On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko wrote:
> Hi,
>
> The patch introduces alternative way of permutations for load groups
> of size 2 and 3 which should be faster on architectures with low
> parallelism.
> The patch gives 2 times gain on Silvermont to the test from PR52252
> (in ad
make check passed: no new fails.
On Wed, May 28, 2014 at 5:09 PM, Evgeny Stupachenko wrote:
> Hi,
>
> The patch introduces alternative way of permutations for load groups
> of size 2 and 3 which should be faster on architectures with low
> parallelism.
> The patch gives 2 times gain on Silvermont
11 matches
Mail list logo