Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

Evgeny Stupachenko Tue, 10 Jun 2014 05:44:29 -0700

ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which
include vector mode.
I'll try to separate this into scalar and vector part, but it will
require more testing (under the testing now).
What about the rest of the patch?


Thanks,
Evgeny

On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radhakrishnan
<[email protected]> wrote:
> On 06/05/14 12:43, Evgeny Stupachenko wrote:
>>
>> New hook is related to vector instructions only. Vector instructions
>> could be sequential in pipeline, but scalar - parallel. For x86
>> architectures TARGET_SCHED_REASSOC_WIDTH does not give required
>> differentiation.
>> General hooks could be potentially reused in other algorithms/by other
>> architectures.
>
>
> It already takes a "mode" argument. Couldn't you use a vector mode to work
> this out ?
>
> If it is not enough then please be more specific about the documentation of
> this hook about where it is useful so that it's easy for people reading the
> documentation to understand at a glance what purpose it serves.
>
>
> Ramana
>
>
>>
>> Thanks,
>> Evgeny
>>
>> On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan
>> <[email protected]> wrote:
>>>
>>> On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko <[email protected]>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> The patch introduces alternative way of permutations for load groups
>>>> of size 2 and 3 which should be faster on architectures with low
>>>> parallelism.
>>>> The patch gives 2 times gain on Silvermont to the test from PR52252
>>>> (in addition to already committed 3 times gain).
>>>>
>>>> Patch passes bootstrap on x86. Make check is in progress.
>>>
>>>
>>> Why do we need a new hook ? Can't you derive this information from
>>> something which is equally badly named TARGET_SCHED_REASSOC_WIDTH
>>> though used in the reassociation logic but also serves a similar
>>> purpose ?
>>>
>>> Also the documentation of this hook is incomplete at best and wrong at
>>> worst as this is not applied everywhere in the vectorizer but just for
>>> this special case for load store permuting. Implying this is useful
>>> everywhere in the vectorizer does not appear to be correct.
>>>
>>> regards
>>> Ramana
>>>
>>>
>>>
>>>
>>>>
>>>> ChangeLog:
>>>>
>>>> 2014-05-28  Evgeny Stupachenko  <[email protected]>
>>>>
>>>>          * config/i386/i386.c (ix86_have_vector_parallel_execution):
>>>> New.
>>>>          (TARGET_VECTORIZE_HAVE_VECTOR_PARALLEL_EXECUTION): New.
>>>>          * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New.
>>>>          * config/i386/x86-tune.def
>>>> (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New.
>>>>          * target.def (have_vector_parallel_execution): New.
>>>>          * doc/tm.texi.in (have_vector_parallel_execution)): New.
>>>>          * doc/tm.texi: Regenerate.
>>>>          * targhooks.c (default_have_vector_parallel_execution): New.
>>>>          * tree-vect-data-refs.c (vect_shift_permute_load_chain): New.
>>>>          Introduces alternative way of loads group permutaions.
>>>>          (vect_transform_grouped_load): Try alternative way of
>>>> permutaions.
>>>>
>>>> Evgeny
>>
>>
>

Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.

Reply via email to