On Sun, Nov 17, 2013 at 04:42:18PM +0100, Richard Biener wrote:
> "Ondřej Bílka" wrote:
> >On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
> >> "Ondřej Bílka" wrote:
> >> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> >>
> >> IIRC what can still be seen is st
"Ondřej Bílka" wrote:
>On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
>> "Ondřej Bílka" wrote:
>> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
>>
>> IIRC what can still be seen is store-buffer related slowdowns when
>you have a big unaligned store load in yo
On 11/16/2013 04:25 AM, Tim Prince wrote:
Many decisions on compiler defaults still are based on an unscientific
choice of benchmarks, with gcc evidently more responsive to input from
the community.
I'm also quite convinced that we are hampered by the fact that there is
no IPA on alignment in
On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
> "Ondřej Bílka" wrote:
> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
>
> IIRC what can still be seen is store-buffer related slowdowns when you have a
> big unaligned store load in your loop. Thus aligning st
"Ondřej Bílka" wrote:
>On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
>> Also keep in mind that usually costs go up significantly if
>> misalignment causes cache line splits (processor will fetch 2 lines).
>> There are non-linear costs of filling up the store queue in modern
>> o
On 11/15/2013 2:26 PM, Ondřej Bílka wrote:
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
Also keep in mind that usually costs go up significantly if
misalignment causes cache line splits (processor will fetch 2 lines).
There are non-linear costs of filling up the store queue i
On Fri, Nov 15, 2013 at 11:26:06PM +0100, Ondřej Bílka wrote:
Minor correction, a mutt read replaced a set1.s file by one that I later
used for avx2 variant. A correct file is following
.file "set1.c"
.text
.p2align 4,,15
.globl set
.type set, @function
On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> Also keep in mind that usually costs go up significantly if
> misalignment causes cache line splits (processor will fetch 2 lines).
> There are non-linear costs of filling up the store queue in modern
> out-of-order processors (x86)
sor is sufficient to guarantee to
> generate loop peeling.
>
> Bingfeng
>
>
> -Original Message-
> From: Xinliang David Li [mailto:davi...@google.com]
> Sent: 15 November 2013 17:30
> To: Bingfeng Mei
> Cc: Richard Biener; gcc@gcc.gnu.org
> Subject: Re: Vectori
guarantee to generate
loop peeling.
Bingfeng
-Original Message-
From: Xinliang David Li [mailto:davi...@google.com]
Sent: 15 November 2013 17:30
To: Bingfeng Mei
Cc: Richard Biener; gcc@gcc.gnu.org
Subject: Re: Vectorization: Loop peeling with misaligned support.
The right longer
d Biener [mailto:richard.guent...@gmail.com]
> Sent: 15 November 2013 14:02
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: Vectorization: Loop peeling with misaligned support.
>
> On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote:
>> Hi,
>> In loop vectorizat
gt; To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: Vectorization: Loop peeling with misaligned support.
>
> On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote:
>> Hi,
>> In loop vectorization, I found that vectorizer insists on loop peeling even
>> our tar
: Vectorization: Loop peeling with misaligned support.
On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote:
> Hi,
> In loop vectorization, I found that vectorizer insists on loop peeling even
> our target supports misaligned memory access. This results in much bigger
> code size for a very si
On Fri, Nov 15, 2013 at 2:16 PM, Bingfeng Mei wrote:
> Hi,
> In loop vectorization, I found that vectorizer insists on loop peeling even
> our target supports misaligned memory access. This results in much bigger
> code size for a very simple loop. I defined
> TARGET_VECTORIZE_SUPPORT_VECTOR_MI
14 matches
Mail list logo