http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52908
--- Comment #9 from vekumar at gcc dot gnu.org 2012-06-18 15:10:51 UTC ---
Author: vekumar
Date: Mon Jun 18 15:10:45 2012
New Revision: 188736
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188736
Log:
Back port Fix PR 52908 - xop-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88494
--- Comment #8 from vekumar at gcc dot gnu.org ---
I tested mdbx before and after the revision Richard pointed out.
On My Ryzen box there is ~4% regression.
Although "vblenvps" is fast path instruction and can execute in pipe 0/1. I
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
As per GCC 8.1.0 Manual
---snip--
-mveclibabi=type
Specifies the ABI type to use for vectorizing intrinsics using an
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86144
--- Comment #3 from vekumar at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> Note a workaround would be to re-arrange the vectorizer calls to
> vectorizable_simd_clone_call and vectorizable_call. Can you check if
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719
--- Comment #9 from vekumar at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #8)
> CCing AMD too.
Sure Let me check if this tuning helps AMD Zen Arch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91719
--- Comment #10 from vekumar at gcc dot gnu.org ---
xchg is faster than mov+mfence on AMD Zen. We can add m_ZNVER1 | m_ZNVER2 to
the tuning.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #2 from vekumar at gcc dot gnu.org ---
This tuning was intended to generate movups instead of movupd as movups is 1
byte lesser than movupd. May be we should remove xorps generation part.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68621
--- Comment #3 from vekumar at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> You can change the testcase to
>
> __attribute__((aligned (32))) float array[LEN] = {};
>
> which makes it not require -fno-commo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68621
--- Comment #4 from vekumar at gcc dot gnu.org ---
Even after initializing the array
decl_binds_to_current_def_p (base_tree) return false when I set -fpic.
---Snip---
(1)
bool
decl_binds_to_current_def_p (const_tree decl)
{
gcc_assert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68621
--- Comment #5 from vekumar at gcc dot gnu.org ---
Adding visibility to hidden helps.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
index 89a3410..7519a61 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68621
--- Comment #6 from vekumar at gcc dot gnu.org ---
Author: vekumar
Date: Wed Mar 2 06:14:43 2016
New Revision: 233888
URL: https://gcc.gnu.org/viewcvs?rev=233888&root=gcc&view=rev
Log:
Adjust test case in PR68621 to compile with -fpic.
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
The following test case failed to vectorize in gcc -Ofast.
(---snip---)
subroutine test (x,y,z)
integer x,y,z
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
flux_lam.f:68:0: note: dependence distance = 0.
flux_lam.f:68:0: note: dependence distance == 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70103
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||matz at suse dot de
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
Following the comments in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70103#c2
and discussion with Richard, filing this PR.
This is inspired by the loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70193
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58135
--- Comment #3 from vekumar at gcc dot gnu.org ---
Author: vekumar
Date: Mon May 23 09:48:54 2016
New Revision: 236582
URL: https://gcc.gnu.org/viewcvs?rev=236582&root=gcc&view=rev
Log:
Fix PR58135.
2016-05-23 Venkataraman
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58135
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71270
--- Comment #2 from vekumar at gcc dot gnu.org ---
Looked at x86_64 gimple code for intrinsic_pack_1.f90.
After the SLP split we now vectorize at the place where we pass constant
arguments via a parameterstructure to _gfortran_pack call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71270
--- Comment #3 from vekumar at gcc dot gnu.org ---
Built armeb-none-linux-gnueabihf -with-cpu=cortex-a9 --with-fpu=neon-fp16
--with-float=hard
And compared gimple output from intrinsic_pack_1.f90.151t.slp1 before and after
my patch.
The
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71270
--- Comment #5 from vekumar at gcc dot gnu.org ---
The expand dump after SLP split
---snip--
;; MEM[(logical(kind=1) *)&A.8] = { 1, 0, 1, 0 };
(insn 71 70 72 (set (reg:SI 308)
(const_int 16777472 [0x1000100])) intrinsic_pack_1.f9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Assignee|vekumar at gcc dot gnu.org |shiva0217 at gmail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64716
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65662
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65662
--- Comment #8 from vekumar at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #7)
> (In reply to vekumar from comment #6)
> > For 42 bit VA, I have to change the SANITIZER_MMAP_RANGE_SIZE to 1 <<42.
>
> Sure.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62077
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
After preventing conversion of shift to mults in combiner
https://gcc.gnu.org/viewcvs/gcc?view=revision
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049
--- Comment #1 from vekumar at gcc dot gnu.org ---
We need patterns based on shifts to match with combiner generated.
Below patch fixes them.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1c2c5fb..c5a640d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049
--- Comment #4 from vekumar at gcc dot gnu.org ---
(In reply to ktkachov from comment #3)
> Venkat, are you planning to submit this patch to gcc-patches?
> Also, does this mean we can remove the patterns that do arith+shift using
> M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049
--- Comment #6 from vekumar at gcc dot gnu.org ---
(In reply to Ramana Radhakrishnan from comment #5)
> (In reply to vekumar from comment #4)
> > (In reply to ktkachov from comment #3)
> > > Venkat, are you planning to submit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049
--- Comment #7 from vekumar at gcc dot gnu.org ---
(In reply to ktkachov from comment #3)
> Venkat, are you planning to submit this patch to gcc-patches?
> Also, does this mean we can remove the patterns that do arith+shift using
> M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049
--- Comment #9 from vekumar at gcc dot gnu.org ---
Author: vekumar
Date: Tue May 26 15:32:02 2015
New Revision: 223703
URL: https://gcc.gnu.org/viewcvs?rev=223703&root=gcc&view=rev
Log:
2015-05-26 Venkataramanan Kumar
PR targ
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66049
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
gfortran -c -o module_cu_gd.fppized.o -I. -I./netcdf/include -march=bdver4
-Ofast -fno-second-underscore
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67717
--- Comment #2 from vekumar at gcc dot gnu.org ---
yes reproducible with today's trunk.
gcc version 6.0.0 20150925 (experimental) (GCC)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67717
--- Comment #3 from vekumar at gcc dot gnu.org ---
(In reply to vekumar from comment #2)
> yes reproducible with today's trunk.
> gcc version 6.0.0 20150925 (experimental) (GCC)
I meant ICE still shows up in the trunk.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66171
--- Comment #1 from vekumar at gcc dot gnu.org ---
Yes canonical RTL is retained and is LSHIFT here.
May be need to adjust the machine descriptions to be based on shift.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
--- Comment #6 from vekumar at gcc dot gnu.org ---
In the function make_compound_operation, there a check
/* See if we have operations between an ASHIFTRT and an ASHIFT.
If so, try to merge the shifts into a SIGN_EXTEND. We
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
--- Comment #7 from vekumar at gcc dot gnu.org ---
I ran GCC tests against the patch found one failure.
int
adds_shift_ext ( long long a, int b, int c)
{
long long d = (a + ((long long)b << 3));
if (d == 0)
return a + c;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
--- Comment #8 from vekumar at gcc dot gnu.org ---
This is complete patch for the first approach that I took (comment 6). This
patch fixes issues I faced while testing. But I have added extra patterns to
cater the sign extended operands with left
: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
For the below test case redundant sxth instruction gets generate.
int
adds_shift_ext ( long long a, short b, int c)
{
long long d = (a - ((long long)b << 3));
if (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
--- Comment #10 from vekumar at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #9)
> A MULT by a constant power of 2 is not canonical RTL (well, not what
> simplify_rtx would give you); combine shouldn't generate this
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63850
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63850
--- Comment #7 from vekumar at gcc dot gnu.org ---
(In reply to clyon from comment #6)
> Venkat,
> Can you submit your GCC patch, in an accepable way? (no change to sanitizer
> libs code, and obviously do not activate tsan by default
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
For the below test case.
signed char a[100],b[100];
void absolute_s8 (void)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946
--- Comment #1 from vekumar at gcc dot gnu.org ---
The test case is got from gcc.target/aarch64/vect-abs-compile.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946
--- Comment #6 from vekumar at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #5)
> I think you should always use an unsigned type here so it will be defined in
> the IR. This is mentioned in bug 22199#c3 .
Andrew I mis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946
--- Comment #9 from vekumar at gcc dot gnu.org ---
This match.pd pattern vectorizes the PR but works only with -fwrapv.
(simplify
( convert (abs (convert@1 @0)))
( if (INTEGRAL_TYPE_P (type)
/* We check for type compatibility between @0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65287
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Reference: https://bugs.linaro.org/show_bug.cgi?id=863
Test case
int subsi_sxth (int a, short i)
{
/* { dg-final { scan-assembler "s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61440
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
--- Comment #4 from vekumar at gcc dot gnu.org ---
(In reply to Richard Earnshaw from comment #3)
> make_extraction is unable to generate bit-field extractions in more than one
> mode. This causes the extractions that it does generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63949
--- Comment #5 from vekumar at gcc dot gnu.org ---
Richard, what the function get_best_reg_extraction_insn is supposed to do in
make_extraction ?
Priority: P3
Component: bootstrap
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
Build breaks while compiling graphite related file.
Occurred with trunk r231212.
(Snip)
../../gcc-fsf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68667
--- Comment #1 from vekumar at gcc dot gnu.org ---
(In reply to vekumar from comment #0)
> Build breaks while compiling graphite related file.
> Occurred with trunk r231212.
>
> (Snip)
> ../../gcc-fsf-trunk/gcc/graphite-isl-ast-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68417
--- Comment #4 from vekumar at gcc dot gnu.org ---
Older trunk showed (gcc version 6.0.0 20151202 (experimental) (GCC))
iftmp.1_19 = p1_36->y;
tree could trap...
Today's trunk (gcc version 6.0.0 20151209)
Applying if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68417
--- Comment #5 from vekumar at gcc dot gnu.org ---
Richard,
STMT: m1 = p1->x - m;
While hashing p1->x being a component ref, we are hashing operand 0 part
ie TREE_OPERAND (ref, 0). This is unconditionally read and written.
STMT: p3-&g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58135
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68621
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65951
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952
--- Comment #10 from vekumar at gcc dot gnu.org ---
With the patch I get
loop:
adrpx0, array
ldr q1, .LC0
ldr q2, .LC1
adrpx1, ptrs
add x1, x1, :lo12:ptrs
ldr x0, [x0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 65952, which changed state.
Bug 65952 Summary: [AArch64] Will not vectorize storing induction of pointer
addresses for LP64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65952
vekumar at gcc dot gnu.org changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54803
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54803
--- Comment #6 from vekumar at gcc dot gnu.org ---
(In reply to vekumar from comment #5)
> On bdver4 when we enable -march=bdver4 and -mno-prefer-avx128 vectorizes
> using YMM
> Otherwise uses vprotq instruction.
>
> .L13:
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67326
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
The below test case fails to vectorize.
gcc version 7.0.0 20160724 (experimental) (GCC)
gcc -Ofast -mavx -fvect-cost-model=unlimited slp.c -S -fdump-tree-slp-all
struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77270
vekumar at gcc dot gnu.org changed:
What|Removed |Added
CC||vekumar at gcc dot gnu.org
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vekumar at gcc dot gnu.org
Target Milestone: ---
For the test case in the given link
https://godbolt.org/z/MP88MaTva
LLVM is able to optimize the loop and computations
72 matches
Mail list logo