from:"liranuna at gmail dot com"

[Bug rtl-optimization/43147] New: SSE shuffle merge

2010-02-22 Thread liranuna at gmail dot com

I've noticed that GCC (my current version is 4.4.1) doesn't fully optimize SSE
shuffle merges, as seen in this example: 

#include 

extern void printv(__m128 m);

int main()
{
m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap
pairs
m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E
printv(m);

return 0;
}

This code generates the following assembly:

movaps  .LC1, %xmm1
shufps  $201, %xmm1, %xmm1
shufps  $45, %xmm1, %xmm1; <-- Both should merge to 78
movaps  %xmm1, %xmm0
movaps  %xmm1, -24(%ebp)

.LC0:
.long   1065353216 ; 1.0f
.long   1073741824 ; 2.0f
.long   1077936128 ; 3.0f
.long   1082130432 ; 4.0f

Would be nice to see it as an enhancement!


-- 
   Summary: SSE shuffle merge
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
    ReportedBy: liranuna at gmail dot com
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: x86_64-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

[Bug rtl-optimization/43147] SSE shuffle merge

2010-02-22 Thread liranuna at gmail dot com



--- Comment #1 from liranuna at gmail dot com  2010-02-23 01:37 ---
It appears I am missing a line in the code I posted:

#include 

extern void printv(__m128 m);

int main()
{
__m128 m = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
m = _mm_shuffle_ps(m, m, 0xC9); // Those two shuffles together swap
pairs
m = _mm_shuffle_ps(m, m, 0x2D); // And could be optimized to 0x4E
printv(m);

return 0;
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147

[Bug target/43722] New: ICE when passing NEON registers using const refrences

2010-04-11 Thread liranuna at gmail dot com

Giving GCC 4.4.3 the following code with the arguments "-O1 -mcpu=cortex-a8
-mfpu=neon -mfloat-abi=softfp": 



#include 
#include 

void printv_f32(const float32x4_t &v)
{
printf("%f\n", vgetq_lane_f32(v, 0));
}

int main()
{
float32x4_t v = {0.0, 1.0f, 2.0f, 3.0f};

printv_f32(v);

return 0;
}



Results into an ICE:

/home/liranuna/Projects/mathlib_md/source/main.cpp: In function 'int main()':
/home/liranuna/Projects/mathlib_md/source/main.cpp:21: error: insn does not
satisfy its constraints:
(insn 25 5 7 2 /home/liranuna/Projects/mathlib_md/source/main.cpp:11 (set
(mem/c/i:V4SF (pre_dec:SI (reg/f:SI 0 r0 [134])) [0 v+0 S16 A64])
(reg:V4SF 95 d16)) 710 {*neon_movv4sf} (expr_list:REG_INC (reg/f:SI 0
r0 [134])
(nil)))
/home/liranuna/Projects/mathlib_md/source/main.cpp:21: internal compiler error:
in reload_cse_simplify_operands, at postreload.c:396
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


-- 
   Summary: ICE when passing NEON registers using const refrences
   Product: gcc
   Version: 4.4.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: liranuna at gmail dot com
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: arm-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43722

[Bug target/43722] ICE when passing NEON registers using const refrences

2010-04-11 Thread liranuna at gmail dot com



--- Comment #1 from liranuna at gmail dot com  2010-04-12 03:24 ---
I would like to add that changing

void printv_f32(const float32x4_t &v)

into:

void printv_f32(float32x4_t v)

makes the problem go away, but the generated code is suboptimal.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43722

[Bug target/43724] New: GCC produces suboptimal ARM NEON code for zero vector assignment

2010-04-11 Thread liranuna at gmail dot com

The intrinsic family for vdupq_n_XXX with argument of 0.

The code generated is:

mov r0, #0
vdup.32 q8, r0

Instead of the faster

veor.32 q8, q8, q8

Thing to note is that GCC will use xorps on x86[_64] for SSE when using
_mm_setzero_ps() or _mm_set1_ps(0).


-- 
   Summary: GCC produces suboptimal ARM NEON code for zero vector
assignment
   Product: gcc
   Version: 4.4.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: liranuna at gmail dot com
 GCC build triplet: x86_64-linux-gnu
  GCC host triplet: x86_64-linux-gnu
GCC target triplet: arm-linux-gnueabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43724

[Bug target/43722] ICE when passing NEON registers using const refrences

2010-04-13 Thread liranuna at gmail dot com



--- Comment #7 from liranuna at gmail dot com  2010-04-13 07:43 ---
Mikael's patch seems to do that trick as well as producing very nice assembly.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43722

[Bug 45775] (c++) New: Private templated classes/structs inside a class.

2010-09-23 Thread liranuna at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45775

   Summary: Private templated classes/structs inside a class.
   Product: gcc
   Version: 4.4.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: liran...@gmail.com


Created attachment 21874
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=21874
Failing code

Using GCC 4.4.3, and the attached source code, GCC does not error that the
access to A::B is illegal.

According to the C++ spec:

11.8 Nested classes [class.access.nest]

1 A nested class is a member and as such has the same access rights as any
other member. The members of an enclosing class have no special access to
members of a nested class; the usual access rules (Clause 11) shall be obeyed.

Note that A::C is erroring correctly.

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

[Bug 45775] (c++) Private templated classes/structs inside a class.

2010-09-23 Thread liranuna at gmail dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45775

--- Comment #1 from Liran Nuna  2010-09-24 06:50:29 
UTC ---
Accidentally attached wrong source file:

#include 

class A
{
private:
template 
struct B
{

};

struct C
{

};

public:
template 
B getAb()
{ 
return B();
}

C getAc()
{ 
return C();
}
};

template
void print_private_template(const A::B &ab)
{
printf("%d\n", T);
}

void print_private_class(const A::C &ac)
{
printf("something\n");
}

int main(int, char**)
{
A a;

print_private_template(a.getAb<42>());

print_private_class(a.getAc());

return 0;
}

-- 
Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

[Bug rtl-optimization/43147] New: SSE shuffle merge

[Bug rtl-optimization/43147] SSE shuffle merge

[Bug target/43722] New: ICE when passing NEON registers using const refrences

[Bug target/43722] ICE when passing NEON registers using const refrences

[Bug target/43724] New: GCC produces suboptimal ARM NEON code for zero vector assignment

[Bug target/43722] ICE when passing NEON registers using const refrences

[Bug 45775] (c++) New: Private templated classes/structs inside a class.

[Bug 45775] (c++) Private templated classes/structs inside a class.

8 matches

Site Navigation

Mail list logo

Footer information