What's the status of autovectorization for MMX and 3DNow!?

2006-12-10 Thread Zuxy Meng
Hi,

I'm particularly interested in this patch 
(http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html); pretty nice for 
users of Pentium 3 and Athlon. Has it been or will it be integrated into 
mainline?

-- 
Zuxy 





Why use pentium4 for Pentium M?

2007-02-26 Thread Zuxy Meng
Hi,

-march=native choose pentium4 instead of pentium-m for Pentium M processors 
(see config/i386/driver-i386.c)? Is this intentional or a typo?

-- 
Zuxy 





Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?

2007-02-28 Thread Zuxy Meng
I saw the patch in http://gcc.gnu.org/ml/gcc/2005-08/msg00281.html but is it 
available for other hosts, like mingw32?

-- 
Zuxy 





Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?

2007-03-06 Thread Zuxy Meng
Hello,

"Zuxy Meng" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED]
>I saw the patch in http://gcc.gnu.org/ml/gcc/2005-08/msg00281.html but is 
>it
> available for other hosts, like mingw32?

I've uploaded a proposed patch for this bug 
(http://gcc.gnu.org/bugzilla/attachment.cgi?id=13151&action=diff). Putting 
crtfastmath.o in the endfile section of specs causes undefined reference to 
memset, so I moved it to the startfile section. Anyone to review this? Quite 
trivial anyway.

-- 
Zuxy 





Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?

2007-03-11 Thread Zuxy Meng

Hi,

2007/3/11, Danny Smith <[EMAIL PROTECTED]>:



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of Zuxy Meng
> Sent: Wednesday, 7 March 2007 12:36 a.m.

>
> I've uploaded a proposed patch for this bug
> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13151&action=diff).
> Putting
> crtfastmath.o in the endfile section of specs causes undefined
reference to
> memset, so I moved it to the startfile section. Anyone to review this?
Quite
> trivial anyway.

Hello Zuxy,

crtfasmath.o is an endfile for a reason.  _set_fast_math is run from the
.ctors section.  It needs to be run before other constructors that might
use fastmath, so is made an end ctor (ctors are run from right to left).

The reference to memset does not occur on trunk with  optimization.  It
can be avoided with 4.2.0 by adding -minline-all-stringops to rule for
(T)crtfastmath.o


Maybe such optimization isn't turned on for mingw. I updated the patch
to force this by using -minline-all-stringops.

http://gcc.gnu.org/bugzilla/attachment.cgi?id=13189
--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6


Re: MMX instruction and gcc

2007-03-15 Thread Zuxy Meng
"Andrew Pinski" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED]
> On 15 Mar 2007 08:12:21 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
>> Galloth <[EMAIL PROTECTED]> writes:
>>
>> > I want ask, if there is posibilities that gcc will be automaticaly
>> > generates MMX instruction? I tried -mtune, -march  -mmmx. But gcc stil
>> > generates code without MMX instruction. On net I found only
>> > information about inline assembler function or citation from man page.
>> > I tried to compile program for adding two arrays of 24 chars.
>>
>> This question is appropriate for the mailing list
>> [EMAIL PROTECTED], not for [EMAIL PROTECTED]  Please take any
>> follow-ups to gcc-help.  Thanks.
>
> Actually this one I think belongs still here because ...
>
>> In some cases gcc may be able to automatically generate MMX
>> instructions if you use the -ftree-vectorize option.
>
> This is not true right now.  GCC is not able to automatically use the
> MMX instructions at all, this is already filed as PR 22076 and others.
> There was a patch posted as an RFC to fix this but it caused
> regression on x86_64 and the person who submitted has not updated it
> yet for the mainline.

I remembered that Uros said that the biggest problem is to determine which 
mode (x87 or MMX) is active; MMX/3DNow! autovectorization isn't so useful 
when SSE2/SSE is available so maybe we can turn it off with x86_64?

-- 
Zuxy 





Re: x86 inc/dec on core2

2007-04-08 Thread Zuxy Meng
"Mike Stump" <[EMAIL PROTECTED]> 
??:[EMAIL PROTECTED]
> On Apr 8, 2007, at 2:37 AM, Uros Bizjak wrote:
>> My docs say that "INC/DEC does not change the carry flag".
>
> Personally, I'm having a hard time envisioning how the semantics of  the 
> instruction are relevant at all.  This is all about instructing  tuning, 
> so, semantics cannot matter, otherwise, it would be wrong to  make this a 
> tune choice.

Intel's optimization reference manual says that:

3.5.1.1 Use of the INC and DEC Instructions
The INC and DEC instructions modify only a subset of the bits in the flag 
register. This creates a dependence on all previous writes of the flag 
register. This is especially problematic when these instructions are on the 
critical path because they are used to change an address for a load on which 
many other instructions depend.

Assembly/Compiler Coding Rule 32. (M impact, H generality)
INC and DEC instructions should be replaced with ADD or SUB instructions, 
because ADD and SUB overwrite all flags, whereas INC and DEC do not, 
therefore creating false dependencies on earlier instructions that set the 
flags.

>
>> But you have better resources that I, so if you think that C2D  should be 
>> left out of X86_TUNE_USE_INCDEC, then the patch is pre- approved for 
>> mainline.
>
> I'm confused again, it isn't that it should be left out, it is that  it 
> should be included.  My patch adds inc/dec selection for C2D.  I'd  also 
> like it for generic on darwin, as that makes more sense for us.   How does 
> the rest of the community feel about inc/dec selection for  generic?

-- 
Zuxy





Re: What's the status of autovectorization for MMX and 3DNow!?

2007-04-24 Thread Zuxy Meng
"Uros Bizjak" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED]
> Hello!
>
>> I'm particularly interested in this patch
>> (http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html); pretty
> nice for
>> users of Pentium 3 and Athlon. Has it been or will it be integrated into
>> mainline?
>
> This patch heavily depends on the functionality of optimize mode
> switching pass. Unfortunatelly, there is currently no way to tell
> optimize_mode_switching() which modes are exclusive. Due to the way how
> the emms switching patch was designed, it expects that either MMX or X87
> mode can be active at once, to properly switch between x87 and MMX
> registers.
>
> PR target/19161 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19161)
> comment #17 has an example of the control flow that can block both
> register sets  at once. Otherwise, the patch works as expected.


Sorry to dig this old thing old, but will u continue to work on this once 
the problem you mentioned above gets resolved? And what about 3DNow! 
support? Just curious:-)
-- 
Zuxy 





Re: Bug detecting AMD K6-2+ with -march=native ?

2011-05-15 Thread Zuxy Meng
"Stroller"  写入消息 
news:0bead785-392b-49e3-a3d1-0f015b954...@stellar.eclipse.co.uk...




Hi there,

I have a Sun / Cobalt Qube 3 with an AMD K6-2+ CPU, and gcc seems to be 
misdetecting it as an Athlon when using -march=native


# gcc -v -Q --help=target -march=native 2>&1 | grep march
/usr/libexec/gcc/i586-pc-linux-gnu/4.5.2/cc1 -v 
help-dummy -D_FORTIFY_SOURCE=2 -march=athlon --param 
l1-cache-size=32 --param l1-cache-line-size=32 --param 
l2-cache-size=128 -mtune=athlon -dumpbase help-dummy -auxbase 
help-dummy -version -fhelp=target -o /tmp/ccQrke9q.s

 -march=   athlon
#

I've tried a couple of different versions of gcc - 4.4.4 and 4.5.2 - but 
they both show the same.

Someone on the linux-cobalt mailing list can also reproduce.

When I compile nano using `-march=i586`, `-march=k6-2` or `-march=k6-3` it 
works fine.
If I compile it with `-march=native` then nano crashes on startup saying 
"illegal instruction".


/proc/cpuinfo seems to show the CPU correctly:

# cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 5
model   : 13
model name  : AMD-K6(tm)-III Processor
stepping: 4
cpu MHz : 448.140
cache size  : 128 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr cx8 pge mmx syscall 3dnowext 
3dnow k6_mtrr

bogomips: 896.28
clflush size: 32
cache_alignment : 32
address sizes   : 32 bits physical, 32 bits virtual
power management: ts fid vid

#

AIUI the K6-2+ is architecturally the same as a K6-3, so I assume this is 
fine. The family & model above match what wikipedia says it should be for 
a K6-2+ (although they're also the same for a K6-3+).


It's a bug in GCC. Can you try the patch at 
http://gcc.gnu.org/bugzilla/attachment.cgi?id=24233?

--
Zuxy 





Re: -mfpmath=sse,387 is experimental ?

2009-03-16 Thread Zuxy Meng

Hi,

"Timothy Madden"  写入消息 
news:5078d8af0903120218i23b69a4bma28ad9b3f1bd4...@mail.gmail.com...

On Thu, Mar 12, 2009 at 1:15 AM, Jan Hubicka  wrote:

Timothy Madden wrote:
> Hello
>
> Is -mfpmath=both for i386 and x86-64 still experimental in gcc 4.3, as
> the in the online manual page ?

[...]


The fundamental problem here is that backend lies to compiler about the
fact that FP operation can not take one operand from SSE and other from
X87.  This is something I want to look into once I have more time.  With
new RA, perhaps we can drop all these fake constraints.


That would be great !
I am sure having twice the number of registers (sse+387) would make a
big difference.

Even if SSE and FPU instructions set can not mix operands, using both
at the same
time (each with its registers) will be an improvement.

Until then I would have a question: if I compile with -msse than using
-mfpmath=387
would help floating-point operations not steal SSE registers that are
already used by
CPU operations ? And using -mfpmath=sse would make FPU and CPU share the 
SSE

registers and compete on them ?

How would I know if my AMD Sempron 2200+ has separate execution units
for SSE and
FPU instructions, with independent registers ?


Most CPU use the same FP unit for both x87 and SIMD operations so it 
wouldn't give you double the performance. The only exception I know of is 
K6-2/3, whose x87 and 3DNow! units are separate.


--
Zuxy 





Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?

2007-05-20 Thread Zuxy Meng
Hi

"Zuxy Meng" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED]
> Hi,
>
> 2007/3/11, Danny Smith <[EMAIL PROTECTED]>:
>>
>>
>> > -Original Message-
>> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
>> > Behalf Of Zuxy Meng
>> > Sent: Wednesday, 7 March 2007 12:36 a.m.
>>
>> >
>> > I've uploaded a proposed patch for this bug
>> > (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13151&action=diff).
>> > Putting
>> > crtfastmath.o in the endfile section of specs causes undefined
>> reference to
>> > memset, so I moved it to the startfile section. Anyone to review this?
>> Quite
>> > trivial anyway.
>>
>> Hello Zuxy,
>>
>> crtfasmath.o is an endfile for a reason.  _set_fast_math is run from the
>> .ctors section.  It needs to be run before other constructors that might
>> use fastmath, so is made an end ctor (ctors are run from right to left).
>>
>> The reference to memset does not occur on trunk with  optimization.  It
>> can be avoided with 4.2.0 by adding -minline-all-stringops to rule for
>> (T)crtfastmath.o
>
> Maybe such optimization isn't turned on for mingw. I updated the patch
> to force this by using -minline-all-stringops.
>
> http://gcc.gnu.org/bugzilla/attachment.cgi?id=13189

Any developer to have a look at this?
-- 
Zuxy 





Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?

2007-05-21 Thread Zuxy Meng

Hi,

2007/5/21, Uros Bizjak <[EMAIL PROTECTED]>:

Hello!

>> Maybe such optimization isn't turned on for mingw. I updated the patch
>> to force this by using -minline-all-stringops.
>>
>> http://gcc.gnu.org/bugzilla/attachment.cgi?id=13189
>
> Any developer to have a look at this?

Please post the patch to [EMAIL PROTECTED] Also add appropriate
ChangeLog and a reference to the PR. I can find the patch, but can't
find the PR from your message.


It's PR 29498. Thanks!


Patch contribution procedure is described in great detail at
http://gcc.gnu.org/contribute.html.


--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6