What's the status of autovectorization for MMX and 3DNow!?
Hi, I'm particularly interested in this patch (http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html); pretty nice for users of Pentium 3 and Athlon. Has it been or will it be integrated into mainline? -- Zuxy
Why use pentium4 for Pentium M?
Hi, -march=native choose pentium4 instead of pentium-m for Pentium M processors (see config/i386/driver-i386.c)? Is this intentional or a typo? -- Zuxy
Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?
I saw the patch in http://gcc.gnu.org/ml/gcc/2005-08/msg00281.html but is it available for other hosts, like mingw32? -- Zuxy
Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?
Hello, "Zuxy Meng" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED] >I saw the patch in http://gcc.gnu.org/ml/gcc/2005-08/msg00281.html but is >it > available for other hosts, like mingw32? I've uploaded a proposed patch for this bug (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13151&action=diff). Putting crtfastmath.o in the endfile section of specs causes undefined reference to memset, so I moved it to the startfile section. Anyone to review this? Quite trivial anyway. -- Zuxy
Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?
Hi, 2007/3/11, Danny Smith <[EMAIL PROTECTED]>: > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Zuxy Meng > Sent: Wednesday, 7 March 2007 12:36 a.m. > > I've uploaded a proposed patch for this bug > (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13151&action=diff). > Putting > crtfastmath.o in the endfile section of specs causes undefined reference to > memset, so I moved it to the startfile section. Anyone to review this? Quite > trivial anyway. Hello Zuxy, crtfasmath.o is an endfile for a reason. _set_fast_math is run from the .ctors section. It needs to be run before other constructors that might use fastmath, so is made an end ctor (ctors are run from right to left). The reference to memset does not occur on trunk with optimization. It can be avoided with 4.2.0 by adding -minline-all-stringops to rule for (T)crtfastmath.o Maybe such optimization isn't turned on for mingw. I updated the patch to force this by using -minline-all-stringops. http://gcc.gnu.org/bugzilla/attachment.cgi?id=13189 -- Zuxy Beauty is truth, While truth is beauty. PGP KeyID: E8555ED6
Re: MMX instruction and gcc
"Andrew Pinski" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED] > On 15 Mar 2007 08:12:21 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: >> Galloth <[EMAIL PROTECTED]> writes: >> >> > I want ask, if there is posibilities that gcc will be automaticaly >> > generates MMX instruction? I tried -mtune, -march -mmmx. But gcc stil >> > generates code without MMX instruction. On net I found only >> > information about inline assembler function or citation from man page. >> > I tried to compile program for adding two arrays of 24 chars. >> >> This question is appropriate for the mailing list >> [EMAIL PROTECTED], not for [EMAIL PROTECTED] Please take any >> follow-ups to gcc-help. Thanks. > > Actually this one I think belongs still here because ... > >> In some cases gcc may be able to automatically generate MMX >> instructions if you use the -ftree-vectorize option. > > This is not true right now. GCC is not able to automatically use the > MMX instructions at all, this is already filed as PR 22076 and others. > There was a patch posted as an RFC to fix this but it caused > regression on x86_64 and the person who submitted has not updated it > yet for the mainline. I remembered that Uros said that the biggest problem is to determine which mode (x87 or MMX) is active; MMX/3DNow! autovectorization isn't so useful when SSE2/SSE is available so maybe we can turn it off with x86_64? -- Zuxy
Re: x86 inc/dec on core2
"Mike Stump" <[EMAIL PROTECTED]> ??:[EMAIL PROTECTED] > On Apr 8, 2007, at 2:37 AM, Uros Bizjak wrote: >> My docs say that "INC/DEC does not change the carry flag". > > Personally, I'm having a hard time envisioning how the semantics of the > instruction are relevant at all. This is all about instructing tuning, > so, semantics cannot matter, otherwise, it would be wrong to make this a > tune choice. Intel's optimization reference manual says that: 3.5.1.1 Use of the INC and DEC Instructions The INC and DEC instructions modify only a subset of the bits in the flag register. This creates a dependence on all previous writes of the flag register. This is especially problematic when these instructions are on the critical path because they are used to change an address for a load on which many other instructions depend. Assembly/Compiler Coding Rule 32. (M impact, H generality) INC and DEC instructions should be replaced with ADD or SUB instructions, because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags. > >> But you have better resources that I, so if you think that C2D should be >> left out of X86_TUNE_USE_INCDEC, then the patch is pre- approved for >> mainline. > > I'm confused again, it isn't that it should be left out, it is that it > should be included. My patch adds inc/dec selection for C2D. I'd also > like it for generic on darwin, as that makes more sense for us. How does > the rest of the community feel about inc/dec selection for generic? -- Zuxy
Re: What's the status of autovectorization for MMX and 3DNow!?
"Uros Bizjak" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED] > Hello! > >> I'm particularly interested in this patch >> (http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html); pretty > nice for >> users of Pentium 3 and Athlon. Has it been or will it be integrated into >> mainline? > > This patch heavily depends on the functionality of optimize mode > switching pass. Unfortunatelly, there is currently no way to tell > optimize_mode_switching() which modes are exclusive. Due to the way how > the emms switching patch was designed, it expects that either MMX or X87 > mode can be active at once, to properly switch between x87 and MMX > registers. > > PR target/19161 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19161) > comment #17 has an example of the control flow that can block both > register sets at once. Otherwise, the patch works as expected. Sorry to dig this old thing old, but will u continue to work on this once the problem you mentioned above gets resolved? And what about 3DNow! support? Just curious:-) -- Zuxy
Re: Bug detecting AMD K6-2+ with -march=native ?
"Stroller" 写入消息 news:0bead785-392b-49e3-a3d1-0f015b954...@stellar.eclipse.co.uk... Hi there, I have a Sun / Cobalt Qube 3 with an AMD K6-2+ CPU, and gcc seems to be misdetecting it as an Athlon when using -march=native # gcc -v -Q --help=target -march=native 2>&1 | grep march /usr/libexec/gcc/i586-pc-linux-gnu/4.5.2/cc1 -v help-dummy -D_FORTIFY_SOURCE=2 -march=athlon --param l1-cache-size=32 --param l1-cache-line-size=32 --param l2-cache-size=128 -mtune=athlon -dumpbase help-dummy -auxbase help-dummy -version -fhelp=target -o /tmp/ccQrke9q.s -march= athlon # I've tried a couple of different versions of gcc - 4.4.4 and 4.5.2 - but they both show the same. Someone on the linux-cobalt mailing list can also reproduce. When I compile nano using `-march=i586`, `-march=k6-2` or `-march=k6-3` it works fine. If I compile it with `-march=native` then nano crashes on startup saying "illegal instruction". /proc/cpuinfo seems to show the CPU correctly: # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 5 model : 13 model name : AMD-K6(tm)-III Processor stepping: 4 cpu MHz : 448.140 cache size : 128 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr cx8 pge mmx syscall 3dnowext 3dnow k6_mtrr bogomips: 896.28 clflush size: 32 cache_alignment : 32 address sizes : 32 bits physical, 32 bits virtual power management: ts fid vid # AIUI the K6-2+ is architecturally the same as a K6-3, so I assume this is fine. The family & model above match what wikipedia says it should be for a K6-2+ (although they're also the same for a K6-3+). It's a bug in GCC. Can you try the patch at http://gcc.gnu.org/bugzilla/attachment.cgi?id=24233? -- Zuxy
Re: -mfpmath=sse,387 is experimental ?
Hi, "Timothy Madden" 写入消息 news:5078d8af0903120218i23b69a4bma28ad9b3f1bd4...@mail.gmail.com... On Thu, Mar 12, 2009 at 1:15 AM, Jan Hubicka wrote: Timothy Madden wrote: > Hello > > Is -mfpmath=both for i386 and x86-64 still experimental in gcc 4.3, as > the in the online manual page ? [...] The fundamental problem here is that backend lies to compiler about the fact that FP operation can not take one operand from SSE and other from X87. This is something I want to look into once I have more time. With new RA, perhaps we can drop all these fake constraints. That would be great ! I am sure having twice the number of registers (sse+387) would make a big difference. Even if SSE and FPU instructions set can not mix operands, using both at the same time (each with its registers) will be an improvement. Until then I would have a question: if I compile with -msse than using -mfpmath=387 would help floating-point operations not steal SSE registers that are already used by CPU operations ? And using -mfpmath=sse would make FPU and CPU share the SSE registers and compete on them ? How would I know if my AMD Sempron 2200+ has separate execution units for SSE and FPU instructions, with independent registers ? Most CPU use the same FP unit for both x87 and SIMD operations so it wouldn't give you double the performance. The only exception I know of is K6-2/3, whose x87 and 3DNow! units are separate. -- Zuxy
Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?
Hi "Zuxy Meng" <[EMAIL PROTECTED]> дÈëÏûÏ¢ÐÂÎÅ:[EMAIL PROTECTED] > Hi, > > 2007/3/11, Danny Smith <[EMAIL PROTECTED]>: >> >> >> > -Original Message- >> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On >> > Behalf Of Zuxy Meng >> > Sent: Wednesday, 7 March 2007 12:36 a.m. >> >> > >> > I've uploaded a proposed patch for this bug >> > (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13151&action=diff). >> > Putting >> > crtfastmath.o in the endfile section of specs causes undefined >> reference to >> > memset, so I moved it to the startfile section. Anyone to review this? >> Quite >> > trivial anyway. >> >> Hello Zuxy, >> >> crtfasmath.o is an endfile for a reason. _set_fast_math is run from the >> .ctors section. It needs to be run before other constructors that might >> use fastmath, so is made an end ctor (ctors are run from right to left). >> >> The reference to memset does not occur on trunk with optimization. It >> can be avoided with 4.2.0 by adding -minline-all-stringops to rule for >> (T)crtfastmath.o > > Maybe such optimization isn't turned on for mingw. I updated the patch > to force this by using -minline-all-stringops. > > http://gcc.gnu.org/bugzilla/attachment.cgi?id=13189 Any developer to have a look at this? -- Zuxy
Re: Is "FTZ/DAZ for SSE via fast math" available for x86 arch other than Linux?
Hi, 2007/5/21, Uros Bizjak <[EMAIL PROTECTED]>: Hello! >> Maybe such optimization isn't turned on for mingw. I updated the patch >> to force this by using -minline-all-stringops. >> >> http://gcc.gnu.org/bugzilla/attachment.cgi?id=13189 > > Any developer to have a look at this? Please post the patch to [EMAIL PROTECTED] Also add appropriate ChangeLog and a reference to the PR. I can find the patch, but can't find the PR from your message. It's PR 29498. Thanks! Patch contribution procedure is described in great detail at http://gcc.gnu.org/contribute.html. -- Zuxy Beauty is truth, While truth is beauty. PGP KeyID: E8555ED6