Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
tbp wrote: On 3/13/06, Andrew Pinski <[EMAIL PROTECTED]> wrote: Actually the best way of improving the inline heuristics is to get a real testcase (and not some benchmark) where the inline heuristics is messed up. Ah, you mean a brand new testcase because PR-21195 wasn't good enough? show up in GCC 4.1 except for Wait wait. PR/21195 is about inlining the SSE builtins. These are special because, for example, you probably would prefer GDB to not step into them, but just execute them. As Andrew said, it is only an implementation choice (subject to revision) that they are implemented as inline functions at all. For example, if an older GCC had a similar bug with Altivec intrinsics, it would have showed up only in C++ (because Altivec intrinsics were never implemented as inlines in C) and would not show up anymore in GCC 4.1 except for a handful of intrinsics (because most Altivec intrinsics are not inlines at all anymore). memset/memcpy is different from SSE builtins because the choice of whether to inline or not is target dependent, and because glibc also decides whether or not to provide its own inlining, depending on the GCC version you're using. So the best way to report the problem is to file a *preprocessed* testcase into Bugzilla (i.e. the output of "gcc -E testcase.c > testcase.i" or equivalently "gcc -save-temps testcase.c", and to include the output of gcc -v testcase.c -O2 of the bug report. Using preprocessed source code at least makes sure that the glibc choices are not influencing the comparison between 3.4.x and 4.0.x. This information is present in the "how to file a bug" chapter of the manual. Your case seems to be different, because it involves inlining user routines. Again, you need to give us the preprocessed source code for us to look at your bug effectively. Paolo
Re: [PATCH] Add new target-hook truncated_to_mode
bool truncated_to_mode (enum machine_mode mode, rtx x) { if (REG_P (x) && rtl_hooks.reg_truncated_to_mode (mode, x)) return true; gcc_assert (!TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (GET_MODE (x))); return num_sign_bit_copies (x, GET_MODE (x)) > GET_MODE_BITSIZE (GET_MODE (x)) - GET_MODE_BITSIZE (mode); } In the MIPS case, you would have n_s_b_c (x, GET_MODE (x)) > 64 - 32. This wouldn't work for DI->HI truncation for example. There too only the upper 33 bits have to match for the TRUNCATE to be unnecessary. See comment around truncsdi in mips.md. If this is so, SImode should be passed to reg_truncated_to_mode as well, instead of HImode, shouldn't it? What about this logic: int n = num_sign_bit_copies (x, GET_MODE (x)); int dest_bits; enum machine_mode next_mode = mode; do { mode = next_mode; dest_bits = GET_MODE_BITSIZE (mode); /* If it is a no-op to truncate to MODE from a wider mode (e.g. to HI from SI on MIPS), we can check a weaker condition. */ next_mode = GET_MODE_WIDER_MODE (mode); } while (next_mode != VOIDmode && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (next_mode), dest_bits); return (REG_P (x) && rtl_hooks.reg_truncated_to_mode (mode, x)) || n > GET_MODE_BITSIZE (GET_MODE (x)) - dest_bits); On MIPS, we would not test HImode but SImode since TRULY_NOOP_TRUNCATION (32, 16) == true. To me, this is a clue that the TRULY_NOOP_TRUNCATION macro is insufficient and could be replaced by another one. For example (for MIPS -- SHmedia is the same with s/MIPS64/SHMEDIA/): /* Return the mode to which we should truncate an INMODE value before operating on it in OUTMODE. For example, on MIPS we should truncate a 64-bit value to 32-bits when operating on it in SImode or a narrower mode. We return INMODE if no such truncation is necessary and we can just pretend that the value is already truncated. */ #define WIDEST_NECESSARY_TRUNCATION(outmode, inmode) \ (TARGET_MIPS64 \ && GET_MODE_BITSIZE (inmode) <= 32 \ && GET_MODE_BITSIZE (outmode) > 32 ? SImode : inmode) Since all uses of TRULY_NOOP_TRUNCATION (except one in convert.c which could be changed to use TYPE_MODE) are of the form TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (x), GET_MODE_BITSIZE (y)), you could change them to WIDEST_NECESSARY_TRUNCATION (x, y) != y We could also take the occasion to remove all the defines of TRULY_NOOP_TRUNCATION to 1, and put a default definition in defaults.h! You can then proceed to implement truncated_to_mode as mode = WIDEST_NECESSARY_TRUNCATION (mode, GET_MODE (x)); gcc_assert (mode != GET_MODE (x)); return (REG_P (x) && rtl_hooks.reg_truncated_to_mode (mode, x)) || num_sign_bit_copies (x, GET_MODE (x)) > GET_MODE_BITSIZE (GET_MODE (x)) - GET_MODE_BITSIZE (mode); What do you think? Paolo
bootstrap broken on tunk for combined source tree
bootstrap compiler gcc-4.1.0 binutils-2.16.1 build system: i686-pc-linux-gnu Configuring stage 2 in ./libiberty configure: creating cache ./config.cache checking whether to enable maintainer-specific portions of Makefiles... no checking for makeinfo... makeinfo --split-size=500 --split-size=500 checking for perl... perl checking build system type... i686-pc-linux-gnu checking host system type... i686-pc-linux-gnu checking for i686-pc-linux-gnu-ar... ar checking for i686-pc-linux-gnu-ranlib... ranlib checking for i686-pc-linux-gnu-gcc... /SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/xgcc -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/ -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/i686-pc-linux-gnu/bin/ checking for suffix of object files... configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details. gmake[2]: *** [configure-stage2-libiberty] Error 1 gmake[2]: Leaving directory `/disk1/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2' gmake[1]: *** [stage2-bubble] Error 2 gmake[1]: Leaving directory `/disk1/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2' gmake: *** [all] Error 2 config.log in libiberty contains: configure:2272: /SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/xgcc -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/ -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/i686-pc-linux-gnu/bin/ -c -g -O2 conftest.c >&5 lt-as-new: error while loading shared libraries: libbfd-2.16.1.so: cannot open shared object file: No such file or directory It looks like we have a wrong LD_LIBRARY_PATH setting. Any thoughts ? Rainer -- Rainer Emrich TECOSIM GmbH Im Eichsfeld 3 65428 Rüsselsheim Phone: +49(0)6142/8272 12 Mobile: +49(0)163/56 949 20 Fax.: +49(0)6142/8272 49 Web: www.tecosim.com
Re: bootstrap broken on tunk for combined source tree
config.log in libiberty contains: configure:2272: /SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/xgcc -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/ -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/i686-pc-linux-gnu/bin/ -c -g -O2 conftest.c >&5 lt-as-new: error while loading shared libraries: libbfd-2.16.1.so: cannot open shared object file: No such file or directory It looks like we have a wrong LD_LIBRARY_PATH setting. It should work; I surely tested it before enabling toplevel bootstrap. The toplevel configure also has HOST_LIB_PATH_bfd = \ $$r/$(HOST_SUBDIR)/bfd/.:$$r/$(HOST_SUBDIR)/prev-bfd/.: Could you try sticking an "echo $LD_LIBRARY_PATH" in the libiberty configure script? Paolo
Re: [PATCH] Add new target-hook truncated_to_mode
Paolo Bonzini <[EMAIL PROTECTED]> writes: >>> bool >>> truncated_to_mode (enum machine_mode mode, rtx x) >>> { >>>if (REG_P (x) && rtl_hooks.reg_truncated_to_mode (mode, x)) >>> return true; >>> >>>gcc_assert (!TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode), >>>GET_MODE_BITSIZE (GET_MODE (x))); >>>return num_sign_bit_copies (x, GET_MODE (x)) > >>> GET_MODE_BITSIZE (GET_MODE (x)) - GET_MODE_BITSIZE (mode); >>> } >>> >>> In the MIPS case, you would have n_s_b_c (x, GET_MODE (x)) > 64 - 32. >>> >> >> This wouldn't work for DI->HI truncation for example. There too only >> the upper 33 bits have to match for the TRUNCATE to be unnecessary. >> See comment around truncsdi in mips.md. >> > If this is so, SImode should be passed to reg_truncated_to_mode as well, > instead of HImode, shouldn't it? What about this logic: > > int n = num_sign_bit_copies (x, GET_MODE (x)); > int dest_bits; > enum machine_mode next_mode = mode; > do > { > mode = next_mode; > dest_bits = GET_MODE_BITSIZE (mode); > > /* If it is a no-op to truncate to MODE from a wider mode (e.g. to > HI from SI on MIPS), > we can check a weaker condition. */ > next_mode = GET_MODE_WIDER_MODE (mode); > } > while (next_mode != VOIDmode >&& TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (next_mode), > dest_bits); > > return (REG_P (x) && rtl_hooks.reg_truncated_to_mode (mode, x)) >|| n > GET_MODE_BITSIZE (GET_MODE (x)) - dest_bits); It looks like you're introducing a new assumption here: that we can ignore TRULY_NOOP_TRUNCATE (X, Y) if the upper X-Y bits are all filled with sign bits. I realise that's true for both SH and MIPS, but the current documentation of TRULY_NOOP_TRUNCATE doesn't guarantee it. For example, I could imagine some future port wanting to preserve zero extension instead of sign extension. That still fits TRLULY_NOOP_TRUNCATION as currently defined, but the code above would then be wrong. And... > On MIPS, we would not test HImode but SImode since TRULY_NOOP_TRUNCATION > (32, 16) == true. To me, this is a clue that the TRULY_NOOP_TRUNCATION > macro is insufficient and could be replaced by another one. For example > (for MIPS -- SHmedia is the same with s/MIPS64/SHMEDIA/): > > /* Return the mode to which we should truncate an INMODE value before > operating >on it in OUTMODE. For example, on MIPS we should truncate a 64-bit value >to 32-bits when operating on it in SImode or a narrower mode. > >We return INMODE if no such truncation is necessary and we can just > pretend >that the value is already truncated. */ > #define WIDEST_NECESSARY_TRUNCATION(outmode, inmode) \ > (TARGET_MIPS64 \ >&& GET_MODE_BITSIZE (inmode) <= 32 \ >&& GET_MODE_BITSIZE (outmode) > 32 ? SImode : inmode) > > Since all uses of TRULY_NOOP_TRUNCATION (except one in convert.c which > could be changed to use TYPE_MODE) are of the form TRULY_NOOP_TRUNCATION > (GET_MODE_BITSIZE (x), GET_MODE_BITSIZE (y)), you could change them to > > WIDEST_NECESSARY_TRUNCATION (x, y) != y > > We could also take the occasion to remove all the defines of > TRULY_NOOP_TRUNCATION to 1, and put a default definition in defaults.h! > > You can then proceed to implement truncated_to_mode as > >mode = WIDEST_NECESSARY_TRUNCATION (mode, GET_MODE (x)); >gcc_assert (mode != GET_MODE (x)); >return (REG_P (x) && rtl_hooks.reg_truncated_to_mode (mode, x)) > || num_sign_bit_copies (x, GET_MODE (x)) > >GET_MODE_BITSIZE (GET_MODE (x)) - GET_MODE_BITSIZE (mode); > > What do you think? ...I think the same applies to this macro too. That's one reason why I prefer the alternative hook that I described: it makes the sign extension explicit. The other reason is that it would allow the middle-end to remove redundant sign extensions. (Note that WIDEST_NECESSARY_TRUNCATION(X, Y) == Z does _not_ imply that sign-extension of a Z-bit value to X bits comes for free. On MIPS, it isn't true for X==128, just X==64.) Richard
Re: bootstrap broken on tunk for combined source tree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Paolo Bonzini schrieb: > >> config.log in libiberty contains: >> >> configure:2272: >> /SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/xgcc >> >> -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/ >> -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/i686-pc-linux-gnu/bin/ >> -c >> -g -O2 conftest.c >&5 >> lt-as-new: error while loading shared libraries: libbfd-2.16.1.so: >> cannot open >> shared object file: No such file or directory >> >> It looks like we have a wrong LD_LIBRARY_PATH setting. > > It should work; I surely tested it before enabling toplevel bootstrap. > The toplevel configure also has > > HOST_LIB_PATH_bfd = \ > $$r/$(HOST_SUBDIR)/bfd/.:$$r/$(HOST_SUBDIR)/prev-bfd/.: > > Could you try sticking an "echo $LD_LIBRARY_PATH" in the libiberty > configure script? > > Paolo > Your right, the LD_LIBRARY_PATH includes ./bfd/. and ./prev-bfd/., but the shared library is in ./prev-bfd/.libs !!! ./prev-bfd/.libs/libbfd-2.16.1.so Rainer - -- Rainer Emrich TECOSIM GmbH Im Eichsfeld 3 65428 Rüsselsheim Phone: +49(0)6142/8272 12 Mobile: +49(0)163/56 949 20 Fax.: +49(0)6142/8272 49 Web: www.tecosim.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEFUqR3s6elE6CYeURAjdUAKDXBW99he2UO9fkpfksg3aMFZnaWwCgzX4F 0hAmYJ01L1WYvjF0nhdvVL8= =PBN5 -END PGP SIGNATURE-
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote: >Wait wait. PR/21195 is about inlining > the SSE builtins. No. PR/21195 was really about inline heuristic going ballistic. Those intrinsics are thin wrappers around builtins, and ultimately resolve to a couple of operations. Typical C++ (accessors/ctors) also presents lots of such small functions. And guess what, same cause same symptom. There's no sensible metric by which code i've quoted in previous mail makes sense. Size? Nope. Execution time? Certainly not. Again whether or not SSE ops are involved was and is still irrelevant. > Your case seems to be different, because it involves inlining user > routines. Again, you need to give us the preprocessed source code for > us to look at your bug effectively. Thanks for the tip, but i'll pass. I've done my duty already. Months ago there was 2 options for fixing PR/21195: a) Fix the inlining heuristic. b) Kludge all intrinsics with always_inline. I've tried to argue a bit but to no avail. So, while you remain convinced everything's fine with the inliner, i'll keep tagging every function in my code with always_inline/noinline where performance matters.
Re: bootstrap broken on tunk for combined source tree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Paolo Bonzini schrieb: > >> config.log in libiberty contains: >> >> configure:2272: >> /SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/xgcc >> >> -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.2/gcc-4.2/./prev-gcc/ >> -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/i686-pc-linux-gnu/bin/ >> -c >> -g -O2 conftest.c >&5 >> lt-as-new: error while loading shared libraries: libbfd-2.16.1.so: >> cannot open >> shared object file: No such file or directory >> >> It looks like we have a wrong LD_LIBRARY_PATH setting. > > It should work; I surely tested it before enabling toplevel bootstrap. > The toplevel configure also has > > HOST_LIB_PATH_bfd = \ > $$r/$(HOST_SUBDIR)/bfd/.:$$r/$(HOST_SUBDIR)/prev-bfd/.: > > Could you try sticking an "echo $LD_LIBRARY_PATH" in the libiberty > configure script? > > Paolo > And the same is true for prev-opcodes: prev-opcodes/.libs/libopcodes-2.16.1.so Rainer - -- Rainer Emrich TECOSIM GmbH Im Eichsfeld 3 65428 Rüsselsheim Phone: +49(0)6142/8272 12 Mobile: +49(0)163/56 949 20 Fax.: +49(0)6142/8272 49 Web: www.tecosim.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEFU7o3s6elE6CYeURAlKFAKCx59Q93kErIQAVw55e7MkNq9oGbACfShyo 3xoQzpN6pKpTIYG2ChipZEs= =RGzx -END PGP SIGNATURE-
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > >Wait wait. PR/21195 is about inlining > > the SSE builtins. > No. PR/21195 was really about inline heuristic going ballistic. > Those intrinsics are thin wrappers around builtins, and ultimately > resolve to a couple of operations. Typical C++ (accessors/ctors) also > presents lots of such small functions. > And guess what, same cause same symptom. Starting with gcc 4.1.0 we have inline heuristics in place that will _always_ inline such simple "wrappers". So, if this still happens, there is a bug in the heuristics and that should be reported. Before 4.1.0 the heuristics were bogus and wrappers were not inlined all the time. So, can you verify you are happy with the heuristics in 4.1.0 (not talking about inlining of memcpy/memset that are really not function inlining, but the SSE/altivec inline function implementations). Richard.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > Starting with gcc 4.1.0 we have inline heuristics in place that will _always_ > inline such simple "wrappers". So, if this still happens, there is a bug in > the > heuristics and that should be reported. Before 4.1.0 the heuristics were > bogus > and wrappers were not inlined all the time. > So, can you verify you are happy with the heuristics in 4.1.0 No i'm not, and i've used a pristine 4.1.0 in http://gcc.gnu.org/ml/gcc/2006-03/msg00410.html I haven't tried that particular testcase on 4.2.x, but some weeks ago i had to go thru all my code again to put always_inline in some forgotten places because i was seeing even empty ctors not being inlined (to the effect of having a call to a ret). So in this regard, 4.1.0 & 4.2.x still exhibit that kind of behaviour. It seems to trigger when some particular threshold is met, either for a function or unit, then nothing at all gets inlined but functions tagged with always_inline; of course major performance regression ensues.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > Starting with gcc 4.1.0 we have inline heuristics in place that will > _always_ > > inline such simple "wrappers". So, if this still happens, there is a bug > > in the > > heuristics and that should be reported. Before 4.1.0 the heuristics were > > bogus > > and wrappers were not inlined all the time. > > So, can you verify you are happy with the heuristics in 4.1.0 > No i'm not, and i've used a pristine 4.1.0 in > http://gcc.gnu.org/ml/gcc/2006-03/msg00410.html For the testcase in this message, I get (I removed the always_inline) all wrappers inlined to bloatit. Of course bloatit does not get inlined w/o always_inline - it's a huge function and not a simple wrapper. With always_inline on it, the wrappers are no longer inlined - this is a bug and should be reported. Of course from 4.1.0 on you can easier stick an __attribute__((flatten)) on the function you want everything inlined to (finalblow) and get everything inlined into it. Can you report a bugzilla for the bad interaction between always_inline and inlining of simple wrappers? Thanks, Richard.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > > Starting with gcc 4.1.0 we have inline heuristics in place that will > > _always_ > > > inline such simple "wrappers". So, if this still happens, there is a bug > > > in the > > > heuristics and that should be reported. Before 4.1.0 the heuristics were > > > bogus > > > and wrappers were not inlined all the time. > > > So, can you verify you are happy with the heuristics in 4.1.0 > > No i'm not, and i've used a pristine 4.1.0 in > > http://gcc.gnu.org/ml/gcc/2006-03/msg00410.html > > For the testcase in this message, I get (I removed the always_inline) > all wrappers inlined to bloatit. Of course bloatit does not get inlined > w/o always_inline - it's a huge function and not a simple wrapper. With > always_inline on it, the wrappers are no longer inlined - this is a bug and > should be reported. Of course from 4.1.0 on you can easier stick an > __attribute__((flatten)) on the function you want everything inlined to > (finalblow) and get everything inlined into it. I see the bug and will have a fix in a moment. Richard.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > Of course from 4.1.0 on you can easier stick an > __attribute__((flatten)) on the function you want everything inlined to > (finalblow) and get everything inlined into it. But that's not really what i'm after: i expect trivial functions to get inlined no matter what at a given -Ox. > With always_inline on it, the wrappers are no longer inlined - this is a bug > and > should be reported. > Can you report a bugzilla for the bad interaction between always_inline > and inlining of simple wrappers? I will report it again then.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > I see the bug and will have a fix in a moment. You made my day. Or you're about to. Unless you're lying and i'll have to curse you for 7 generations.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > I see the bug and will have a fix in a moment. > You made my day. Or you're about to. Unless you're lying and i'll have > to curse you for 7 generations. http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html ;)
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html /me ventilates. You're my hero.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html > /me ventilates. > You're my hero. A double+ hero on top of that. http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00737.html I think i've hit that one that one too; reported here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26650 Well, i can always dream.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html > > /me ventilates. > > You're my hero. > A double+ hero on top of that. > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00737.html > I think i've hit that one that one too; reported here: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26650 I don't think this is related, and a quick check with the patch shows still unaligned moves to the stack. Richard.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > I don't think this is related, and a quick check with the patch shows > still unaligned > moves to the stack. Patience is a virtue i guess :) Is there good chances your inlining fix will hit mainline soon?
[M32C-ELF] : Improper follow-up of bss section
Hi, I have downloaded latest GCC and Binutils sources from FSF for M32C port. Using these sources, I could successfully build the cross toolchain i.e. m32c-elf-*. I have observed the following behavior while building an application, Case 1 - Initialized global variables are not present in the application. (data section is empty). If I specify the locations of data section and bss section in the linker script in the following manner and build the application, .data 0x0400 : { _data = .; *(.data) *(.data.*) _edata = .; } .bss : { _bss = .; *(.bss) *(COMMON) _ebss = .; _end = .; } the bss section is located at the location 0x00 instead of 0x000400. The value of the variable "_bss" is 0x00. The value of the variables "_ebss" and "_end" is 0x00. This can be verified from the map file. In this case the location counter is not incremented properly. In case of H8 and SH tool chains, the bss section follows the data section correctly. Case 2 - One initialized global variable is present in the application (E.g. int i = 1;). If I build the application with the above mentioned linker script, the bss section is located at 0x000402. The value of the variable "_bss" is 0x000402. The value of the variables "_ebss" and "_end" is 0x000402. This can be verified from the map file. Thus, for the proper follow-up of the bss section (i.e. to increment the location counter correctly), the data section should not be empty. Is this behavior expected? Case 3 - No initialized global variable is present in the application (data section is empty) but following linker script is used, MEMORY { ram (rw) : o = 0x400, l = 31k rom (rx) : o = 0x000E000, l = 256k } .data 0x0400 : { _data = .; *(.data) *(.data.*) _edata = .; } > ram .bss : { _bss = .; *(.bss) *(COMMON) _ebss = .; _end = .; } > ram In this case the bss section follows the data section correctly i.e. bss section is located at address 0x000400 and not 0x00 (as in case 1). In this case the location counter is incremented correctly. The above behavior is observed for all m32c targets, i.e. r8c, m16c, m32c and m32cm. Is this behavior expected? Linker script similar to the script specified in case 1, works properly with H8 and SH tool chains (modified according to their memory maps). Why does it not work with M32C tool chain? Do I need to use "MEMORY" command in the linker script as in case 3. Thanks in advance. Regards, Ina Pandit KPIT Cummins InfoSystems Ltd. Pune, India Free download of GNU based tool-chains for Renesas' SH and H8 Series. The following site also offers free technical support to its users. Visit http://www.kpitgnutools.com for details. Latest versions of KPIT GNU tools were released on February 1, 2006.
Re: [M32C-ELF] : Improper follow-up of bss section
This is all binutils specific, nothing to do with gcc as such. Please re-post your queries to the binutils list. Andrew.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
Is there a bugzilla entry describing the bug Richard is fixing? If not, it'd be nice to have, if for no other reason than it would show up naturally when people look for bugs fixed in gcc-4.1.1. I can create one, but it'd be better if someone actually involved in the action did. - Dan -- Wine for Windows ISVs: http://kegel.com/wine/isv
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Dan Kegel <[EMAIL PROTECTED]> wrote: > Is there a bugzilla entry describing the bug Richard is fixing? > If not, it'd be nice to have, if for no other reason than > it would show up naturally when people look for bugs fixed in gcc-4.1.1. > > I can create one, but it'd be better if someone actually > involved in the action did. I can do it. Richard.
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 3/13/06, Dan Kegel <[EMAIL PROTECTED]> wrote: > > Is there a bugzilla entry describing the bug Richard is fixing? > > If not, it'd be nice to have, if for no other reason than > > it would show up naturally when people look for bugs fixed in gcc-4.1.1. > > > > I can create one, but it'd be better if someone actually > > involved in the action did. > > I can do it. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26667 Richard.
emit-rtl.c: 5048 assert
All- I'm having problems with an assert on line 5048 of emit-rtl.c gcc_assert (i < MAX_RECOG_OPERANDS); The assert is in the copy_insn_1() function and is asserted when the number of copied scratch registers exceeds MAX_RECOG_OPERANDS. For my particular machine (IA-64) this number is 30. This happens when I make a call to duplicate_block() in my code. I've just started the debugging process and was just wondering if there was a simple way (besides recursing through the expression) to check for the number of scatch registers used by an instruction? Thanks, Chad
Re: Problem with pex-win32.c
Here is a sample program which does the right thing (no spurious console windows, all output visible) when run either from a console or from a console-free environment, such as a Cygwin xterm. This is the code we'll be working into libiberty -- unless someone has a better solution! -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713 #include #include int main() { HANDLE stdin_handle; HANDLE stdout_handle; HANDLE stderr_handle; DWORD dwCreationFlags; OSVERSIONINFO version_info; STARTUPINFO si; PROCESS_INFORMATION pi; /* Replace these with handles for pipes, etc. */ stdin_handle = GetStdHandle (STD_INPUT_HANDLE); stdout_handle = GetStdHandle (STD_OUTPUT_HANDLE); stderr_handle = GetStdHandle (STD_ERROR_HANDLE); version_info.dwOSVersionInfoSize = sizeof (version_info); GetVersionEx (&version_info); if (version_info.dwPlatformId == VER_PLATFORM_WIN32_WINDOWS) /* On Windows 95/98/ME the CREATE_NO_WINDOW flag is not supported, so we cannot avoid creating a console window. */ dwCreationFlags = 0; else { HANDLE conout_handle; /* Determine whether or not we have an associated console. */ conout_handle = CreateFile("CONOUT$", GENERIC_WRITE, FILE_SHARE_WRITE, /*lpSecurityAttributes=*/NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, /*hTemplateFile=*/NULL); if (conout_handle == INVALID_HANDLE_VALUE) /* There is no console associated with this process. Since the child is a console process, the OS would normally create a new console Window for the child. Since we'll be redirecting the child's standard streams, we do not need the console window. */ dwCreationFlags = CREATE_NO_WINDOW; else { /* There is a console associated with the process, so the OS will not create a new console. And, if we use CREATE_NO_WINDOW in this situation, the child will have no associated console. Therefore, if the child's standard streams are connected to the console, the output will be discarded. */ CloseHandle(conout_handle); dwCreationFlags = 0; } } /* Since the child will be a console process, it will, by default, connect standard input/output to its console. However, we want the child to use the handles specifically designated above. In addition, if there is no console (such as when we are running in a Cygwin X window), then we must redirect the child's input/output, as there is no console for the child to use. */ memset (&si, 0, sizeof (si)); si.cb = sizeof (si); si.dwFlags = STARTF_USESTDHANDLES; si.hStdInput = stdin_handle; si.hStdOutput = stdout_handle; si.hStdError = stderr_handle; fprintf (stderr, "About to invoke child.\n"); fflush (stderr); /* Start the child. */ CreateProcess ("child.exe", "child.exe", NULL, NULL, /*bInheritHandles=*/TRUE, dwCreationFlags, /*lpEnvironment=*/NULL, /*lpCurrentDirectory=*/NULL, &si, &pi); WaitForSingleObject (pi.hProcess, INFINITE); CloseHandle (pi.hProcess); CloseHandle (pi.hThread); fprintf (stderr, "Child done.\n"); fflush (stderr); }
Re: GCC Port (gcc backend) for Microchip PICMicro microcontroller
On Mar 13, 2006, at 5:29 AM, Colm O' Flaherty wrote: I've been thinking a bit more about this (no code yet: I was busy trying to find and fix a bug in gpsim), and I'm still not sure what the optimal development mode is.. by this, I mean.. "what should the proposed PIC port of GCC produce"? If 100% of the ports produce assemble files, then, you'll want to produce assembly files. 100% of the ports produce assembly. There are pros and cons to both approaches. Producing a hex file is (a lot?) more work, and would duplicate the work of gputils, but would leave gcc as a standalone tool, which I presume is desirable! Nope. The issue here is that that gcc would then become "bound" to gputils, Not a problem, though, we'd prefer that you did up a binutils port as well. The reason is that those utilities have a certain feature set that other tools don't have, and that feature set is used and it useful to the compiler and users. Also, it is possible to do up a port first to gputils and then later to enhance it to target binutils, while retaining the ability to still target gputils, if people find that interesting. The real issue here, for me, is the level of duplication / overlap with the SDCC project. Don't worry, they can come join us and stop duplicating our work after you get a port going.
Question about use of C++ copy constructor
When I compile the attached test case with mainline, I get this: foo.cc: In function ‘void foo(const B&)’: foo.cc:3: error: ‘B::B(const B&)’ is private foo.cc:13: error: within this context I don't understand why, as I don't see the copy constructor being used anywhere. It seems to me this code should create a temporary for the duration of the statement, and pass the temporary as a reference to b.fn. This code compiles with icc with no errors. What is wrong with this code? Thanks. Ian class B { private: B(const B&); void operator=(const B&); public: B(); void fn(const B &other) const; }; void foo (const B& b) { b.fn(B()); }
Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X
On Mar 13, 2006, at 12:16 AM, Paolo Bonzini wrote: PR/21195 is about inlining the SSE builtins. These are special because, for example, you probably would prefer GDB to not step into them, but just execute them. :-) We have an APPLE LOCAL patch to remove the debug information associated with them so that the debugger never steps `into' them. :- ( attr (__nodebug)
Re: Problem with pex-win32.c
Mark Mitchell wrote at http://gcc.gnu.org/ml/gcc/2006-03/msg00441.html > Here is a sample program which does the right thing (no spurious console > windows, all output visible) when run either from a console or from a > console-free environment, such as a Cygwin xterm. This is the code > we'll be working into libiberty -- unless someone has a better solution! In my experience, following test is not necessary. Win9x just ignores the CREATE_NO_WINDOWS flag so setting it is a harmless no-op on these platforms. > version_info.dwOSVersionInfoSize = sizeof (version_info); > GetVersionEx (&version_info); > if (version_info.dwPlatformId == VER_PLATFORM_WIN32_WINDOWS) > /* On Windows 95/98/ME the CREATE_NO_WINDOW flag is not >supported, so we cannot avoid creating a console window. */ > dwCreationFlags = 0; See also http://gcc.gnu.org/ml/java-patches/2003-q4/msg00260.html Danny
Re: Problem with pex-win32.c
Danny Smith wrote: > In my experience, following test is not necessary. Win9x just ignores > the CREATE_NO_WINDOWS flag so setting it is a harmless no-op on these > platforms. It's OK with me not to do it; I just didn't have those platforms to use for testing, and it seems more pedantically correct to check the version. But, I'm sure not going to argue for keeping that block of code in there if that stands in the way of making progress! > See also > > http://gcc.gnu.org/ml/java-patches/2003-q4/msg00260.html Lovely, we're all reinventing the same wheels. All the more reason to get this into libiberty... :-) Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Question about use of C++ copy constructor
Hi, Didn't see a reply yet, so I'll chime in. The relevant text appears in gcc-3.4's release notes: "When binding an rvalue of class type to a reference, the copy constructor of the class must be accessible." PR 12226 seems to be the mother bug related to this (many dupes). Fang > foo.cc: In function ??void foo(const B&)??: > foo.cc:3: error: ??B::B(const B&)?? is private > foo.cc:13: error: within this context > > I don't understand why, as I don't see the copy constructor being used > anywhere. It seems to me this code should create a temporary for the > duration of the statement, and pass the temporary as a reference to > b.fn. This code compiles with icc with no errors. > > What is wrong with this code? > > class B { > private: > B(const B&); > void operator=(const B&); > > public: > B(); > void fn(const B &other) const; > }; > > void foo (const B& b) > { > b.fn(B()); > } >
Re: -fmudflap and -fmudflapth
"Rafael Espíndola" <[EMAIL PROTECTED]> writes: > Use `-fmudflapth' instead of `-fmudflap' to compile and to link if > your program is multi-threaded. [...but...] > gate_mudflap (void) { return flag_mudflap != 0 } Maybe something broke this, but -fmudflapth used to imply setting both flag_mudflap and flag_mudflap_threads. - FChE
Re: Question about use of C++ copy constructor
Also see CWG issue 391: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#391 which will make our behavior non-conforming in C++0X. -Howard On Mar 13, 2006, at 4:02 PM, David Fang wrote: Hi, Didn't see a reply yet, so I'll chime in. The relevant text appears in gcc-3.4's release notes: "When binding an rvalue of class type to a reference, the copy constructor of the class must be accessible." PR 12226 seems to be the mother bug related to this (many dupes). Fang foo.cc: In function ¡Ævoid foo(const B&)¡Ç: foo.cc:3: error: ¡ÆB::B(const B&)¡Ç is private foo.cc:13: error: within this context I don't understand why, as I don't see the copy constructor being used anywhere. It seems to me this code should create a temporary for the duration of the statement, and pass the temporary as a reference to b.fn. This code compiles with icc with no errors. What is wrong with this code? class B { private: B(const B&); void operator=(const B&); public: B(); void fn(const B &other) const; }; void foo (const B& b) { b.fn(B()); }
Re: gcc 4.1
The appropriate place for such stuff is gcc@gcc.gnu.org Am Montag, 13.03.06 um 17:19 Uhr schrieb Helge Hess: Hi, new gcc release, new warnings ;-) Am I the only one who gets those: DOMElement.m:283: warning: pointer type mismatch in conditional expression For stuff like: objs[1] = _ns ? _ns : (id)null; or return [pathes isNotNull] ? pathes : nil; Bug to be reported or a feature? With libFoundation I get similiar things for constant NSString's, like: myString ? myString : @"" (apparently not so with gstep-base) Greets, Helge -- http://docs.opengroupware.org/Members/helge/ OpenGroupware.org ___ Discuss-gnustep mailing list Discuss-gnustep@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnustep
Re: Ada subtypes and base types
On Mon, 2006-02-27 at 20:08 +0100, Waldek Hebisch wrote: > What do you mean by "abuse"? TYPE_MAX_VALUE means maximal value > allowed by given type. As long as you're *absolutely* clear that a variable with a restricted range can never hold a value outside that the restricted range in a conforming program, then I'll back off the "abuse" label and merely call it pointless :-) The scheme you're using "promoting" to a base type before all arithmetic creates lots of useless type conversions and means that the optimizers can't utilize TYPE_MIN_VALUE/TYPE_MAX_VALUE anyway. ie, you don't gain anything over keeping that knowledge in the front-end. jeff
Re: scripting interface to GCC ?
> "Mike" == Mike Mattie <[EMAIL PROTECTED]> writes: Mike> Has anyone ever tried to build a scripting interface into the guts of Mike> GCC with something like SWIG ? I've heard of a couple efforts along these lines -- once with Scheme and once with Python. I don't know if either used SWIG. Neither one was submitted to GCC. Both were obscure enough that, even now, I can't really be sure they ever existed :-) Tom
Line insn notes in modulo-sched
Hi Ayal, The SMS implementation in GCC, in modulo-sched.c, uses line notes to find insn locations, see find_line_note. Why are you using line notes instead of insn locators? Line notes are on the list of Things That Should Not Be, and insn locators replace them. Is there a reason for modulo-sched to rely on loop notes, or is this just an oversight? Gr. Steven
Re: Question about use of C++ copy constructor
David Fang <[EMAIL PROTECTED]> writes: > The relevant text appears in gcc-3.4's release notes: > "When binding an rvalue of class type to a reference, the copy constructor > of the class must be accessible." Thanks. I see that I have managed to ask about a "frequently reported bug." Sorry about the noise. Ian
Re: scripting interface to GCC ?
Tom Tromey wrote: >> "Mike" == Mike Mattie <[EMAIL PROTECTED]> writes: > > Mike> Has anyone ever tried to build a scripting interface into the guts of > Mike> GCC with something like SWIG ? > > I've heard of a couple efforts along these lines -- once with Scheme > and once with Python. I don't know if either used SWIG. Neither one > was submitted to GCC. Both were obscure enough that, even now, I > can't really be sure they ever existed :-) That info will definitely help refine my searches. I am experimenting with SWIG for other purposes so something like this would make a nice personal experiment to see how well SWIG scales in terms of interface complexity. > Tom > Mike Mattie
Re: Line insn notes in modulo-sched
> Hi Ayal, > > The SMS implementation in GCC, in modulo-sched.c, uses line notes > to find insn locations, see find_line_note. Why are you using > line notes instead of insn locators? Line notes are on the list > of Things That Should Not Be, and insn locators replace them. Is > there a reason for modulo-sched to rely on loop notes, or is this > just an oversight? And in addition the line notes should not exist anymore at modulo-sched time Honza > > Gr. > Steven
Re: gcc 4.1
On Mar 13, 2006, at 2:05 PM, Lars Sonchocky-Helldorf wrote: The appropriate place for such stuff is gcc@gcc.gnu.org No, not really. gcc-help is more appropriate. Am I the only one who gets those: DOMElement.m:283: warning: pointer type mismatch in conditional expression I doubt it. For stuff like: objs[1] = _ns ? _ns : (id)null; or return [pathes isNotNull] ? pathes : nil; And here all information that I can use to answer the question has been stripped.
Re: Ada subtypes and base types
Jeffrey A Law wrote: > On Mon, 2006-02-27 at 20:08 +0100, Waldek Hebisch wrote: > > > What do you mean by "abuse"? TYPE_MAX_VALUE means maximal value > > allowed by given type. > As long as you're *absolutely* clear that a variable with a > restricted range can never hold a value outside that the > restricted range in a conforming program, then I'll back off > the "abuse" label and merely call it pointless :-) > > The scheme you're using "promoting" to a base type before all > arithmetic creates lots of useless type conversions and means > that the optimizers can't utilize TYPE_MIN_VALUE/TYPE_MAX_VALUE > anyway. ie, you don't gain anything over keeping that knowledge > in the front-end. > Pascal arithmetic essentially is untyped: operators take integer arguments and are supposed to give mathematically correct result (provided all intermediate results are representable in machine arithmetic, overflow is treated as user error). OTOH for proper type checking front end have to track ranges associated to variables. So "useless" type conversions are needed due to Pascal standard and backend constraints. I think that it is easy for back end to make good use of TYPE_MIN_VALUE/TYPE_MAX_VALUE. Namely, consider the assignment x := y + z * w; where variables y, z and w have values in the interval [0,7] and x have values in [0,1000]. Pascal converts the above to the following C like code: int tmp = (int) y + (int) z * (int) w; x = (tmp < 0 || tmp > 1000)? (Range_Check_Error (), 0) : tmp; I expect VRP to deduce that tmp will have values in [0..56] and eliminate range check. Also, it should be clear that in the assigment above artithmetic can be done using any convenient precision. In principle Pascal front end could deduce more precise types (ranges), but that would create some extra type conversions and a lot of extra types. Moreover, I assume that VRP can do better job at tracking ranges then Pascal front end. -- Waldek Hebisch [EMAIL PROTECTED]
gcc autovectorization question
Hi All, I am trying to use the latest autovectorization gcc code to generate functionally correct SSE instructions, and I have the following questions: Where is the latest stable gcc version with autovector? (is this 4.1.0?) and where is the latest development code for this? (off of the SVN?) Initially, I want to use code that will generate functional code for my application. After I familiarize myself with the existing code, I plan to contribute to this work. thx Tom Yeh
Re: scripting interface to GCC ?
On Mon, 2006-03-13 at 16:25 -0700, Tom Tromey wrote: > > "Mike" == Mike Mattie <[EMAIL PROTECTED]> writes: > > Mike> Has anyone ever tried to build a scripting interface into the guts of > Mike> GCC with something like SWIG ? > > I've heard of a couple efforts along these lines -- once with Scheme > and once with Python. I don't know if either used SWIG. Neither one > was submitted to GCC. Both were obscure enough that, even now, I > can't really be sure they ever existed :-) I will admit i SWIG'd gcc, once, and generated python wrappers, but this was before tree-ssa, so it wasn't all that useful. There is some appeal to being able to prototype passes, etc, in python. It's a ton of work to get the bindings going though. Maybe SWIG is much better than it was, in which case, cool! > > Tom >