How to add a NOP instruction in each basic block for obj code that gcc generates
Hi all, I'm doing an experiment with gcc. I need to modify gcc so that a NOP instruction will be inserted into each basic block in binary code that gcc generates. I know this sounds weird but it is part of my experiment. As I'm unfamiliar with gcc, is there someone willing to help me out and do a quick hack for me? Thanks in advance, Jeff
Re: How to add a NOP instruction in each basic block for obj code that gcc generates
2006/8/10, Paolo Bonzini <[EMAIL PROTECTED]>: jeff jeff wrote: > Hi all, > > I'm doing an experiment with gcc. I need to modify gcc so that a NOP > instruction will be inserted into each basic block in binary code that > gcc generates. I know this sounds weird but it is part of my > experiment. As I'm unfamiliar with gcc, is there someone willing to > help me out and do a quick hack for me? You can probably do so much more easily by modifying the assembly language. That is, instead of letting the compiler produce a .o, you produce a .s file (using the -S option), run some perl or awk script on the assembly, and compile it again (gcc accepts .s as well as .c). If you care, you can write a shell script that does these three steps automatically, and receives the same command line as gcc (or a suitable subset). Otherwise, you can use the "-B" option to replace "as" with your own executable or shell script. This shell script would run the perl or awk script on the input, and call the system "as" with the output. To understand what's going on (i.e. debugging), in turn, the "-###" option shows you which commands the gcc driver is executing ("cc1" is the compiler proper, "as" is the assembler", etc.). What's in the perl or awk script? To find basic block boundaries, search for things like "L123". If you need the nop at the beginning, you need to look for jump tables and not insert the nop there. If you need the nop at the end, you can blindly insert one at the end of a jump table, but on the other hand you will have to insert it before jump instructions. If you need more information, please ask on a Perl/awk/shell-scripting newsgroups or mailing lists. Paolo Paolo, Thanks a lot for your reply. You provided a good alternative. I need to use gcc to compile a big project in my experiment. In the makefile, some source code are compiled to .o files and others are compiled to .S file. I'm wondering how much effort I need, provided I have the script. I'll look into further. It would be cleaner if I know how to modify the gcc source code and let it insert a nop to each basic block. This shouldn't be a hard job for an experienced gcc developer, should this? Jeff
Re: How to add a NOP instruction in each basic block for obj code that gcc generates
If you don't want to change the generated code other than inserting the nops, and you can restrict yourself to a processor which does not need to track addresses to avoid out-of-range branches, then you could approximate what you want by emitting a nop in final_scan_insn when you see a CODE_LABEL, after the label. You'll need to also emit a nop in the first basic block in the function. That probably won't be the precise set of basic blocks as the compiler sees them, but basic block is a somewhat nebulous concept and that may be good enough for your purposes, whatever they are. Thanks, Ian. I don't want the NOPs to affect gcc's optimization. I've found the function final_scan_insn, which is in ./gcc/final.c. Here is the code snippet related to CODE_LABEL: case CODE_LABEL: /* The target port might emit labels in the output function for some insn, e.g. sh.c output_branchy_insn. */ if (CODE_LABEL_NUMBER (insn) <= max_labelno) { int align = LABEL_TO_ALIGNMENT (insn); #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN int max_skip = LABEL_TO_MAX_SKIP (insn); #endif if (align && NEXT_INSN (insn)) { #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip); #else #ifdef ASM_OUTPUT_ALIGN_WITH_NOP ASM_OUTPUT_ALIGN_WITH_NOP (file, align); #else ASM_OUTPUT_ALIGN (file, align); #endif #endif } } Which function should I use in order to emit a nop? Thanks, Jeff
Re: How to add a NOP instruction in each basic block for obj code that gcc generates
The simplest way is going to be something like fprintf (asm_out_file, "\tnop\n"); I added fprintf (asm_out_file, "\tnop\n"); to the end of case CODE_LABEL. Then I recompile the gcc. Unfortunately, it doesn't seem that a NOP was inserted. Any ideaes? case CODE_LABEL: /* The target port might emit labels in the output function for some insn, e.g. sh.c output_branchy_insn. */ if (CODE_LABEL_NUMBER (insn) <= max_labelno) { int align = LABEL_TO_ALIGNMENT (insn); #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN int max_skip = LABEL_TO_MAX_SKIP (insn); #endif if (align && NEXT_INSN (insn)) { #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN ASM_OUTPUT_MAX_SKIP_ALIGN (file, align, max_skip); #else #ifdef ASM_OUTPUT_ALIGN_WITH_NOP ASM_OUTPUT_ALIGN_WITH_NOP (file, align); #else ASM_OUTPUT_ALIGN (file, align); #endif #endif } } fprintf (asm_out_file, "\tnop\n");
Solaris 9, GCC 4.1.1 and GCC 3.3.5 ... --disable-shared at build time?
I'm backed into a corner here and really not sure what the proper path out is. -- Our production GCC is 3.3.5. It was built with default args. Previously we ran 2.95.3. You can perhaps realize my surprise when I found that a lot of apps we had built with this GCC 3.3.5 had libgcc_s.so linked dynamically to them. You can perhaps also realize my surprise when I came to this conclusion after a lot of stuff broke when I expired our GCC 3.3.5 install in favor of 4.1.1. -- But okay. This process at least made one thing clear. We need to offer our users multiple GCC versions. Some want 3.3.x, some want to test 4.1.1's pedantic nature, etc. -- So I says to my self, "Self, when you go to build the new multiple GCCs in the new production language area, build them with --disable-shared so N tens of apps are not depending on your GCC staying put in order for them to function (!??). I build GCC 3.3.5 and 4.1.1, both with --disable-shared. I do this for Solaris 9, 10, Linux 2.4 for i686, and Linux 2.4 for AMD64. Yes, a hoot. Weeks pass during this time and the leaves begin to fall. Oh, and the Solaris ones were built to reference the Sun 'as' and 'ld' (/usr/ccs/bin). -- In order to redo all of the "broken because they're linked to libgcc_s.so" apps, I set my PATH to use the new compilers (the ones that were built with --disable-shared). I find that my life is hell, as just about half of every- thing I try to build under Solaris 9 does not build. I get text relocation errors from our built libz.a, I fail to build subversion for mysterious reasons/errors, I get Python 2.4.x to build fine without libgcc_s.so linked to it, then I drop a Modules/Setup.local in place, make again to build the modules, and everything goes to hell with a new ./python that is now magically linked to libgcc_s.so (the old one we have to keep around until our apps are rebuilt). It would seem that GCC 3.3.5 + Sun as + Sun ld do not play nice at all with libraries previously created with GNU binutils. So... could someone elaborate on what it is I am doing that is so wrong? What is the successful recipe for using GCC 3.3.5 + 4.1.1 and/or binutils under Solaris?
Re: Newlib _ctype_ alias kludge now invalid due to PR middle-end/15700 fix.
Giovanni Bajo wrote: Hans-Peter Nilsson <[EMAIL PROTECTED]> wrote: So, the previously-questionable newlib alias-to-offset-in-table kludge is finally judged invalid. This is a heads-up for newlib users. IMHO it's not a GCC bug, though there's surely going to be some commotion. Maybe a NEWS item is called for, I dunno. It will be in NEWS, since RTH already updated http://gcc.gnu.org/gcc-4.0/changes.html. I hope newlib will be promptly fixed. Giovanni Bajo I have just checked in a patch to newlib that changes the ctype macros to use __ctype_ptr instead of _ctype_. In addition, a configuration check is made to see whether the array aliasing trick can be used or not. The code allows for backward compatibility except in the case where the old code is using negative offsets and the current version of newlib is built with a compiler that does not support the array aliasing trick. Corinna, if this causes any Cygwin issues, please let me know. -- Jeff J.
Successful gcc4.0.0 build (Redhat 9. Kernel 2.4.25)
make bootstrap successful build info: config.guess states: i686-pc-linux-gnu gcc -v states: Using built-in specs. Target: i686-pc-linux-gnu Configured with: /tmp/downloads/gcc-4.0.0/configure --prefix=/apps/Linux/gcc400 --program-suffix=400 --with-local-prefix=/usr/include --enable-threads Thread model: posix gcc version 4.0.0 /etc/issue states: Red Hat Linux release 9 (Shrike) Kernel \r on an \m uname -a states: Linux parsley 2.4.25-3dneg #1 SMP Tue Sep 7 11:16:44 BST 2004 i686 i686 i386 GNU/Linux -- (This is essentially the 2.4.25 kernel with some nfs patches) rpm -q glibc states: glibc-2.3.2-11.9 Notes: This build worked for installing sitewide over nfs fine. --prefix and --program-suffix worked fine. --with-local-prefix=/usr/include had to be specified as the default of /usr/local/include is not right for certain Linux distributions (like Redhat). Jeff Clifford
Successful gcc4.0.1 build (Redhat 9. Kernel 2.4.25)
make bootstrap successful build info: config.guess states: i686-pc-linux-gnu gcc -v states: Using built-in specs. Target: i686-pc-linux-gnu Configured with: /tmp/downloads/gcc-4.0.1/configure -prefix=/apps/Linux/gcc401 --program-suffix=401 Thread model: posix gcc version 4.0.1 /etc/issue states: Red Hat Linux release 9 (Shrike) Kernel \r on an \m uname -a states: Linux parsley 2.4.25-3dneg #1 SMP Tue Sep 7 11:16:44 BST 2004 i686 i686 i386 GNU/Linux -- (This is essentially the 2.4.25 kernel with some nfs patches) rpm -q glibc states: glibc-2.3.2-11.9 Notes: This build worked for installing sitewide over nfs fine. --prefix and --program-suffix worked fine. Jeff Clifford
Successful gcc4.0.2 build (Redhat 9, Kernel 2.4.25)
make bootstrap successful build info: config.guess states: i686-pc-linux-gnu gcc -v states: Using built-in specs. Target: i686-pc-linux-gnu Configured with: /tmp/downloads/gcc-4.0.2/configure -prefix=/apps/Linux/gcc402 --program-suffix=402 Thread model: posix gcc version 4.0.2 /etc/issue states: Red Hat Linux release 9 (Shrike) Kernel \r on an \m uname -a states: Linux parsley 2.4.25-3dneg #1 SMP Tue Sep 7 11:16:44 BST 2004 i686 i686 i386 GNU/Linux -- (This is essentially the 2.4.25 kernel with some nfs patches) rpm -q glibc states: glibc-2.3.2-11.9 Notes: This build worked for installing sitewide over nfs fine. --prefix and --program-suffix worked fine. Jeff Clifford
glibc compilation error
I am trying to cross compile GCC for an AMCC 440SP platform (powerpc-linux). Binutils and bootstrap GCC compile fine, but when I make glibc it errors out with the following: snippet if test -r /opt/luan2/toolchain/build/glibc/csu/abi-tag.h.new; then mv -f /opt/luan2/toolchain/build/glibc/csu/abi-tag.h.new /opt/luan2/toolchain/build/glibc/csu/abi-tag.h; \ else echo >&2 'This configuration not matched in ../abi-tags'; exit 1; fi gawk -f ../scripts/gen-as-const.awk ../linuxthreads/sysdeps/powerpc/tcb-offsets.sym \ | powerpc-linux-gcc -S -o /opt/luan2/toolchain/build/glibc/tcb-offsets.hT3 -std=gnu99 -O2 -Wall -Winline -Wstrict-prototypes -Wwrite-strings -g -mnew-mnemonics -I../include -I. -I/opt/luan2/toolchain/build/glibc/csu -I.. -I../libio -I/opt/luan2/toolchain/build/glibc -I../sysdeps/powerpc/powerpc32/elf -I../sysdeps/powerpc/elf -I../linuxthreads/sysdeps/unix/sysv/linux/powerpc/powerpc32 -I../linuxthreads/sysdeps/unix/sysv/linux/powerpc -I../linuxthreads/sysdeps/unix/sysv/linux -I../linuxthreads/sysdeps/pthread -I../sysdeps/pthread -I../linuxthreads/sysdeps/unix/sysv -I../linuxthreads/sysdeps/unix -I../linuxthreads/sysdeps/powerpc/powerpc32 -I../linuxthreads/sysdeps/powerpc -I../sysdeps/unix/sysv/linux/powerpc/powerpc32 -I../sysdeps/unix/sysv/linux/powerpc -I../sysdeps/unix/sysv/linux -I../sysdeps/gnu -I../sysdeps/unix/common -I../sysdeps/unix/mman -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/powerpc -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/powerpc/powerpc32/fpu -I../sysdeps/powerpc/powerpc32 -I../sysdeps/wordsize-32 -I../sysdeps/powerpc/soft-fp -I../sysdeps/powerpc/fpu -I../sysdeps/powerpc -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754 -I../sysdeps/generic/elf -I../sysdeps/generic -nostdinc -isystem /opt/luan2/toolchain/bin/lib/gcc/powerpc-linux/4.0.2/include -isystem /opt/luan2/toolchain/source/linux-2.6.13/include/ -D_LIBC_REENTRANT -include ../include/libc-symbols.h -DHAVE_INITFINI -x c - \ -MD -MP -MF /opt/luan2/toolchain/build/glibc/tcb-offsets.h.dT -MT '/opt/luan2/toolchain/build/glibc/tcb-offsets.h.d /opt/luan2/toolchain/build/glibc/tcb-offsets.h' : In function dummy: :11: warning: asm operand 0 probably doesnt match constraints :11: error: impossible constraint in asm make[2]: *** [/opt/luan2/toolchain/build/glibc/tcb-offsets.h] Error 1 make[2]: Leaving directory `/opt/luan2/toolchain/source/glibc-2.3.5/csu' make[1]: *** [csu/subdir_lib] Error 2 make[1]: Leaving directory `/opt/luan2/toolchain/source/glibc-2.3.5' make: *** [all] Error 2 snippet Here is the configuration I run from a separate build/glibc directory: ../../source/glibc-2.3.5/configure --prefix=/opt/luan2/toolchain/bin --target=powerpc-linux --host=powerpc-linux --enable-add-ons=linuxthreads --with-headers=/opt/luan2/toolchain/source/linux-2.6.13/include/ --with-binutils=/opt/luan2/toolchain/bin/powerpc-linux/bin This seems to complete without any issues. It seems that gcc is having issues with the following line in gen-as-const.awk: printf "asm (\"@@@name@@@%s@@@value@@@%%0@@@end@@@\" : : \"i\" (%s));\n", name, $0; Is my configure line incorrect, or have I maybe incorrectly configured bootstrap gcc prior to building glibc? Thanks, Jeff Stevens __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
HowTo Cross Compile GCC on x86 Host for PowerPC Target
Is there a HowTo out there on how to cross compile GCC to run on another platform? I have an x86 host running linux, and an embedded PowerPC 440SP target running linux. I would like to compile GCC to run on the target but am having some difficulties. I have compiled the cross compiler fine, but when I try to compile a native compiler, it acts just like the cross compiler (runs on the host and not the target). All I did was re-run gcc configure and "make all install". Here is the configuration I ran: ../../source/gcc-3.4.4/configure --target=powerpc-linux --host=powerpc-linux --prefix=/opt/luan2/toolchain/bin --enable-shared --enable-threads --enable-languages=c I'm obviously missing something, but can't seem to find anything on the internet that explains cross-compiling gcc for another target. Thanks, Jeff Stevens __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
RE: HowTo Cross Compile GCC on x86 Host for PowerPC Target
Yes I added the cross-compiler to the path and created a separate build directory (ppc_gcc). Thanks, Jeff Stevens --- Dave Korn <[EMAIL PROTECTED]> wrote: > Dave Korn wrote: > > Jeff Stevens wrote: > >> Is there a HowTo out there on how to cross > compile GCC > >> to run on another platform? I have an x86 host > >> running linux, and an embedded PowerPC 440SP > target > >> running linux. I would like to compile GCC to > run on > >> the target but am having some difficulties. I > have > >> compiled the cross compiler fine, but when I try > to > >> compile a native compiler, it acts just like the > cross > >> compiler (runs on the host and not the target). > All I > > > > *All* compilers "run on the host"; the term > "host" is defined as "the > > machine on which the compiler runs". The target > is the machine on which > > the _generated_ code runs. So for a native > compiler, host==target, and > > for a cross-compiler, host!=target. > > Doh. I misread this; I see now that what you mean > is you wanted a native > compiler on the target. > > >> did was re-run gcc configure and "make all > install". > >> Here is the configuration I ran: > >> > >> ../../source/gcc-3.4.4/configure > >> --target=powerpc-linux --host=powerpc-linux > >> --prefix=/opt/luan2/toolchain/bin --enable-shared > >> --enable-threads --enable-languages=c > > So, this should have worked. Did you perhaps > re-build in the same directory > that you had already configured the cross-compiler > in without first running > "make clean" perhaps? Was the powerpc-linux cross > compiler placed in your > $PATH setting, so that configure could find the > powerpc-linux-gcc executable? > > [ This is OT for this list really; we really > should take it to crossgcc ] > > > cheers, > DaveK > -- > Can't think of a witty .sigline today > > __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
Re: Howto Cross Compile GCC to run on PPC Platform
I am using the AMCC 440SP processor. I went and bought "Building Embedded Linux Systems" by Karim Yaghmour. It seems to be a pretty complete book, and I have gotten the cross-compiler completely installed, but it doesn't get into installing a native compiler. However, I tried cross compiling gcc by first running this configure line: ../gcc-3.4.4/configure --build=`../gcc-3.4.4/config.guess` --target=powerpc-linux --host=powerpc-linux --prefix=${PREFIX} --enable-languages=c and then a make all. The make went fine, and completed without any errors. However, when I ran 'make install' I got the following error: powerpc-linux-gcc: installation problem, cannot exec `/opt/recorder/tools/libexec/gcc/powerpc-linux/3.4.4/collect2': Exec format error make[2]: *** [nof/libgcc_s_nof.so] Error 1 make[2]: Leaving directory `/opt/recorder/build-tools/build-native-gcc/gcc' make[1]: *** [stmp-multilib] Error 2 make[1]: Leaving directory `/opt/recorder/build-tools/build-native-gcc/gcc' make: *** [install-gcc] Error 2 How do I install the native compiler? Thanks, Jeff Stevens --- Clemens Koller <[EMAIL PROTECTED]> wrote: > Hello, Jeff! > > >I am trying to compile GCC on an x86 platform > to > > run natively on an embedded PPC platform. > > What CPU do you use? > I am currently working on an mpc8540 natively (from > harddisk). > And have a current toolchain up and running. > I can recommend the latest Linux-From-Scratch > documentation > to get an idea of what to do. > > > I am able > > to compile gcc as a cross compiler (to run on > x86), > > but can't seem to get it to cross compile gcc (to > run > > on ppc). Does anyone know of a good HowTo to do > this? > > I'm currently downloading the source distro of > ELDK, > > so if it's already in there I'll find it, but if > there > > is one elsewhere online please let me know. > > I've started with the ELDK 3.1 too. And updated it > step by step to the latest versions. > > Greets, > > Clemens Koller > ___ > R&D Imaging Devices > Anagramm GmbH > Rupert-Mayer-Str. 45/1 > 81379 Muenchen > Germany > > http://www.anagramm.de > Phone: +49-89-741518-50 > Fax: +49-89-741518-19 > __ Start your day with Yahoo! - Make it your home page! http://www.yahoo.com/r/hs
Re: Howto Cross Compile GCC to run on PPC Platform
I am creating the target tree on my host, so that I can later transfer it to a USB storage device. I was going to manually move everything, but only saw one binary, xgcc. Is that all, or aren't there some other utilities that go along with it? I just didn't know exactly what to copy and where to copy it to. When I built glibc, those were built for the target system, but installed to the target directory structure that I am creating. The 'make install' command that I ran for glibc was: make install_root=${TARGET_PREFIX} prefix="" install where TARGET_PREFIX is the target filesystem tree. I used the same make install command for the native gcc that I compiled. Thanks, Jeff Stevens --- Kai Ruottu <[EMAIL PROTECTED]> wrote: > Jeff Stevens wrote: > > > .../gcc-3.4.4/configure > > --build=`../gcc-3.4.4/config.guess` > > --target=powerpc-linux --host=powerpc-linux > > --prefix=${PREFIX} --enable-languages=c > > > > and then a make all. The make went fine, and > > completed without any errors. However, when I ran > > 'make install' I got the following error: > > > > powerpc-linux-gcc: installation problem, cannot > exec > > > `/opt/recorder/tools/libexec/gcc/powerpc-linux/3.4.4/collect2': > > Exec format error > > > > How do I install the native compiler? > > You shouldn't ask how but where ! > > You cannot install alien binaries into > the native places on your host ! This > is not sane at all... > > Ok, one solution is to collect the components > from the produced stuff, pack them into a > '.tar.gz' or something and then ftp or something > the stuff into the native system. > > If you really want to install the stuff into your > host, you should know the answer to the "where" > first, and should read from the "GCC Install" > manual the chapter 7, "Final installation" and > see what option to 'make' you should use in order > to get the stuff into your chosen "where"... > __ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com
Re: Need sanity check on DSE vs expander issue
On Fri, 2019-12-20 at 12:08 +0100, Richard Biener wrote: > On December 20, 2019 8:25:18 AM GMT+01:00, Jeff Law wrote: > > On Fri, 2019-12-20 at 08:09 +0100, Richard Biener wrote: > > > On December 20, 2019 3:20:40 AM GMT+01:00, Jeff Law > > wrote: > > > > I need a sanity check here. > > > > > > > > Given this code: > > > > > > > > > typedef union { long double value; unsigned int word[4]; } > > > > memory_long_double; > > > > > static unsigned int ored_words[4]; > > > > > static void add_to_ored_words (long double x) > > > > > { > > > > > memory_long_double m; > > > > > size_t i; > > > > > memset (&m, 0, sizeof (m)); > > > > > m.value = x; > > > > > for (i = 0; i < 4; i++) > > > > > { > > > > > ored_words[i] |= m.word[i]; > > > > > } > > > > > } > > > > > > > > > > > > > DSE is removing the memset as it thinks the assignment to m.value > > is > > > > going to set the entire union. > > > > > > > > But when we translate that into RTL we use XFmode: > > > > > > > > > ;; m.value ={v} x_6(D); > > > > > > > > > > (insn 7 6 0 (set (mem/v/j/c:XF (plus:DI (reg/f:DI 77 > > > > virtual-stack-vars) > > > > > (const_int -16 [0xfff0])) [2 > > m.value+0 > > > > S16 A128]) > > > > > (reg/v:XF 86 [ x ])) "j.c":13:11 -1 > > > > > (nil)) > > > > > > > > > > > > > That (of course) only writes 80 bits of data because of XFmode, > > leaving > > > > 48 bits uninitialized. We then read those bits, or-ing the > > > > uninitialized data into ored_words and all hell breaks loose later. > > > > > > > > Am I losing my mind? ISTM that dse and the expander have to agree > > on > > > > how much data is written by the store to m.value. > > > > > > It looks like MEM_SIZE is wrong here, so you need to figure how we > > arrive at this (I guess TYPE_SIZE vs. MODE_SIZE mismatch is biting us > > here?) > > > That is, either the MEM should have BLKmode or the mode size should > > match > > > MEM_SIZE. Maybe DSE can avoid looking at MEM_SIZE for non-BLKmode > > MEMs? > > It's gimple DSE that removes the memset, so it shouldn't be mucking > > around with modes at all. stmt_kills_ref_p seems to think the > > assignment to m.value sets all of m. > > > > The ao_ref for memset looks reasonable: > > > > > (gdb) p *ref > > > $14 = {ref = 0x0, base = 0x77ffbea0, offset = { > long>> = {coeffs = {0}}, }, > > > size = {> = {coeffs = {128}}, > fields>}, max_size = {> = { > > > coeffs = {128}}, }, ref_alias_set = 0, > > base_alias_set = 0, volatile_p = false} > > 128 bits with a base of VAR_DECL m. > > > > We looking to see if this statement will kill the ref: > > > > > (gdb) p debug_gimple_stmt (stmt) > > > # .MEM_8 = VDEF <.MEM_6> > > > m.value ={v} x_7(D); > > > $21 = void > > > (gdb) p debug_tree (lhs) > > > > > type > volatile XF > > > size > > > unit-size > > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7fffea988690 precision:80> > > > side-effects volatile > > > arg:0 > > type > sizes-gimplified volatile type_0 BLK size > 128> unit-size > > > align:128 warn_if_not_align:0 symtab:0 alias-set -1 > > canonical-type 0x7fffea988348 fields > > context > > > pointer_to_this > > > > side-effects addressable volatile used read BLK j.c:10:31 > > size unit-size > 0x7fffea7f3d38 16> > > > align:128 warn_if_not_align:0 context > 0x7fffea97bd00 add_to_ored_words> > > > chain > 0x7fffea9430a8 size_t> > > > used unsigned read DI j.c:11:10 > > > size > > > unit-size > > > align:64 warn_if_not_align:0 context > 0x7fffea97bd00 add_to_ored_words>>> > > > arg:1 > > type > XF size unit-size > 0x7fffea7f3d38 16> > > > align:128 warn_if_not_align:0 symtab:0 alias-set -
Re: Git ChangeLog policy for GCC Testsuite inquiry
On Fri, 2020-01-24 at 13:49 -0500, David Edelsohn wrote: > > > > On 1/24/20 8:45 AM, David Edelsohn wrote: > > > > > There is no ChangeLog entry for the testsuite changes. > > > > > > > > I don't believe in ChangeLog entries for testcases, but I'll add one for > > > > the target-supports.exp change, thanks. > > > > > > Is this a general policy change that we want to make? Current we > > > still have gcc/testsuite/ChangeLog and developers are updating that > > > file. > > > > I would support formalizing that as policy; currently there is no policy. > > > > https://gcc.gnu.org/codingconventions.html#ChangeLogs > > > > "There is no established convention on when ChangeLog entries are to be > > made for testsuite changes." > > Do we want to continue with ChangeLog entries for testsuite changes or > only rely on Git log? I strongly prefer to move towards relying on the git log. jeff
Re: Git ChangeLog policy for GCC Testsuite inquiry
On Fri, 2020-01-24 at 20:32 +0100, Eric Botcazou wrote: > > I strongly prefer to move towards relying on the git log. > > In my experience the output of git log is a total mess so cannot replace > ChangeLogs. But we can well decide to drop ChangeLog for the testsuite. Well, glibc has moved to extracting them from git, building policies and scripts around that. I'm pretty sure other significant projecs are also extracting their ChangeLogs from git. We could do the same, selecting some magic date as the cutover point after which future ChangeLogs are extracted from GIT. In fact, that's precisely what I'd like to see us do. jeff
Re: Git ChangeLog policy for GCC Testsuite inquiry
On Sat, 2020-01-25 at 10:50 -0500, Nathan Sidwell wrote: > On 1/24/20 4:36 PM, Jeff Law wrote: > > On Fri, 2020-01-24 at 20:32 +0100, Eric Botcazou wrote: > > > > I strongly prefer to move towards relying on the git log. > > > > > > In my experience the output of git log is a total mess so cannot replace > > > ChangeLogs. But we can well decide to drop ChangeLog for the testsuite. > > Well, glibc has moved to extracting them from git, building policies > > and scripts around that. I'm pretty sure other significant projecs are > > also extracting their ChangeLogs from git. > > > > We could do the same, selecting some magic date as the cutover point > > after which future ChangeLogs are extracted from GIT. In fact, that's > > precisely what I'd like to see us do. > > The GCC10 release date would seem a good point to do this. That gives us > around > 3 months to figure the details (and get stakeholder buy-in) Yup. That would be what I'd recommend we shoot for. As you say it gives time to work on the details and for folks to start changing their habits. jeff
Re: Git push account
On Sat, 2020-01-25 at 12:39 +, Feng Xue OS wrote: > Which account should I use to push my local patch to git repo of gcc? > I have a sourceware account that works for svn, but now it doesn't for git. > Actually both below commands were tried, but failed. > git push ssh://f...@sourceware.org/gcc/gcc.git ..., > git push ssh://f...@gcc.gnu.org/gcc/gcc.git ... It shouldn't matter. Under the hood sourceware.org and gcc.gnu.org are the same machine. jeff
Re: SSA Question related to Dominator Trees
On Mon, 2020-01-27 at 10:18 -0500, Nicholas Krause wrote: > Greetings, > > Sorry if this question has been asked before but do we extend out the > core tree type for SSA or > is there a actual dominator tree type. It seems to be we just extend or > override the core tree > type parameters but was unable to verify it by looking in the manual. There is no type or class for the dominator tree. Having one would be useful. jeff
Re: Aliasing rules for unannotated SYMBOL_REFs
On Sat, 2020-01-25 at 09:31 +, Richard Sandiford wrote: > TL;DR: if we have two bare SYMBOL_REFs X and Y, neither of which have an > associated source-level decl and neither of which are in an anchor block: > > (Q1) can a valid byte access at X+C alias a valid byte access at Y+C? > > (Q2) can a valid byte access at X+C1 alias a valid byte access at Y+C2, > C1 != C2? > > Also: > > (Q3) If X has a source-level decl and Y doesn't, and neither of them are > in an anchor block, can valid accesses based on X alias valid accesses > based on Y? So what are the cases where Y won't have a source level decl but we have a decl in RTL? anchors, other cases? > > (well, OK, that wasn't too short either...) I would have thought the answer would be "no" across the board. But the code clearly indicates otherwise. Interposition clearly complicates things as do explicit aliases though. > > This part seems obvious enough. But then, apart from the special case of > forced address alignment, we use an offset-based check even for cmp==-1: > > /* Assume a potential overlap for symbolic addresses that went >through alignment adjustments (i.e., that have negative >sizes), because we can't know how far they are from each >other. */ > if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0)) > return -1; > /* If decls are different or we know by offsets that there is no > overlap, >we win. */ > if (!cmp || !offset_overlap_p (c, xsize, ysize)) > return 0; > > So we seem to be taking cmp==-1 to mean that although we don't know > the relationship between the symbols, it must be the case that either > (a) the symbols are equal (e.g. via aliasing) or (b) the accesses are > to non-overlapping objects. In other words, one of the situations > described by cmp==1 or cmp==0 must be true, but we don't know which > at compile time. Right. That was the conclusion I came to. If a SYMBOL_REF has an alias, the alias must have the same value as the SYMBOL_REF. So their either equal or there's no valid case for overlap. > > This means that in practice, the answer to (Q1) appears to be "yes" > but the answer to (Q2) appears to be "no". That would be my understanding once aliases/interpositioning come into play. > > This somewhat contradicts: > > /* In general we assume that memory locations pointed to by different labels > may overlap in undefined ways. */ > return -1; > > at the end of compare_base_symbol_refs, which seems to be saying > that the answer to (Q2) ought to be "yes" instead. Which is right? I'm not sure how we could get to yes in that case. A symbol alias or interposition ultimately still results in two symbols having the same final address. Thus for a byte access if C1 != C2, then we can't have an overlap. > > In PR92294 we have a symbol X at ANCHOR+OFFSET that's preemptible. > Under the (Q1)==yes/(Q2)==no assumption, cmp==-1 means that either > (a) X = ANCHOR+OFFSET or (b) X and ANCHOR reference non-overlapping > objects. So we should take the offset into account when doing: > > if (!cmp || !offset_overlap_p (c, xsize, ysize)) > return 0; > > Let's call this FIX1. So this is a really interesting wrinkle. Doesn't this change Q2 to a yes? In particular it changes the "invariant" that the symbols have the same address in the event of an symbol alias or interposition. Of course one could ask the question of whether or not we should handle cases with anchors specially. > > But that then brings us to: why does memrefs_conflict_p return -1 > when one symbol X has a decl and the other symbol Y doesn't, and neither > of them are block symbols? Is the answer to (Q3) that we allow equality > but not overlap here too? E.g. a linker script could define Y to X but > not to a region that contains X at a nonzero offset? Does digging into the history provide any insights here? I'm not sure given the issues you've introduced if I could actually fill out the matrix of answers without more underlying information. ie, when can we get symbols without source level decls, anchors+interposition issues, etc. Jeff >
Re: Git ChangeLog policy for GCC Testsuite inquiry
On Mon, 2020-02-03 at 18:55 +, Richard Sandiford wrote: > "H.J. Lu" writes: > > On Fri, Jan 24, 2020 at 2:39 PM Paul Smith wrote: > > > On Fri, 2020-01-24 at 22:45 +0100, Jakub Jelinek wrote: > > > > > > In my experience the output of git log is a total mess so cannot > > > > > > replace ChangeLogs. But we can well decide to drop ChangeLog for > > > > > > the testsuite. > > > > > > > > > > Well, glibc has moved to extracting them from git, building > > > > > policies and scripts around that. I'm pretty sure other > > > > > significant projecs are also extracting their ChangeLogs from git. > > > > > > > > > > We could do the same, selecting some magic date as the cutover > > > > > point after which future ChangeLogs are extracted from GIT. In > > > > > fact, that's precisely what I'd like to see us do. > > > > > > > > We don't have a tool that can do it, not even get the boilerplate > > > > right. Yes, mklog helps, but it very often gets stuff wrong. Not to > > > > mention that the text what actually changed can't be generated very > > > > easily. > > > > > > I don't know if it counts as a significant project, but GNU make has > > > been doing this for years. > > > > > > What I did was take the existing ChangeLogs and rename them to > > > ChangeLog.1 or whatever, then started with a new ChangeLog generated > > > from scratch from Git messages. > > > > > > I use the gnulib build-aux/gitlog-to-changelog script to do it. It > > > requires a little bit of discipline to get right; in particular you > > > have to remember that the Git commit message will be indented 8 spaces > > > in the ChangeLog, so you have to be careful that your commit messages > > > wrap at char 70 (or less) in your Git commit. > > > > > > If you have Git hooks you could enforce a bit of formatting; for > > > example any line not indented by space must be <=70 chars long; this > > > allows people to use long lines for formatted content if they indent it > > > with a space or two. > > > > > > Otherwise, it's the same as writing the ChangeLog and you only have to > > > do it once. > > > > > > Just to note, the above script simply transcribes the commit message > > > into ChangeLog format. It does NOT try to auto-generate ChangeLog- > > > style content (files that changed, functions, etc.) from the Git diff > > > or whatever. > > > > > > There are a few special tokens you can add to your Git commit message > > > that get reformated to special changelog tokens like "(tiny change)" > > > etc. > > > > > > As mentioned previously, it's very important that the commit message be > > > provided as part of the code review, and it is very much fair game for > > > review comments. This is common practice, and a good idea because bad > > > commit messages are always a bummer, ChangeLog or not. > > > > > > > Libgcrypt includes ChangeLog entries in git commit messages: > > > > http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git > > > > In each patch, commit log starts with ChangeLog entries without leading > > TABs followed by separator line with -- and then commit message. They > > have a script to extract ChangeLog for release. > > How many people would we be catering for by generating changelogs at > release time though? It seems too low-level to be useful to users, > and people wanting to track gcc development history at the source level > would surely be better off using git (which e.g. makes it much easier to > track changes to particular pieces of code). > > Perhaps there are practical or policy reasons for not requiring everyone > who wants to track gcc development history to build or install git. > But if so, why not just include the output of "git log", with whatever > options seem best? (Probably --stat at least, to show the affected files.) > > Like with the svn-to-git conversion, the less we change the way the > history is presented, the less chance there is of something going wrong. > And the idea is that git log should be informative enough for upstream > developers, so surely it should be enough for others too. I believe the ChangeLog is primarily a FSF requirement, hence generating it from the SCM at release time seems reasonable. ANd yes, even though I have been a regular ChangeLog user, I rely more and more on the git log these days. jeff
Fwd: February sourceware.org transition to new server!
This affects gcc.gnu.org as well...Expect weekend outages... --- Begin Message --- Community, The sourceware.org server will be transitioning to a new server over the next 2-4 weeks. The new server will be CentOS 8-based with more CPU and more RAM. Please keep this in mind when planning out your work. Starting in 2 weeks time we might see some weekend outages as Frank Eigler and the overseers team work out the bugs. Thanks to Frank and all of overseers for their tireless efforts! Cheers, Carlos. --- End Message ---
Re: Git ChangeLog policy for GCC Testsuite inquiry
On Wed, 2020-02-05 at 15:18 -0600, Segher Boessenkool wrote: > On Mon, Feb 03, 2020 at 01:24:04PM -0700, Jeff Law wrote: > > ANd yes, even though I have been a regular ChangeLog user, I rely more > > and more on the git log these days. > > As a reviewer, the changelog is priceless still. We shouldn't drop the > changelog before people write *good* commit messages (and we are still > quite far from that goal). I believe the current proposal is not to switch immediately, but to do so after gcc-10 is released. Feel free to suggest improvements to ChangeLogs or summaries in the mean time to get folks to start rethinking what they write in 'em. And FWIW, we're talking about the ChangeLog *file* here. If folks continued writing the same log messages and put them into git, I personally think that's sufficient to transition away from having a ChangeLog file in the source tree. I don't want to make perfect the enemy of the very very good here and moving away from a ChangeLog file in the source tree is, IMHO, very very good. jeff
Re: [EXTERNAL] Re: GCC selftest improvements
On Thu, 2020-02-13 at 22:18 +, Modi Mo wrote: > > On 2/12/20 8:53 PM, David Malcolm wrote: > > > Thanks for the patch. > > > > > > Some nitpicks: > > > > > > Timing-wise, the GCC developer community is focusing on gcc 10 > > > bugfixing right now (aka "stage 4" of the release cycle). So this > > > patch won't be suitable to commit to master until stage 1 of the > > > release cycle for gcc 11 (in April, hopefully). > > > > > Ah I should've looked a bit harder for timelines before asking > https://gcc.gnu.org/develop.html. Appreciate the response here! > > > > But yes, it's probably a good idea to get feedback on the patch given > > > the breadth of platforms we support. > > > > > > The patch will need an update to the docs; search for "Tools/packages > > > necessary for building GCC" in gcc/doc/install.texi, which currently > > > has some paragraphs labelled: > > >@item ISO C++98 compiler > > > that will need changing. > > > > > > I think Richi mentioned that the minimum gcc version should be 4.8.2 > > > as he recalled issues with .1, so maybe the error message and docs > > > should reflect that? > > > > > > https://gcc.gnu.org/ml/gcc/2019-10/msg00180.html > > > > > Segher here suggests 4.8.5 instead of 4.8.2: > https://gcc.gnu.org/ml/gcc/2019-11/msg00192.html > > Looking at release dates 4.8.5 was in June 2015 while 4.8.2 in October 2013 > which is a pretty big gap. I'd for moving the needle as far as we reasonably > can since this is a leap anyways. @Segher do you have a reason in mind for > the higher versioning? I doubt there's a lot of functional difference between 4.8.5 and 4.8.2. It really should just be bugfixes.While I'd prefer 4.8.5 over 4.8.2, I could live with either. Jeff
Re: Branch instructions that depend on target distance
On Mon, 2020-02-24 at 12:36 +0100, Petr Tesarik wrote: > On Mon, 24 Feb 2020 11:14:44 + > Jozef Lawrynowicz wrote: > > > On Mon, 24 Feb 2020 12:05:28 +0100 > > Petr Tesarik wrote: > > > > > Hi all, > > > > > > I'm looking into reviving the efforts to port gcc to VideoCore IV [1]. > > > One issue I've run into is the need to find out target branch distance > > > at compile time. I looked around, and it's not the first one > > > architecture with such requirement, but AFAICS it has never been solved > > > properly. > > > > > > For example, AVR tracks instruction length. Later, ret_cond_branch() > > > selects between a branch instruction and an inverted branch followed by > > > an unconditional jump based on these calculated lengths. > > > > > > This works great ... until there's some inline asm() statement, for > > > which gcc cannot keep track of the length attribute, so it is probably > > > taken as zero. Linker then fails with a cryptic message: > > > > > > > relocation truncated to fit: R_AVR_7_PCREL against `no symbol' > > > > The MSP430 backend just always generates maximum range branch instructions, > > except for some special cases. We then rely on the linker to relax branch > > instructions to shorter range "jump" instructions when the destination is > > within range. > > > > So the compiler output will always work, but not be the smallest possible > > code > > size. > > > > For that relocation truncated to fit error message you want to check that > > the > > linker has the ability to relax whatever branch instruction it is failing > > on to > > a longer range branch. > > But that would change the instruction length, so not really an option > AFAICS (unless I also switch to LTO). > > Anyway, the situation is much worse on the VideoCore IV. The > alternatives here are: > > 1. >addcmpbCC rx, 0, imm, target >; usually written as bCC rx, imm, target > > 2. > cmp rx, imm > bCC .+2 > j target Yea, this isn't that uncommon. You can describe both of these to the branch shortening pass. > > The tricky part is that the addcmpbCC instruction does NOT modify > condition codes, while the cmp instruction does. Nothing you could > solve in the linker... > > OK, it seems I'll have to go with the worst-case variant. You can support both. You output the short case when the target is close enough and the longer variant otherwise. Jeff
Re: Fwd: Legal Prerequisites contributions
On Sun, 2020-03-01 at 14:37 +0100, Michael de Lang wrote: > Dear Sir/Madam, > > I'm working on implementing pr0980r1 and people in the #gcc channel > told me to get the legal process started asap. I am willing to sign > copyright assignments as outlayed on > https://gcc.gnu.org/contribute.html Contact ass...@gnu.org to get your paperwork started. Thanks, Jeff >
Away on PTO, expect some delays in GCC patch reviews
I'm away on PTO for the next couple weeks, which likely means that patch review times will suffer. Normally I'd ask Richard Henderson to help cover, but he's going to be on PTO as well. It's probably safe to assume that when I return there will be a bit of a patch backlog and I'll have a higher than usual backlog of non-patch-review things to be doing for Red Hat. So, at least for the next few weeks, patch review will be slower than usual. Please be patient :-) Jeff
Re: ifcvt limitations?
On 06/10/2015 07:36 AM, Kyrill Tkachov wrote: Thanks, I've made some progress towards making it more aggressive. A question since I'm in the area... noce_try_cmove_arith that I've been messing around with has this code: /* A conditional move from two memory sources is equivalent to a conditional on their addresses followed by a load. Don't do this early because it'll screw alias analysis. Note that we've already checked for no side effects. */ /* ??? FIXME: Magic number 5. */ if (cse_not_expected && MEM_P (a) && MEM_P (b) && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b) && if_info->branch_cost >= 5) Any ideas on where the rationale for that 5 came from? I see it's been there since the very introduction of ifcvt.c I'd like to replace it with something more sane, maybe even remove it? Richard was working on Itanic at the time. So I can speculate that the transformation wasn't generally profitable on other targets, so he picked a value that was high enough for the code to only trigger on Itanic (and perhaps Alphas since he was still doing a lot of work on them and knew their properties quite well). Richard is currently on PTO, so I don't think you're likely to get a quick response from him with further details. jeff
Re: set_src_cost lying comment
On 06/21/2015 11:57 PM, Alan Modra wrote: set_src_cost says it is supposed to /* Return the cost of moving X into a register, relative to the cost of a register move. SPEED_P is true if optimizing for speed rather than size. */ Now, set_src_cost of a register move (set (reg1) (reg2)), is zero. Why? Well, set_src_cost is used just on the right hand side of a SET, so the cost is that of (reg2), which is zero according to rtlanal.c rtx_cost. targetm.rtx_costs doesn't get a chance to modify this. Now consider (set (reg1) (ior (reg2) (reg3))), for which set_src_cost on rs6000 currently returns COSTS_N_INSNS(1). It seems to me that this also ought to return zero, if the set_src_cost comment is to be believed. I'd claim the right hand side of this expression costs the same as a register move. A register move machine insn "mr reg1,reg2" is encoded as "or reg1,reg2,reg2" on rs6000! Certainly seems inconsistent -- all the costing stuff should be revisited. The basic design for costing dates back to the m68k/vax era. I certainly agree that the cost of a move, logicals and arithmetic is essentially the same at the chip level for many processors. But a copy has other properties that make it "cheaper" -- namely we can often propagate it away or arrange for the source & dest of the copy to have the same hard register which achieves the same effect. So one could argue that a copy should have cost 0 as it has a reasonable chance of just going away, while logicals, alu operations on the appropriate chips should have a cost of 1. jeff
Re: set_src_cost lying comment
On 06/24/2015 03:18 AM, Alan Modra wrote: On Tue, Jun 23, 2015 at 11:05:45PM -0600, Jeff Law wrote: I certainly agree that the cost of a move, logicals and arithmetic is essentially the same at the chip level for many processors. But a copy has other properties that make it "cheaper" -- namely we can often propagate it away or arrange for the source & dest of the copy to have the same hard register which achieves the same effect. So one could argue that a copy should have cost 0 as it has a reasonable chance of just going away, while logicals, alu operations on the appropriate chips should have a cost of 1. That's an interesting point, and perhaps true for rtl expansion. I'm not so sure it is correct for later rtl passes where you'd like to discourage register moves.. It was the best I could come up with :-) I certainly don't know the history behind the choices. Case in point: The rs6000 backend happens to use zero for the cost of setting registers to simple constants. That might be an accident, but when I fixed this by making (set (reg) (const_int)) cost one insn as it actually does for a range of constants, I found some call sequences regressesd. A call like foo(0,0) is better as (set (reg r3) (const_int 0)) li 3,0 (set (reg r4) (const_int 0)) li 4,0 (call ...) bl foo rather than (set (reg r3) (const_int 0)) li 3,0 (set (reg r4) (reg r3)) mr 4,3 (call ...) bl foo CSE will say the second sequence is cheaper if loading a constant is more expensive than a copy. In reality the second sequence is less preferable since you have a register dependency. Agreed 100%. A similar problem happens with foo(x+1,x+1) which currently emits (set (reg r3) (plus (reg x) (const_int 1))) (set (reg r4) (reg r3)) for the arg setup insns. On modern processors it would be better as (set (reg r3) (plus (reg x) (const_int 1))) (set (reg r4) (plus (reg x) (const_int 1))) So in these examples we'd really like register moves to cost one insn. Hmm, at least, moves from hard regs ought to cost something. Agreed again. These are good examples of things the costing model simply wasn't ever designed to consider -- because they weren't significant issues on the m68k, vax and other ports in the gcc-1 era. So I don't really know how to tell you to proceed -- I've considered the costing models fundamentally flawed for many years, but haven't ever tried to come up with something that works better. Jeff
Re: set_src_cost lying comment
On 06/24/2015 03:18 AM, Alan Modra wrote: So in these examples we'd really like register moves to cost one insn. Hmm, at least, moves from hard regs ought to cost something. The more I think about it, the more I think that's a reasonable step. Nothing should have cost 0. Jeff
Re: set_src_cost lying comment
On 06/25/2015 06:28 AM, Richard Earnshaw wrote: On 24/06/15 17:47, Jeff Law wrote: On 06/24/2015 03:18 AM, Alan Modra wrote: So in these examples we'd really like register moves to cost one insn. Hmm, at least, moves from hard regs ought to cost something. The more I think about it, the more I think that's a reasonable step. Nothing should have cost 0. Jeff It really means what you mean by cost here. I think rtx_cost is really talking about delta costs much of the time, so when recursing down plus (op1, op2) there's a cost from the plus, a cost from op1 and a cost from op2. I believe the idea behind reg being cost 0 is that when op1 and op2 are both registers in the above expression the overall cost is just the cost of the plus, with no additional cost coming from the operands. Perhaps, but it's also the case on the PPC and a variety of other architectures that there's no difference between a reg and some constants. And on some architectures the PLUS is no different than a logical operation or even a copy. This leaves only one problem: if the entire expression is just reg then the overall cost becomes zero. We hit this problem because we only look at the source cost, not the overall insn cost. Mostly that's ok, but in the specific case of a move instruction it doesn't really generate the desired result. Perhaps the best thing to do is to use the OUTER code to spot the specific case where you've got a SET and return non-zero in that case. Seems like it's worth an experiment. jeff
Re: Proposal for merging scalar-storage-order branch into mainline
On 06/09/2015 04:52 AM, Richard Biener wrote: On Tue, Jun 9, 2015 at 12:39 PM, Eric Botcazou wrote: What's the reason to not expose the byte swapping operations earlier, like on GIMPLE? (or even on GENERIC?) That would be too heavy, every load and store in GENERIC/GIMPLE would have an associated byte swapping operation, although you don't know if they will be needed in the end. For example, if the structure is scalarized, they are not. Yes, but I'd expect them to be optimized away (well, hopefully). Anwyay, and I thought use of the feature would be rare so that "every load and store" is still very few? Seems like it'd be a great way to test the effectiveness of our bswap pass :-) jeff
Re: Proposal for merging scalar-storage-order branch into mainline
On 06/09/2015 10:20 AM, Eric Botcazou wrote: Because some folks don't want to audit their code to where to add byteswaps. I am serious people have legacy big-endian code they want to run little endian. There is a reason this is around in the first place. Developers are lazy. That's a little rough, but essentially correct in our experience. Agreed on both points. These legacy codebases can be large and full auditing may not really be that feasible. jeff
Re: C++ coding style inconsistencies
On 06/25/2015 12:28 PM, Richard Sandiford wrote: Sorry in advance for inviting a bikeshed discussion, but while making the hashing changes that I just committed, I noticed that the C++ification has been done in a variety of different styles. I ended up having to follow the "do what the surrounding code does" principle that some code bases have, but to me that's always seemed like an admission of failure. One of the strengths of the GCC code base was always that it was written in a very consistent style. Regardless of what you think of that style (I personally like it, but I know others don't at all), it was always easy to work on a new area of the compiler without having to learn how the surrounding code preferred to format things. It would be a shame if we lost that in the rush to make everything "more C++". The three main inconsistencies I saw were: (1) Should inline member functions be implemented inside the class or outside the class? If inside, should they be formatted like this: void foo (args...) { ...; } or like this: void foo (args...) { ...; } (both have been used). The coding standard is pretty clear about this one: Define all members outside the class definition. That is, there are no function bodies or member initializers inside the class definition. But in-class definitions have become very common. Do we want to revisit this? Or do we just need more awareness of what the rule is supposed to be? [Personally I like the rule. The danger with in-class definitions is that it becomes very hard to see the interface at a glance. It obviously makes things more verbose though.] I'd say let's go with the existing rule. I know that in-class definitions are relatively common, particularly if they are trivial, so do we want an exception for anything that fits on a single line? (2) Is there supposed to be a space before a template parameter list? I.e. is it: foo or: foo ? Both are widely used. The current coding conventions don't say explicitly, but all the examples use the second style. It's also more in keeping with convention for function parameters. On the other hand, it could be argued that the space in: foo ::thing makes the binding confusing and looks silly compared to: foo::thing But there again, the second one might look like two unrelated blobs at first glance. I'd go with whatever gnu-ident does with these things. I'd hate for us to settle on a style that requires non-default behaviour from gnu-indent. (3) Do we allow non-const references to be passed and returned by non-operator functions? Some review comments have pushed back on that, but some uses have crept in. [IMO non-const references are too easy to misread as normal parameters.] In all three cases, whether the answer is A or B is less important than whether the answer is the same across the code base. I could make an argument either way on this one... jeff
Re: C++ coding style inconsistencies
On 06/26/2015 03:50 AM, Martin Jambor wrote: Hi, On Thu, Jun 25, 2015 at 04:59:51PM -0400, David Malcolm wrote: On Thu, 2015-06-25 at 19:28 +0100, Richard Sandiford wrote: Sorry in advance for inviting a bikeshed discussion, but while making the hashing changes that I just committed, I noticed that the C++ification has been done in a variety of different styles. I ended up having to follow the "do what the surrounding code does" principle that some code bases have, but to me that's always seemed like an admission of failure. One of the strengths of the GCC code base was always that it was written in a very consistent style. Regardless of what you think of that style (I personally like it, but I know others don't at all), it was always easy to work on a new area of the compiler without having to learn how the surrounding code preferred to format things. It would be a shame if we lost that in the rush to make everything "more C++". [...snip...] If we're bike-shedding (sorry, I'm waiting for a bootstrap), do we have a coding standard around the layout of member initialization in constructors? Yes, https://gcc.gnu.org/codingconventions.html#Member_Form i.e. should it be: foo::foo (int x, int y) : m_x (x), m_y (y) { } vs foo::foo (int x, int y) : m_x (x), m_y (y) { } according to the document, the semicolon should be on the first column if all initializers do not fit on one line with the definition. Emacs gnu-style indentation does not do that and produces your second case above, which, according to some simple grepping, is also greatly prevails in the codebase now. So perhaps we should change the rule? (how much indentation?) https://gcc.gnu.org/wiki/CppConventions I'd be wary of citing and using this document, IIRC it sometimes contradicts the official one and was meant as a basis for discussion when we were discussing whether to switch to gcc in the first place. But it (CppConventions in the wiki) also has a note that the top that makes it explicit that the information on that page is obsolete and refers the reader to the official conventions. Jeff
Re: Proposal for merging scalar-storage-order branch into mainline
On 06/26/2015 01:56 AM, Richard Biener wrote: On Thu, Jun 25, 2015 at 7:03 PM, Jeff Law wrote: On 06/09/2015 10:20 AM, Eric Botcazou wrote: Because some folks don't want to audit their code to where to add byteswaps. I am serious people have legacy big-endian code they want to run little endian. There is a reason this is around in the first place. Developers are lazy. That's a little rough, but essentially correct in our experience. Agreed on both points. These legacy codebases can be large and full auditing may not really be that feasible. Well - they need a full audit anyway to slap those endian attributes on the appropriate structures. We are not, after all, introducing a -fbig-endian switch. The cases I'm aware of would *love* a -fbig-endian switch :-) "Legacy code base" and "new compiler feature" don't mix in my mind. Assume the legacy platform didn't use GCC (because GCC wasn't a viable option way back when the now legacy platform was state-of-the-art), while the new platform will use GCC and will have an endianness change. In that case features to ease the pain of migration make a goodly amount of sense. jeff
Re: C++ coding style inconsistencies
On 06/27/2015 01:18 AM, Richard Sandiford wrote: Mikhail Maltsev writes: Perhaps one disappointing exception is mixed space/tabs indentation. It is often inconsistent (i.e. some parts use space-only indentation). Yeah. Been hitting that recently too. commit-hooks are the solution to this class of problem, IMHO. Either by rejecting code which violates the formatting standards or automagically formatting it for us (via gnu-indent presumably). I believe glibc (for example) is using commit hooks to reject commits that violate certain formatting rules. No reason we couldn't do the same. Not sure what granularity that hooks uses -- ie, does it flag preexisting formatting nits or just new ones. Either way works. The former is a bit more burdensome initially, but does get the code base into shape WRT formatting stuff more quickly. Jeff
Re: Multi-version IF-THEN-ELSE conditional
On 06/27/2015 06:10 AM, Ajit Kumar Agarwal wrote: All: The presence of aliases disables many optimizations like CCP(conditional constant propagation) , PRE(Partial Redundancy Elimination), Scalar Replacements for conditional IF-THEN-ELSE. The presence of aliasing also disables the IF-conversion. I am proposing the Multi-version IF-THEN-ELSE where the different version of the IF-THEN-ELSE is one without aliasing and other versions with the aliasing. Thus converting the different Multi-version IF-THEN-ELSE enables the CCP, PRE, Scalar replacements and IF-conversions for the version of the IF-THEN-ELSE that does not have aliasing on the pointer variables and the versions that has alias information will not be affected with such optimizations. I don't have examples on hand currently but I am working on currently to provide the examples. Probably the hardest part will be the heuristics around this. The optimizations you want to improve happen as separate passes, so there won't necessarily be a good way to predict if the multi-version if-then-else will enable further optimizations. Jeff
Re: gcc feature request / RFC: extra clobbered regs
On 06/30/2015 04:02 PM, H. Peter Anvin wrote: On 06/30/2015 02:55 PM, Andy Lutomirski wrote: On Tue, Jun 30, 2015 at 2:52 PM, H. Peter Anvin wrote: On 06/30/2015 02:48 PM, Andy Lutomirski wrote: On Tue, Jun 30, 2015 at 2:41 PM, H. Peter Anvin wrote: On 06/30/2015 02:37 PM, Jakub Jelinek wrote: I'd say the most natural API for this would be to allow f{fixed,call-{used,saved}}-REG in target attribute. Either that or __attribute__((fixed(rbp,rcx),used(rax,rbx),saved(r11))) ... just to be shorter. Either way, I would consider this to be desirable -- I have myself used this to good effect in a past life (*cough* Transmeta *cough*) -- but not a high priority feature. I think I mean the per-function equivalent of -fcall-used-reg, so hpa's "used" suggestion would do the trick. I guess that clobbering the frame pointer is a non-starter, but five out of six isn't so bad. It would be nice to error out instead of producing "disastrous results", though, if another bad reg is chosen. (Presumably the PIC register on PIC builds would be an example of that.) Clobbering the frame pointer is perfectly fine, as is the PIC register. However, gcc might need to handle them as "fixed" rather than "clobbered". Hmm. True, I guess, although I wouldn't necessarily expect gcc to be able to generate code to call a function like that. No, but you need to be able to call other functions, or you just push the issue down one level. For ia32, the PIC register really isn't special anymore. I'd be surprised if you couldn't clobber it. jeff
Re: Allocation of hotness of data structure with respect to the top of stack.
On 07/05/2015 05:11 AM, Ajit Kumar Agarwal wrote: All: I am wondering allocation of hot data structure closer to the top of the stack increases the performance of the application. The data structure are identified as hot and cold data structure and all the data structures are sorted in decreasing order of The hotness and the hot data structure will be allocated closer to the top of the stack. The load and store on accessing with respect to allocation of data structure on stack will be faster with allocation of hot Data structure closer to the top of the stack. Based on the above the code is generated with respect to load and store with the correct offset of the stack allocated on the decreasing order of hotness. You might want to look at this paper from an old gcc summit conference. Basically they were trying to reorder stack slots to minimize offsets in reg+d addressing for hte SH port. It should touch on a number of common issues/goals. ftp://gcc.gnu.org/pub/gcc/summit/2003/Optimal%20Stack%20Slot%20Assignment.pdf I can't recall if they ever tried to submit that work for inclusion. Jeff
Re: rl78 vs cse vs memory_address_addr_space
On 07/01/2015 10:14 PM, DJ Delorie wrote: In this bit of code in explow.c: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x)) x = force_reg (address_mode, x); On the rl78 it results in code that's a bit too complex for later passes to be optimized fully. Is there any way to indicate that the above force_reg() is bad for a particular target? I believe this used to be conditional on -fforce-mem or -fforce-reg or some such option that we deprecated long ago. It'd be helpful if you could be more specific about what can't be handled. combine for example was extended to handle larger chains of insns not terribly long ago. jeff
Re: Uninitialized registers handling in the REE pass
On 07/06/2015 09:42 AM, Pierre-Marie de Rodat wrote: Hello, The attached reproducer[1] seems to trigger a code generation issue at least on x86_64-linux: $ gnatmake -q p -O3 -gnatn $ ./p raised PROGRAM_ERROR : p.adb:9 explicit raise Can you please file this as a bug in bugzilla so that can get tracked? http://gcc.gnu.org/bugzilla jeff
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On 07/07/2015 11:53 AM, Martin Jambor wrote: Hi, I've been asked to look into the item one of http://permalink.gmane.org/gmane.linux.kernel/1990397 and found out that at least shrink-wrapping happily moves prologue past an asm statement which can be bad if the asm statement contains a call instruction. Am I right concluding that this is a bug? Looking into the manual and at requires_stack_frame_p() in shrink-wrap.c, I do not see any obvious way of marking the asm statement as requiring the stack frame (but I will not mind being proven wrong). Do we want to create one, such as only disallowing moving prologue past volatile asm statements? Any other ideas? Shouldn't this be driven by dataflow? jeff
Re: s390: larl for Simode on 64-bit
On 07/08/2015 02:33 PM, DJ Delorie wrote: Is there any reason that LARL can't be used to load a 32-bit symbolic value, in 64-bit mode? On TPF (64-bit) the app has the option of being loaded in the first 4Gb so that all symbols are also valid 32-bit addresses, for backward compatibility. (and if not, the linker would complain) It would seem that we'd want the compiler to know when the app is going to be loaded into that first 4G so that it can use the more efficient addressing modes. Jeff
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On 07/08/2015 02:51 PM, Josh Poimboeuf wrote: On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote: On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote: On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote: For other archs, e.g. x86-64, you can do register void *sp asm("%sp"); asm volatile("call func" : "+r"(sp)); I've found that putting "sp" in the clobber list also seems to work: asm volatile("call func" : : : "sp"); This syntax is nicer because it doesn't need a local variable associated with the register. Do you see any issues with this approach? Given that SP isn't subject to register allocation, I'd expect it's fine. Note that some folks have (loudly) requested that GCC issue an error if an asm tries to clobber sp. The call doesn't actually clobber the stack pointer does it? ISTM that a use of sp makes more sense and is better "future proof'd" than clobbering sp. Jeff
Re: s390: larl for Simode on 64-bit
On 07/08/2015 03:05 PM, DJ Delorie wrote: In the TPF case, the software has to explicitly mark such pointers as SImode (such things happen only when structures that contain addresses can't change size, for backwards compatibility reasons[1]): int * __attribute__((mode(SImode))) ptr; ptr = &some_var; So in effect, we have two pointer sizes, 64 being the default, but we can also get a 32 bit pointer via the syntax above? Wow, I'm surprised that works. And the only time we'd be able to use larl is a dereference of a pointer declared with the syntax above. Right OK for the trunk with a simple testcase. I think you can just scan the assembler output for the larl instruction. so I wouldn't consider this the "default" case for those apps, just *a* case that needs to be handled "well enough", and the user is already telling the compiler that they assume those addresses are 32-bit (that either the whole app, or at least the part with that object, will be linked below 4Gb). The majority of the addresses are handled as 64-bit. [1] /me refrains from commenting on the worth of such practices, just that they exist and need to be (and have been) supported. Understood, but we also need to make sure that we don't do something that breaks things. Thus I needed to know the tidbit about explicitly declaring those pointers as SImode. jeff
Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE
On 06/02/2015 10:43 PM, Ajit Kumar Agarwal wrote: -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Tuesday, June 02, 2015 9:19 PM To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote: I don't offhand know if any of the benchmarks you cite above are free-enough to derive a testcase from. But one trick many of us use is to instrument the >>pass and compile some known free software (often gcc itself) to find triggering code and use that to generate tests for the new transformation. I will add tests in the suite. I could see many existing tests in the suite also get triggered with this optimization. Thanks. For cases in the existing testsuite where you need to change the expected output, it's useful to note why the expected output was changed. >>Sometimes a test is compromised by a new optimization, sometimes the expected output is changed and is papering over a problem, etc so it's something >>we look at reasonably closely. Thanks. I will modify accordingly. diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96 100644 --- a/gcc/cfghooks.c +++ b/gcc/cfghooks.c @@ -581,7 +581,7 @@ delete_basic_block (basic_block bb) /* If we remove the header or the latch of a loop, mark the loop for removal. */ - if (loop->latch == bb + if (loop && loop->latch == bb || loop->header == bb) mark_loop_for_removal (loop); So what caused you to add this additional test? In general loop structures are supposed to always be available. The change here implies that the loop structures were gone at some point. That seems at first glance a mistake. I was using the gimple_duplicate_bb which will not add the duplicate basic block inside the current_loops. That's why the above Condition is required. I am using duplicate_block instead of gimple_duplicate_bb. With this change the above check with loop Is not required as it adds the duplicate basic block inside the loops. OK. Good to hear it's not required anymore. diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val) } } +void +gimple_threaded_merge_blocks (basic_block a, basic_block b) If we keep this function it will need a block comment. I say "if" for a couple reasons. First we already have support routines that know how to merge blocks. If you really need to merge blocks you should try to use them. Second, I'm not sure that you really need to worry about block merging in this pass. Just create the duplicates, wire them into the CFG and let the existing block merging support handle this problem. The above routine is not merging but duplicates the join nodes into its predecessors. If I change the name of the above Function to the gimple_threaded_duplicating_join_node it should be fine. But you don't need to duplicate into the predecessors. If you create the duplicates and wire them into the CFG properly the existing code in cfgcleanup >>should take care of this for you. Certainly I will do it. diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 4303a18..2c7d36d 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e, return 0; } +static void +replace_threaded_uses (basic_block a,basic_block b) If you keep this function, then it'll need a function comment. It looks like this is just doing const/copy propagation. I think a better structure is to implement your optimization as a distinct pass, then rely on existing passes such as update_ssa, DOM, CCP to handle updating the SSA graph and propagation opportunities exposed by your transformation. Similarly for the other replace_ functions. I think these replace_ functions are required as existing Dom, CCP and propagation opportunities doesn't transform these Propagation given below. : xk_124 = MIN_EXPR ; xc_126 = xc_121 - xk_6; xm_127 = xm_122 - xk_6; xy_128 = xy_123 - xk_6; *EritePtr_14 = xc_126; MEM[(Byte *)EritePtr_14 + 1B] = xm_127; MEM[(Byte *)EritePtr_14 + 2B] = xy_128; EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte *)EritePtr_14 + 3B] = xk_6; i_137 = i_4 + 1; goto ; : xk_125 = MIN_EXPR ; xc_165 = xc_121 - xk_6; xm_166 = xm_122 - xk_6; xy_167 = xy_123 - xk_6; *EritePtr_14 = xc_126; MEM[(Byte *)EritePtr_14 + 1B] = xm_127; MEM[(Byte *)EritePtr_14 + 2B] = xy_128; EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte *)EritePtr_14 + 3B] = xk_6; i_173 = i_4 + 1; and are the predecessors of the joi
Re: GCC/JIT and precise garbage collection support?
On 07/10/2015 09:04 AM, Armin Rigo wrote: Hi David, On 10 July 2015 at 16:11, David Malcolm wrote: AIUI, we have CALL_INSN instructions all the way through the RTL phase of the backend, so we can identify which locations in the generated code are calls; presumably we'd need at each CALL_INSN to determine somehow which RTL expressions tagged as being GC-aware are live (perhaps a mixture of registers and fp-offset expressions?) So presumably we could use that information (maybe in the final pass) to write out some metadata describing for each %pc callsite the relevant GC roots. Armin: does this sound like what you need? Not quite. I can understand that you're trying to find some solution with automatic discovery of the live variables of a "GC pointer" type and so on. This is more than we need, and if we had that, then we'd need to work harder to remove the extra stuff. We only want the end result: attach to each CALL_INSN a list of variables which should be stored in the stack map for that call, and be ready to see these locations be modified from outside across the call if a GC occurs. I wonder how much overlap there is between this need and what we're going to need to do for resumable functions which are being discussed in the ISO C++ standards meetings. jeff
Re: Spurious parallel make failures in libgcc.
On 07/15/2015 08:33 AM, Andrew MacLeod wrote: Oh, wait. this isnt on a scratch build this time, this is on an incremental rebuild I just noticed (I was doing stuff on multiple machines, turn out the other one was a scratch build) make -j16 from the root build directory. and it may have happened with me putzing around with a .awk file in the gcc which required rebuilding.. so maybe configure did require re-running somehow? still seems to be a timing issue tho. Maybe if gthr-default already existed (as well as config.status), the makefile would spawn the libgcov-interface.c object builds... meanwhile a reconfigure is going which ends up overwriting gthr-default.h at what turns out to be a poor time ? that sort of makes sense I guess. Im not sure how we synchronize the parallel bits. I always blow away the target directory (and stage2-* stage3-*) when I do an incremental bootstrap. If I had to guess, something is missing a dependency which allows compilation of libgcov-interface to start building prior to configure re-running (since I believe it's configure that sets up gthr-default). I bet if you have libgcov-interface.o (and anything else which depends on gthr.h or gthr-default.h depend on config.status this problem would go away. I'm not sure what the real fix is, but it's got to be a missing dependency that allows libgcov-interface.c to build prior to configure being completed. jeff
Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
On 07/28/2015 12:18 PM, Alex Velenko wrote: On 21/04/15 06:27, Jeff Law wrote: On 04/20/2015 01:09 AM, Shiva Chen wrote: Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks. This looks pretty good. I expanded the comment for the new function a bit and renamed the function in an effort to clarify its purpose. From reviewing can_replace_by, it seems it should have been handling this case, but clearly wasn't due to implementation details. I then bootstrapped and regression tested the patch on x86_64-linux-gnu where it passed. I also instrumented that compiler to see how often this code triggers. During a bootstrap it triggers a couple hundred times (which is obviously a proxy for cross jumping improvements). So it's triggering regularly on x86_64, which is good. I also verified that this fixes BZ64916 for an arm-non-eabi toolchain configured with --with-arch=armv7. Installed on the trunk. No new testcase as it's covered by existing tests. Thanks,, jeff Hi, I see this patch been committed in r56 on trunk. Is it okay to port this to fsf-5? It's not a regression, so backporting it would be generally frowned upon. If you feel strongly about it, you should ask Jakub, Joseph or Richi (the release managers) for an exception to the general policy. jeff
Bin Cheng as Loop Induction Variable Optimizations maintainer
I am pleased to announce that the GCC Steering Committee has appointed Bin Cheng as the IVopts maintainer. Please join me in congratulating Bin on his new role. Bin, please update your entry in the MAINTAINERS file. I also believe you have some patches to self-approve :-) Thanks, Jeff
Re: Finding insns to reorder using dataflow
On 08/13/2015 05:06 AM, Kyrill Tkachov wrote: Hi all, I'm implementing a target-specific reorg pass, and one thing that I want to do is for a given insn in the stream to find an instruction in the stream that I can swap it with, without violating any dataflow dependencies. The candidate instruction could be earlier or later in the stream. I'm stuck on finding an approach to do this. It seems that using some of the dataflow infrastructure is the right way to go, but I can't figure out the details. can_move_insns_across looks like relevant, but it looks too heavyweight with quite a lot of arguments. I suppose somehow constructing regions of interchangeable instructions would be the way to go, but I'm not sure how clean/cheap that would be outside the scheduler Any ideas would be appreciated. I think you want all the dependency analysis done by the scheduler. Which leads to the question, can you model what you're trying to do in the various scheduler hooks -- in particular walking through the ready list seems appropriate. jeff
Re: Finding insns to reorder using dataflow
On 08/14/2015 03:05 AM, Kyrill Tkachov wrote: The problem I'm trying to solve can be expressed in this way: "An insn that satisfies predicate pred_p (insn) cannot appear exactly N insns apart from another insn 'insn2' that satisfies pred_p (insn2). N is a constant". So, the problem here is that this restriction is not something expressed in terms of cycles or DFA states, but rather distance in the instruction stream. I wasn't really suggesting to model it in DFA states, but instead use the dependency analysis + hooks. The dependency analysis in particular when it's safe to interchange two insns. Given the additional information, I think you'd want to note when an insn fires and satisfies pred_p, and associate a counter with each firing. THe active counters are bumped (decremented?) at each firing (so you can track how many insns appear after the one that satisfied pred_p). Note that for insns which generate multiple assembly instructions, you need to decrement the counter by the number of assembly instructions they emit. Then when sorting the ready list, if you have an insn that satisfies pred_p and an active counter has just reached zero, make sure some other insn fires (what if there aren't any other ready insns? Is this a correctness or performance issue?) I don't think I can do this reliably during sched2 because there is still splitting that can be done that will create more insns that will invalidate any book keeping that I do there. Right. You need everything split and you need accurate insn length information for every insn in the backend that isn't split. If this is a correctness issue, then you also have to deal with final deleting insns behind your back as well. Many years ago I did something which required 100% accurate length information from the backend. It was painful, very very painful. Ultimately it didn't work out and the code was scrapped. However, during TARGET_MACHINE_DEPENDENT_REORG I can first split all insns and then call schedule_insns () to do another round of scheduling. However, I'm a bit confused by all the different scheduler hooks and when each one is called in relation to the other. You'll have to work through them -- I haven't kept close tabs on the various hooks we have, I just know we have them. I'd need to keep some kind of bitfield recording for the previous N instructions in the stream whether they satisfy pred_p. Where would I record that? Can I just do everything in TARGET_SCHED_REORDER? i.e. given a ready list check that no pred_p insns in it appear N insns apart from another such insn (using my bitfield as a lookup helper), reorder insns as appropriate and then record the order of the pred_p insns in the bitfield. Would the scheduler respect the order of the insns that was set by TARGET_SCHED_REORDER and not do any further reordering? The problem I see is that once one of these insns fire, other new insns will be added to the ready list. So you have to keep some kind of state about how many instructions back one of these insns fired and consult that data when making a decision about the next instruction to fire. All this will fall apart if this is a correctness issue since you'd have to issue a nop or somesuch. Though I guess you might be able to arrange to get a nop into the scheduled stream. If this is a correctness issue, tackling it in the assembler may make more sense. Jeff
Re: Deprecate SH5/SH64
On 08/18/2015 11:11 AM, David Edelsohn wrote: On Tue, Aug 18, 2015 at 1:00 PM, Oleg Endo wrote: Hi all, Kaz and I have been discussing the SH5/SH64 status, which is part of the SH port, every now and then. To our knowledge, there is no real hardware available as of today and we don't think there are any real users for a SH5/SH64 toolchain out there. Moreover, the SH5/SH64 parts of the SH port haven't been touched by anybody for a long time. The only exception is occasional ad-hoc fixes for bug reports from people who build GCC for every architecture that is listed in the Linux kernel. However, we don't actually know whether code compiled for SH5/SH64 still runs at an acceptable level since nobody has been doing any testing for that architecture for a while now. If there are no objections, we would like to deprecate SH5/SH64 support as of GCC 6. Initially this would include an announcement on the changes page and the removal of any documentation related to SH5/SH64. After GCC 6 we might start removing configure options and the respective code paths in the target. +1 Works for me based on what I've heard independently about sh5 hardware situation. Frankly, I think we should be more aggressive about this kind of port/variant pruning across the board. Jeff
Re: Question about "instruction merge" pass when optimizing for size
On 08/19/2015 02:38 PM, DJ Delorie wrote: I've seen this on other targets too, sometimes so bad I write a quick target-specific "stupid move optimizer" pass to clean it up. A generic pass would be much harder, but very useful. More important is to determine *why* we're getting these patterns. In the IRA/LRA world, they should be a lot less common. Jeff
Re: Question about "instruction merge" pass when optimizing for size
On 08/20/2015 01:07 AM, sa...@hederstierna.com wrote: From: Jeff Law More important is to determine *why* we're getting these patterns. In the IRA/LRA world, they should be a lot less common. Yes I agree this phenomena seems more common after introducing LRA. Though I was thinking that such a pass still maybe can be relevant. Thinking hypothetically of an architecture, lets call it cortex-X, assume this specific target type have an op-code for ADD with 5-operands. Optimal code for a = a + b + c + d would be addx Ra,Ra,Rb,Rc,Rd where in the optimization process do we introduce the merging into this target type specific instruction. Can the more generic IRA/LRA handle this? Lots of passes could be involved. It's better to work with a real example on a real target for this kind of discussion. Assuming sensible three address code comes out of the gimple with non-overlapping lifetimes, then I'd expect this to be primarily a combiner issue. And maybe patterns can appear across different BB, or somewhere that the normal optimizers have hard to find, or figure out? Sorry if I'm ignorant, I don't know the internals of the different optimizers, but I'm trying to learn and understand how to come forward on this issue we have with code size currently. (I tried to but some bugs on it also Bug 61578 and Bug 67213.) Unfortunately, 61578 has multiple testcases. Each should be its own bug that can be addressed and tracked individually. Peeking at the last testcase in c#19 is interesting. Presumably the typecasting is necessary to avoid doing multiple comparisons and the assumption is that the casts will be NOPs at the RTL level. That assumption seems to be fine through IRA. The allocation seems sane, except there's a reload needed for thumb1_addsi3_addgeu to ensure operand 1 and operand 0 match due to the matching constraint. That points to two issues. 1. Is IRA correctly tracking the need for those two operands to be the same and accouting for that in its cost model. 2. In the case where IRA still generates code that needs a reload, why was the old reload code able to eliminate the copy, the LRA can't. 67213 is probably a costing issue somewhere. Since Richi is already involved, I'll let the two of you dig into the details. jeff
Re: Possible issue with using LAST_INSN_CODE
On 08/20/2015 02:54 AM, Claudiu Zissulescu wrote: Hi, The LAST_INSN_CODE is used to mark the last instruction code valid for a particular architecture (e.g., For ARM the value of LAST_INSN_CODE is 3799). Also this code (i.e., 3799) is used by a predicated instruction (e.g., for ARM this code is used by predicated version of arm_usatsihi => {*p arm_usatsihi}). However, the LAST_INSN_CODE macro is used by lra, recog and tree-vect-stmts to dimension various arrays which may lead to various errors. For example, when calling preprocess_insn_constraints (recog.c:2444), the compilation may go berserk when evaluating "if (this_target_recog->x_op_alt[icode])" line when icode is exactly the LAST_INSN_CODE as "this_target_recog->x_op_alt" is dimensioned up to LAST_INSN_CODE (recog.h:397). A possible solution is having the LAST_INSN_CODE value to be exactly the value returned by get_num_isns_codes() (gencodes.c:89). Alternatively is to use LAST_INSN_CODE+1 when defining an array. Please can someone confirm my observation. And what will be the best solution for this. It seems to me like something has been broken then. LAST_INSN_CODE is supposed to be higher than any insn defined by the backend. Jeff
Re: Possible issue with using LAST_INSN_CODE
On 08/20/2015 11:28 AM, Claudiu Zissulescu wrote: Hi Jeff, In the gencodes.c:89, it explicitly decrements by one the return value of get_num_insn_codes(). While for the get_num_insn_codes is stated this: /* Return the number of possible INSN_CODEs. Only meaningful once the whole file has been processed. */ I can provide an example for the ARC port where it crashes due to LAST_INSN_CODE issue. Probably it can be reproduced with other more popular port like ARM. Passing along a test, even for the ARC is useful. This is something Richard recently changed, it's probably just an oversight on his part. I believe he's in the UK and may be offline for the day. jeff
Re: Moving to git
On 08/20/2015 11:57 AM, Jason Merrill wrote: I hear that at Cauldron people were generally supportive of switching over to git as the primary GCC repository, and talked about me being involved in that transition. Does anyone have more information about this discussion? Our current workflow translates over to a git master pretty easily: basically, in the current git-svn workflow, replace git svn rebase and git svn dcommit with git pull --rebase and git push. Right. It should be pretty straightforward to use the existing git mirror as the master repository; the main adjustment I'd want to make is rewriting the various subdirectory branches to be properly represented in git. This is straightforward, but we'll want to stop SVN commits to subdirectory branches shortly before the changeover. Seems reasonable. I think we also need to convert our SVN hooks into git hooks, but presumably that'll be easy. I suspect Jakub will strongly want to see some kind commit hook to associate something similar to an SVN id to each git commit to support his workflow where the SVN ids are associated with the compiler binaries he keeps around for very fast bisection. I think when we talked about it last year, he just needs an increasing # for each commit, presumably starting with whatever the last SVN ID is when we make the change. It would be good to have a more explicit policy on branch/tag creation, rebasing, and deletion in the git world where branches are lighter weight and so more transient. Presumably for branch/tag creation the primary concern is the namespace? I think if we define a namespace folks can safely use without getting in the way of the release managers we get most of what we need. ISTM that within that namespace, folks ought to have the freedom to use whatever works for them. If folks want to create a transient branch, push-rebase-push on that branch, then later remove it, I tend to think, why not let them. Do we want a namespace for branches which are perhaps not as transient in nature, ie longer term projects, projects on-ice or works-in-progress that we don't want to lose? As far as the trunk and release branches, are there any best practices out there that we can draw from? Obviously doing things like push-rebase-push is bad. Presumably there's others. jeff
Re: Moving to git
On 08/20/2015 02:09 PM, Jason Merrill wrote: On 08/20/2015 02:23 PM, Jeff Law wrote: I suspect Jakub will strongly want to see some kind commit hook to associate something similar to an SVN id to each git commit to support his workflow where the SVN ids are associated with the compiler binaries he keeps around for very fast bisection. I think when we talked about it last year, he just needs an increasing # for each commit, presumably starting with whatever the last SVN ID is when we make the change. Jakub: How about using git bisect instead, and identify the compiler binaries with the git commit sha1? That would seem to make reasonable sense to me. Jakub is on PTO, so we should re-engage on this tweak to his workflow when he returns. Jeff
Re: Moving to git
On 08/24/2015 02:17 AM, Jakub Jelinek wrote: On Thu, Aug 20, 2015 at 04:09:39PM -0400, Jason Merrill wrote: On 08/20/2015 02:23 PM, Jeff Law wrote: I suspect Jakub will strongly want to see some kind commit hook to associate something similar to an SVN id to each git commit to support his workflow where the SVN ids are associated with the compiler binaries he keeps around for very fast bisection. I think when we talked about it last year, he just needs an increasing # for each commit, presumably starting with whatever the last SVN ID is when we make the change. Jakub: How about using git bisect instead, and identify the compiler binaries with the git commit sha1? That is really not useful. While you speed it bisection somewhat by avoiding network traffic and communication with a server, there is still significant time spent on actually building the compiler. I thought the suggestion was to use the git hash to identify the builds you save. So you'd use git bisect merely to get the hash id. Once you've got the git hash, you can then use that to find the right cc1/cc1plus/f95 that you'd previously built. It's not perfect (since you can't just look git hashes and know which one is newer). Jeff
Re: Moving to git
On 08/24/2015 09:43 AM, Jakub Jelinek wrote: On Mon, Aug 24, 2015 at 09:34:41AM -0600, Jeff Law wrote: On 08/24/2015 02:17 AM, Jakub Jelinek wrote: On Thu, Aug 20, 2015 at 04:09:39PM -0400, Jason Merrill wrote: On 08/20/2015 02:23 PM, Jeff Law wrote: I suspect Jakub will strongly want to see some kind commit hook to associate something similar to an SVN id to each git commit to support his workflow where the SVN ids are associated with the compiler binaries he keeps around for very fast bisection. I think when we talked about it last year, he just needs an increasing # for each commit, presumably starting with whatever the last SVN ID is when we make the change. Jakub: How about using git bisect instead, and identify the compiler binaries with the git commit sha1? That is really not useful. While you speed it bisection somewhat by avoiding network traffic and communication with a server, there is still significant time spent on actually building the compiler. I thought the suggestion was to use the git hash to identify the builds you save. So you'd use git bisect merely to get the hash id. Once you've got the git hash, you can then use that to find the right cc1/cc1plus/f95 that you'd previously built. It's not perfect (since you can't just look git hashes and know which one is newer). But then you are forced to use git bisect all the time, because the hashes don't tell you anything. True. Most often even before writing a script I try a couple of compiler versions by hand if I have some extra info (this used to work a 3 years ago, broke in the last couple of days, etc.). A map of key hashes would probably be helpful with this kind of thing. Major releases, key branch->trunk merge points and the like. It'd still be somewhat worse usability wise for you, but it ought to be manageable. ANd like I said before, I'd support a git-hook which bumped some kind of index at each commit for your workflow. Perhaps I could touch the cc1.sha1hash files with timestamps corresponding to the date/time of the commit, and keep them sorted in some file manager by timestamps, still it would be worse usability wise. Not to mention we should keep the existing r123456 comments in bugzilla working, and I'm not convinced keeping a SVN version of the repository (frozen) for that purpose is the best idea. I'd like to keep the old ones working, but new references should probably be using the hash id and commit name. As for how to best keep the old r123456 links working, I don't know. Presumably those could be mapped behind the scenes to a git id. Jeff
Re: Offer of help with move to git
On 08/24/2015 01:46 PM, Frank Ch. Eigler wrote: Joseph Myers writes: [...] FWIW, Jason's own trial conversion with reposurgeon got up to at least 45GB memory consumption on a 32GB repository. (The host sourceware.org box has 72GB.) And if Jason really needs it, we've got considerably larger systems in our test farm that he could provision for this task. Jeff
Re: fake/abnormal/eh edge question
On 08/25/2015 12:39 PM, Steve Ellcey wrote: I have a question about FAKE, EH, and ABNORMAL edges. I am not sure I understand all the implications of each type of edge from the description in cfg-flags.def. I am trying to implement dynamic stack alignment for MIPS and I have code that does the following: prologue copy incoming $sp to $12 (temp reg) align $sp copy $sp to $fp (after alignment so that $fp is also aligned) entry block copy $12 to virtual reg (DRAP) for accessing args and for restoring $sp exit block copy virtual reg (DRAP) back to $12 epilogue copy $12 to $sp to restore stack pointer This works fine as long as there as a path from the entry block to the exit block but in some cases (like gcc.dg/cleanup-8.c) we have a function that always calls abort (a non-returning function) and so there is no path from entry to exit and the exit block and epilogue get removed and the copy of $sp to $12 also gets removed because GCC sees no uses of $12. I want to preserve the copy of $sp to $12 and I also want to preserve the .cfi psuedo-ops (and code) in the exit block and epilogue in order for exception handling to work correctly. One way I thought of doing this is to create an edge from the entry block to the exit block but I am unsure of all the implications of creating a fake/eh/abnormal edge to do this and which I would want to use. Presumably it's the RTL DCE pass that's eliminating this stuff? Do you have the FRAME_RELATED bit set of those insns? But what I don't understand is why preserving the code is useful if it can't be reached. Maybe there's something about the dwarf2 unwinding that I simply don't understand -- I've managed to avoid learning about it for years. jeff
Re: fake/abnormal/eh edge question
On 08/25/2015 03:54 PM, Steve Ellcey wrote: On Tue, 2015-08-25 at 14:44 -0600, Jeff Law wrote: I want to preserve the copy of $sp to $12 and I also want to preserve the .cfi psuedo-ops (and code) in the exit block and epilogue in order for exception handling to work correctly. One way I thought of doing this is to create an edge from the entry block to the exit block but I am unsure of all the implications of creating a fake/eh/abnormal edge to do this and which I would want to use. Presumably it's the RTL DCE pass that's eliminating this stuff? Actually, it looks like is peephole2 that is eliminating the instructions (and .cfi psuedo-ops). Strange. I'm not sure why peep2 would be deleting those instructions, except perhaps as a side effect of a cfgcleanup or somesuch. that I simply don't understand -- I've managed to avoid learning about it for years. I am not entirely sure I need the code or if I just need the .cfi psuedo-ops and that I need the code to generate the .cfi stuff. I wish I could avoid the dwarf unwinder but that seems to be the main problem I am having with stack realignment. Getting the cfi stuff right so that the unwinder works properly is proving very hard. Yea, unfortunately I can't help much there. I see dwarf-anything and my eyes just glaze over and I thank the powers that be that Jakub, Jason and others are around to handle that stuff. jeff
Re: 33 unknowns left
On 08/26/2015 01:31 PM, Eric S. Raymond wrote: mib = mib Michael Bushnell. Aagain, not active in forever. m...@geech.gnu.ai.mit.edu probably doesn't work anymore. miles = miles Miles Bader. mi...@gnu.ai.mit.edu mkoch = mkoch Michael Koch? konque...@gmx.de/ moore = moore Catherine, Tim? mycroft = mycroft Charles Hannum. Hasn't been active in forever. mycr...@gnu.ai.mit.edu probably doesn't work anymore. Might help if we had a reference to one or more changes from the folks. Just knowing timeframes for example would likely resolve . Jeff
Re: 33 unknowns left
On 08/26/2015 02:09 PM, Eric S. Raymond wrote: Jeff Law : On 08/26/2015 01:31 PM, Eric S. Raymond wrote: mib = mib Michael Bushnell. Aagain, not active in forever. m...@geech.gnu.ai.mit.edu probably doesn't work anymore. miles = miles Miles Bader. mi...@gnu.ai.mit.edu mycroft = mycroft Charles Hannum. Hasn't been active in forever. mycr...@gnu.ai.mit.edu probably doesn't work anymore. Right, I recognize these people as long-time hard-core GNU contributors. It would be a bit surprising if they *weren't* in the history anywhere. Adding them now... That's why those 3 popped out at me. moore = moore Catherine, Tim? The more I think about it, it's more likely Tim. Catherine typically used clm@ and Tim used moore@. Certainly if it was a change to the PA port, then it was Tim. Jeff
Re: 33 unknowns left
On 08/26/2015 02:35 PM, Eric S. Raymond wrote: Joseph Myers : On Wed, 26 Aug 2015, Eric S. Raymond wrote: After comparing with the Subversion hists, passswd file, the are 30 unknowns left. Can anyone identify any of these? aluchko = aluchko Aaron Luchko Aha. I thought that was him. I found his SourceForge account. bo = bo Bo Thorsen Oh, thank you. I had *no* idea how I was going to pin down that one. It's a unpromising string for web searches. ira = ira Ira Ruben irar = irar Ira Rosen I pretty much knew these two guys went with these two names, but couldn't figure out which was which. Thanks. [others omitted] All of the above are emails from the time of some commits, not necessarily current. That's OK. Addresses will go stale. The important thing is to preserve as good odds as possible that fuure data mining will be able to recognize when different name/address pairs with the same name-among-humans refer to the same person. The remaining list is pretty short: bson = bson fx = fx fx is active... Francois-Xavier Coudert fxcoud...@gcc.gnu.org Not sure how I missed that the first time around.
Re: 33 unknowns left
On 08/26/2015 02:44 PM, Ian Lance Taylor wrote: friedman = friedman Noah Friedman (was ). Yea. fx = fx Dave Love (was ). Hmm, not Francois-Xavier Coudert? I guess if it's an old commit, then Dave Love is more likely. Given most of the names we're pulling out are from old contributors, Dave is probably the most likely. hassey = hassey John Hassey (was ). Yes. jrv = jrv James Van Artsdalen (was ). Yes. karl = karl Karl Berry (was k...@cs.umb.edu). Yes. moore = moore Timothy Moore (was ). (Catherine Moore is clm). I think the consensus is Tim. He'll also be moore@*.cs.utah.edu wood = wood Tom Wood (was ). Yes. jeff
Re: 33 unknowns left
On 08/26/2015 02:50 PM, Eric S. Raymond wrote: Jeff Law : moore = moore Catherine, Tim? The more I think about it, it's more likely Tim. Catherine typically used clm@ and Tim used moore@. Certainly if it was a change to the PA port, then it was Tim. What was his address? Most were @cs.utah.edu, but he left Utah eons ago. He was @redhat.com for a while, not sure after that. About Catherine Moore: One of the things that is quite noticeable about this list is the number of apparently female contributors, which while regrettably small in absolute terms is still rather more than I'm used to seeing in a sample this size. This makes me curious. Was any special effort made to attract female hackers to the project? Is there a prevailing theory about why they showed up in comparatively large numbers? It would be interesting to know what, if any specific thing, was done right here... I can't recall any special effort. Just always trying to encourage anyone to contribute in whatever way they could. jeff
Re: 33 unknowns left
On 08/26/2015 02:54 PM, Joseph Myers wrote: click = click Nick Clifton Wow, never knew 'click' would be Nick. ni...@redhat.com is probably better than ni...@cygnus.com
Re: 33 unknowns left
On 08/26/2015 06:02 PM, Peter Bergner wrote: On Wed, 2015-08-26 at 13:44 -0700, Ian Lance Taylor wrote: On Wed, Aug 26, 2015 at 12:31 PM, Eric S. Raymond wrote: click = click You've got me on that one. Any hints? Just purely looking at the name, did Cliff Click ever contribute to gcc in the past? I don't think so. It was my first thought when I say click@. jeff
Re: 33 unknowns left
On 08/26/2015 07:37 PM, Joel Sherrill wrote: On August 26, 2015 8:28:40 PM CDT, Jeff Law wrote: On 08/26/2015 06:02 PM, Peter Bergner wrote: On Wed, 2015-08-26 at 13:44 -0700, Ian Lance Taylor wrote: On Wed, Aug 26, 2015 at 12:31 PM, Eric S. Raymond wrote: click = click You've got me on that one. Any hints? Just purely looking at the name, did Cliff Click ever contribute to gcc in the past? I don't think so. It was my first thought when I saw click@. Didn't Amazon get a patent on the one click@? I recall something like that. Seriously the email review has been a walk down memory lane. :) Very much so. Many of those folks pre-date my involvement in GCC. Jeff
Re: Repository for the conversion machinery
On 08/27/2015 10:16 AM, Eric S. Raymond wrote: Paulo Matos : On 27/08/15 16:56, Paulo Matos wrote: I noticed I am not on the list (check commit r225509, user pmatos) either. And thanks for your help on this transition. r188804 | mkuvyrkov Maxim Kuvyrkov jeff
Re: Repository for the conversion machinery
On 08/27/2015 10:04 AM, FX wrote: If the former, then I don't know why they're not in the map. In fact, I can look at the output of “svn log” for the MAINTAINERS file, which probably almost everyone with commit rights has modified. This contains 442 usernames, compared to the map’s 290. And there are probably more, which we’ll miss if we have to rely on manual modifications of that list… How was the map generated? FX PS: I found one username that first escaped my scripts because it contained a period, so I am raising a flag here, so the same doesn’t happen to you: m.hayes (commit 34779). Michael Hayes? Jeff
Re: Ambiguous usernames
On 08/27/2015 11:27 AM, Eric S. Raymond wrote: I'm pretty sure I know who bothner, brendan, drepper, eggert, ian, jimb, meissner, and roland are; they've all had stable handles longer than GCC has existed. Yup. (Raise a glass to Brendan Kehoe; he was a fine hacker and a good man and it's a damn shame we lost him.) Absolutely. Scrutiny should therefore fall particularly on amylaar, bje, bkoz, dje, gavin, kenner, krab, law, meyering, mrs, raeburn, shebs, and wilson. What do you need here? I can confirm that each of those handles corresponds to one specific individual person, with the exception of dje (which we know is Doug Evans and David Edelsohn) and krab, which I don't know the history behind. Jeff
Re: Repository for the conversion machinery
On 08/28/2015 09:26 AM, Joseph Myers wrote: All the cygnus.com addresses are out of date. More current replacements for a few: echristo = Eric Christopher merrill = Jason Merrill (if someone appears with multiple usernames, probably make their address consistent for all of them unless specifically requested otherwise) rsavoye = Rob Savoye Given that I worked for Cygnus and still work with Red Hat, I can make a pass over all the @cygnus.com addresses and probably give something more up-to-date for most of them if that's useful. jeff
Re: Predictive commoning leads to register to register moves through memory.
On 08/28/2015 09:43 AM, Simon Dardis wrote: Following Jeff's advice[1] to extract more information from GCC, I've narrowed the cause down to the predictive commoning pass inserting the load in a loop header style basic block. However, the next pass in GCC, tree-cunroll promptly removes the loop and joins the loop header to the body of the (non)loop. More oddly, disabling conditional store elimination pass or the dominator optimizations pass or disabling of jump-threading with --param max-jump-thread-duplication-stmts=0 nets the above assembly code. Any ideas on an approach for this issue? I'd probably start by looking at the .optimized tree dump in both cases to understand the difference, then (most liklely) tracing that through the RTL optimizers into the register allocator. jeff
Re: Repository for the conversion machinery
On 08/28/2015 09:57 AM, Eric S. Raymond wrote: Jeff Law : Given that I worked for Cygnus and still work with Red Hat, I can make a pass over all the @cygnus.com addresses and probably give something more up-to-date for most of them if that's useful. That would be *very* useful. Here's my stab at all the @cygnus.com and @redhat.com addresses. There's several I lost track of through the years. bill = Bill Cox Retired. Not sure of his email address. billm = Bill Moyer Now at Sonic.net. bi...@ciar.org noer = Geoffrey Noer Now at Panasas. I don't have an email address though. raeburn = Ken Raeburn Now at Permabit. raeb...@permabit.com abalkiss = Anthony Balkissoon No clue on this one, not with Red Hat anymore. aluchko = Aaron Luchko Grad Student at University of Alberta. bbooth = Brian Booth Student at Simon Fraser University. b...@sfu.ca djee = David Jee No clue. Not with Red Hat anymore fitzsim = Thomas Fitzsimmons No clue. Not with Red Hat anymore hiller = Matthew Hiller No clue. Not with Red Hat anymore jknaggs = Jeff Knaggs Not with Red Hat anymore spolk = Syd Polk Now at Mozilla. sydp...@gmail.com apbianco = Alexandre Petit-Bianco Now at Google. apbia...@serialhacker.org bkoz = Benjamin Kosnik b...@gnu.org cchavva = Chandra Chavva Cavium Networks ccha...@caviumnetworks.com clm = Catherine Moore Code Sourcery c...@codesourcery.com graydon = Graydon Hoare Now at Stellar Development. I don't know his email address. jimb = Jim Blandy Now at Mozilla I believe. j...@red-bean.com kgallowa = Kyle Galloway He's at twitter now, but I don't have an email address. rsandifo = Richard Sandiford ARM rdsandif...@googlemail.com tromey = Tom Tromey Now at Mozilla I believe. t...@tromey.com cagney = Andrew Cagney No longer with Red Hat. Not sure where he is now. chastain = Michael Chastain Not with Red Hat anymore. Spent some time at Google, but I don't think he's there anymore either. dlindsay = Don Lindsay Not with Red Hat anymore. Was at Cisco for a period of time (linds...@cisco.com). Not sure if he's still there. trix = Tom Rix Not with Red Hat anymore. No idea where he is now. All these are still current: aldyh = Aldy Hernandez amacleod = Andrew Macleod aoliva = Alexandre Oliva aph = Andrew Haley brolley = Dave Brolley carlos = Carlos O'Donell click = Nick Clifton davem = David S. Miller dj = DJ Delorie dmalcolm = David Malcolm fche = Frank Ch. Eigler fnasser = Fernando Nasser gary = Gary Benson gavin = Gavin Romig-Koch green = Anthony Green jakub = Jakub Jelinek jason = Jason Merrill jkratoch = Jan Kratochvil kevinb = Kevin Buettner kseitz = Keith Seitz ktietz = Kai Tietz law = Jeff Law merrill = Jason Merrill mpolacek = Marek Polacek nickc = Nick Clifton oliva = Alexandre Oliva palves = Pedro Alves pmuldoon = Phil Muldoon rth = Richard Henderson scox = Stan Cox tiemann = Michael Tiemann torvald = Torvald Riegel vmakarov = Vladimir Makarov wcohen = William Cohen
Re: Offer of help with move to git
On 08/27/2015 10:13 PM, Eric S. Raymond wrote: I'd like to use the --legacy flag so that old references to SVN commits are easier to look up. Your call, but ... I don't recommend it. It's very cluttery, and I've found the demand for that kind of lookup tends to drop off after conversion faster than people expect it will. I suspect we do this with more regularity than most projects. Hell, I regularly wish we had all the emacs backups files from the old mit machines (they were unfortunately purged regularly to make space). Jeff
Re: Repository for the conversion machinery
On 08/28/2015 12:29 PM, Eric S. Raymond wrote: Jeff Law : Here's my stab at all the @cygnus.com and @redhat.com addresses. There's several I lost track of through the years. Would you please resend this as a contrib map with the updated addresses in it? I find that when I hand-edit these in I make too many cut'n'paste errors. Given a contrib map, repomapper -u can do it all in one go. If you don't know a current address for the person, we'll just leave the redhat one in place - best we can do. Will do, but won't get to it today. Monday most likely. jeff
Re: reload question about unmet constraints
On 09/01/2015 01:44 AM, DJ Delorie wrote: Given this test case for rl78-elf: extern __far int a, b; void ffr (int x) { a = b + x; } I'm trying to use this patch: Index: gcc/config/rl78/rl78-virt.md === --- gcc/config/rl78/rl78-virt.md (revision 227360) +++ gcc/config/rl78/rl78-virt.md(working copy) @@ -92,15 +92,15 @@ ] "rl78_virt_insns_ok ()" "v.inc\t%0, %1, %2" ) (define_insn "*add3_virt" - [(set (match_operand:QHI 0 "rl78_nonfar_nonimm_operand" "=vY,S") - (plus:QHI (match_operand:QHI 1 "rl78_nonfar_operand" "viY,0") - (match_operand:QHI 2 "rl78_general_operand" "vim,i"))) + [(set (match_operand:QHI 0 "rl78_nonimmediate_operand" "=vY,S,Wfr") + (plus:QHI (match_operand:QHI 1 "rl78_general_operand" "viY,0,0") + (match_operand:QHI 2 "rl78_general_operand" "vim,i,vi"))) ] "rl78_virt_insns_ok ()" "v.add\t%0, %1, %2" ) (define_insn "*sub3_virt" To allow the rl78 port to generate the "Wfr/0/r" case (alternative 3). (Wfr = far MEM, v = virtual regs). I expected gcc to see that the operation doesn't meet the constraints, and move operands into registers to make it work (alternative 1, "v/v/v"). That'd be my expectation as well. Note that addXX patterns may be special. I can recall a fair amount of pain with them on oddball ports. Instead, it just complains and dies. dj.c:42:1: error: insn does not satisfy its constraints: } ^ (insn 10 15 13 2 (set (mem/c:HI (reg:SI 8 r8) [1 a+0 S2 A16 AS2]) (plus:HI (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [1 x+0 S2 A16]) (mem/c:HI (symbol_ref:SI ("b") ) [1 b+0 S2 A16 AS2]))) dj.c:41 13 {*addhi3_virt} (nil)) dj.c:42:1: internal compiler error: in extract_constrain_insn, at recog.c:2200 Reloads for insn # 10 Reload 0: reload_in (SI) = (symbol_ref:SI ("a") ) V_REGS, RELOAD_FOR_INPUT (opnum = 0), inc by 2 reload_in_reg: (symbol_ref:SI ("a") ) reload_reg_rtx: (reg:SI 8 r8) Reload 1: reload_in (HI) = (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) reload_out (HI) = (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) V_REGS, RELOAD_OTHER (opnum = 1), optional reload_in_reg: (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) reload_out_reg: (mem/c:HI (plus:HI (reg/f:HI 32 sp) (const_int 4 [0x4])) [2 x+0 S2 A16]) Reload 2: reload_in (HI) = (mem/c:HI (symbol_ref:SI ("b") ) [2 b+0 S2 A16 AS2]) V_REGS, RELOAD_FOR_INPUT (opnum = 2), optional reload_in_reg: (mem/c:HI (symbol_ref:SI ("b") ) [2 b+0 S2 A16 AS2]) Note that reload 1 and reload 2 do not have a reload_reg_rtx. My memories of reload are fading fast (thank goodness), but I believe that's an indication that it's not reloading into a hard register. So I'd start with looking at find_reloads/push_reload and figure out why it's not getting a suitable register. It might be good to know what alternative is being targeted by reload. ie, you'll be looking at goal_alternative* in find_reloads. Again, my memories are getting stale here, so double-check the meaning of reload_reg_rtx ;-) jeff
Re: incremental compiler project
On 09/03/2015 10:36 AM, Manuel López-Ibáñez wrote: On 02/09/15 22:44, David Kunsman wrote: Hello, I just read over the incremental compiler project on the gcc wiki...and I am excited to try to finish it. I am just wondering if it is even wanted anymore because it is 7-8 years old. Does anybody know if this project is wanted anymore? The overall goal of the project is worthwhile, however, it is unclear whether the approach envisioned in the wiki page will lead to the desired benefits. See http://tromey.com/blog/?p=420 which is the last status report that I am aware of. In addition to that, the implementation itself would already be incredibly challenging for anyone with many years of experience in GCC. Agreed. I think the google project went further, but with Lawrence retiring, I think it's been abandoned. Jeff
Re: incremental compiler project
On 09/04/2015 09:40 AM, David Kunsman wrote: what do you think about the sub project in the wiki: Parallel Compilation: One approach is to make the front end multi-threaded. (I've pretty much abandoned this idea. There are too many mutable tree fields, making this a difficult project. Also, threads do not interact well with fork, which is currently needed by the code generation approach.) You should get in contact with David Malcolm as these issues are directly related to his JIT work. This will entail removing most global variables, marking some with __thread, and wrapping a few with locks. Yes, but that's work that is already in progress. Right now David's got a big log and context switch in place, but we really want to drive down the amount of stuff in that context switch. Jeff
Re: incremental compiler project
On 09/04/2015 10:14 AM, Jonathan Wakely wrote: On 4 September 2015 at 16:57, Manuel López-Ibáñez wrote: Clang++ is much faster yet it is doing more and tracking more data than cc1plus. How much faster these days? In my experience for optimized builds of large files the difference is not so impressive (for unoptimized builds clang is definitely much faster). Which would generally indicate that the front-end and mandatory parts of the middle/backend are slow for GCC (relatively to clang/llvm), but the optimizers in GCC are faster. That wouldn't be a huge surprise given home much time has been spent trying to keep the optimizers fast. jeff
Re: How to allocate memory safely in RTL, preferably on the stack? (relating to the RTL-level if-converter)
On 09/08/2015 12:05 PM, Abe wrote: Dear all, In order to be able to implement this idea for stores, I think I need to make some changes to the RTL if-converter such that it will sometimes add -- to the code being compiled -- a new slot/variable in the stack frame. This memory needs to be addressable via a pointer in the code being generated, so AFAIK just allocating a new pseudo-register won`t work and AFAIK using an RTL "scratch" register also won`t work. I also want to do my best to ensure that this memory is thread-local. For those reasons, I`m asking about the stack. Look at assign_stack_local. Jeff
Re: Combined top-down and bottom-up instruction scheduler
On 09/08/2015 12:39 PM, Aditya K wrote: IIUC, in the haifa-sched.c, the default scheduling algorithm seems to be top-down (before reload). Is there a way to schedule the other way (bottom up), or both ways? Not that I'm aware of. Note that region scheduling allows insns to move between basic blocks to help fill the bubbles that can occur at the end of a block. As a use case for bottom-up or some other heuristic: Currently, the first priority in the selection is given to the longest path, in some cases this may produce code with stalls at the end of the basic block. Whereas in the case of combined top-down + bottom-up scheduling we would end up having stalls in the middle of the basic block. GCC's original scheduler worked bottom-up until ~1997. IBM Haifa's work turned it into a top-down model and was a small, but clear improvement. There's certainly better things that can be done than strictly top-down or bottom-up, but revamping the scheduler again hasn't been seen as a major win for the most common processors GCC targets these days. Thus it hasn't been a significant area of focus. Jeff
Re: Why scheduler do not re-emit REG_DEAD notes?
On 09/07/2015 10:05 AM, Konstantin Vladimirov wrote: Hi, In private backend for GCC 5.2.0, we do have target-specific scheduler (running in TARGET_SCHED_FINISH hook), that do some instruction packing/pairing on sched2 and relies on REG_DEAD notes, that should be correct. But they aren't because inside haifa-sched.c, that is being run first in the sched2 pass, reemit_notes function processes only REG_SAVE_NOTE case, and, after this scheduler, some insns with REG_DEAD on register, say r1, might be moved before previous r1 usage (input dependency case) and things become totally wrong. Now I appllied some minimal patch locally to fix it (just added separate REG_DEAD case). But may be it is part of design and may be it is generally true, that we can't rely on correct REG_DEAD notes in platform-specific scheduler? You can not rely on death notes within the scheduler. That's been in its design as long as I can remember (circa 1992). jeff
Re: Combined top-down and bottom-up instruction scheduler
On 09/08/2015 01:40 PM, Vladimir Makarov wrote: As I remember it is was written by Mike Tiemann. Correct. Bottom-up scheduler as a rule generates worse code than top-down one. Indeed that was one of the key things we were looking to get from the Haifa scheduler along with improved superscalar support some support for region scheduling & speculation. Yes, that is true for OOO execution processors which can rearrange insns and execute them speculatively looking through several branches. For such processors, software pipelining is more important as the processors can look only through a few branches as software pipelining could look through any number of branches. That is why Intel compiler did not have any insn scheduler (but had software pipelining) until Intel Atom introduction which was originally in-order processor. Correct. Latency scheduling just isn't that important for OOO and instead you look at scheduling to mitigate costs for large latency operations (ie, cache miss and transcendental functions). You might also attack secondary issues like throughput at the retirement stage for example. Actually, I believe dealing with variable/unknown latency of load insns (depending where data are placed in a cache or memory) would be more important than bottom-up or hybrid scheduler. Agreed. This is in-line with what the HP guys were seeing as they transitioned to the PA8000. A balanced scheduling dealing with this problem was implemented by Alexander Monakov about 7-8 years ago as a google internship work but it was not included as at that time its advantages was not confirmed on SPEC2000. It would be interesting to reconsider and re-evaluate it on modern processors and scientific benchmarks with big data. Agreed. For in-order processors, we also have another scheduler (selective one) which does additional transformations (like register renaming and non-modulo software pipelining) which could be more important than top-down/bottom-up scheduling. And it gave 1-2% improvement on Itanium SPEC2000 in comparison with haifa scheduler. Right. Jeff
Re: Combined top-down and bottom-up instruction scheduler
On 09/08/2015 01:24 PM, Aditya K wrote: Subject: Re: Combined top-down and bottom-up instruction scheduler To: hiradi...@msn.com; gcc@gcc.gnu.org CC: vmaka...@redhat.com From: l...@redhat.com Date: Tue, 8 Sep 2015 12:51:24 -0600 On 09/08/2015 12:39 PM, Aditya K wrote: IIUC, in the haifa-sched.c, the default scheduling algorithm seems to be top-down (before reload). Is there a way to schedule the other way (bottom up), or both ways? Not that I'm aware of. Note that region scheduling allows insns to move between basic blocks to help fill the bubbles that can occur at the end of a block. As a use case for bottom-up or some other heuristic: Currently, the first priority in the selection is given to the longest path, in some cases this may produce code with stalls at the end of the basic block. Whereas in the case of combined top-down + bottom-up scheduling we would end up having stalls in the middle of the basic block. GCC's original scheduler worked bottom-up until ~1997. IBM Haifa's work turned it into a top-down model and was a small, but clear improvement. There's certainly better things that can be done than strictly top-down or bottom-up, but revamping the scheduler again hasn't been seen as a major win for the most common processors GCC targets these days. Thus it hasn't been a significant area of focus. Do you have pointers on places to look for if I want to explore bottom-up, or maybe a combination of the two. Not immediately handy. I'd comb through PLDI through the 1990s and early 2000s and possibly Morgan's compiler book. jeff
Re: Combined top-down and bottom-up instruction scheduler
On 09/08/2015 03:12 PM, Evandro Menezes wrote: cache miss and transcendental functions). You might also attack secondary issues like throughput at the retirement stage for example. Our motivation stems from the fact that even modern, aggressively OOO processors don't have orthogonal resources. Some insns depend on expensive circuitry (area or power wise) that is added only once, making such insns simply scalar, though most of other insns enjoy multiple resources capable of executing them as superscalar. That's why we believe that a hybrid approach might yield good results. We don't have data, for it possibly requires implementing it first. I'd also argue that looking at an OOO pipeline in a steady state is not the only approach. It's also important to consider how quickly the pipeline can be replenished or warmed up to reach a steady state. Which is why I mentioned optimizing for throughput at the retirement stage rather than traditional latency scheduling. That's from a real world case -- the PA8000 where retirement bandwidth was at a premium (relative to functional unit bandwidth). jeff
Re: Advertisement in the GCC mirrors list
On 09/09/2015 10:41 AM, Jonathan Wakely wrote: Gerald, I think we've had similar issues with these mirrors in the past as well, shall we just remove them from the list? Please do. jeff
Re: Ubsan build of GCC 6.0 fails with: cp/search.c:1011:41: error: 'otype' may be used uninitialized in this function
On 09/09/2015 01:17 PM, Martin Sebor wrote: On 09/09/2015 12:36 PM, Toon Moene wrote: See: https://gcc.gnu.org/ml/gcc-testresults/2015-09/msg00699.html Full error message: /home/toon/compilers/trunk/gcc/cp/search.c: In function 'int accessible_p(tree, tree, bool)': /home/toon/compilers/trunk/gcc/cp/search.c:1011:41: error: 'otype' may be used uninitialized in this function [-Werror=maybe-uninitialized] dfs_accessible_data d = { decl, otype }; ^ Any ideas ? It looks as though GCC assumes that TYPE can be null even though it can't (if it was, TYPE_P (type) would then dereference a null pointer). As a workaround until this is fixed, initializing OTYPE with type instead of in the else block should get rid of the error. Here's a small test case that reproduces the bogus warning: cat t.c && /build/gcc-trunk/gcc/xg++ -B /build/gcc-trunk/gcc -Wmaybe-uninitialized -O2 -c -fsanitize=undefined t.c struct S { struct S *next; int i; }; int foo (struct S *s) { int i; if (s->i) { struct S *p; for (p = s; p; p = p->next) i = p->i; } else i = 0; return i; } t.c: In function ‘int foo(S*)’: t.c:14:12: warning: ‘i’ may be used uninitialized in this function [-Wmaybe-uninitialized] return i; More likely than not, the sanitization bits get in the way of VRP + jump threading rotating the loop. jeff
Re: Using the asm suffix
On 09/07/2015 06:56 PM, David Wohlferd wrote: In order for the doc maintainers to approve this patch, I need to have someone sign off on the technical accuracy. Now that I have included the points we have discussed (attached), hopefully we are there. Original text: https://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html Proposed text: http://limegreensocks.com/gcc/Asm-Labels.html Still pending is the line I removed about 'static variables in registers' that belongs in the Reg Vars section. I have additional changes I want to make to Reg Vars sections, so once this patch is accepted, I'll post that work. dw AsmLabels4.patch Index: extend.texi === --- extend.texi (revision 226751) +++ extend.texi (working copy) OK. Please install. jeff
Re: How to allocate memory safely in RTL, preferably on the stack? (relating to the RTL-level if-converter)
On 09/10/2015 12:28 PM, Abe wrote: On 9/8/15 1:12 PM, Jeff Law wrote: Look at assign_stack_local. Thanks very much! The above was very helpful, and I have started to make some progress on this work. I`ll report back when I have much more progress. Would you like me to CC further emails about this work to "l...@redhat.com"? The list is probably more appropriate. I suspect Bernd will probably want to get involved as well now that he's getting back into the swing of things. Jeff
Re: Replacing malloc with alloca.
On 09/13/2015 12:28 PM, Florian Weimer wrote: * Ajit Kumar Agarwal: The replacement of malloc with alloca can be done on the following analysis. If the lifetime of an object does not stretch beyond the immediate scope. In such cases the malloc can be replaced with alloca. This increases the performance to a great extent. You also need to make sure that the object is small (less than a page) and that there is no deep recursion going on. Otherwise, the program may no longer work after the transformation with real-world restricted stack sizes. It may even end up with additional security issues. You also have to make sure you're not inside a loop. Even a small allocation inside a loop is problematical from a security standpoint. You also need to look at what other objects might be on the stack and you have to look at the functional scope, not the immediate scope as alloca space isn't returned until the end of a function. jeff
Re: Replacing malloc with alloca.
On 09/14/2015 02:14 AM, Richard Earnshaw wrote: On 13/09/15 20:19, Florian Weimer wrote: * Jeff Law: On 09/13/2015 12:28 PM, Florian Weimer wrote: * Ajit Kumar Agarwal: The replacement of malloc with alloca can be done on the following analysis. If the lifetime of an object does not stretch beyond the immediate scope. In such cases the malloc can be replaced with alloca. This increases the performance to a great extent. You also need to make sure that the object is small (less than a page) and that there is no deep recursion going on. Otherwise, the program may no longer work after the transformation with real-world restricted stack sizes. It may even end up with additional security issues. You also have to make sure you're not inside a loop. Even a small allocation inside a loop is problematical from a security standpoint. You also need to look at what other objects might be on the stack and you have to look at the functional scope, not the immediate scope as alloca space isn't returned until the end of a function. Ah, right, alloca is unscoped (except when there are variable-length arrays). Using a VLA might be the better approach (but the size concerns remain). Introducing VLAs could alter program behavior in case a pre-existing alloca call, leading to premature deallocation. You also have to consider that code generated for functions containing alloca calls can also be less efficient than for functions that do not call it (cannot eliminate frame pointers, for example). So I'm not convinced this would necessarily be a performance win either. Yes, but I suspect that eliminating a single malloc/free pair dwarfs the cost of needing a frame pointer. The problem is proving when its safe to turn a malloc/free into an alloca. As folks have shown, it's non-trivial when the security aspects are considered. I've speculated that from a security standpoint that projects ought to just ban alloca, particularly glibc. It's been shown over and over again that folks just don't get it right and its ripe for exploitation. It'd be a whole lot easier to convince folks to go this direction if GCC was good about that kind of optimization. Jeff
Re: dejagnu version update?
On 09/15/2015 01:23 PM, Bernhard Reutner-Fischer wrote: On September 15, 2015 7:39:39 PM GMT+02:00, Mike Stump wrote: On Sep 14, 2015, at 3:37 PM, Jeff Law wrote: Maybe GCC-6 can bump the required dejagnu version to allow for getting rid of all these superfluous load_gcc_lib? *blink* :) I'd support that as a direction. Certainly dropping the 2001 version from our website in favor of 1.5 (which is what I'm using anyway) would be a step forward. So, even ubuntu LTS is 1.5 now. No harm in upgrading the website to 1.5. I don’t know of any reason to not update and just require 1.5 at this point. I’m not a fan of feature chasing dejagnu, but an update every 2-4 years isn’t unreasonable. So, let’s do it this way… Any serious and compelling reason to not update to 1.5? If none, let’s update to 1.5 in another week or two, if no serious and compelling reasons not to. My general plan is, slow cycle updates on dejagnu, maybe every 2 years. LTS style releases should have the version in it before the requirement is updated. I take this approach as I think this should be the maximal change rate of things like make, gcc, g++, ld, if possible. Yea, although this means that 1.5.3 (a Version with the libdirs tweak) being just 5 months old will have to wait another bump, I fear. For my part going to plain 1.5 is useless WRT the load_lib situation. I see no value in conditionalizing simplified libdir handling on a lucky user with recentish stuff so i'm just waiting another 2 or 4 years for this very minor cleanup. Given we haven't updated the dejagnu reqs since ~2001, I think stepping forward would be appropriate and I'd support moving all the way to 1.5.3 with the expectation that we'll be on a cadence of no faster than 2 years going forward. jeff