Melt-building problem
Hello, I tried to compile the gcc-melt branch from svn, but i get the following error: make warmelt1 make[4]: Entering directory `/home/wolfgang/gcc-melt/objects/gcc' date +"/* empty-file-for-melt.c %c */" > empty-file-for-melt.c-tmp /bin/bash ../../melt-branch/gcc/../move-if-change empty-file-for-melt.c-tmp empty-file-for-melt.c make -f ../../melt-branch/gcc/melt-module.mk VPATH=../../melt-branch/gcc:. meltmodule \ GCCMELT_CFLAGS="-g -O2 -fomit-frame-pointer -g -g -O2 -fomit-frame-pointer -gtoggle -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include -I." \ GCCMELT_MODULE_SOURCE=../../melt-branch/gcc/melt/generated/warmelt-first.0.c GCCMELT_MODULE_BINARY=warmelt-first.0.so make[5]: Entering directory `/home/wolfgang/gcc-melt/objects/gcc' gcc -g -O2 -fomit-frame-pointer -g -g -O2 -fomit-frame-pointer -gtoggle -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include -I. -fPIC -c -o warmelt-first.0.pic.o ../../melt-branch/gcc/melt/generated//warmelt-first.0.c cc1: error: unrecognised debug output level "toggle" make[5]: *** [warmelt-first.0.pic.o] Fehler 1 make[5]: Leaving directory `/home/wolfgang/gcc-melt/objects/gcc' make[4]: *** [warmelt-first.0.so] Fehler 2 make[4]: Leaving directory `/home/wolfgang/gcc-melt/objects/gcc' make[3]: *** [melt.encap] Fehler 2 make[3]: Leaving directory `/home/wolfgang/gcc-melt/objects/gcc' make[2]: *** [all-stage2-gcc] Fehler 2 make[2]: Leaving directory `/home/wolfgang/gcc-melt/objects' make[1]: *** [stage2-bubble] Fehler 2 make[1]: Leaving directory `/home/wolfgang/gcc-melt/objects' make: *** [all] Fehler 2 wolfg...@debian:~/gcc-melt/objects$ any advice? Thanks Wolfgang -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Re: Melt-building problem
Original-Nachricht > Datum: Tue, 25 May 2010 16:20:14 +0200 > Von: Basile Starynkevitch > An: Wolfgang > CC: gcc@gcc.gnu.org > Betreff: Re: Melt-building problem > On Tue, 2010-05-25 at 12:03 +0200, Wolfgang wrote: > > Hello, > > > > I tried to compile the gcc-melt branch from svn, > > A big thanks for testing GCC MELT! What svn revision of the MELT branch > are you testing? Did you configure gcc-melt with --enable-bootstrap? > > Can you reproduce the bug inside a clean build tree? (I mean, removing > all your build tree, and start again the configure & the build) yes, it is reproduceable...i cleaned all and tried to build that new > > > but i get the following error: > > > > > > make warmelt1 > > make[4]: Entering directory `/home/wolfgang/gcc-melt/objects/gcc' > > date +"/* empty-file-for-melt.c %c */" > empty-file-for-melt.c-tmp > > /bin/bash ../../melt-branch/gcc/../move-if-change > empty-file-for-melt.c-tmp empty-file-for-melt.c > > make -f ../../melt-branch/gcc/melt-module.mk > VPATH=../../melt-branch/gcc:. meltmodule \ > > GCCMELT_CFLAGS="-g -O2 -fomit-frame-pointer -g -g -O2 > -fomit-frame-pointer -gtoggle -DIN_GCC -DHAVE_CONFIG_H -I > melt-private-build-include > -I." \ > > > GCCMELT_MODULE_SOURCE=../../melt-branch/gcc/melt/generated/warmelt-first.0.c > GCCMELT_MODULE_BINARY=warmelt-first.0.so > > make[5]: Entering directory `/home/wolfgang/gcc-melt/objects/gcc' > > gcc -g -O2 -fomit-frame-pointer -g -g -O2 -fomit-frame-pointer -gtoggle > -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include -I. -fPIC -c -o > warmelt-first.0.pic.o ../../melt-branch/gcc/melt/generated//warmelt-first.0.c > > cc1: error: unrecognised debug output level "toggle" > > Which gcc is this one? (What does gcc -v tells you?). At the moment, i have the gcc 4.4 installed: wolfg...@debian:~/gcc-melt/objects$ gcc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.4-1' --with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.4 --program-suffix=-4.4 --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc --enable-targets=all --with-arch-32=i486 --with-tune=generic --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.4.4 (Debian 4.4.4-1) The Melt-Branch was configuired with: ../melt-branch/configure --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-nls --enable-objc-gc --enable-mpfr --prefix=/usr/lib/test2 --enable-plugin --enable-lto --with-ppl --enable-bootstrap I now tried to compile it with the new gcc-4.5 Thanks Wolfgang > > I hoped to have corrected this bug by adding the MELTHERE_CFLAGS in GCC > MELT's gcc/Makefile.in near line 5076. I am not a guru in autoconf + GNU > make tricks. Apparently, something is still wrong. > > A dirty workaround might be to replace every -gtoggle occurrence in the > build tree gcc/Makefile with -g. > > I will try to reproduce that bug! > > Thanks for reporting it. > > > BTW, I am surprised that GCC (even a plain 4.4 or 4.5) issues an error > for an unrecognised debug output level. I would imagine it would in that > case issue a warning, and try to do what -g does... > > Cheers. > > > -- > Basile STARYNKEVITCH http://starynkevitch.net/Basile/ > email: basilestarynkevitchnet mobile: +33 6 8501 2359 > 8, rue de la Faiencerie, 92340 Bourg La Reine, France > *** opinions {are only mines, sont seulement les miennes} *** > -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Re: Melt-building problem
Hallo, i built gcc-melt sucessfully with a new gcc-4.5 compiler from scratch. The svn of melt is: URL: svn://gcc.gnu.org/svn/gcc/branches/melt-branch Basis des Projektarchivs: svn://gcc.gnu.org/svn/gcc UUID des Projektarchivs: 138bc75d-0d04-0410-961f-82ee72b054a4 Revision: 159823 Knotentyp: Verzeichnis Plan: normal Letzter Autor: bstarynk Letzte geänderte Rev: 159667 Letztes Änderungsdatum: 2010-05-21 16:44:05 +0200 (Fr, 21. Mai 2010) The configuration of melt ist: Using built-in specs. COLLECT_GCC=./gcc COLLECT_LTO_WRAPPER=/usr/lib/test/libexec/gcc/i686-pc-linux-gnu/4.6.0/lto-wrapper Target: i686-pc-linux-gnu Configured with: ../melt-branch/configure --enable-languages=c,c++ --enable-shared --enable-threads=posix --enable-nls --enable-objc-gc --enable-mpfr --prefix=/usr/lib/test --enable-plugin --enable-lto --enable-checking --enable-tree-browse --enable-tree-checking --with-ppl --disable-bootstrap Thread model: posix gcc version 4.6.0 20100406 (experimental) (GCC) Now i can not reproduce the error any more Thanks a lot for your help Wolfgang > > > > > > > but i get the following error: > > > > > > > > > > > > make warmelt1 > > > > make[4]: Entering directory `/home/wolfgang/gcc-melt/objects/gcc' > > > > date +"/* empty-file-for-melt.c %c */" > empty-file-for-melt.c-tmp > > > > /bin/bash ../../melt-branch/gcc/../move-if-change > > > empty-file-for-melt.c-tmp empty-file-for-melt.c > > > > make -f ../../melt-branch/gcc/melt-module.mk > > > VPATH=../../melt-branch/gcc:. meltmodule \ > > > > GCCMELT_CFLAGS="-g -O2 -fomit-frame-pointer -g -g -O2 > > > -fomit-frame-pointer -gtoggle -DIN_GCC -DHAVE_CONFIG_H -I > melt-private-build-include > > > -I." \ > > > > > > > > GCCMELT_MODULE_SOURCE=../../melt-branch/gcc/melt/generated/warmelt-first.0.c > GCCMELT_MODULE_BINARY=warmelt-first.0.so > > > > make[5]: Entering directory `/home/wolfgang/gcc-melt/objects/gcc' > > > > gcc -g -O2 -fomit-frame-pointer -g -g -O2 -fomit-frame-pointer > -gtoggle > > > -DIN_GCC -DHAVE_CONFIG_H -I melt-private-build-include -I. -fPIC -c -o > > > warmelt-first.0.pic.o > ../../melt-branch/gcc/melt/generated//warmelt-first.0.c > > > > cc1: error: unrecognised debug output level "toggle" > > > Perhaps re-merging the current MELT branch into your private branch > (assuming you have a private MELT variant) might help, because on my > side with GCCMELT_CC set to gcc-4.5 the make log contains > > > make warmelt1 > make[4]: Entering directory `/usr/src/Lang/_MeltBoot/Obj/gcc' > date +"/* empty-file-for-melt.c %c */" > empty-file-for-melt.c-tmp > /bin/bash ../../melt-branch/gcc/../move-if-change > empty-file-for-melt.c-tmp empty-file-for-melt.c > make -f ../../melt-branch/gcc/melt-module.mk > VPATH=../../melt-branch/gcc:. meltmodule \ > GCCMELT_CFLAGS="-g -fkeep-inline-functions -g > -fkeep-inline-functions -DIN_GCC -DHAVE_CONFIG_H -I > melt-private-build-include -I." \ > > GCCMELT_MODULE_SOURCE=../../melt-branch/gcc/melt/generated/warmelt-first.0.c > GCCMELT_MODULE_BINARY=warmelt-first.0.so > make[5]: Entering directory `/usr/src/Lang/_MeltBoot/Obj/gcc' > gcc-4.5 -g -fkeep-inline-functions -g -fkeep-inline-functions -DIN_GCC > -DHAVE_CONFIG_H -I melt-private-build-include -I. -fPIC -c -o > warmelt-first.0.pic.o > ../../melt-branch/gcc/melt/generated//warmelt-first.0.c > gcc-4.5 -g -fkeep-inline-functions -g -fkeep-inline-functions -DIN_GCC > -DHAVE_CONFIG_H -I melt-private-build-include -I. -fPIC -c -o > warmelt-first.0 > +01.pic.o ../../melt-branch/gcc/melt/generated//warmelt-first.0+01.c > echo '/*' generated file ./warmelt-first.0-stamp.c '*/' > > warmelt-first.0-stamp.c-tmp > date "+const char melt_compiled_timestamp[]=\"%c \";" >> > warmelt-first.0-stamp.c-tmp > echo "const char melt_md5[]=\"\\" >> warmelt-first.0-stamp.c-tmp > for f > in ../../melt-branch/gcc/melt/generated/warmelt-first.0.c > ../../melt-branch/gcc/melt/generated/warmelt-first.0+01.c; do \ > md5line=`md5sum $f` ; \ > printf "%s\\\n" $md5line >> warmelt-first.0-stamp.c-tmp; \ > done > echo "\";" >> warmelt-first.0-stamp.c-tmp > echo "const char melt_csource[]= > \"../../melt-branch/gcc/melt/generated/warmelt-first.0.c > ../../melt-branch/gcc/melt/generated/warmelt-first.0+01.c\";" >> > warmelt-first.0-stamp.c-tmp > mv warmelt-first.0-stamp.c-tmp warmelt-first.0-stamp.c > gcc-4.5 -g -fkeep-inline-functio
Code Instrumentation
Hallo, i would like to instrument some existing code. For example, after an ADD-EXPR: int main() { int a=5; int b=5; int c = a + b; ... } should become: ... int c = a + b; puts("ADD-EXPR"); ... I thought writing a Gimple-pass would be best, but i don't know exactly where to start. I'm able to walk throug the Gimple-Statements and debug it, but i'm not able to insert something. Is there any source-code available? Thanks Wolfgang -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
What is the right usage of SAVE_EXPR?
What is the policy concerning the usage of SAVE_EXPRs? Who is responsible for inserting them? I thought the respective language front end were responsible to enclose any expressions with side effects this way, so that later parts of GCC know how to treat these expressions right. However, also some of the code translating tree nodes into rtxes like some functions found in builtins.c worry about the re-evaluation of arguments and insert plenty of SAVE_EXPRs. Why is that necessary? With best regards, Wolfgang Gellerich --- Dr. Wolfgang Gellerich IBM Deutschland Entwicklung GmbH Schönaicher Strasse 220 71032 Böblingen, Germany Tel. +49 / 7031 / 162598 [EMAIL PROTECTED] === IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Johann Weihen Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Mainline bootstrap failure in tree-ssa-pre.c:create_value_expr_from
For the last few days, since April 8th, I get bootstrap failures on mainline like this: stage1/xgcc -Bstage1/ -B/ices/bangerth/tmp/build-gcc/gcc-install/i686-pc-linux-gnu/bin/ -c -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros -Wold-style-definition -Werror -fno-common -DHAVE_CONFIG_H-I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include ../../gcc/gcc/tree-ssa-pre.c -o tree-ssa-pre.o ../../gcc/gcc/tree-ssa-pre.c: In function 'execute_pre': ../../gcc/gcc/tree-ssa-pre.c:1812: sorry, unimplemented: inlining failed in call to 'create_value_expr_from': recursive inlining ../../gcc/gcc/tree-ssa-pre.c:1853: sorry, unimplemented: called from here make[1]: *** [tree-ssa-pre.o] Error 1 The failure happens in a piece of code that was added here: http://gcc.gnu.org/ml/gcc-bugs/2005-04/msg00337.html by Dan Berlin (it introduced the recursive calls), though it certainly isn't the cause, just the trigger. It also was added already April 4th. Since this has been happening for the last 10 days for me, I start to believe that I may be the only one seeing this. Anyone any explanations? Thanks Wolfgang PS: System is "Linux terra.ices.utexas.edu 2.4.25-13mdkenterprise #1 SMP Tue Jan 18 14:02:17 MST 2005 i686 unknown unknown GNU/Linux", the bootstrap compiler is Mandrake's gcc 3.3.2. ----- Wolfgang Bangerth email:[EMAIL PROTECTED] www: http://www.ices.utexas.edu/~bangerth/
Re: Mainline bootstrap failure in tree-ssa-pre.c:create_value_expr_from
Isn't this the normal always_inline problem from the kernel headers? Yes, good spot. Thanks for the help! W. - Wolfgang Bangerth email:[EMAIL PROTECTED] www: http://www.ices.utexas.edu/~bangerth/
Re: Proposed resolution to aliasing issue.
> In short, the issue is, when given the following code: > > > struct A {...}; > struct B { ...; struct A a; ...; }; > > > void f() { >B b; >g(&b.a); > } > > does the compiler have to assume that "g" may access the parts of "b" > outside of "a". I understand that you are talking about ISO C, but one relevant case (in C++) to look out for that is similar is this one, which certainly constitutes legitimate and widespread use of language features: class A {...}; class B : public A { ... }; void f() { B b; g (static_cast (&b)); } void g(A *a) { B *b = dynamic_cast(a); // do what you please with the full object B } dynamic_cast<> was invented for the particular reason to allow such constructs. I admit ignorance how exactly the C++ FE describes base class information (as opposed to structure member information), but the aliasing code what have to know about the difference. Best Wolfgang - Wolfgang Bangerth email:[EMAIL PROTECTED] www: http://www.ices.utexas.edu/~bangerth/
Re: Proposed resolution to aliasing issue.
Mark, it occurred to me that asking the question you pose may use language that is more unfamiliar than necessary. How about this question instead -- assume struct S { int s; }; struct X { int i; struct S s; }; void g(struct S*); void f() { X x; g(&x.s); } Would the compiler be allowed to realize that X::i is never referenced and therefore a dead variable? I assume the compiler doesn't do that right now, but it would be straightforward for a scalar replacement algorithm to not even allocate stack space for X::i, but only X::s, and hand the address of the only remaining stack object, of type S, to g(). The community at large may have more experience with such "as-if" related questions. It would be interested to know whether the scalarizers in gcc realize, for example, whether they can/can't get rid of X::i... Best Wolfgang ----- Wolfgang Bangerth email:[EMAIL PROTECTED] www: http://www.ices.utexas.edu/~bangerth/
No documentation of -rdynamic
Hi all, in order for the glibc function backtrace() to return something useful, its documentation says one has to use the -rdynamic flag. However, as has been mentioned before here http://gcc.gnu.org/ml/gcc-help/2002-11/msg00196.html http://gcc.gnu.org/ml/libstdc++/2002-04/msg00100.html and probably some other places, there doesn't seem to be any documentation of what this flag does etc. Is there someone who can give me the gist of its meaning? If I get a reasonable explanation, I may even be willing to write a blurb for the manual... Best Wolfgang - Wolfgang Bangerth email:[EMAIL PROTECTED] www: http://www.ices.utexas.edu/~bangerth/
Bug or feature: symbol names of global/extern variables
Hello, I don't know whether it is a bug or feature and I searched through the mailing lists without success therefor I write my question this way: If you have a global variable inside a cpp file and create a library out of that, the symbol name for that global variable does in no way take the variable type into account. A user of that variable can "make" it any type with its extern declaration and thus produce subtle errors. An example: lib.cpp: int maximum; int minimum; static bool init ( ) { maximum = 2; minimum = -7; } static bool initialized = init ( ); --- Create a library out of that lib.cpp file. Then compile the following main.cpp and link it against the library: main.cpp: --- extern double maximum; extern intminimum; void main (int, char**) { // Assume you are on a machine where the sizeof (int) is 4 bytes // and the sizeof (double) is 8 bytes. assert (minimum == -7); { maximum = 2342343242343.3; } assert (minimum == -7); return 0; } --- The main.o will perfectly link with the library although main.o needs a double variable named maximum and the lib only offers an int variable named maximum. Because the symbol name does in no way reflect the variable type, everything links fine but in fact the variable minimum gets scrambled in this example because maximum is accessed as if it is a double variable thus overwriting 4 additional bytes (in the case the 4 bytes of the variable minimum). The assertion will show that. I tested that on Windows with Visual C++ as well and there main.obj won't link because the variable type is part of the symbol name and everthing is fine. I think it would be very very important for the binary interface (ELF here, or?) to have that feature as well. What do you think? Regards, Wolfgang Roemer
Re: Bug or feature: symbol names of global/extern variables
Hello Michael, first of all: Thanks for the fast reply! On Thu Oct 06, 2005 10:33, you wrote: >> [..] >> >> It's a feature. It is undefined behavior to have conflicting declarations >> in different translation units. >> [...] Well, but shouldn't there at least be a warning during linking!? >> [..] >> In that case, how does VC++ implement cout,cin, construction? >> In libstdc++ (well, at least in gcc-3.4) it is implemented by doing >> something like: >> >> namespace std{ >> ... >> >> // Note that this is different from 's definition of cin >> // (it's declared as "extern istream cin" in there). >> char cin[ sizeof(istream) ]; >> ... >> ios::Init::Init() >> { >>if (count++ == 0) >> new (&cin) istream(cin_constuction_flags); >> >> } I don't know how VC++ implements cout, cin. I just checked the symbol names with the dumpbin.exe tool that is part of the VC++ Suite and there it is clearly marked as "maximum (int)". And during the attempt to link you get a unresolved symbol error saying that main.o needs "maximum (double)" but lib only offers "maximum (int)" and that's very helpful. I encountered this behaviour on Linux because of a very strange SEGV and I was finally able to track that down to an extern variable that was used in the wrong way and thus I found the mentioned behaviour. I did not take a look at the VC++ libc implementation etc. I just checked it from the user perspective. Thanks, WR
Re: Bug or feature: symbol names of global/extern variables
Hello, so it seems as if it would be best if I post that to the binutils mailing list. Agreed? WR On Thu Oct 06, 2005 11:57, Robert Dewar wrote: >> Michael Veksler wrote: >> > It sounds as if the symbol is still "maximum" and it is annotated with >> > its type (something like debug information). I should be possible to >> > hack the linker to emit a warning for symbols with conflicting debug >> > information. >> >> Nice idea! >> >> > This is the wrong list for linker enhancements. You should look for >> > binutils mailing lists. However "collect2" which is part of gcc and is >> > called before the linker (for C++)- could also detect this and give >> > the same warning. I would bet that collect2 is the wrong place for >> > this enhancement because it will work only for C++, not for C. >> >> If the linker did this, then it would even work across languages, >> e.g. importing a C symbol from an Ada unit, and vice versa. >> >> > Michael
Bug or feature: symbol names of global/extern variables
Hello, I encountered a subtle SEGV in a program and was able to track the problem down to symbol names concerning global/extern variables. I discussed that with some guys from the GCC project (see recipient list) and we came to the conclusion it would make more sense to share our thoughts with you. Here the problem: If you have a global variable inside a cpp file and create a library out of that, the symbol name for that global variable does in no way take the type of the variable into account. A user of that variable can "make" it any type with an "extern" declaration and thus produce subtle errors. An example: lib.cpp int maximum; int minimum; static bool init ( ) { maximum = 2; minimum = -7; } static bool initialized = init ( ); --- Create a library out of that lib.cpp file. Then compile the following main.cpp and link it against the library: - main.cpp -- extern double maximum; extern int minimum; void main (int, char**) { // Assume you are on a machine where the sizeof (int) is 4 bytes // and the sizeof (double) is 8 bytes. assert (minimum == -7); { maximum = 2342343242343.3; } assert (minimum == -7); return 0; } - The main.o will perfectly link with the library although main.o needs a double variable named maximum and the lib only offers an int variable named maximum. Because the symbol name does in no way reflect the variable type, everything links fine but in fact the variable named "minimum" gets scrambled in this example because "maximum" is accessed as if it is a double variable thus overwriting 4 additional bytes (in this case the 4 bytes of the variable minimum). The assertion will show that. I tested that on Windows with Visual C++ as well and there main.obj doesn't link because the variable type is part of the symbol name and everthing is fine. I think it would be very very important for the binary interface to have that feature as well. Regards, Wolfgang Roemer
Re: Bug or feature: symbol names of global/extern variables
On Thu Oct 06, 2005 14:50, Robert Dewar wrote: >> [..] >> >> I actually disagree with this, I think attempting to make the link fail >> here would be a mistake. Why do you think that this would be a mistake? WR
Re: Bug or feature: symbol names of global/extern variables
Hello Michael, On Thu Oct 06, 2005 15:54, Michael Veksler wrote: [..] >> 2. I think that it will break C. As I remember, it is sometimes >> legal in C (or in some dialects of C) to have conflicting types. >> You may define in one translation unit: >> char var[5]; >> and then go on and define in a different translation unit: >> char var[10]; >> The linker will merge both declarations and allocate at least >> 10 bytes for 'var' (ld's --warn-common will detect this). that is interesting: If the linker would behave that way, I wouldn't get the error because the needed 8 bytes for a double would be allocated. WR
devbranches: ambigous characterisation of branches
Dear Sir or Madam, in the repository contents description at <https://gcc.gnu.org/svn.html#olddevbranches>, numerous branch names are listed as inactive, with some further comments. Right at the start there is the longest list of such names, followed by "These branches have been merged into the mainline.". Without "preceding" or "following", or at least leading dash or a trailing colon, I'm at a loss whether that refers to the branches named before or after. (The somewhat formal address contributed to landing this message in the SPAM pit? Dear me.) Yours faithfully Wolfgang Hospital -- Wolfgang Hospital
gcc version 4.4.3 (GCC)
config.guess: i686-pc-linux-gnu gcc --v: Using built-in specs. Target: i686-pc-linux-gnu Configured with: ./configure Thread model: posix gcc version 4.4.3 (GCC) packages: gcc-4.4.3.tar.bz2 gcc-core-4.4.3.tar.bz2 gcc-g++-4.4.3.tar.bz2 linux distribution: Ubuntu 8.04.4 LTS \n \l kernel version: Linux HDHN2432 2.6.24-26-generic #1 SMP Tue Dec 1 18:37:31 UTC 2009 i686 GNU/Linux glibc version: Desired=Unknown/Install/ Remove/Purge/Hold | Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==-==-=== ii libc6 2.7-10ubuntu5 GNU C Library: Shared libraries
Tree Browser
Hi, I've tried to use the treebrowser described at http://gcc.gnu.org/projects/tree-ssa/tree-browser.html I configuired gcc-4.5.0 with ... --enable-checking --enable-tree-browse --enable-tree-checking ... Compilation was OK, a gcc/tree-browser.o exists. Now i'm able to launch gdb and step through the code, etc. When I type the suggested command, I get (gdb) p browse_tree (current_function_decl) No symbol "browse_tree" in current context. (gdb) What i'm doing wrong? Any ideas? Thanks, Wolfgang -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
Modifying ARM code generator for elimination of 8bit writes - need help
Hello, I am trying to port big C/C++ programs (see www.dslinux.org) to the nintendo DS console. The console has 4 Mbytes internal memory, and 32 MBytes external memory which is *not* 8bit writable (only 16 and 32 bits). CPU is an ARM 946. Using the external memory for ROM(XIP) and the internal memory for data, linux in console mode is possible, but graphical environments are very limited... The idea to overcome this problem is to a) activate data cache in writeback mode for the external memory. b) modify the gcc code generator. "strb" opcode is transformed to "swpb". swpb will load the cache because of the read-modify-write, and at cache writeback time, the whole cached half-line will be written back, eliminating the 8bit write problem. I have proven the solution with an assembler program, but I think I need some help modifying the compiler I found arm.md and the moveqi insns, but because of the different addressing modes of strb and swpb, its not easy to make the change. And there must be a compiler option for this, too. Could somebody please tell me how to implement this change? regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
On Tuesday 30 May 2006 23:47, Daniel Jacobowitz wrote: > On Tue, May 30, 2006 at 09:03:54PM +0100, Paul Brook wrote: > > > I found arm.md and the moveqi insns, but because of the different > > > addressing modes of strb and swpb, its not easy to make the > > > change. And there must be a compiler option for this, too. > > > > > > Could somebody please tell me how to implement this change? > > > > Short answer is probably not. > > > > There are a couple of complications that spring to mind. The > > different addressing modes and the fact that swp clobbers a > > register are the most immediate ones. > > > > You'll need to modify at least the movqi insn patterns, memory > > constraints and the legitimate address stuff. I'm not sure about > > the clobber, that might need additional reload-related machinery. > > I suspect it would be better to make GCC do halfword stores instead > (read/modify/write). Hmmm... I have thought about that. But how do the compiler know if the byte address is even or odd? Every time testing the LSB of the address and making conditional statements is no joke... regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Paul, thank you for commenting... On Tuesday 30 May 2006 22:03, Paul Brook wrote: > > I found arm.md and the moveqi insns, but because of the different > > addressing modes of strb and swpb, its not easy to make the change. > > And there must be a compiler option for this, too. > > > > Could somebody please tell me how to implement this change? > > Short answer is probably not. > > There are a couple of complications that spring to mind. The > different addressing modes and the fact that swp clobbers a register > are the most immediate ones. > > You'll need to modify at least the movqi insn patterns, memory > constraints and the legitimate address stuff. I'm not sure about the > clobber, that might need additional reload-related machinery. For the first shot, I have changed > (define_insn "*arm_movqi_insn" > [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m") > (match_operand:QI 1 "general_operand" "rI,K,m,r"))] > "TARGET_ARM >&& ( register_operand (operands[0], QImode) > >|| register_operand (operands[1], QImode))" > > "@ >mov%?\\t%0, %1 >mvn%?\\t%0, #%B1 >ldr%?b\\t%0, %1 >str%?b\\t%1, %0" > [(set_attr "type" "*,*,load1,store1") >(set_attr "predicable" "yes")] > ) into > (define_insn "*arm_movqi_insn" > [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,Q") > (match_operand:QI 1 "general_operand" "rI,K,m,+r"))] > "TARGET_ARM >&& ( register_operand (operands[0], QImode) > >|| register_operand (operands[1], QImode))" > > "@ >mov%?\\t%0, %1 >mvn%?\\t%0, #%B1 > ldr%?b\\t%0, %1 >swp%?b\\t%1, %1, [%M0]" > [(set_attr "type" "*,*,load1,store1") >(set_attr "predicable" "yes")] > ) Changing "m" to "Q", narrowing the address modes Changing "r" to "+r", (register is globbered) and of course making the swpb call.. Gcc compiles, but does a segfault while compiling ARM programs. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Rask, On Thursday 01 June 2006 16:13, Rask Ingemann Lambertsen wrote: > I think you will need to remove the '+' as already suggested and add > (clobber (match_scratch:QI "=X,X,X,1")) to tell GCC that the register > allocated to operand 1 is clobbered by the instruction for this > particular alternative. Using (define_insn "*arm_movqi_insn" [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m") (match_operand:QI 1 "general_operand" "rI,K,m,r")) (clobber (match_scratch:QI 2 "=X,X,X,1"))] "TARGET_ARM && ( register_operand (operands[0], QImode) || register_operand (operands[1], QImode))" "@ mov%?\\t%0, %1 mvn%?\\t%0, #%B1 ldr%?b\\t%0, %1 str%?b\\t%1, %0" [(set_attr "type" "*,*,load1,store1") (set_attr "predicable" "yes")] ) (_only_ adding the clobber statement), I get > /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newlib-1.14. >0/newlib/li bc/argz/argz_create_sep.c: In function 'argz_create_sep': > /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newlib-1.14. >0/newlib/li bc/argz/argz_create_sep.c:60: error: unrecognizable insn: > (insn 192 21 24 0 > /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newli > b-1.14.0/newlib/libc/argz/argz_create_sep.c:29 (set (reg:QI 1 r1) > (reg:QI 4 r4)) -1 (nil) > (nil)) > /data1/home/wolfgang/Projekte/DSO/devkitpro/buildscripts/newlib-1.14. >0/newlib/li bc/argz/argz_create_sep.c:60: internal compiler error: in > extract_insn, at recog .c:2020 What do you mean with > You will also have to modify any code which > expands this pattern accordingly. I will use this weekend to digg deeper into the documentation... thank you for your help so far... Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Hello Rask, On Friday 02 June 2006 09:24, Rask Ingemann Lambertsen wrote: > There may be a faster way of seeing if the modification is going to > work for the DS at all. I noticed from the output template > "swp%?b\\t%1, %1, [%M0]" that "swp" takes three operands. I don't > know ARM assembler, but you may be able to choose to always clobber a > specific register. Make it a fixed register (see FIXED_REGISTERS), > refer to this register directly in the output template and don't add > a clobber to the movqi patterns. IMHO, that's an acceptable hack at > an experimental stage. If the resulting code runs correctly on the > DS, you can then undo the FIXED_REGISTERS change and add the clobber > statements. I have tried this. No luck. Problem is the lack of addressing modes for the swp instruction. Only a simple pointer in a register (no offset, no auto-increment is allowed). After reading most of the gcc rtl documentation (and forgetting way too much..) I came up to the following conclusion: Splitting the insn (define_insn "*arm_movqi_insn" [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m") (match_operand:QI 1 "general_operand" "rI,K,m,r"))] into 4 different insns: (define_insn "*arm_movqi_insn" [(set (match_operand:QI 0 "register_operand" "") (match_operand:QI 1 "register_operand" ""))] (define_insn "*arm_movnqi_insn" [(set (match_operand:QI 0 "register_operand" "") (match_operand:QI 1 "constant_operand" ""))] (define_insn "*arm_loadqi_insn" [(set (match_operand:QI 0 "register_operand" "") (match_operand:QI 1 "memory_operand" ""))] (define_insn "*arm_storeqi_insn" [(set (match_operand:QI 0 "memory_operand" "") (match_operand:QI 1 "register_operand" ""))] This should give the same function as before, but I then I can do (define_insn "*arm_storeqi_insn" [(set (match_operand:QI 0 "simple_memory_operand" "") (match_operand:QI 1 "register_operand" ""))] etc to limit the addressing modes of the store insn to the limits of the swpb instruction. And then I can recode the (define_expand "movqi" [(set (match_operand:QI 0 "general_operand" "") (match_operand:QI 1 "general_operand" ""))] to cope with the movqi requirements defined in the gcc manual. Hmmm... wondering who all these xxx_operand functions are defined, and where they are documented... Is this the right way to go? regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Paul, On Sunday 04 June 2006 13:24, Paul Brook wrote: > On Sunday 04 June 2006 11:31, Wolfgang Mües wrote: > > Splitting the insn > > > > (define_insn "*arm_movqi_insn" > > [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m") > > (match_operand:QI 1 "general_operand" "rI,K,m,r"))] > > > > into 4 different insns: > > No. This is completely the wrong approach. Why? I am learning. > You should just change the valid QImode memory addresses, adding a new > constraint if neccessary. H... I have tried this. I have changed the operand constraint from "m" to "Q". But these constraints are only used to select the right alternative inside the insn, not which insn is invoked. It might be possible to modify "nonimmediate_operand" into something else, to select this insn only if the address is fitting in a single register, without offset or increment. But this will not give me the freedom to allocate a temporary register. According to the manual, mov insns are not supposed to clobber a register. I suppose I will have to allocate these registers in (define_expand "movqi" [(set (match_operand:QI 0 "general_operand" "") (match_operand:QI 1 "general_operand" ""))] So I have to narrow down the constraint "nonimmediate_operand", so that every memory address not fitting in a single register will not invoke arm_movqi_insn. Please correct me if I'm wrong. This is my first encounter with the inner contents of gcc. I may have completely missed your point. > You also need to tweak the reload legitimate address bits to obey the > new restrictions. Can you show me what you mean here? What to do where? > For the record these hacks are unlikely to ever be acceptable in > mainline gcc. They're relatively invasive changes who's only purpose > is to support fundamentally broken hardware. Paul, this is clear to me. Homebrew software on the DS is not so important to justify such a change in mainline gcc. A patch will be fine. Its a big amount of - sometimes frustrating - work for a gcc newbie to make this change. I am doing this only because I know it's the only solution, and to turn the command line only DS linux into some nice PDA/browser/wireless client machine. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Paul, On Sunday 04 June 2006 17:57, Paul Brook wrote: > Because then you have several different patterns for the same > operation. The different variants of movsi should be part of the same > pattern so that the compiler can change its mind which variant it > wants to use. Together with the comments of Rask Ingemann (Thanks, Rask!), I understand now what you mean. But regarding the fact that swpb() needs a temporary register - or alternative - clobber the input register - how can I model this behaviour in a single insn? > You're confusing constraints and predicates. general_operand is the > predicate. The predicate says under which conditions the insn will > match. The constraints tell regalooc/reload how to make sure the > operands of the instruction are valid. Yes, my wording was incorrect. But I know already the difference from the manual. > Tightening the predicates isn't sufficient (and may not even be > neccessary). You need to set the constraints so that the compiler > knows *how* to fix invalid instructions. And if I have 4 different constraints in a single insn, and only one of them is needing a temporary register, how do I model this? This may be the biggest problem. And because byte writes are so common, it deserves a good implementation. I can't waste a temporary register for each load/store. > The compilcation is that while constraints give sufficient > information for the compiler to generate correct code they don't help > generating good code. There are often non-obvious target specific > ways of reloading invalid addresses. So reload has additional hooks > (eg. GO_IF_LEGITIMATE_ADDRESS) to provice clever ways of fixing > invalid operands. I will look into this region of code to understand what's going on there. Thanks, Paul. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Hello Dave ;-) On Monday 05 June 2006 02:12, Dave Murphy wrote: > I was just about to ask about this very thing since I'm quite sure > that there would be interest in adding this to devkitARM. You are following the process in dslinux, don't you? In fact, devkitARM is my current build environment. The first thing that will happen is a patch to devkitARM. > How much work would it be to implement these switches? Good question ;-) > I assume that the toolchain would need multilibs for these options in > order to use newlib etc. I have not looked into library issues now. Compiler comes first. We will need an asm macro for 8bit writes to the hardware registers. And the devkitARM libraries *must* implement writeback caching for the GBA slot ROM area. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Richard, On Monday 05 June 2006 12:06, Richard Earnshaw wrote: > I'm confident right now that these will be too invasive to include in > mainline. As said before, this is OK for me. > The changes that tend to get incorporated into the compiler are to > work around bugs in the CPU, not bugs in some H/W developer's use of > the CPU. The former affect all users of the processor, the latter > only that one case. > > If we started putting in hacks for the latter the compiler back-ends > would become unmaintainable in almost no time at all. Agreed. > PS. Using swp is a bad idea IMO, this instruction is *very* slow on > some CPU implementations because of the way it interacts with caches. Yes, swp forces a cache load. But in this particular case, forcing a cache load is the ONLY way to circumvent the hardware problem. If there is a block write, cache loads are forced only each 32 byte accesses. Other possible solutions: a) code a 16bit read-modify-write. This will also cause a cache load, and will need much more code, because it will have to look at the LSB of the address to know where to insert the byte into the word. b) use the protection unit and make a data abort for a write to that memory region. This has the advantage of affecting ONLY the critical memory region (not all the other ones), but the disadvantages are big: all memory writes are affected, and a data abort handler is very slow. This solution was implemented before, it was 100 times slower than native access. Unusable. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Hello, my first little success... in arm.h, I have changed > /* Output the address of an operand. */ > #define ARM_PRINT_OPERAND_ADDRESS(STREAM, X) \ > { \ > int is_minus = GET_CODE (X) == MINUS; > \ > \ > if (GET_CODE (X) == REG) \ > asm_fprintf (STREAM, "[%r, #0]", REGNO (X)); > \ into > /* Output the address of an operand. */ > #define ARM_PRINT_OPERAND_ADDRESS(STREAM, X) \ > { \ > int is_minus = GET_CODE (X) == MINUS; > \ > \ > if (GET_CODE (X) == REG) \ > asm_fprintf (STREAM, "[%r]", REGNO (X)); > \ I don't know why the form "[%r, #0]" was coded before, because the assembler understands "[%r]" very well for all instructions. The form "[%r]" has a wider usage because it covers swp too. On Sunday 04 June 2006 23:36, Rask Ingemann Lambertsen wrote: > On Wed, May 31, 2006 at 10:49:35PM +0200, Wolfgang Mües wrote: > > > (define_insn "*arm_movqi_insn" > > > [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,m") > > > (match_operand:QI 1 "general_operand" "rI,K,m,r"))] > I think you should go back to this (i.e. the unmodified version) and > only change the "m" into "Q" in the fourth alternative of operand 0. > See if that works, i.e. generates addresses that are valid for the > swp instruction. No, that doesn't work: > ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c: In function > __register_frame_info_table_bases': > ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: error: insn does not > satisfy its constraints: (insn 63 28 29 0 > ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:136 (set (mem/s/j:QI (plus:SI > (reg/v/f:SI 1 r1 [orig:102 ob ] [102]) (const_int 16 [0x10])) [0 S1 > A32]) > (reg:QI 12 ip)) 155 {*arm_movqi_insn} (nil) > (nil)) > ../../../gcc-4.0.2/gcc/unwind-dw2-fde.c:146: internal compiler error: > in reload_ cse_simplify_operands, at postreload.c:391 Also, I wonder what the "Q" constraint really means: from the GCC manual: > Q > A memory reference where the exact address is in a single register > (``m'' is preferable for asm statements) but in arm.h: > /* For the ARM, `Q' means that this is a memory operand that is just >an offset from a register. > #define EXTRA_CONSTRAINT_STR_ARM(OP, C, STR) \ >((C) == 'Q') ? (GET_CODE (OP) == MEM > \ >&& GET_CODE (XEXP (OP, 0)) == REG) : \ Obviously, GCC tries to implement REG+CONSTANT with Q. Maybe I must define a new constraint? regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Rask, On Monday 05 June 2006 16:16, Rask Ingemann Lambertsen wrote: > On Mon, Jun 05, 2006 at 01:47:10PM +0200, Wolfgang Mües wrote: > Does GCC happen to accept "[%r, #0]" for swp? No. But no problem here to change that. > I think the comment in arm.h is wrong. The manual seems to agree with > the code. Just to make it easy for beginners... > I tried 'V' instead, but it looks as if reload completely ignores the > meaning of the constraint. There is already a comment in arm.md about > that. It should be investigated further. Hmmm... I have searched 'Q' in the arm files. Not used in arm.md, only for some variants of arm (cirrus). Maybe only implemented for them? > Meanwhile, I changed arm_legitimate_address_p() to enforce the > correct address form. This hurts byte loads too, though. I assume there is no way to tell the direction in arm_legitimate_address_p() ? Hmmm. > Index: gcc/config/arm/arm.opt > === > --- gcc/config/arm/arm.opt(revision 114119) > +++ gcc/config/arm/arm.opt(working copy) > @@ -153,3 +153,7 @@ > mwords-little-endian > Target Report RejectNegative Mask(LITTLE_WORDS) > Assume big endian bytes, little endian words > + > +mswp-byte-writes > +Target Report Mask(SWP_BYTE_WRITES) > +Use the swp instruction for byte writes In my environment (gcc 4.0.2), this is different. But I was able to find the definitions in arm.h and implement these changes. Easyer than expected... (The DSLINUX team is not using gcc 4.1 because of compile problems with the 2.6.14er kernel). > + swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0" You should get a price for cleverness here! > +; Avoid reading the stored value back if we have a spare register. > +(define_peephole2 > + [(match_scratch:QI 2 "r") > + (set (match_operand:QI 0 "memory_operand" "") > +(match_operand:QI 1 "register_operand" ""))] > + "TARGET_ARM && TARGET_SWP_BYTE_WRITES" > + [(parallel [ > +(set (match_dup 0) (match_dup 1)) > +(clobber (match_dup 2))] > + )] > +) As far as I can tell now, this works good. But I think there are many cases in which the source operand is not needed after the store. Is there a possibility to clobber the source operand and not using another register? Hmmm. Most of the code I have seen in the first tests have no problem with this extra register...it's available. > With -O2 -mswp-byte-writes: > > bytewritetest: > @ args = 0, pretend = 0, frame = 0 > @ frame_needed = 0, uses_anonymous_args = 0 > str lr, [sp, #-4]! > add r2, r0, #4 > add lr, r0, #5 > ldrbr3, [lr, #0]@ zero_extendqisi2 > ldrbr1, [r2, #0]@ zero_extendqisi2 > eor r2, r1, r3 > add r3, r3, r1 > ldr ip, [r0, #0] > str r3, [r0, #0] > swpbr3, r2, [lr, #0] > str ip, [r0, #8] > ldr pc, [sp], #4 > > > The register allocator chooses to use the lr register, in turn > causing link register save alimination to fail, which doesn't help. I can't understand this without explanation... is it bad? Rask, thank you very much for your work. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Rask, On Tuesday 06 June 2006 21:33, Rask Ingemann Lambertsen wrote: > > > + swp%?b\\t%1, %1, %0\;ldr%?b\\t%1, %0" > > > > You should get a price for cleverness here! > > Thanks! Indeed it looks good until you think of volatile variables. Because volatile variables can change their values from another thread, and the readback will be false. Oh. gcc knows the volatile attribute here, I assume? > > As far as I can tell now, this works good. But I think there are > > many cases in which the source operand is not needed after the > > store. Is there a possibility to clobber the source operand and not > > using another register? > > I don't know if (match_scratch ...) might reuse the source operand. > It can be attempted more specifically with an additional peephole > definition: > > (define_peephole2 > [(set (match_operand:QI 0 "memory_operand" "") > (match_operand:QI 1 "register_operand" ""))] > "TARGET_ARM && TARGET_SWP_BYTE_WRITES && peep2_reg_dead_p (1, > operands[1])" [(parallel > [(set (match_dup 0) (match_dup 1)) > (clobber (match_dup 1))] > )] > ) I will try this. > Yet another register which stands a good chance of being reusable is > the register containing the address. Yes, but that is not allowed according to the specification of the swp instruction. The address register must be different from the other 2 registers. Is there any chance of gcc violating this constraint? regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Rask, On Thursday 08 June 2006 20:12, Rask Ingemann Lambertsen wrote: > Also, undo the change to arm_legitimate_address_p() arm.c. Hmmm > arm-elf-gcc -g -mswp-byte-writes -Wall -O2 -fomit-frame-pointer > -ffast-math -mthumb-interwork -isystem > /usr/lib/devkitpro/libnds/include -mcpu=arm9tdmi -mtune=arm9tdmi > -DARM9 -S arm9_main.c -o arm9_main.S arm9_main.c: In function 'test': > arm9_main.c:20: error: unable to generate reloads for: > (insn:HI 20 21 22 1 arm9_main.c:16 (set (mem/v:QI (post_inc:SI > (reg/v/f:SI 3 r3 [orig:102 p ] [102])) [0 S1 A8]) (subreg/s/u:QI > (reg:SI 2 r2 [orig:103 c.36 ] [103]) 0)) 157 {*arm_movqi_insn_swp} > (nil) (expr_list:REG_INC (reg/v/f:SI 3 r3 [orig:102 p ] [102]) > (nil))) > arm9_main.c:20: internal compiler error: in find_reloads, at > reload.c:3720 void test(void) { static unsigned char c = 20; volatile unsigned char * p; int i; p = (volatile unsigned char *) 0x0800; for (i = 0; i < 1000; i++) *p++ = c; c = 40; c = c; } Without the change in arm_legitimate_address_p, we get post increment pointer into swpb. The non-working 'Q' constraint regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Hello, after getting a "working" version of the gcc 4.0.2 with the Nintendo 8-bit-write problem, I was busy the last weeks trying to adapt the linux system (replacing I/O with writeb() macros, removing strb assembler calls). However, it turned out that the sources of the linux kernel are a far more demanding test than every single small test case. I have tried my very best to implement the last patch from Rask (thank you very much!). There was one place I was not shure I have coded the right solution: Rasks patch (gcc 4.2.x): > +;; Match register operands or memory operands of the form (mem (reg > ...)), +;; as permitted by the "Q" memory constraint. > +(define_predicate "reg_or_Qmem_operand" > + (ior (match_operand 0 "register_operand") > + (and (match_code "mem") > + (match_code "reg" "0"))) > +) > + My patch (without the second operand for match_code): > ;; Match register operands or memory operands of the form (mem (reg > ...)), ;; as permitted by the "Q" memory constraint. > (define_predicate "reg_or_Qmem_operand" > (ior (match_operand 0 "register_operand") >(and (match_code "mem") > (match_test "GET_CODE (XEXP (op, 0)) == REG"))) > ) Is this the right substitution? If I compile the linux kernel with this patch, many files get compiled without problems, but in fs/vfat/namei.c I get: > fs/vfat/namei.c: In function 'vfat_add_entry': > fs/vfat/namei.c:694: error: unrecognizable insn: > (insn 2339 2338 2340 188 (set (mem/s/j:QI (reg:SI 14 lr) [0 > .attr+0 S1 A8]) (reg:QI 12 ip)) -1 (nil) > (nil)) > fs/vfat/namei.c:694: internal compiler error: in extract_insn, at > recog.c:2020 Please submit a full bug report, I can't see what is going on here... regards Wolfgang The full patch of Rask is appended below: > Index: gcc/config/arm/arm.h > === > --- gcc/config/arm/arm.h (revision 114119) > +++ gcc/config/arm/arm.h (working copy) > @@ -1094,6 +1094,8 @@ > ? vfp_secondary_reload_class (MODE, X)\ > > : TARGET_ARM \ > > ? (((MODE) == HImode && ! arm_arch4 && true_regnum (X) == -1) \ > + || ((MODE) == QImode && TARGET_ARM && TARGET_SWP_BYTE_WRITES \ > + && true_regnum (X) == -1) \ > ? GENERAL_REGS : NO_REGS)\ > > : THUMB_SECONDARY_OUTPUT_RELOAD_CLASS (CLASS, MODE, X)) > > Index: gcc/config/arm/arm.opt > === > --- gcc/config/arm/arm.opt(revision 114119) > +++ gcc/config/arm/arm.opt(working copy) > @@ -153,3 +153,7 @@ > mwords-little-endian > Target Report RejectNegative Mask(LITTLE_WORDS) > Assume big endian bytes, little endian words > + > +mswp-byte-writes > +Target Report Mask(SWP_BYTE_WRITES) > +Use the swp instruction for byte writes. The default is to use str > Index: gcc/config/arm/predicates.md > === > --- gcc/config/arm/predicates.md (revision 114119) > +++ gcc/config/arm/predicates.md (working copy) > @@ -125,6 +125,14 @@ > >|| (GET_CODE (op) == REG > >&& REGNO (op) >= FIRST_PSEUDO_REGISTER)))"))) > > +;; Match register operands or memory operands of the form (mem (reg > ...)), +;; as permitted by the "Q" memory constraint. > +(define_predicate "reg_or_Qmem_operand" > + (ior (match_operand 0 "register_operand") > + (and (match_code "mem") > + (match_code "reg" "0"))) > +) > + > ;; True for valid operands for the rhs of an floating point insns. > ;; Allows regs or certain consts on FPA, just regs for everything > else. (define_predicate "arm_float_rhs_operand" > Index: gcc/config/arm/arm.md > === > --- gcc/config/arm/arm.md (revision 114119) > +++ gcc/config/arm/arm.md (working copy) > @@ -5151,6 +5151,16 @@ >emit_insn (gen_movsi (operands[0], operands[1])); >DONE; > } > + if (TARGET_ARM && TARGET_SWP_BYTE_WRITES) > +{ > + /* Ensure that operands[0] is (mem (reg ...)) if a memory > operand. */ + if (MEM_P (operands[0]) && !REG_P (XEXP > (operands[0], 0))) + operands[0] > + = replace_equiv_address (operands[0], > +
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Hello Rask, On Wednesday 19 July 2006 13:24, Rask Ingemann Lambertsen wrote: > I've spotted a function named emit_set_insn() in arm.c. It might be > the problem, because it uses gen_rtx_SET() directly. But it's not the only function which uses gen_rtx_SET. There are also much places with > emit_constant_insn (cond, > gen_rtx_SET (VOIDmode, target, source)); Isn't it better to replace gen_rtx_SET? regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Rask, On Friday 21 July 2006 15:26, Rask Ingemann Lambertsen wrote: > I found that this peephole optimization improves the code a whole > lot: Done. > Another way of improving the code was to swap the order of the two > last alternatives of _arm_movqi_insn_swp. Done. Anyway, the problems with reload continues...error: unrecognizable insn First, I have had a problem with loading a register with a constant. (no clobber). I have solved this problem by adding > (define_insn "_arm_movqi_insn_const" > [(set (match_operand:QI 0 "register_operand" "=r") > (match_operand:QI 1 "const_int_operand" ""))] > "TARGET_ARM && TARGET_SWP_BYTE_WRITES >&& ( register_operand (operands[0], QImode))" > "@ >mov%?\\t%0, %1" > [(set_attr "type" "*") >(set_attr "predicable" "yes")] > ) I am very shure that this does only cure the symptoms, and it will better to fix this in the reload stage, but at least, it worked, and I was able to compile the whole linux kernel! After testing that the kernel is running, I have tried to compile uCLinux. And there is the next problem > ../ncurses/./base/lib_set_term.c: In function '_nc_setupscreen': > ../ncurses/./base/lib_set_term.c:470: error: unrecognizable insn: > (insn 1199 1198 696 37 ../ncurses/./base/lib_set_term.c:429 (parallel > [ (set (mem/s/j:QI (reg/f:SI 3 r3 [491]) [0 ._clear+0 S1 > A8] > ) (reg:QI 0 r0)) > (clobber (subreg:QI (reg:DI 11 fp) 0)) > ]) -1 (nil) > (nil)) > ../ncurses/./base/lib_set_term.c:470: internal compiler error: in > extract_insn, > at recog.c:2020 P The source code line is: >newscr->_clear = TRUE; Obviously, TRUE is loaded in r0, but I don't know why this construct (storing a byte into a struct member referenced by a pointer) is not evaluated. I fear that these problems are creating an endless story, and sorry for generating traffic on this list, because I'm still no gcc expert... On the other hand, the compiler now has generated code from hundreds of files, and maybe I'm very near to success now. regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Rask, On Sunday 06 August 2006 02:05, Rask Ingemann Lambertsen wrote: > Yes, it only cures the symptom, but it could take a lot of time to > find the cause, and the gain is small, so I think it is OK to leave > it like this for now. OK. > This insn was generated from the "reload_outqi" pattern. I don't > completely understand why it isn't recognized. The (subreg:QI (reg:DI > 11 fp) 0) part won't be matched by (match_scratch ...), but > simplify_gen_subreg() should have simplified it to (reg:QI 11 fp) > since this is one of the main purposes of having > simplify_(gen_)subreg() in the first place. Try changing > >operands[3] = simplify_gen_subreg (QImode, operands[2], DImode, > 0); > > into > >operands[3] = gen_rtx_REG (QImode, REGNO (operands[2])); > > (in "reload_outqi") and see if that works. Yes, it works. Kernel and userland are compiling now. I can't find any errors in the generated code. Many thanks! regards Wolfgang -- We're back to the times when men were men and wrote their own device drivers. (Linus Torvalds)
Bootstrap failure on SuSE 10.1
I must be doing something extraordinarily stupid, but I can't figure out what it is: I can't bootstrap anymore since subversion revisions from early January this year on a system as widely available as stock SuSE10.1. Here's what's happening: starting with revision 109241 --- config: 2006-01-02 Paolo Bonzini <[EMAIL PROTECTED]> PR target/25259 * stdint.m4: New. gcc: 2006-01-02 Paolo Bonzini <[EMAIL PROTECTED]> PR target/25259 * Makefile.in (DECNUMINC): Include libdecnumber's build directory. [...] I get bootstrap failures like this: gcc -c -g -DENABLE_CHECKING -DENABLE_ASSERT_CHECKING -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros -Wold-style-definition -Wmissing-format-attribute -fno-common -DHAVE_CONFIG_H -I. -I. -I../../svn-mainline/gcc -I../../svn-mainline/gcc/. -I../../svn-mainline/gcc/../include -I../../svn-mainline/gcc/../libcpp/include -I../../svn-mainline/gcc/../libdecnumber -I../libdecnumber ../../svn-mainline/gcc/c-lang.c -o c-lang.o In file included from ../../svn-mainline/gcc/input.h:25, from ../../svn-mainline/gcc/tree.h:26, from ../../svn-mainline/gcc/c-lang.c:27: ../../svn-mainline/gcc/../libcpp/include/line-map.h:56: error: 'CHAR_BIT' undeclared here (not in a function) I can prevent the failure if I remove the -I../libdecnumber from the command line. The reason is that c-lang.c contains #include "config.h" at the beginning, and for some reason the preprocessor decides to pick up the config.h from ../libdecnumber instead of from ./ . If I run above commandline with -E instead of -c, here is the top of the preprocessor output with -I../libdecnumber: # 1 "../../svn-mainline/gcc/c-lang.c" # 1 "" # 1 "" # 1 "../../svn-mainline/gcc/c-lang.c" # 23 "../../svn-mainline/gcc/c-lang.c" # 1 "../libdecnumber/config.h" 1 # 24 "../../svn-mainline/gcc/c-lang.c" 2 On the other hand, when I omit -I../libdecnumber, I get the output that was probably expected: # 1 "../../svn-mainline/gcc/c-lang.c" # 1 "" # 1 "" # 1 "../../svn-mainline/gcc/c-lang.c" # 23 "../../svn-mainline/gcc/c-lang.c" # 1 "./config.h" 1 3 # 1 "./auto-host.h" 1 3 This must be something that someone has seen before and knows how to deal with. Any ideas? Best Wolfgang PS: Just in case, this is how I build: ../svn-mainline/configure --prefix=/home/bangerth/bin/gcc-4.2-pre --enable-languages=c,c++ && make bootstrap - Wolfgang Bangerthemail:[EMAIL PROTECTED] www: http://www.math.tamu.edu/~bangerth/
Re: Modifying ARM code generator for elimination of 8bit writes - need help
Now it's time to give a big "thank you" to all persons involved, ecpecially Rask Ingemann Lambertsen with his invaluable help. As I started this project, I feared that I would never succeed, and now ... the modified compiler is used about 3 month now, and DSLINUX with this crude modification is working fine with 36 MBytes RAM available, and has a good future now. During the last months, 3 issues have come up, all with invalid insns, but with my new-developed knowledge of the arm code generator, I was able to resolve them. So very much thanks to all involved, and keep up the good work! Wolfgang Mües
Re: S/390 as GCC 4.3 secondary plattform?
Hello Everyone, > In the criteria for primary plattforms I've read that primary plattforms > have to be "popular systems". Reading this as "widely used" I think that > this will be a requirement which mainframes are unlikely to meet in the > near future, so I propose to make s390 and s390x secondary plattforms for > now. I think this can be important to show users that gcc works reliably on > S/390 and that it can be expected to do so in the future as well. I agree and would like to add that with respect to the s390 platform one should consider that "popular" and "widely used" cannot have the same meaning as, for example, in the context of computers for personal use. The s390 back end does not only compile any Linux-related software on IBM System z but also the system´s Firmware (the software layer between operating system and hardware). So, every System z machine uses code generated by gcc, even if there if the system does not yet run Linux. Regards, Wolfgang
Problem with optimization passes management
There is a conflict between the command line switches that turn off individual optimization steps and their preconditions. Compiling a "hello world" with the following options: -O1 -fno-tree-salias causes gcc to fail with an internal consistency check. Pass return_slot has PROP_alias in its preconditions but alias information is not generated due to the second option. Regards, Wolfgang --- Dr. Wolfgang Gellerich IBM Deutschland Entwicklung GmbH Schönaicher Strasse 220 71032 Böblingen, Germany Tel. +49 / 7031 / 162598 [EMAIL PROTECTED] === IBM Deutschland Entwicklung GmbH Vorsitzender des Aufsichtsrats: Johann Weihen Geschäftsführung: Herbert Kircher Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: Problem with optimization passes management
> On 10/10/07, Wolfgang Gellerich <[EMAIL PROTECTED]> wrote: > > > > There is a conflict between the command line switches that turn > off individual > > optimization steps and their preconditions. Compiling a "hello > world" with the > > following options: > > This one issue is know, it was reported as PR 33092. > Sorry for the duplicate! Regards, Wolfgang
gcc bootstrap failure with libcody
I'm seeing a boostrap failure when I try to build the latest gcc version ( 8833eab4461b4b7050f06a231c3311cc1fa87523 ) : checking whether time.h and sys/time.h may both be included... checking whether gcc supports -Wmissing-prototypes... i686-pc-linux-gnu checking host system type... make[3]: *** [buffer.o] Error 1 make[3]: Leaving directory `/ssd/fsf/tst-gcc11--base/libcody' make[2]: *** [all-stage1-libcody] Error 2 make[2]: *** Waiting for unfinished jobs Going back to the version to before libcody was added seems to build fine so far.
Re: ARC length attribute patch
On 20/03/15 16:02, Claudiu Zissulescu wrote: Hi Joern, I have a small patch for ARC backend that fixes the value of instruction length attribute when the instruction is predicated. Ok to apply? Why would the arc_bdr_iscond test have any effect? arc_predicate_delay_insns should render the issue moot. Moreover, - your patch has no ChangeLog entry. +extern bool arc_bdr_iscond (rtx); - New code should use const rtx_insn * . + conditionally. */ - ^ The GNU coding standard requires two spaces here. - (const_int 2)) + (match_test "GET_CODE (PATTERN (insn)) == COND_EXEC || arc_bdr_iscond (insn)") + (const_int 4)] + (const_int 2)) - You are mis-formatting the code. (const_int 2) is part of the cond clause.
Re: ARC length attribute patch
On 20/03/15 16:02, Claudiu Zissulescu wrote: Hi Joern, I have a small patch for ARC backend that fixes the value of instruction length attribute when the instruction is predicated. Ok to apply? Assuming you tested it, this patch is OK.
Suitable regression test for vectorizer patches?
I want to submit some vectorizer patches, what would be a suitable regression test? Preferably some native or cross test that can run on an i7 x86_64 GNU/Linux machine. To give an idea what code I'm patching, here are the patches I got so far: * tree-vect-patterns.c (vect_recog_dot_prod_pattern): Recognize unsigned dot product pattern. Allow widening multiply-add to be used for DOT_PROF_EXPR reductions. * tree-vect-data-refs.c (vect_get_smallest_scalar_type): Treat WIDEN_MULT_PLUS_EXPR like WIDEN_SUM_EXPR. * tree-vect-loop.c (get_initial_def_for_reduction): Likewise. Get VECTYPE from STMT_VINFO_VECTYPE. (vect_determine_vectorization_factor): Allow vcector size input/output mismatch for reduction. (vect_analyze_scalar_cycles_1): When we find a phi for a reduction, put the reduction statement into the phi's STMT_VINFO_RELATED_STMT. * tree-vect-patterns.c (vect_pattern_recog_1): If DOT_PROD_EXPR can't be expanded directly, try to use WIDEN_MULT_PLUS_EXPR instead. Fix bug where a vectorizer reduction split (from TARGET_VECTORIZE_SPLIT_REDUCTION) would end up not being used. * tree-vect-loop.c (vect_create_epilog_for_reduction): If we split the reduction, use the result in Case 3 too.
Re: Suitable regression test for vectorizer patches? - (need {u,}madd* pattern)
On 30/10/18 08:36, Richard Biener wrote: On Mon, Oct 29, 2018 at 7:03 PM Joern Wolfgang Rennecke wrote: I want to submit some vectorizer patches, what would be a suitable regression test? I am sure you have testcases, no? For new features please make them dg-do run ones by checking correctness. For the dot product / widen_sum -> madd transformations to trigger, I need an in-tree port with a named pattern matched by smadd_widen_optab or umadd_widen_optab, with an input matching PREFERRED_SIMD_VECTOR_MODE, and hence an output twice that size (and that pattern must not be eclipsed by existing [us]sum_widen_optab and [us]dot_prod_optab matches). I can't find any such port in the tree. Indeed, not any {u,}madd4 pattern at all. I've heard that arm cortex-m4 hardware acctually supports a madd vector operation (V2HI -> V2SI), is that true? Would the test be suitable if it made the arm target, with a patch added to add a suitable madd pattern, and my vectorizer patch added, use that madd pattern? Or could I add an imaginary madd vector extension instruction to the arc for that purpose? But then, it wouldn't actually execute, as it's just a made-up instruction; nor would the vectorization test be included in a test run for an actual.
Garbage collection bugs
We've been running builds/regression tests for GCC 8.2 configured with --enable-checking=all, and have observed some failures related to garbage collection. First problem: The g++.dg/pr85039-2.C tests (I've looked in detail at -std=c++98, but -std=c++11 and -std=c++14 appear to follow the same pattern) see gcc garbage-collecting a live vector. A subsequent access to the vector with vec_quick_push causes a segmentation fault, as m_vecpfx.m_num is 0xa5a5a5a5 . The vec data is also being freed / poisoned. The vector in question is an auto-variable of cp_parser_parenthesized_expression_list, which is declared as: vec *expression_list; According to doc/gty/texi: "you should reference all your data from static or external @code{GTY}-ed variables, and it is advised to call @code{ggc_collect} with a shallow call stack." In this case, cgraph_node::finalize_function calls the garage collector, as we are finishing a member function of a struct. gdb shows a backtrace of 34 frames, which is not really much as far as C++ parsing goes. The caller of finalize_function is expand_or_defer_fn, which uses the expression "function_depth > 1" to compute the no_collect paramter to finalize_function. cp_parser_parenthesized_expression_list is in frame 21 of the backtrace at this point. So, if we consider this shallow, cp_parser_parenthesized_expression_list either has to refrain from using a vector with garbage-collected allocation, or it has to make the pointer reachable from a GC root - at least if function_depth <= 1. Is the attached patch the right approach? When looking at regression test results for gcc version 9.0.0 20181028 (experimental), the excess errors test for g++.dg/pr85039-2.C seems to pass, yet I can see no definite reason in the source code why that is so. I tried running the test by hand in order to check if maybe the patch for PR c++/84455 plays a role, but running the test by hand, it crashes again, and gdb shows the telltale a5 pattern in a pointer register. #0 vec::quick_push (obj=, this=0x705ece60) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/vec.h:974 #1 vec_safe_push (obj=, v=@0x7fffd038: 0x705ece60) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/vec.h:766 #2 cp_parser_parenthesized_expression_list ( parser=parser@entry=0x77ff83f0, is_attribute_list=is_attribute_list@entry=0, cast_p=cast_p@entry=false, allow_expansion_p=allow_expansion_p@entry=true, non_constant_p=non_constant_p@entry=0x7fffd103, close_paren_loc=close_paren_loc@entry=0x0, wrap_locations_p=false) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:7803 #3 0x006e910d in cp_parser_initializer ( parser=parser@entry=0x77ff83f0, is_direct_init=is_direct_init@entry=0x7fffd102, non_constant_p=non_constant_p@entry=0x7fffd103, subexpression_p=subexpression_p@entry=false) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:22009 #4 0x0070954e in cp_parser_init_declarator ( parser=parser@entry=0x77ff83f0, decl_specifiers=decl_specifiers@entry=0x7fffd1c0, checks=checks@entry=0x0, function_definition_allowed_p=function_definition_allowed_p@entry=true, member_p=member_p@entry=false, declares_class_or_enum=, function_definition_p=0x7fffd250, maybe_range_for_decl=0x0, init_loc=0x7fffd1ac, auto_result=0x7fffd2e0) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:19827 #5 0x00711c5d in cp_parser_simple_declaration (parser=0x77ff83f0, function_definition_allowed_p=, maybe_range_for_decl=0x0) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:13179 #6 0x00717bb5 in cp_parser_declaration (parser=0x77ff83f0) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:12876 #7 0x0071837d in cp_parser_translation_unit (parser=0x77ff83f0) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:4631 #8 c_parse_file () at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/cp/parser.c:39108 #9 0x00868db1 in c_common_parse_file () at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/c-family/c-opts.c:1150 #10 0x00e0aaaf in compile_file () at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/toplev.c:455 #11 0x0059248a in do_compile () at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/toplev.c:2172 #12 toplev::main (this=this@entry=0x7fffd54e, argc=, argc@entry=100, argv=, argv@entry=0x7fffd648) at /data/hudson/jobs/gcc-9.0.0-linux/workspace/gcc/build/../gcc/gcc/toplev.c:2307 #13 0x00594b5b in main (argc=100, argv=0x7fffd648) at /data/hudson/jobs/gcc-9.0.0-linu
Build report gcc 4.6.1 on Sparc Solaris 10
function it appears in ../../../mpc/src/get.c: In function ‘mpc_get_ldc’: ../../../mpc/src/get.c:39:11: error: ‘I’ undeclared (first use in this function) I fixed this with a modification in /usr/include/complex.h (yes, need root permissions): #if !defined(__GNUC__) /* wke mod for mpc 0.9 */ #undef I #define I _Imaginary_I #else /* native cc */ #undef I #define I (__extension__ 1.0iF) #endif /* end __GNUC__ */ But I do not know if this fix may break stuff in some places. Another restart of make yields a full build and I am able to install the compiler and use it. it seems to generate proper result (at least for C). In case you have further questions, do not hesitate to contact me via email. Best regards -- Wolfgang Kechel mailto:wolfgang.kec...@prs.de
tsvc test iteration count during check-gcc
The tsvc tests take just too long on simulators, particularly if there is little or no vectorization of the test because of compiler limitations, target limitations, or the chosen options. Having 151 tests time out at a quarter of an hour is not fun, and making the time out go away by upping the timeout might make for better looking results, but not for better turn-around times. So, I though to just change the iteration count (which is currently defined as 1 in tsvc.h, resulting in billions of operations for a single test) to something small, like 10. This requires new expected results, but there were pretty straightforward to auto-generate. The lack of a separate number for s3111 caused me some puzzlement, but it can indeed share a value with s3. But then if I want to specifically change the iteration count for simulators, I have to change 151 individual test files to add another dg-additional-options stanza. I can leave the job to grep / bash / ed, but then I get 151 locally changed files, which is a pain to merge. So I wonder if tsvc.h shouldn't really default to a low iteration count. Is there actually any reason to run the regression tests with an iteration count of 1 on any host? I mean, if you wanted to get some regression check on performance, you'd really want to have something more exact that wall clock time doesn't exceed whatever timeout is set. You could test set a ulimit for cpu time and fine tune that for proper benchmark regression test - but for the purposes of an ordinary gcc regression test, you generally just want the optimizations perfromed (like in the dump file tests present) and the computation be performed correctly. And for these, it makes little difference how many iterations you use for the test, as long as you convince GCC that the code is 'hot'.
c-c++-common/analyzer/out-of-bounds-diagram-11.c written type vs toolchain
While on x86_64-pc-linux-gnu, the second diagram shows the type written as 'int', as expected, on a 16 and 32 bit newlib based toolchain, it is being output as int32_t . And all the formatting is also a bit different, probably due to the change in how the int32_t is displayed. What do other people see on toolchains where the regression tests actually have I/O functionality? Would it make sense to handle this with one multi-line pattern for newlib based toolchains, ending with { dg-end-multiline-output "" { target *-*-elf } } */ and one for glibc based toolchain, ending in { dg-end-multiline-output "" { target !*-*-elf } } */ ? I have no idea what toolchains with different libraries (and hence header files) would see.
Re: c-c++-common/analyzer/out-of-bounds-diagram-11.c written type vs toolchain
On 22/07/2024 16:44, David Malcolm wrote: Does it help to hack this change into prune.exp: diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp index d00d37f015f7..f467d1a97bc6 100644 --- a/gcc/testsuite/lib/prune.exp +++ b/gcc/testsuite/lib/prune.exp @@ -109,7 +109,7 @@ proc prune_gcc_output { text } { # Many tests that use visibility will still pass on platforms that don't support it. regsub -all "(^|\n)\[^\n\]*lto1: warning: visibility attribute not supported in this configuration; ignored\[^\n\]*" $text "" text -#send_user "After:$text\n" +send_user "After:$text\n" return $text } I'm baffled. Isn't that statement there just to debug prune_gcc_output? I suppose we could prune the whitespace from the diagram, but prune_gcc_output does not know about types. If there's 'int, that could be int32_t, int16_t, int64_t, ptrdiff_t, or whatever. Unless you want to make all integer types be considered equivalent for dejagnu purposes if they appear somewhere between vertical bars. Would it make sense to handle this with one multi-line pattern for newlib based toolchains, ending with { dg-end-multiline-output "" { target *-*-elf } } */ and one for glibc based toolchain, ending in { dg-end-multiline-output "" { target !*-*-elf } } */ ? Presumably the only difference is in the top-right hand box of the diagram, Unfortunately, there's also a lot of white space change in the rest of the diagram. I have attached the patch I'm currently using for your perusal. whereas my objective for those tests was more about the lower part of the diagram - I wanted to verify how we handle symbolic buffer sizes (e.g. (size * 4) + 3, and other run-time-computer sizes). It's rather awkward to test the diagrams with DejaGnu, alas. Would it might make sense to split out that file into three separate tests -11a, -11b, and -11c, and be more aggressive about only running the 2nd test on targets that we know generate "int" in the top-right box? No, each dg-end-multiline-output stanza already can have its separate target selector, there is no point in putting them in separate files. I guess you could reduce the differences between platforms if you didn't use types as defined by headerfiles directly, as they might be #defines or typedefs or whatever, and instead used your own typedef or struct types.Index: c-c++-common/analyzer/out-of-bounds-diagram-8.c === --- c-c++-common/analyzer/out-of-bounds-diagram-8.c (revision 6640) +++ c-c++-common/analyzer/out-of-bounds-diagram-8.c (revision 6642) @@ -17,6 +17,24 @@ /* { dg-begin-multiline-output "" } + ┌───┐ + │write of '(int32_t) 42'│ + └───┘ + │ + │ + v + ┌───┐ ┌───┐ + │buffer allocated on heap at (1)│ │ after valid range │ + └───┘ └───┘ + ├───┬───┤├─┬──┤├───┬───┤ + │ │ │ +╭─┴╮ ╭───┴───╮ ╭─┴─╮ +│capacity: 'size * 4' bytes│ │4 bytes│ │overflow of 4 bytes│ +╰──╯ ╰───╯ ╰───╯ + + { dg-end-multiline-output "" { target *-*-elf } } */ +/* { dg-begin-multiline-output "" } + ┌───┐ │write of '(int) 42'│ └───┘ @@ -32,4 +50,4 @@ │capacity: 'size * 4' bytes│ │4 bytes│ │overflow of 4 bytes│ ╰──╯ ╰───╯ ╰───╯ - { dg-end-multiline-output "" } */ + { dg-end-multiline-output "" { target !*-*-elf } } */ Index: c-c++-common/analyzer/out-of-bounds-diagram-11.c === --- c-c++-common/analyzer/out-of-bounds-diagram-11.c(revision 6640) +++ c-c++-common/analyzer/out-of-bounds-diagram-11.c(revision 6642) @@ -45,8 +45,30 @@ buf[size] = 42; /* { dg-warning "stack-based buffer overflow" } */ } +/* With a newlib toolchain (at least for esirisc), we end up with int32_t + being shown as itself. */ /* { dg-begin-multiline-output "" } +┌┐ +│write of '(int32_t) 42' │ +
Re: c-c++-common/analyzer/out-of-bounds-diagram-11.c written type vs toolchain
On 22/07/2024 17:13, Joern Wolfgang Rennecke wrote: > I guess you could reduce the differences between platforms if you didn't use types as defined by headerfiles directly, as they might be #defines or typedefs or whatever, and instead used your own typedef or struct types. It seems a typedef to int is seen through, even if you chain two of them together. After preprocessing, newlib has: typedef long int __int32_t; typedef __int32_t int32_t ; So the crucial point seems to be to have 'long int', but that is of course not portable for int32_t. So to get portable code and consistent messages, I suppose we should use a struct: typedef struct { int32_t i; } my_int32; my_int32 s42 = { 42 }; my_int32 *buf = (my_int32 *) __builtin_alloca (4 * size + 3); /* { dg-warning "allocated buffer size is not a multiple of the pointee's size" } */ buf[size] = s42; /* { dg-warning "stack-based buffer overflow" } */ Now suddenly the diagram is made *more* verbose, with the struct keyword added. ┌─┐ │write of ‘struct my_int32’ (4 bytes) │ └─┘ │ │ │ │ v v ┌───┐ ┌┐ │ buffer allocated on stack at (1)│ │ after valid range│ └───┘ └┘ ├───┬───┤ ├───┬┤ │ │ ╭┴───╮ ╭─┴╮ │capacity: ‘(size * 4) + 3’ bytes│ │overflow of 1 byte│ ╰╯ ╰──╯
RFD: switch/case statement dispatch using hash
This has come up several time over the years: https://gcc.gnu.org/legacy-ml/gcc/2006-07/msg00158.html https://gcc.gnu.org/legacy-ml/gcc/2006-07/msg00155.html https://gcc.gnu.org/pipermail/gcc/2010-March/190234.html , but maybe now (or maybe a while ago) is the right time to do this, considering the changes in relative costs of basic operations? Multiply and barrel shift are cheap on many modern microarchitectures. Control flow and non-linear memory access is expensive. FWIW, [Dietz92] https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1312&context=ecetr mentions multiply in passing as unpractical for SPARC because of cost, but modern CPUs can often do a multiply in a single cycle. Approximating division and scaling, for a case value x, we can calculate an index or offset into a table as f(x) = x*C1 >> C2 & M For an index, M masks off the upper bits so that the index fits into a table that has a number of elements that is a power of two. For architectures where a non-scaled index in cheaper to use than a scaled one, we compute an offset by having M also mask off the lower bits. Each table entry contains a jump address (or offset) and a key - at least if both are the same size; for different sizes, it might be cheaper to have two tables. If we have found values for C1 and C2 that give a perfect hash, we can immediately dispatch to the default case for a non-match; otherwise, we can have decision trees at the jump destinations, each using the comparison with the key from the table for the first decision. No separate range check is necessary, so if the multiply is fast enough, this should be close in performance to an ordinary tablejump. This dispatch method can be used for tables that are too sparse for a tablejump,but have enough cases to justify the overhead (depending on multiple conditional vs single indirect branch costs, the latter might be a low bar). I suppose we could make tree-switch-conversion.cc use rtx costs to compare the hash implementation to a decision tree, or have a hook make the decision - and the default for the hook might use rtx costs...
Re: RFD: switch/case statement dispatch using hash
On 23/06/2025 12:31, Florian Weimer wrote: Also carry-less multiply persumably. It's challenging to use those instructions for compiling switch statements because they would then be used all over the place. Not necessarily; you can hide them in an UNSPEC if you are worried that exposing the exact semantics leads to inappropriate uses.
Re: scan-*-dump-times across multiple functions considered harmful
On 02/07/2025 18:59, David Malcolm wrote: ... Brainstorming some ideas on other possible approaches on making our tests less brittle; for context I did some investigation back in 2018 about implementing "optimizations remarks" like clang does: diagnostics about optimization decisions, so you could have a dg directive like this on a particular line: foo (); /* { dg-remark "inlined call to 'foo' into 'bar'" } */ I like the idea. However, it seems unlikely that we can make a clean switchover in this decade, unless you find one or more corporate sponsors. We probably always want dump files without a rigid structure, because it makes it easier to add debug output when you flesh out a new pass or a change to an existing one. We can make the calls that generate the json output also emit output in the dump file, so we won't carry a doubled maintenance burden; however, this means the current ad-hoc messages would become more unified; thus the testsuite will have to be adjusted. FWIW, even you you were to get rid of the current dump files (which I think would be stifling for GCC development for the above reasons), you would have to adjust the testsuite. So, we could use the json framework for new dump output that is contributed before or along with the parts of the testsuite that scan for it, but for any legacy dump output that is scanned for in the testsuite, that requires to adjust the testsuite. More than 26K dejagnu scan-*-dump* directives in the gcc15 testsuite. And you'll have a bit flag day, or a ton of small ones. Plus all the friction that this will create with porting patches up and down gcc versions. That is a lot of thankless work, which I can't imagine doing as a hobby. And condifering people at the start of their career who might think of doing some unpaid drudge work in hope of getting recognition that'll get them some paid work, with paying work for GCC drying up, they would more likely do something for LLVM, which also seems to better align with the skills of recent graduates. So, unless/until you have (a) corporate sponsor(s) to pay for the work on the existing testsuite - and that work is successfully concluded - we will have to find a way to make the scans of the dump files more maintainable. In fact, if we can solve the maintenance hassle of having multiple in a test by making the scan patterns more specific, so we don't have to split the tests up, that will put us in better position if/when the transition to a more organized optimization records system is made.
scan-*-dump-times across multiple functions considered harmful
Quite often I see a test quickly written to test some new feature (bug fix, extension or optimization) that has a couple of functions to cover various aspects of the feature, checked all together with a single scan-tree-dump-times, scan-rtl-dump-times etc. check, using the expected value for the target of the test writer. Or worse, it's all packed into one giant function, with unpredictable interactions between the different pieces of code. I think we have less of those recently, but please don't interpret this post as a suggestion to fall back to this practice. Quite often it turns out that the feature applies only to some of the functions / sites on some targets. The first reaction is often to create multiple copies of the scan-*-dump-times stanza, with mutually exclusive conditions for each copy, which might look harmless when there are only two cases, but as more are added, it quickly turns into an unmaintainable mess of lots dejagnu directives with complicated. This can get even worse if different targets can get the compiler the pattern multiple times for the same piece of source, like for vectorization that is tried with different vectorization factors. I think we should discuss what is best practice to address these problems efficiently, and to preferably write new tests avoiding them in the first place. When each function has a single site per feature where success is given if the pattern appears at least once, a straightforward solution that has already been used a number of times is to split the test into multiple smaller tests. The main disadvantages of this approach are that a large set of small files can clutter the directory where they appear, making it less maintainable, and that the compiler is invoked more often, generally with the same set of include files read each time, thus making the test runs slower. Another approach would be to use source line numbers, where present and distinctive, to add to the scan pattern to make it specific to the site under concern. That should, for instance, work for vectorization scan-tree-dump-times tests. The disadvantage of that approach is that the tests become more brittle, as the line numbers would have to be adjusted whenever the line numbers of the source site change, like when new include files, dejagnu directives at the file start, or typedefs are needed. Maybe we could get the best of both worlds if we add a new dump option? Say, if we make that option add the (for polymorphic languages like C++: mangled) name of the current function to each dumped line that is interesting to scan for. Or just every line, if that's simpler.
P.S. to: scan-*-dump-times across multiple functions considered harmful
P.S.: to get get the specifity of linenumbers without the brittleness, we could have a pragma for the extra dump line tag instead. Either till the next such pragme / EOF, or (if before next such pragma), till the end of function. Disadvantages: Not actually more specific when the source describes a template, or gets cloned. Unless we add the mangled filename too, either unconditionally, per option, or per format (complex, bugprone). The tag would have to be associated with or included in the source location. That might get complex and be a source of bugs in itself.