Configuration question
I have run into a libstdc++ configuration issue and was wondering if it is a known issue or not. My build failed because the compiler I am using to build GCC and libstdc++ does not have wchar support and does not define mbstate_t. The compiler (and library) that I am creating however, do support wchar and do define mbstate_t. Both compilers are GCC, the old one does not include a -D that the new one does. mbstate_t (defined in the system header files) is only seen when this define is set. The problem is that the libstdc++ configure script is using the original GCC to check for the existence of mbstate_t (doesn't find it) and using that information to say that it needs to define mbstate_t when compiling libstdc++, but libstdc++ is compiled with the newly built GCC which does have an mbstate_t from the system header files. Shouldn't the libstdc++ configure script use the new GCC when checking things with AC_TRY_COMPILE. Or is this just not possible? Is this why some tests don't use AC_TRY_COMPILE but say "Fake what AC_TRY_COMPILE does"? See acinclude.m4 for these comments, there is no explanation about why it is faking what AC_TRY_COMPILE does. Steve Ellcey [EMAIL PROTECTED]
Re: ICE while bootstrapping trunk on hppa2.0w-hp-hpux11.00
| /raid/tecosim/it/devel/projects/develtools/src/gcc-4.3.0/gcc/libgcc2.c:1970: | internal compiler error: Segmentation fault | Please submit a full bug report, | with preprocessed source if appropriate. | See http://gcc.gnu.org/bugs.html> for instructions. I am seeing this too. I tracked it back to line 5613 of tree-ssa-loop-ivopts.c (rewrite_use_compare). There is a line: bound = cp->value; and cp is null. cp is set with a call to get_use_iv_cost and that routine does return NULL in some cases so I think we need to check for a NULL cp before dereferencing it. I changed if (bound) to if (cp && cp->value) and set bound inside the if but now I am dying when compiling decNumber.c so I don't have a bootstrap working yet. Steve Ellcey [EMAIL PROTECTED]
Re: Configuration question
> > Shouldn't the > > libstdc++ configure script use the new GCC when checking things with > > AC_TRY_COMPILE. > > Yes. > > -benjamin It looks like this has something to do with using autoconf 2.59 at the top-level of GCC. I am experimenting with updating the top-level GCC to 2.59 now that all of the GCC and src sub-trees have been updated to 2.59. When I tried this on Linux I had no problems but on HP-UX (with multilibs) it is not working correctly and the failure I get is that AC_TRY_COMPILE is not using the right GCC when run. When I undid my top-level change (went back to autoconf 2.14) the libstdc++ configure worked correctly and the right GCC was used by AC_TRY_COMPILE. Most perplexing. Steve Ellcey [EMAIL PROTECTED]
Re: failed to compile gcc-4.3-20061209/gcc/varasm.c on OSX 10.3
Andreas Tobler wrote: > Dominique Dhumieres wrote: >>... >>cc1: warnings being treated as errors >>../../gcc-4.3-20061209/gcc/varasm.c: In function 'elf_record_gcc_switches': >>../../gcc-4.3-20061209/gcc/varasm.c:6268: warning: format '%llu' expects type >>'long long unsigned int', but argument 3 has type 'long int' >>../../gcc-4.3-20061209/gcc/varasm.c:6275: warning: format '%llu' expects type >>'long long unsigned int', but argument 3 has type 'long int' >>../../gcc-4.3-20061209/gcc/varasm.c:6283: warning: format '%llu' expects type >>'long long unsigned int', but argument 3 has type 'long int' >>../../gcc-4.3-20061209/gcc/varasm.c:6302: warning: format '%llu' expects type >>'long long unsigned int', but argument 3 has type 'long int' >>make[3]: *** [varasm.o] Error 1 >> >> Any idea around about the cause and/or the way to fix it? > > This is known to break on all 32-bit targets. (afaik) On 64-bit targets > it works. > > You can either wait until the patch is reverted or the correct fix is done. Do you know if there a GCC bug report for this defect? I couldn't find one in bugzilla. I am seeing this problem with IA64 HP-UX on ToT. I tried the workaround you gave and that makes IA64 HP-UX work but causes other platforms to fail so I am wondering when there will be a real fix for this bootstrap problem. Steve Ellcey [EMAIL PROTECTED]
Running GCC tests on installed compiler
Can someone one with some deja-knowledge help me figure out how to run the GCC tests on an installed compiler and without having to do a GCC build? I started with runtest -tool gcc --srcdir /proj/opensrc/nightly/src/trunk/gcc/testsuite and that ran the tests, but it ran them with whatever gcc command it found in PATH. I tried setting and exporting CC before running runtest and putting "CC=" on the runtest command line but neither of those methods seemed to affect what gcc was run by runtest. So then I tried to create a site.exp file and use that on the command line, in site.exp I put: set CC "/proj/opensrc/be/ia64-hp-hpux11.23/bin/gcc" set srcdir "/proj/opensrc/nightly/src/trunk/gcc/testsuite" I also tried using the site.exp file that I got from building GCC and various combinations of the two but all these attempts ended with no tests runs and the following lines in my log file: Running target unix Using /proj/opensrc/be/ia64-debian-linux-gnu/share/dejagnu/baseboards/unix.exp as board description file for target. Using /proj/opensrc/be/ia64-debian-linux-gnu/share/dejagnu/config/unix.exp as generic interface file for target. WARNING: Couldn't find tool config file for unix, using default. When testing my just built GCC I was seeing: Running target unix Using /proj/opensrc/be/ia64-debian-linux-gnu/share/dejagnu/baseboards/unix.exp as board description file for target. Using /proj/opensrc/be/ia64-debian-linux-gnu/share/dejagnu/config/unix.exp as generic interface file for target. Using /proj/opensrc/nightly/src/trunk/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Running /proj/opensrc/nightly/src/trunk/gcc/testsuite/gcc.c-torture/compile/compile.exp ... The built GCC seems to be picking up an extra .exp file (default.exp) but I am not sure why or how to fix it so that my non-built compiler runs the same way. Can someone help me out here? Steve Ellcey [EMAIL PROTECTED]
store_expr, expr_size, and C++
I am looking at PR target/30826 (an IA64 ABI bug) and have come up with a patch that basically involves turning off the CALL_EXPR_RETURN_SLOT_OPT optimization in some instances and forcing GCC to create a temporary for the (large aggragete) return value of a function and then copying that temporary value to the desired target. The problem I am running into is with C++ code in store_expr. I get to this if statement: if ((! rtx_equal_p (temp, target) || (temp != target && (side_effects_p (temp) || side_effects_p (target && TREE_CODE (exp) != ERROR_MARK /* If store_expr stores a DECL whose DECL_RTL(exp) == TARGET, but TARGET is not valid memory reference, TEMP will differ from TARGET although it is really the same location. */ && !(alt_rtl && rtx_equal_p (alt_rtl, target)) /* If there's nothing to copy, don't bother. Don't call expr_size unless necessary, because some front-ends (C++) expr_size-hook must not be given objects that are not supposed to be bit-copied or bit-initialized. */ && expr_size (exp) != const0_rtx) and I hit a gcc_assert when calling expr_size(). Even if I avoided this somehow I would hit it later when calling: emit_block_move (target, temp, expr_size (exp), (call_param_p ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); So my question is: is there a way to handle this copy/assignment in C++ without depending on expr_size. I noticed PR middle-end/30017 (turning memcpys into assignments) which seems to have some of the same issues of getting expr_size for C++ expressions but that defect is still open so it doesn't look like there is an answer yet. Anyone have some ideas on this problem? Steve Ellcey [EMAIL PROTECTED]
bootstrap failure on real-install-headers-cpio
Has anyone seen this bootstrap failure? I only get it on my hppa*-hp-hpux* builds, not on ia64-hp-hpux* or on Linux builds. I assume it is related to the include-fixed changes but I don't know why I only get it for some platforms. I get it with parallel and non-parallel builds. Steve Ellcey [EMAIL PROTECTED] . . . /bin/sh /proj/opensrc/nightly/src/trunk/gcc/../move-if-change tmp-macro_list macro_list echo timestamp > s-macro_list rm -rf include-fixed; mkdir include-fixed chmod a+rx include-fixed if [ -d ../prev-gcc ]; then \ cd ../prev-gcc && \ make real-install-headers-cpio DESTDIR=`pwd`/../gcc/ \ libsubdir=. ; \ else \ (TARGET_MACHINE='hppa1.1-hp-hpux11.11'; srcdir=`cd /proj/opensrc/nightly/src/trunk/gcc; ${PWDCMD-pwd}`; \ SHELL='/bin/sh'; MACRO_LIST=`${PWDCMD-pwd}`/macro_list ; \ export TARGET_MACHINE srcdir SHELL MACRO_LIST && \ cd ../build-hppa1.1-hp-hpux11.11/fixincludes && \ /bin/sh ./fixinc.sh ../../gcc/include-fixed \ `echo /usr/include | sed -e :a -e 's,[^/]*/\.\.\/,,' -e ta` ); \ rm -f include-fixed/syslimits.h; \ if [ -f include-fixed/limits.h ]; then \ mv include-fixed/limits.h include-fixed/syslimits.h; \ else \ cp /proj/opensrc/nightly/src/trunk/gcc/gsyslimits.h include-fixed/syslimits.h; \ fi; \ fi make[4]: Entering directory `/proj/opensrc/nightly/build-hppa1.1-hp-hpux11.11-trunk/obj_gcc/prev-gcc' cd `${PWDCMD-pwd}`/include ; \ find . -print | cpio -pdum /proj/opensrc/nightly/build-hppa1.1-hp-hpux11.11-trunk/obj_gcc/prev-gcc/../gcc/./include cannot write in make[4]: *** [real-install-headers-cpio] Error 2 make[4]: Leaving directory `/proj/opensrc/nightly/build-hppa1.1-hp-hpux11.11-trunk/obj_gcc/prev-gcc' make[3]: *** [stmp-fixinc] Error 2 make[3]: Leaving directory `/proj/opensrc/nightly/build-hppa1.1-hp-hpux11.11-trunk/obj_gcc/gcc' make[2]: *** [all-stage2-gcc] Error 2 make[2]: Leaving directory `/proj/opensrc/nightly/build-hppa1.1-hp-hpux11.11-trunk/obj_gcc' make[1]: *** [stage2-bubble] Error 2 make[1]: Leaving directory `/proj/opensrc/nightly/build-hppa1.1-hp-hpux11.11-trunk/obj_gcc' make: *** [bootstrap] Error 2
Updating libtool in GCC and srctree
Now that autoconf has been updated to 2.59, I would like to update the libtool that GCC and the binutils/gdb/etc use. Unfortunately, I am not having much luck coming up with a patch and figuring out what all needs to be reconfigured. Here is what I have tried so far. In the libtool documentation it says that to include libtool in your package you need to add config.guess, config.sub, install-sh, and ltmain.sh to your package. We already have the install-sh that is in the latest libtool and our config.guess and config.sub look to be newer than the ones in libtool so that just leaves ltmain.sh. I downloaded the 2.1a snapshot of libtool and found ltmain.sh in libltdl/config/ltmain.sh, I copied that to the top level of the src tree and then removed libtool.m4, ltconfig, ltcf-c.sh, ltcf-cxx.sh, and ltcf-gcj.sh. I was able to run autoconf on the top-level of the source tree with no errors but when I did a configure/make I got the following error while make was in the bfd subdirectory: make[3]: Entering directory `/proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11. 23-trunk/obj_src/bfd' make[3]: LIBTOOL@: Command not found make[3]: *** [archive.lo] Error 127 So I went into bfd and tried to run autoconf there but I get the errors: $ /proj/opensrc/be/ia64-hp-hpux11.23/bin/autoconf configure.in:13: error: possibly undefined macro: AM_PROG_LIBTOOL If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. configure.in:20: error: possibly undefined macro: AM_DISABLE_SHARED I tried changing the macros to AC_* but that didn't help, should I just use m4_pattern_allow or am I missing a bigger picture here? Steve Ellcey [EMAIL PROTECTED]
Re: Updating libtool in GCC and srctree
> > I downloaded the 2.1a snapshot of libtool and found > > Are you sure you want to use the (rather oldish) 2.1a snapshot? I think > you'll better off using the latest stable release which is 1.5.22. I thought that 2.1a was a snapshot of ToT. I have some recollection of someone saying we would want to use ToT libtool as opposed to the latest released one. > > ltmain.sh in libltdl/config/ltmain.sh, I copied that to the top level of > > the src tree and then removed libtool.m4, > > You'll still need libtool.m4. Are you sure? According to <http://www.gnu.org/software/libtool/manual.html#Distributing> we shouldn't need libtool.m4 in our package. Steve Ellcey [EMAIL PROTECTED]
Re: Updating libtool in GCC and srctree
I have made some progress in updating libtool in the src (binutils) tree and I have attached the various changes (but not the actual new libtool files) to this email in case anyone wants to see what I am doing. I am having more trouble with the GCC tree. I put the new libtool in the toplevel directory, just like I did in the binutils src tree and then I went to the boehm-gc (and libffi) directories to try and rerun autoconf. If I just run autoconf I get errors because I am not including the new ltoptions.m4, ltsugar.m4, and ltversion.m4 files. Now in the binutils tree the acinclude.m4 files had explicit includes of libtool.m4 and I added includes of ltoptions.m4, ltsugar.m4, and ltversion.m4. But boehm-gc has no acinclude.m4 file and while libffi has an acinclude.m4 file, it doesn't have an include of libtool.m4. So my question is, how is the include of libtool.m4 getting into aclocal.m4? Is it by running aclocal? I tried to run aclocal but I get errors when I run it: $ aclocal autom4te: unknown language: Autoconf-without-aclocal-m4 aclocal: autom4te failed with exit status: 1 This is aclocal 1.9.6. Any idea on what I need to do here to fix this error? Why do some acinclude.m4 files have explicit includes for libtool files (libgfortran, libgomp, etc) but other's don't (libffi, gcc). Steve Ellcey [EMAIL PROTECTED] Here is what I have done so far in the src/binutils tree: Top level src tree ChangeLog: 2007-03-09 Steve Ellcey <[EMAIL PROTECTED]> * ltmain.sh: Update from libtool ToT. * libtool.m4: Update from libtool ToT. * ltsugar.m4: New. Update from libtool ToT. * ltversion.m4: New. Update from libtool ToT. * ltoptions.m4: New. Update from libtool ToT. * ltconfig: Remove. * ltcf-c.sh: Remove. * ltcf-cxx.sh: Remove. * ltcf-gcj.sh: Remove. * src-release: Update with new libtool file list. Index: src-release === RCS file: /cvs/src/src/src-release,v retrieving revision 1.22 diff -u -r1.22 src-release --- src-release 9 Feb 2007 15:15:38 - 1.22 +++ src-release 9 Mar 2007 23:37:34 - @@ -49,8 +49,8 @@ DEVO_SUPPORT= README Makefile.in configure configure.ac \ config.guess config.sub config move-if-change \ COPYING COPYING.LIB install-sh config-ml.in symlink-tree \ - mkinstalldirs ltconfig ltmain.sh missing ylwrap \ - libtool.m4 ltcf-c.sh ltcf-cxx.sh ltcf-gcj.sh \ + mkinstalldirs ltmain.sh missing ylwrap \ + libtool.m4 ltsugar.m4, ltversion.m4, ltoptions.m4 \ Makefile.def Makefile.tpl src-release config.rpath # Files in devo/etc used in any net release. bfd/ChangeLog 2007-03-09 Steve Ellcey <[EMAIL PROTECTED]> * acinclude.m4: Add new includes. * configure.in: Change macro call order. * configure: Regenerate. Index: acinclude.m4 === RCS file: /cvs/src/src/bfd/acinclude.m4,v retrieving revision 1.16 diff -u -r1.16 acinclude.m4 --- acinclude.m431 May 2006 15:14:35 - 1.16 +++ acinclude.m49 Mar 2007 23:36:49 - @@ -49,6 +49,9 @@ fi AC_SUBST(EXEEXT_FOR_BUILD)])dnl +sinclude(../ltsugar.m4) +sinclude(../ltversion.m4) +sinclude(../ltoptions.m4) sinclude(../libtool.m4) dnl The lines below arrange for aclocal not to bring libtool.m4 dnl AM_PROG_LIBTOOL into aclocal.m4, while still arranging for automake Index: configure.in === RCS file: /cvs/src/src/bfd/configure.in,v retrieving revision 1.222 diff -u -r1.222 configure.in --- configure.in1 Mar 2007 15:48:36 - 1.222 +++ configure.in9 Mar 2007 23:37:07 - @@ -19,7 +19,10 @@ dnl configure option --enable-shared. AM_DISABLE_SHARED -AM_PROG_LIBTOOL +AC_PROG_CC +AC_GNU_SOURCE + +AC_PROG_LIBTOOL AC_ARG_ENABLE(64-bit-bfd, [ --enable-64-bit-bfd 64-bit support (on hosts with narrower word sizes)], @@ -95,9 +98,6 @@ # host stuff: -AC_PROG_CC -AC_GNU_SOURCE - ALL_LINGUAS="fr tr ja es sv da zh_CN ro rw vi" ZW_GNU_GETTEXT_SISTER_DIR AM_PO_SUBDIRS binutils/ChangeLog 2007-03-09 Steve Ellcey <[EMAIL PROTECTED]> * configure.in: Change macro call order. * configure: Regenerate. Index: configure.in === RCS file: /cvs/src/src/binutils/configure.in,v retrieving revision 1.75 diff -u -r1.75 configure.in --- configure.in28 Feb 2007 01:29:32 - 1.75 +++ configure.in9 Mar 2007 23:36:12 - @@ -11,7 +11,9 @@ changequote([,])dnl AM_INIT_AUTOMAKE(binutils, ${BFD_VERSION}) -AM_PROG_LIBTOOL +AC_PROG_CC +AC_GNU_SOURCE +AC_PROG_LIBTOOL AC_ARG_ENABLE(targets, [ --enable-targetsalternative target configurations], @@ -53,9 +55,6 @@ AC_MSG_ERROR(Unrecognized host
Re: Updating libtool in GCC and srctree
> Steve Ellcey <[EMAIL PROTECTED]> writes: > > > $ aclocal > > autom4te: unknown language: Autoconf-without-aclocal-m4 > > aclocal: autom4te failed with exit status: 1 > > Looks like you have an out-of-date autom4te.cache. > > Andreas. I removed autom4te.cache and reran aclocal. Same results. Steve Ellcey [EMAIL PROTECTED]
Re: Updating libtool in GCC and srctree
> So, you need to run aclocal with: > $ aclocal -I ../config -I .. > > -- > albert chin ([EMAIL PROTECTED]) Thanks, that helps a lot. For libstdc++-v3 I actually needed "-I ." as well in order to find linkage.m4 so maybe "-I . -I .. -I ../config" is the best option list to use on aclocal calls in the GCC tree. libjava is the only subdir I can't seem to get configured with the new libtool: $ aclocal -I . -I .. -I ../config $ autoconf configure:15448: error: possibly undefined macro: AM_PROG_GCJdnl If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. I am not sure why I get this, nothing else seems to be requiring m4_pattern_allow. If I don't use any -I options on aclocal it works and then I get a different error from autoconf (about TL_AC_GXX_INCLUDE_DIR being possibly undefined). I think I want the -I options though. Steve Ellcey [EMAIL PROTECTED]
Re: Updating libtool in GCC and srctree
> On Mon, Mar 12, 2007 at 04:03:52PM -0700, Steve Ellcey wrote: > > configure:15448: error: possibly undefined macro: AM_PROG_GCJdnl > > Where'd that come from? Wherever it is, it's a bug. Maybe someone > checked in a typo to the configure file. "dnl" is a comment start > token in autoconf (that's a very rough approximation of the situation). It looks like it is coming from the new libtool.m4, I just sent email to bug-libtool@gnu.org about it. In the new libtool.m4 there is: # LT_PROG_GCJ # --- AC_DEFUN([LT_PROG_GCJ], [m4_ifdef([AC_PROG_GCJ], [AC_PROG_GCJ], [m4_ifdef([A][M_PROG_GCJ], [A][M_PROG_GCJ], [AC_CHECK_TOOL(GCJ, gcj,) test "x${GCJFLAGS+set}" = xset || GCJFLAGS="-g -O2" AC_SUBST(GCJFLAGS)])])dnl ]) And I think the dnl at the end of the AC_SUBST line is the problem. Removing it seems to fix the configure of libjava anyway. > Yes, you always want to match ACLOCAL_AMFLAGS from Makefile.am. Now that is a very useful thing to know. I am trying to build now and am currently running into a problem building libgfortran. When doing the libtool link of the library I get: ld: Can't find library or mismatched ABI for -lgfortranbegin Fatal error. collect2: ld returned 1 exit status make[3]: *** [libgfortran.la] Error 1 I was able to build libstdc++-v3 and other libraries with no problem, but I haven't figured out what is going on here yet. Steve Ellcey [EMAIL PROTECTED]
RFC: obsolete __builtin_apply?
I have long been annoyed by the failure of the test builtin-apply4.c on IA64 HP-UX and I know there are failures of tests using __builtin_apply on other platforms as well. My question is: Is it time to obsolete __builtin_apply, __builtin_apply_args, and __builtin_return? It looks like the main sticking point is that libobjc uses __builtin_apply, __builtin_apply_args, and __builtin_return. There is a FIXME comment about changing this to use libffi. Do any of the objc folks have this on their 'todo' plate? I am not sure how big this task would be. My thinking is that if libobjc was changed then we could put in a depreciated message on these builtins for 4.3 and maybe remove them for 4.4. Comments? Steve Ellcey [EMAIL PROTECTED]
libgfortran Makefile question (using latest libtool)
While attempting to build libgfortran with the latest libtool I got the following error: if /bin/sh ./libtool --mode=compile /proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11.23-trunk/obj_gcc/./gcc/xgcc -B/proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11.23-trunk/obj_gcc/./gcc/ -B/proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/bin/ -B/proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/lib/ -isystem /proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/include -isystem /proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/sys-include -DHAVE_CONFIG_H -I. -I/proj/opensrc/sje/svn.libtool/src/trunk/libgfortran -I. -iquote/proj/opensrc/sje/svn.libtool/src/trunk/libgfortran/io -I/proj/opensrc/sje/svn.libtool/src/trunk/libgfortran/../gcc -I/proj/opensrc/sje/svn.libtool/src/trunk/libgfortran/../gcc/config -I../../.././gcc -D_GNU_SOURCE -std=gnu99 -Wall -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wextra -Wwrite-strings -O2 -g -mlp64 -MT backtrace.lo -M D -MP -MF ".deps/backtrace.Tpo" -c -o backtrace.lo `test -f 'runtime/backtrace.c' || echo '/proj/opensrc/sje/svn.libtool/src/trunk/libgfortran/'`runtime/backtrace.c; \ then mv -f ".deps/backtrace.Tpo" ".deps/backtrace.Plo"; else rm -f ".deps/backtr ace.Tpo"; exit 1; fi libtool: compile: unable to infer tagged configuration libtool: compile: specify a tag with `--tag' make[6]: *** [fmain.lo] Error 1 Now, obviously, what I want to do is add --tag=CC to the libtool call. But I can't figure out where to do this. If I look at Makefile.in I see where this is coming from but Makefile.in is generated from Makefile.am so I shouldn't be editing Makefile.in. When I look at Makefile.am I don't see how we got this compile line. What do I change to get --tag=CC added to the libtool call? The libstdc++-v3/src/Makefile has: LTCXXCOMPILE = $(LIBTOOL) --tag CXX --mode=compile $(CXX) $(INCLUDES) \ $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CXXFLAGS) $(CXXFLAGS) But libgfortran doesn't have a line like this so how is it coming up with this compile line? Steve Ellcey [EMAIL PROTECTED]
Re: libgfortran Makefile question (using latest libtool)
> I think that should already be the default. Try running ./libtool > --config and look for the value of CC. That value should match (modulo > whitespace) the command line that is actually used. > > Andreas. It does not look like this is the default. I don't see any use of --tag in the libtool config output (nor do I see where the -MD, -MP, -MF flags are coming from). Steve Ellcey [EMAIL PROTECTED] % ./libtool --config | grep -e LT -e CC -e tag LTCC="/proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11.23-trunk/obj_gcc/./gcc/xgcc -B/proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11.23-trunk/obj_gcc/./gcc/ -B/proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/bin/ -B/proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/lib/ -isystem /proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/include -isystem /proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/sys-include" # LTCC compiler flags. LTCFLAGS="-std=gnu99 -O2 -g -Wunknown-pragmas" variables_saved_for_relink="PATH LD_LIBRARY_PATH GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" CC="/proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11.23-trunk/obj_gcc/./gcc/xgcc -B/proj/opensrc/sje/svn.libtool/build-ia64-hp-hpux11.23-trunk/obj_gcc/./gcc/ -B/proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/bin/ -B/proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/lib/ -isystem /proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/include -isystem /proj/opensrc/sje/svn.libtool/gcc-ia64-hp-hpux11.23-trunk/ia64-hp-hpux11.23/sys-include" archive_cmds="\$CC -shared \${wl}+h \${wl}\$soname \${wl}+nodefaultrpath -o \$lib \$libobjs \$deplibs \$compiler_flags"
Re: libgfortran Makefile question (using latest libtool)
> From: Charles Wilson <[EMAIL PROTECTED]> > > The --tag option is added by automake-1.9 or automake-1.10, but not 1.8: Interesting, the Makefile.in in libgfortran claims to be from automake 1.9.6. If I run this automake in a tree with the old (1.4 based libtool) I don't get any --tags options in Makefile.in, but if I run automake in the tree where I have the latest libtool then I see the --tag option used. So I guess just rerunning automake is sufficient to fix this problem. Steve Ellcey [EMAIL PROTECTED]
Re: GCC mini-summit - compiling for a particular architecture
> It came up in a few side conversations. As I understand it, RMS has > decreed that the -On optimizations shall be architecture independent. > That said, there are "generic" optimizations which really only apply > to a single architecture, so there is some precedent for bending this > rule. > > There were also suggestions of making the order of optimizations > command line configurable and allowing dynamically loaded libraries to > register new passes. > > Ollie This seems unfortunate. I was hoping I might be able to turn on loop unrolling for IA64 at -O2 to improve performance. I have only started looking into this idea but it seems to help performance quite a bit, though it is also increasing size quite a bit too so it may need some modification of the unrolling parameters to make it practical. I notice the OPTIMIZATION_OPTIONS documentation does say: | You should not use this macro to change options that are not | machine-specific. These should uniformly selected by the same | optimization level on all supported machines. Use this macro to enable | machine-specific optimizations. What is the rational for this? Is it a question of making it easier to reproduce a -O2 bug that happens on one machine on a different one too so it is easier to find and fix? Steve Ellcey [EMAIL PROTECTED]
Re: GCC mini-summit - benchmarks
Jim Wilson wrote: > Kenneth Hoste wrote: > > I'm not sure what 'tests' mean here... Are test cases being extracted > > from the SPEC CPU2006 sources? Or are you refering to the validity tests > > of the SPEC framework itself (to check whether the output generated by > > some binary conforms with their reference output)? > > The claim is that SPEC CPU2006 has source code bugs that cause it to > fail when compiled by gcc. We weren't given a specific list of problem. HJ, can you give us the specifics on the SPEC 2006 failures you were seeing? I remember the perlbench failure, it was IA64 specific, and was due to the SPEC config file spec_config.h that defines the attribute keyword to be null, thus eliminating all attributes. On IA64 Linux, in the /usr/include/bits/setjmp.h header file, the __jmp_buf buffer is defined to have an aligned attribute on it. If the buffer isn't aligned the perlbench program fails. I believe another problem was an uninitialized local variable in a Fortran program, but I don't recall which program or which variable that was. Steve Ellcey [EMAIL PROTECTED]
Problem with patch for PR tree-optimization/29789
Richard, Has anyone reported any problems with your tree-ssa-loop-im.c patch that fixes PR tree-optimization/29789? I have been looking at a failure with the SPECfp2000 173.applu test. I found that if I compile it with version r124041 of the GCC gfortran compiler it works but if I compile it with version r124042 it fails. The difference between the two is your checkin: 2007-04-22 Richard Guenther <[EMAIL PROTECTED]> PR tree-optimization/29789 * tree-ssa-loop-im.c (stmt_cost): Adjust cost of shifts. (rewrite_reciprocal): New helper split out from determine_invariantness_stmt. (rewrite_bittest): Likewise. (determine_invariantness_stmt): Rewrite (A >> B) & 1 to A & (1 << B) if (1 << B) is loop invariant but (A >> B) is not. To make things harder, the problem only seems to happen if I build bootstrap. If I build a non-bootstrap compiler then the applu test compiles and runs fine. If I build a bootstrap compiler, I can compile applu but the program core dumps when run. Do you have any ideas about what might be happening or what I might try in order to understand what is going wrong. Steve Ellcey [EMAIL PROTECTED]
How to handle g++.dg/warn/multiple-overflow-warn-3.C failure
I was wondering if anyone had some advice on how to handle the testcase g++.dg/warn/multiple-overflow-warn-3.C. The test case fails on my HP-UX platforms because the underlying type for wchar_t on HP-UX is 'unsigned int' and not 'int' like it is on Linux. This means that the expression does not overflow, we don't get a warning, and the test fails. I could just xfail/xskip it for HP-UX but other platforms use unsigned types for wchar_t and must be failing too. I was hoping for something a little more elegant. I thought of changing all the wchar_t's to int's but I think that might negate what the test is trying to check since there would be no implicit conversions in the code any more and the test would probably never have given multiple overflow warnings in the first case. Steve Ellcey [EMAIL PROTECTED] Test g++.dg/warn/multiple-overflow-warn-3.C: /* PR 30465 : Test for duplicated warnings in a conversion. */ /* { dg-do compile } */ /* { dg-options "-Woverflow" } */ wchar_t g (void) { wchar_t wc = ((wchar_t)1 << 31) - 1; /* { dg-bogus "overflow .* overflow" } */ /* { dg-warning "overflow" "" { target *-*-* } 8 } */ return wc; }
Re: RFC: obsolete __builtin_apply?
Andrew, are you still planning on applying the libobjc patch that removes the use of __builtin_apply? Steve Ellcey [EMAIL PROTECTED]
Re: IA64 record alignment rules, and modes?
> Question: If we assume that a TImode would've been a more efficient mode > to represent the record type above, would it not have been acceptable for > the compiler to promote the alignment of this type to 128, given there > are no apparent restrictions otherwise, or are there other C conventions > at work that dictate otherwise? Is there a configuration tweak that > would've led to using TImode rather than BLKmode? I think using TImode might work in this specific example but there are other cases where it would definitely not work. This is especially true on HP-UX, which is big-endian, and where the alignment of records and integers is different. I.e. passing a integer argument vs. passing a record containing a single integer field is different. And then there is the whole issue of HFA's (homogenous floating point aggregates) to consider. In general coming up with a specific set of criteria where an aggregate doesn't have to be treated as such is difficult on IA64. For more details about the IA64 ABI see: http://h21007.www2.hp.com/dspp/tech/tech_TechDocumentDetailPage_IDX/1,1701,3309,00.html Steve Ellcey [EMAIL PROTECTED]
PR 19893 & array_ref bug
I was looking at PR 19893 (gcc.dg/vect/vect-76 fails on ia64-hpux) and I think it is caused by a non-platform specific bug, though it may not cause vect-76 to fail on other platforms. I was hoping someone might be able to help me understand what is going on. Here is a cut down test case (with no vector stuff in it): typedef int aint __attribute__ ((__aligned__(16))); aint ib[12]; int ic[12], *x, *y; int main (void) { x = &ib[4]; y = &ic[4]; } If you look at the assembly language generated on IA64 (HP-UX or Linux) or probably on any platform, you will see that 'y' gets correctly set to the address of ic[4]. But 'x' gets set to ib[0], instead of ib[4]. Things look good in all the tree dumps but the first rtl dump looks bad so I believe things are going wrong during expansion. Looking in tree.def I see: /* Array indexing. Operand 0 is the array; operand 1 is a (single) array index. Operand 2, if present, is a copy of TYPE_MIN_VALUE of the index. Operand 3, if present, is the element size, measured in units of the alignment of the element type. */ DEFTREECODE (ARRAY_REF, "array_ref", tcc_reference, 4) Now I think this the problem is with operand 3. What value should it have if the alignment is greater than the element size? That is what I have in the test case above and when I dump the array_ref for ib[4] I see that I have an operand 3 and it is zero and I think this is causing the test failure. What value should operand 3 have in this situation? Or should it have been left out? Steve Ellcey [EMAIL PROTECTED]
Re: PR 19893 & array_ref bug
> This program should generate an error; it's illogical. If the alignment > of the element is greater than the element size, then arrays of such a > type should be disallowed. Otherwise, stuff in either the compiler or > the program itself could make the justified assumption that things of > that type are aligned more strictly than they actually are. > > -- > Mark Mitchell Interesting, I have created a patch (attached) that gives an error whenever we try to create an array of elements and the alignment of the elements is greater than the size of the elements. The problem I have, and the reason I haven't sent it to gcc-patches, is that it generates a bunch of regressions. The regressions are all due to bad tests but I am not sure how to fix the tests so that I can check in the patch. The regressions are in two places, gcc.dg/compat/struct-layout* and gcc.dg/vect/* Most of the gcc.dg/vect/* tests contain something like: typedef float afloat __attribute__ ((__aligned__(16))); afloat a[N]; The question is, since this is illegal, what should we use instead? I don't know if the alignment is an integral part of what is being tested or not since the tests have no comments in them. So I am not sure if we should just delete the alignment attribute or make it smaller. If we make it smaller we need to know the size of float in order to know if a particular alignment is legal or not. The gcc.dg/compat/struct-layout problems seem to stem from struct-layout-1_generate.c. In generate_fields() it generates random types, some of these are arrays of some base type. Then based on another random number we might add an attribute like alignment. There is no check to ensure that the alignment of the base type is less than or equal to the size of the base type in those instances where we are creating an array. I would be interested in any advice on the best way to fix these tests so that I can add my patch without causing regressions. Steve Ellcey [EMAIL PROTECTED] Here is the patch that checks for the alignment of array elements and that causes the regressions: 2005-03-15 Steve Ellcey <[EMAIL PROTECTED]> PR 19893 * stor-layout.c (layout_type): Add alignment check. *** gcc.orig/gcc/stor-layout.c Fri Mar 11 14:40:03 2005 --- gcc/gcc/stor-layout.c Tue Mar 15 15:46:02 2005 *** layout_type (tree type) *** 1632,1637 --- 1632,1643 build_pointer_type (element); + if (host_integerp (TYPE_SIZE_UNIT (element), 1) + && tree_low_cst (TYPE_SIZE_UNIT (element), 1) > 0 + && (HOST_WIDE_INT) TYPE_ALIGN_UNIT (element) + > tree_low_cst (TYPE_SIZE_UNIT (element), 1)) + error ("alignment of array elements is greater than element size"); + /* We need to know both bounds in order to compute the size. */ if (index && TYPE_MAX_VALUE (index) && TYPE_MIN_VALUE (index) && TYPE_SIZE (element))
Re: PR 19893 & array_ref bug
> > The gcc.dg/compat/struct-layout problems seem to stem from > > struct-layout-1_generate.c. In generate_fields() it generates random > > types, some of these are arrays of some base type. Then based on > > another random number we might add an attribute like alignment. There > > is no check to ensure that the alignment of the base type is less than or > > equal to the size of the base type in those instances where we are > > creating an array. > > That could be fixed by adding the check you suggest, and then just > discarding the attribute. I don't know if I have enough information to implement a test that ignores the attribute only when the alignment is greater than the size. Some of the attributes use __aligned__ with no value and that defaults to whatever the maximum alignment is for the platform you are running on and I don't know if I can determine that while running struct-layout-1_generate. The simplest solution would probably be to ignore __aligned__ attributes completely when we have an array. Or to do the change you suggested for the vector tests and have the attribute attached to the array and not the element type. Steve Ellcey [EMAIL PROTECTED]
Re: PR 19893 & array_ref bug
What do people think about this idea for changing the vect tests using gcc.dg/vect/vect-56.c as an example. The arguments (pa, pb, pc) would remain afloat type (vs. float) but the arrays would be changed from 'array of aligned floats' to an array of floats where the actual array itself is aligned. It seems like we are lying about the alignment of the pa, pb, pc arguments but I don't see a way around this. If we changed GCC to pad the array elements (in order to obey the alignment request) wouldn't we actually break our ability to vectorize things? Steve Ellcey [EMAIL PROTECTED] *** vect-56.c.orig Wed Mar 16 11:38:49 2005 --- vect-56.c Wed Mar 16 11:39:46 2005 *** main1 (afloat * __restrict__ pa, afloat *** 40,48 int main (void) { int i; ! afloat a[N]; ! afloat b[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57}; ! afloat c[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}; check_vect (); --- 40,50 int main (void) { int i; ! float a[N] __attribute__ ((__aligned__(16))); ! float b[N] __attribute__ ((__aligned__(16))) = ! {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57}; ! float c[N] __attribute__ ((__aligned__(16))) = ! {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}; check_vect ();
Re: PR 19893 & array_ref bug
> From: Gabriel Dos Reis <[EMAIL PROTECTED]> > | > | Make them array arguments, instead of pointer arguments. I'm not sure > | if GCC is smart enough to still vectorize them in that case, but > | that's the right way to express it. An aligned array-of-floats decays > | to an aligned pointer-to-float, i.e., the pointer is known to be > | aligned, but the object pointed to is just a float not an aligned > | float. > > Agreed. > > -- Gaby But as Joseph pointed out we don't implement attributes on array arguments so I get a warning when I try to use the __restrict__ attribute on the array arguments. Without the __restrict__ attribute I am sure we would not do any vectorization and then what is the point of the test? Steve Ellcey [EMAIL PROTECTED]
GCC3 to GCC4 performance regression. Bug?
I have been looking at a significant performance regression in the hmmer application between GCC 3.4 and GCC 4.0. I have a small cutdown test case (attached) that demonstrates the problem and which runs more than 10% slower on IA64 (HP-UX or Linux) when compiled with GCC 4.0 than when compiled with GCC 3.4. At first I thought this was just due to 'better' alias analysis in the P7Viterbi routine and that it was the right thing to do even if it was slower. It looked like GCC 3.4 does not believe that hmm->tsc could alias mmx but GCC 4.0 thinks they could and thus GCC 4.0 does more loads inside the inner loop of P7Viterbi. But then I noticed something weird, if I remove the field M (which is unused in my example) from the plan_s structure. GCC 4.0 runs as fast as GCC 3.4. I don't understand why this would affect things. Any optimization experts care to take a look at this test case and help me understand what is going on and if this change from 3.4 to 4.0 is intentional or not? Steve Ellcey [EMAIL PROTECTED] Test Case --- #define L_CONST 500 void *malloc(long size); struct plan7_s { int M; int **tsc; /* transition scores [0.6][1.M-1]*/ }; struct dpmatrix_s { int **mmx; }; struct dpmatrix_s *mx; void AllocPlan7Body(struct plan7_s *hmm, int M) { int i; hmm->tsc= malloc (7 * sizeof(int *)); hmm->tsc[0] = malloc ((M+16) * sizeof(int)); mx->mmx = (int **) malloc(sizeof(int *) * (L_CONST+1)); for (i = 0; i <= L_CONST; i++) { mx->mmx[i] = malloc (M+2+16); } return; } void P7Viterbi(int L, int M, struct plan7_s *hmm, int **mmx) { int i,k; for (i = 1; i <= L; i++) { for (k = 1; k <= M; k++) { mmx[i][k] = mmx[i-1][k-1] + hmm->tsc[0][k-1]; } } } main () { struct plan7_s *hmm; char dsq[L_CONST]; int i; hmm = (struct plan7_s *) malloc (sizeof (struct plan7_s)); mx = (struct dpmatrix_s *) malloc (sizeof (struct dpmatrix_s)); AllocPlan7Body(hmm, 10); for (i = 0; i < 60; i++) { P7Viterbi(500, 10, hmm, mx->mmx); } }
IA64 Pointer conversion question / convert code already wrong?
I am looking at a bug/oddity in the HP-UX IA64 GCC compiler in ILP32 mode. Here is some code (cut out from libffi): typedef void *PTR64 __attribute__((mode(DI))); extern void bar(PTR64); void foo(void * x) { bar(x); } Now the issue is whether or not this is legal and how x should get extended. I am assuming that it is legal and that, on IA64, we would like the pointer extended via the addp4 instruction. When I do not optimize this program I do not get any addp4 instructions, when I do optimize the program I do get the desired addp4 instructions. I believe the problem in the unoptomized case is in expand_expr_real_1, where we have: case NON_LVALUE_EXPR: case NOP_EXPR: case CONVERT_EXPR: . . . else if (modifier == EXPAND_INITIALIZER) op0 = gen_rtx_fmt_e (unsignedp ? ZERO_EXTEND : SIGN_EXTEND, mode, op0); else if (target == 0) op0 = convert_to_mode (mode, op0, TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, 0; else { convert_move (target, op0, TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, 0; op0 = target; } The EXPAND_INITIALIZER if looks wrong (for IA64) because it assumes that ZERO_EXTEND and SIGN_EXTEND are the only possibilities and and if op0 is a pointer then we have a third possibility for ia64. Is the use of gen_rtx_fmt_e an optimization that could be replaced by convert_to_mode or convert_move or is there some underlying reason why that has to be a gen_rtx_fmt_e call for an initializer? The existing convert_to_mode and convert_move calls look suspicious to me too because they use the TYPE_UNSIGNED macro to determine wether to do signed or unsigned extensions and I am not sure if that would be set correctly for pointer types based on a platforms setting of POINTERS_EXTEND_UNSIGNED. Anyone have any insights? Steve Ellcey [EMAIL PROTECTED]
Re: IA64 Pointer conversion question / convert code already wrong?
> This is a conversion between what, two pointer types? Yes. From 'void *' to 'void * __attribute__((mode(DI)))' where the first is 32 bits (HP-UX ILP32 mode) and the second is 64 bits. > If so, I think there should be a special case here to check for converting > between two pointer types and call convert_memory_address if so. I don't know why I didn't think of using convert_memory_address. I just tried it and it seems to work in my test case. I will do a bootstrap and test overnight to see how that goes. > Also, I think convert_memory_address ought to have a > gcc_assert (GET_MODE (x) == to_mode); > in the #ifndef case. OK, I'll toss that in too. It won't be seen on the HP-UX side but I'll do a Linux build as well. Steve Ellcey [EMAIL PROTECTED]
Re: IA64 Pointer conversion question / convert code already wrong?
> Also, I think convert_memory_address ought to have a > gcc_assert (GET_MODE (x) == to_mode); > in the #ifndef case. Interesting, I put this assertion in my code and I now cannot bootstrap on HPPA. Looking at the HPPA builds (where POINTERS_EXTEND_UNSIGNED if not defined) I see the assertion fail because I enter convert_memory_address with to_mode set to SImode and x set to '(const_int 0 [0x0])'. The call to convert_memory_address is being made from memory_address (explow.c:404). I am not sure if this is a bug, or if convert_memory_address should allow this by doing nothing (current behaviour) or if convert_memory_address should be changed so that it does the same conversion on const_int values when POINTERS_EXTEND_UNSIGNED is undefined as it does when POINTERS_EXTEND_UNSIGNED is defined. Steve Ellcey [EMAIL PROTECTED]
How can I write an empty conversion instruction
I was wondering if anyone could tell me how to write an (empty) instruction pattern that does a truncate/extend conversion on a register 'in place'. All the conversions I see are like this one in ia64/ia64.md: (define_insn "extendsfdf2" [(set (match_operand:DF 0 "fr_register_operand" "=f") (float_extend:DF (match_operand:SF 1 "fr_register_operand" "f")))] "" "fnorm.d %0 = %1" [(set_attr "itanium_class" "fmac")]) Where the source and the destination may or may not be the same register. I am trying to create an empty extend operation I can use to 'convert' a SFmode register into a DFmode register without actually generating any code. Since I don't want this extend called in place of the normal one I defined it as an UNSPEC operation instead of a float_extend operation and since it doesn't generate any code and it cannot move the result from one register to another I need to define it with only one operand. But my attempt to do this doesn't seem to work and I was wondering if anyone could tell me why or perhaps point me to an example of an instruction that does a conversion in place that might help me understand how to write such an instruction. My attempt: (define_insn "nop_extendsfdf" [(set (match_operand:DF 0 "fr_register_operand" "+f") (unspec:DF [(match_dup:SF 0)] UNSPEC_NOP_EXTEND))] "" "" [(set_attr "itanium_class" "ignore") (set_attr "predicable" "no") (set_attr "empty" "yes")]) I think the match_dup may be wrong since I am using it with SF but the original match_operand has DF. Do I need to make this modeless? Or is there some other way to create an empty conversion instruction. Steve Ellcey [EMAIL PROTECTED]
Re: How can I write an empty conversion instruction
> You might want to try this instead: > > [(set (match_operand:DF 0 "fr_register_operand" "=f") > (unspec:DF [(match_operand:SF 0 "fr_register_operand" "0")] > UNSPEC_NOP_EXTEND))] > > -- > Daniel Jacobowitz > CodeSourcery, LLC Nope. GCC doesn't like seeing two match_operand's for op 0. Steve Ellcey [EMAIL PROTECTED]
vector alignment question
I noticed that vectors are always aligned based on their size, i.e. an 8 byte vector has an aligment of 8 bytes, 16 byte vectors an alignment of 16, a 256 byte vector an alignment of 256, etc. Is this really intended? I looked in stor-layout.c and found: /* Always naturally align vectors. This prevents ABI changes depending on whether or not native vector modes are supported. */ TYPE_ALIGN (type) = tree_low_cst (TYPE_SIZE (type), 0); so it seems to be intentional, but it still seems odd to me, especially for very large vectors. Steve Ellcey [EMAIL PROTECTED]
Re: vector alignment question
> On Wed, Jun 08, 2005 at 12:50:32PM -0700, Steve Ellcey wrote: > > I noticed that vectors are always aligned based on their size, i.e. an > > 8 byte vector has an aligment of 8 bytes, 16 byte vectors an alignment > > of 16, a 256 byte vector an alignment of 256, etc. > > > > Is this really intended? > > Yes. > > > so it seems to be intentional, but it still seems odd to me, especially > > for very large vectors. > > Hardware usually requires such alignment. Most folk don't use vectors > larger than some bit of hardware supports. One wouldn't want the ABI > to depend on whether that bit of hardware were actually present, IMO. > > r~ I guess that makes sense but I wonder if the default alignment should be set to "MIN (size of vector, BIGGEST_ALIGNMENT)" instead so that we don't default to an alignment larger than we know we can support. Or perhaps there should be a way to override the default alignment for vectors on systems that don't require natural alignment. Steve Ellcey [EMAIL PROTECTED]
Re: MEMBER_TYPE_FORCES_BLK on IA-64/HP-UX
> Steve Ellcey defined MEMBER_TYPE_FORCES_BLK when he first implemented > the ia64-hpux port. At the time, I mentioned using PARALLELs was a > better solution, but this was a simpler way for him to get the initial > port working. Since then, there have been a lot of bug fixes to the > ia64-hpux support by various people: Steve, Zack, Joseph, etc. Looking > at the current code, it does appear that all cases are now handled by > PARALLELs, and that the definition of MEMBER_TYPE_FORCES_BLK no longer > appears to be necessary. > > I don't have an ia64-hpux machine, so there is no easy way for me to > test this change. > -- > Jim Wilson, GNU Tools Support, http://www.SpecifixInc.com I am concerned about the use of MEMBER_TYPE_FORCES_BLK in stor-layout.c. I believe that, if MEMBER_TYPE_FORCES_BLK is not defined, this code will change the mode of a structure containing a single field from BLKmode into the mode of the field. I think this might mess up the parameter passing of structures that contain a single field, particularly when that field is smaller than 64 bits, like a single char, an int, or a float. I would definitely want to check the parameter passing of small single field structures before removing MEMBER_TYPE_FORCES_BLK on ia64-hpux. Steve Ellcey [EMAIL PROTECTED]
GCC testsuite timeout question (gcc.c-torture/compile/20001226-1.c)
I was looking at a failure of the test gcc.c-torture/compile/20001226-1.c on one of my machines and I see that it is timing out on a slow machine that I have. I tried to look around to find out how and where the timeout limit was set and could not find it. Can someone explain to me how much time a compile is given and where this limit is set? By hand, I can compile the test in about 3 1/2 minutes on the machine in question (the machine may have been busier when the failure occured and thus taken longer). Steve Ellcey [EMAIL PROTECTED]
Re: GCC testsuite timeout question (gcc.c-torture/compile/20001226-1.c)
> > By hand, I can compile the test in about 3 1/2 minutes on the machine in > > question (the machine may have been busier when the failure occured and thus > > taken longer). > > I think it's a real regression (memory consumption/speed) of the compiler, it > is timing out on all the slow SPARC machines I have (it is OK with 4.0.x). > IIRC I observed the same regression between 3.2.x and 3.3.x on even slower > machines, but 3.4.x fixed it. > > -- > Eric Botcazou Yes, I think you are right. I can see a substantial slowdown in compilation times on IA64 HP-UX at -O2 (though it doesn't time out there). gcc 4.0.0 - 81 seconds gcc 3.4.1 - 38 seconds gcc 3.4.0 - 37 seconds gcc 3.3.5 - 89 seconds gcc 3.3.1 - 91 seconds 3.3 is slow, 3.4 is faster, 4.0.0 seems slow agin, I don't have 4.0.* hanging around to test. Looking at a timing report based on the 4.0.0 compiler it looks like half the compile time is spent in the phase "dominance frontiers". I will investigate some more. Steve Ellcey [EMAIL PROTECTED]
Request for testsuite help (gcc.dg/compat)
I was wondering if I could get some help/advice from a testsuite expert. I have a patch that I want to submit that makes sure elements of an array are not given an alignment greater than their size. See http://gcc.gnu.org/ml/gcc/2005-03/msg00729.html This test was causing a bunch of regressions, most of which have been fixed now by Jakub and Dorit. But the patch still causes a couple of regressions in the gcc.dg/compat tests that I have been unable to fix. The failures I get are: FAIL: tmpdir-gcc.dg-struct-layout-1/t002 c_compat_x_tst.o compile FAIL: tmpdir-gcc.dg-struct-layout-1/t002 c_compat_y_tst.o compile FAIL: tmpdir-gcc.dg-struct-layout-1/t027 c_compat_x_tst.o compile FAIL: tmpdir-gcc.dg-struct-layout-1/t027 c_compat_y_tst.o compile There used to be more layout failures but Jakub submitted a patch earlier (May 2005) that fixed all but these. I know that the gcc.dg-struct-layout-1_generate program creates a t002_test.h header file and that that file contains: T(582,void * atal8 a[2];double b;unsigned short int c;,F(582,a[0],(void *)&intarray[78],(void *)&intarray[187])F(582,b,198407.656250,218547.203125)F(582,c,55499U,5980U)) and that atal8 is a define for "__attribute__((aligned (8)))" which means that we get "void * __attribute__((aligned (8))) a[2];" and that is what is causing the problem (8 byte alignement of the elements in an array where the elements are only 4 bytes long. But what I have not been able to do is to figure out how to get gcc.dg-struct-layout-1_generate to stop generating this type. Even after looking at Jakubs patch that fixed the other layout failures, I haven't been able to come up with a fix. Can anyone help me with this? Steve Ellcey [EMAIL PROTECTED]
RFC: IPO optimization framework for GCC
I have been given some time by my management to work on creating a framework for IPO optimizations in GCC by creating an intermediate file reader and writer for GCC. I would like to start by getting any input and advice the members of the GCC community might have for me. I would also like to see if I can get some names of folks who might be interested in helping or advising me on this project. My current thought is that if I can get a start made I would create a branch for this work in CVS and a project page on the GCC Wiki. In the meantime I would be interested in any opinions people have on what level we should be writing things out at. Generic? Gimple? RTL? (Just kidding on that last one.) Also any opinions on what format to write things out in; binary form vs. an ascii file, XML? ANDF? If you know of any good papers I should read I would like to hear about those too. Steve Ellcey [EMAIL PROTECTED]
Re: RFC: IPO optimization framework for GCC
Thanks to everyone who replied to my mail, I am currently waiting for some follow-ups to replies I got off-list. In the mean time I wonder if we could talk about Devang's questions on what this might look like to a user. > From: Devang Patel <[EMAIL PROTECTED]> > > It is useful to get clear understanding of few simpler things before > tackling IL issue. > > First question is - What is the user interface ? Few alternatives : > > 1) gcc -fenable-ipo input1.c input2.c input3.c -o output > > Here, writing IL on the disk, and reading it back, and optimizing it, > etc.. are all hidden from users. But at the cost of having to put all the source compiles on one GCC command line. We could probably do this today without reading or writing anything to disk (as long as we didn't run out of memory). > 2) gcc -fwrite-ipo input1.c -o input1.data > gcc -fwrite-ipo input2.c -o input2.data > gcc -fwrite-ipo input3.c -o input3.data > > gcc -fread-ipo input1.data input2.data input3.data -o output > > 3) gcc -fwrite-ipo input1.c -o input1.data > gcc -fuse-ipo input1.data input2.c -o input2.data > gcc -fuse-ipo input2.data input3.c -o output > > 4) gcc -fwrite-ipo input1.c -o input1.data > gcc -fwrite-ipo input2.c -o input2.data > gcc -fwrite-ipo input3.c -o input3.data > > glo -fread-ipo input1.data input2.data input3.data -o output Could we just have -fwrite-ipo create a '.o' file that contains the intermediate representation (instead of being a real object file). Then when the linker is called it would call the compiler with all the files that have intermediate code instead of object code and finish up the compilation. Actually, maybe we could add the restriction that you have to use GCC to call the linker when doing IPO and that way GCC could finish up the compilations before it calls the linker. > Second question is - When to put info on the disk? Few alternatives, > 1) Before gimplfication > 2) Before optimizing tree-ssa > 3) After tree-ssa optimization is complete > 4) Immediately after generating RTL > 5) Halfway throuh RTL passes > etc.. And answer to this question largely depend on the optimization > passes that work on whole program info. I would think one would want to put the info out before optimizing tree-ssa since you would hope that the IPO data from other modules would let you do better tree-ssa optimizations. > I do not know whether these two questions are already answered or not. I don't think anything has been answered yet. Steve Ellcey [EMAIL PROTECTED]
Subversion and firewalls question
Anyone have advice on how to get subversion working through a corporate firewall. Currently I get: | /usr/local/bin/svn co svn+ssh://gcc.gnu.org:/svn/gcc/trunk | ssh: gcc.gnu.org:: no address associated with hostname. | svn: Connection closed unexpectedly I have cvs working, I ran socksify on cvs and ssh and that seemed to work fine for those commands and I can do checkout/checkins with cvs. When I try to socksify svn, I get an error: [hpsje - sje_gcc_cmo] (root) $ /opt/socks/bin/socksify /usr/local/bin/svn /usr/local/bin/svn->/opt/socks/bin/svn ... Found nothing to change. I think this might be because the library calls that need to be intercepted by socks are not in svn but in a dynamic library that is linked in by svn. It looks like the neon subdirectory in svn understands --with-socks= but I don't have a socks.h header file as part of my socks installation. Is there an GNU Socks package I can build? I see Dante, is that what I want? Is using --with-socks on my subversion build the right way to be attacking this problem? I am trying to get this to work from my HP-UX box, if that makes a difference. Steve Ellcey [EMAIL PROTECTED]
Re: Subversion and firewalls question
> > Currently I get: > > > > | /usr/local/bin/svn co svn+ssh://gcc.gnu.org:/svn/gcc/trunk > > | ssh: gcc.gnu.org:: no address associated with hostname. > > | svn: Connection closed unexpectedly > > This one might easy. > > You added a : at the end of gcc.gnu.org :) Blush It worked. Steve Ellcey [EMAIL PROTECTED]
Re: Excess precision problem on IA-64
> > This seems like any other target which has a fused multiply and add > > instruction like PPC. Maybe a target option to turn on and off the fma > > instruction like there is for PPC. > > I'm under the impression that it's worse on IA-64 because of the "infinite > precision", but I might be wrong. > > -- > Eric Botcazou The HP compiler generates fused multiply and add by default and has several settings for the +Ofltacc option to control this (and other optimizations that affect floating point accuracy). +Ofltacc=default Allows contractions, such as fused multiply-add (FMA), but disallows any other floating point optimization that can result in numerical differences. +Ofltacc=limited Like default, but also allows floating point optimizations which may affect the generation and propagation of infinities, NaNs, and the sign of zero. +Ofltacc=relaxed In addition to the optimizations allowed by limited, permits optimizations, such as reordering of expressions, even if parenthesized, that may affect rounding error. This is the same as +Onofltacc. +Ofltacc=strict Disallows any floating point optimization that can result in numerical differences. This is the same as +Ofltacc. It would be easy enough to add an option that turned off the use of the fused multiply and add in GCC but I would hate to see its use turned off by default. Steve Ellcey [EMAIL PROTECTED]
Re: Re: Does gcc-3.4.3 for HP-UX 11.23/IA-64 work?
> > As mentioned before, there is a brace missing after the gcc_s_hpux64. > > This brace is needed to close off the shared-libgcc rule before the > > static-libgcc rule starts. You then must delete a brace from the end of > > the !static rule which has one too many. > > Yes, doing so gives the correct 'gcc -shared' output. I am not convinced there is a bug here. I think there may have been a deliberate change between 3.4.* and 4.* about whether or not '-shared' implied '-shared-libgcc', particularly for C code. I notice that if I compile using 3.4.4 and use '-shared -shared-libgcc' instead of just '-shared' then it works as you want. Steve Ellcey [EMAIL PROTECTED]
Re: GMP on IA64-HPUX
> > > So, in short, my questions are: is gmp-4.1.4 supposed to work on > > > ia64-hpux? > > > > > > No, it is not. It might be possible to get either the LP64 or > > > the ILP32 ABI to work, but even that requires the workaround you > > > mention. Don't expect any HP compiler to compile GMP correctly > > > though, unless you switch off optimization. > > > > > If it's really compiler problems, this is one more reason for pulling > gmp to the toplevel gcc, so it can be built with a sane compiler. > > Richard. FYI: What I do to compile gmp on IA64 HP-UX is to configure gmp with '--host=none --target=none --build=none'. This avoids all the target specific code. I am sure the performance stinks this way but since it is used by the compiler and not in the run-time I haven't found it to be a problem. Of course I don't compile any big fortran programs either. Steve Ellcey [EMAIL PROTECTED]
GCC 3.4.5 status?
Has GCC 3.4.5 been officially released? I don't recall seeing an announcement in gcc@gcc.gnu.org or [EMAIL PROTECTED] and when I looked on the main GCC page and I see references to GCC 3.4.4 but not 3.4.5. But I do see a 3.4.5 download on the GCC mirror site that I checked and I see a gcc_3_4_5_release tag in the SVN tags directory. I also notice we have a "Releases" link under "About GCC" in the top left corner of the main GCC page that doesn't look like it has been updated in quite a while for any releases. Should this be updated or removed? Steve Ellcey [EMAIL PROTECTED]
Re: GCC can't stop using GNU libiconv if it's in /usr/local
> IMHO, the fact that GCC includes /usr/local/include by default in it's > system header search path is brain damaged, but it's probably way too > entrenched to revisit that. :-( > > --Kaveh > -- > Kaveh R. Ghazi[EMAIL PROTECTED] You can stop this by specifying --with-local-prefix=/not-usr-local when configuring GCC. I have built a GCC into a location like /be by specifying both --prefix=/be and --with-local-prefix=/be This GCC does not look in /usr/local/include (but does search /be/include). Steve Ellcey [EMAIL PROTECTED]
Question about DRAP register and reserving hard registers
I have a question about the DRAP register (used for dynamic stack alignment) and about reserving/using hard registers in general. I am trying to understand where, if a drap register is allocated, GCC is told not to use it during general register allocation. There must be some code somewhere for this but I cannot find it. I am trying to implement dynamic stack alignment on MIPS and because there is so much code for the x86 dynamic stack alignment I am trying to incorporate bits of it as I understrand what I need instead of just turning it all on at once and getting completely lost. Right now I am using register 16 on MIPS to access incoming arguments in a function that needs dynamic alignment, so it is my drap register if my understanding of the x86 code and its use of a DRAP register is correct. I copy the stack pointer into reg 16 before I align the stack pointer (during expand_prologue). So far the only way I have found to stop the register allocator from also using reg 16 and thus messing up its value is to set fixed_regs[16]. But I don't see the x86 doing this for its DRAP register and I was wondering how it is handled there. I think setting fixed_regs[16] is why C++ tests with exception handling are not working for me because this register is not getting set and restored (since it is thought to be fixed) during code that uses throw and catch. Steve Ellcey sell...@imgtec.com
Re: Question about DRAP register and reserving hard registers
On Fri, 2015-06-19 at 09:09 -0400, Richard Henderson wrote: > On 06/16/2015 07:05 PM, Steve Ellcey wrote: > > > > I have a question about the DRAP register (used for dynamic stack alignment) > > and about reserving/using hard registers in general. I am trying to > > understand > > where, if a drap register is allocated, GCC is told not to use it during > > general register allocation. There must be some code somewhere for this > > but I cannot find it. > > There isn't. Because the vDRAP register is a pseudo. The DRAP register is > only live from somewhere in the middle of the prologue to the end of the > prologue. > > See ix86_get_drap_rtx, wherein we coordinate with the to-be-generated > prologue > (crtl->drap_reg), allocate the pseudo, and emit the hard-reg-to-pseudo copy > at > entry_of_function. > > > r~ OK, that makes more sense now. In my work on MIPS I was trying to cut out some of the complexity of the x86 implementation and just use a hard register as my DRAP register. One of the issues I ran into, and perhaps the one that caused x86 to use a virtual register, was saving and restoring the register during setjmp/longjmp and C++ exception handling usage. I will trying switching to a virtual register and see if that works better. Other than exceptions, the main complexity in dynamic stack alignment seems to involve the debug information. I am still trying to understand the handling of the drap register and dynamic stack alignment in dwarf2out.c and dwarf2cfi.c. Steve Ellcey sell...@imgtec.com
Re: Question about DRAP register and reserving hard registers
On Fri, 2015-06-19 at 09:09 -0400, Richard Henderson wrote: > On 06/16/2015 07:05 PM, Steve Ellcey wrote: > > > > I have a question about the DRAP register (used for dynamic stack alignment) > > and about reserving/using hard registers in general. I am trying to > > understand > > where, if a drap register is allocated, GCC is told not to use it during > > general register allocation. There must be some code somewhere for this > > but I cannot find it. > > There isn't. Because the vDRAP register is a pseudo. The DRAP register is > only live from somewhere in the middle of the prologue to the end of the > prologue. > > See ix86_get_drap_rtx, wherein we coordinate with the to-be-generated > prologue > (crtl->drap_reg), allocate the pseudo, and emit the hard-reg-to-pseudo copy > at > entry_of_function. > > > r~ OK, I think I have this part of the code working on MIPS but crtl->drap_reg is used in the epilogue as well as the prologue even if it is not 'live' in between. If I understand the code correctly the x86 prologue pushes the drap register on to the stack so that the epilogue can pop it off and use it to restore the stack pointer. Is my understanding correct? I also need the drap pointer in the MIPS epilogue but I would like to avoid having to get it from memory. Ideally I would like to restore it from the virtual register that the prologue code / get_drap_rtx code put it into. I tried just doing a move from the virtual drap register to the real one in expand_epilogue but that didn't work because it looks like you can't access virtual registers from expand_prologue or expand_epilogue. I guess that is why the code to copy the hard drap reg to the virtual drap_reg is done in get_drap_reg and not in expand_prologue. I thought about putting code in get_drap_reg to do this copying but I don't see how to access the end of a function. The hard drap reg to virtual drap reg copy is inserted into the beginning of a function with: insn = emit_insn_before (seq, NEXT_INSN (entry_of_function ())); Is there an equivalent method to insert code to the end of a function? I don't see an 'end_of_function ()' routine anywhere. Steve Ellcey sell...@imgtec.com
Re: Question about DRAP register and reserving hard registers
On Mon, 2015-06-29 at 11:10 +0100, Richard Henderson wrote: > > I also need the drap pointer in the MIPS epilogue but I would like to > > avoid having to get it from memory. Ideally I would like to restore it > > from the virtual register that the prologue code / get_drap_rtx code put > > it into. I tried just doing a move from the virtual drap register to > > the real one in expand_epilogue but that didn't work because it looks > > like you can't access virtual registers from expand_prologue or > > expand_epilogue. I guess that is why the code to copy the hard drap reg > > to the virtual drap_reg is done in get_drap_reg and not in > > expand_prologue. I thought about putting code in get_drap_reg to do > > this copying but I don't see how to access the end of a function. The > > hard drap reg to virtual drap reg copy is inserted into the beginning of > > a function with: > > > > insn = emit_insn_before (seq, NEXT_INSN (entry_of_function ())); > > > > Is there an equivalent method to insert code to the end of a function? > > I don't see an 'end_of_function ()' routine anywhere. > > Because, while generating initial rtl for a function, the beginning of a > function has already been emitted, while the end of the function hasn't. > > You'd need to hook into expand_function_end, right at the bottom, before the > call to use_return_register. > > > r~ I ran into an interesting issue while doing this. Right now the expand pass calls construct_exit_block (which calls expand_function_end) before it calls expand_stack_alignment. That means that crtl->drap_reg, etc are not yet set up when in expand_function_end. I moved the expand_stack_alignment call up before construct_exit_block to fix that. I hope moving it up doesn't break anything. Steve Ellcey sell...@imgtec.com
Re: Question about DRAP register and reserving hard registers
On Mon, 2015-06-29 at 11:10 +0100, Richard Henderson wrote: > > OK, I think I have this part of the code working on MIPS but > > crtl->drap_reg is used in the epilogue as well as the prologue even if > > it is not 'live' in between. If I understand the code correctly the x86 > > prologue pushes the drap register on to the stack so that the epilogue > > can pop it off and use it to restore the stack pointer. Is my > > understanding correct? > > Yes. Although that saved copy is also used by unwind info. Do you know how and where this saved copy is used by the unwind info? I don't see any indication that the unwind library knows if a stack has been dynamically realigned and I don't see where unwind makes use of this value. Steve Ellcey sell...@imgtec.com
Basic GCC testing question
I have a basic GCC testing question. I built a native GCC and ran: make RUNTESTFLAGS='dg.exp' check Everything passed and according to the log file it used the unix.exp as the target-board. But if I try running: make RUNTESTFLAGS='dg.exp --target-board=unix' check Then I get failures. They both say they are running target unix. If I diff the two log files I see: 1,2c1,3 < Test Run By sellcey on Fri Jul 10 10:13:21 2015 < Native configuration is x86_64-unknown-linux-gnu --- > Test Run By sellcey on Fri Jul 10 09:52:41 2015 > Target is unix > Host is x86_64-unknown-linux-gnu 12a14,15 > WARNING: Assuming target board is the local machine (which is probably wrong). > You may need to set your DEJAGNU environment variable. The reason I want to specify a target-board is so I can then modify it with something like '--target-board=unix/-m32' but I think I need to specify a board before I add any options don't I? Steve Ellcey sell...@imgtec.com
Re: Basic GCC testing question
On Fri, 2015-07-10 at 14:27 -0500, Segher Boessenkool wrote: > On Fri, Jul 10, 2015 at 10:43:43AM -0700, Steve Ellcey wrote: > > > > I have a basic GCC testing question. I built a native GCC and ran: > > > > make RUNTESTFLAGS='dg.exp' check > > > > Everything passed and according to the log file it used the unix.exp > > as the target-board. But if I try running: > > > > make RUNTESTFLAGS='dg.exp --target-board=unix' check > > Does it work better if you spell --target_board ? > > > Segher Arg, I hate it when I do something stupid like that. It would be ince if runtest gave an error message when it had a bad/unknown argument, but if it does I didn't see it anywhere. Steve Ellcey
CFI directives and dynamic stack alignment
I don't know if there are any CFI experts out there but I am working on dynamic stack alignment for MIPS. I think I have it working in the 'normal' case but when I try to do stack unwinding through a routine with an aligned stack, then I have problems. I was wondering if someone can help me understand what CFI directives to generate to allow stack unwinding. Using gcc.dg/cleanup-8.c as an example (because it fails with my stack alignment code), if I generate code with no dynamic stack alignment (but forcing the use of the frame pointer), the routine fn2 looks like this on MIPS: fn2: .frame $fp,32,$31 # vars= 0, regs= 2/0, args= 16, gp= 8 .mask 0xc000,-4 .fmask 0x,0 .setnoreorder .setnomacro lui $2,%hi(null) addiu $sp,$sp,-32 .cfi_def_cfa_offset 32 lw $2,%lo(null)($2) sw $fp,24($sp) .cfi_offset 30, -8 move$fp,$sp .cfi_def_cfa_register 30 sw $31,28($sp) .cfi_offset 31, -4 jal abort sb $0,0($2) There are .cfi directives when incrementing the stack pointer, saving the frame pointer, and copying the stack pointer to the frame pointer. When I generate code to dynamically align the stack my code looks like this: fn2: .frame $fp,32,$31 # vars= 0, regs= 2/0, args= 16, gp= 8 .mask 0xc000,-4 .fmask 0x,0 .setnoreorder .setnomacro lui $2,%hi(null) li $3,-16 # 0xfff0 lw $2,%lo(null)($2) and $sp,$sp,$3 addiu $sp,$sp,-32 .cfi_def_cfa_offset 32 sw $fp,24($sp) .cfi_offset 30, -8 move$fp,$sp .cfi_def_cfa_register 30 sw $31,28($sp) .cfi_offset 31, -4 jal abort sb $0,0($2) The 'and' instruction is where the stack gets aligned and if I remove that one instruction, everything works. I think I need to put out some new CFI psuedo-ops to handle this but I am not sure what they should be. I am just not very familiar with the CFI directives. I looked at ix86_emit_save_reg_using_mov where there is some special code for handling the drap register and for saving registers on a realigned stack but I don't really understand what they are trying to do. Any help? Steve Ellcey sell...@imgtec.com P.S. For completeness sake I have attached my current dynamic alignment changes in case anyone wants to see them. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 4f9a31d..386c2ce 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -5737,6 +5737,29 @@ expand_stack_alignment (void) gcc_assert (targetm.calls.get_drap_rtx != NULL); drap_rtx = targetm.calls.get_drap_rtx (); + /* I am not doing this in get_drap_rtx because we are also calling + that from expand_function_end in order to get/set the drap_reg + and vdrap_reg variables and doing these instructions at that + point is not working. */ + + if (drap_rtx != NULL_RTX) +{ + rtx_insn *insn, *seq; + + start_sequence (); + emit_move_insn (crtl->vdrap_reg, crtl->drap_reg); + seq = get_insns (); + insn = get_last_insn (); + end_sequence (); + emit_insn_at_entry (seq); + if (!optimize) +{ + add_reg_note (insn, REG_CFA_SET_VDRAP, crtl->vdrap_reg); + RTX_FRAME_RELATED_P (insn) = 1; +} +} + + /* stack_realign_drap and drap_rtx must match. */ gcc_assert ((stack_realign_drap != 0) == (drap_rtx != NULL)); diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index ce21a0f..b6ab30a 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -746,6 +746,8 @@ static const struct attribute_spec mips_attribute_table[] = { { "use_shadow_register_set", 0, 0, false, true, true, NULL, false }, { "keep_interrupts_masked", 0, 0, false, true, true, NULL, false }, { "use_debug_exception_return", 0, 0, false, true, true, NULL, false }, + { "align_stack", 0, 0, true, false, false, NULL, false }, + { "no_align_stack", 0, 0, true, false, false, NULL, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -1528,6 +1530,61 @@ mips_merge_decl_attributes (tree olddecl, tree newdecl) DECL_ATTRIBUTES (newdecl)); } +static bool +mips_cfun_has_msa_p (void) +{ + /* For now, for testing, assume all functions use MSA + (and thus need alignment). */ +#if 0 + if (!cfun || !TARGET_MSA) +return FALSE; + + for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) +{ + if (MSA_SUPPORTED_MODE_P (GET_MODE (insn))) + return TRUE; +} + + return FALSE; +#else + return TRUE; +#endif +} + +bool +mips_align_stack_p (void) +{ + bool want_alignment = TARGET_ALIGN_STACK &&
Re: CFI directives and dynamic stack alignment
On Tue, 2015-08-11 at 10:05 +0930, Alan Modra wrote: > > The 'and' instruction is where the stack gets aligned and if I remove that > > one instruction, everything works. I think I need to put out some new CFI > > psuedo-ops to handle this but I am not sure what they should be. I am just > > not very familiar with the CFI directives. > > I don't speak mips assembly very well, but it looks to me that you > have more than just CFI problems. How do you restore sp on return > from the function, assuming sp wasn't 16-byte aligned to begin with? > Past that "and $sp,$sp,$3" you don't have any means of calculating > the original value of sp! (Which of course is why you also can't find > a way of representing the frame address.) I have code in expand_prologue that copies the incoming stack pointer to a temporary hard register and then I have code to the entry_block to copy that register into a virtual register. In the exit block that virtual register is copied back to a temporary hard register and expand_epilogue copies it back to $sp to restore the stack pointer. This function (fn2) ends with a call to abort, which is noreturn, so the optimizer sees that the epilogue is dead code and GCC determines that there is no need to save the old stack pointer since it will never get restored. I guess I need to tell GCC to save the stack pointer in expand_prologue even if it never sees a use for it. I guess I need to make the temporary register where I save $sp volatile or do something else so that the assignment (and its associated .cfi) is not deleted by the optimizer. Steve Ellcey sell...@imgtec.com
Adding an IPA pass question (pass names)
I am trying to create a new IPA pass to scan the routines being compiled by GCC and I thought I would put it in after the last IPA pass (comdats) so I tried to register it with: opt_pass *p = make_pass_ipa_frame_header_opt (g); static struct register_pass_info f = {p, "comdats", 1, PASS_POS_INSERT_AFTER }; register_pass (&f); But when I build GCC I get: /scratch/sellcey/repos/header2/src/gcc/libgcc/libgcc2.c:1:0: fatal error: pass 'comdats' not found but is referenced by new pass 'frame-header-opt' Does anyone know why this is the case? "comdats" is what is used for the name of pass_ipa_comdats in ipa-comdats.c. Steve Ellcey sell...@imgtec.com
Re: Adding an IPA pass question (pass names)
On Wed, 2015-08-19 at 13:40 -0400, David Malcolm wrote: > Is your pass of the correct type? (presumably IPA_PASS). I've run into > this a few times with custom passes (which seems to be a "gotcha"); > position_pass can fail here: > > /* Check if the current pass is of the same type as the new pass and > matches the name and the instance number of the reference pass. */ > if (pass->type == new_pass_info->pass->type > > > Hope this is helpful > Dave That seems to have been the problem. I made my pass SIMPLE_IPA_PASS and the comdats pass is just IPA_PASS. I changed mine to IPA_PASS and it now registers the pass. Steve Ellcey sell...@imgtec.com
Re: CFI directives and dynamic stack alignment
On Tue, 2015-08-18 at 09:23 +0930, Alan Modra wrote: > On Mon, Aug 17, 2015 at 10:38:22AM -0700, Steve Ellcey wrote: > OK, then you need to emit a .cfi directive to say the frame top is > given by the temp hard reg sometime after that assignment and before > sp is aligned in the prologue, and another .cfi directive when copying > to the pseudo. It's a while since I looked at the CFI code in gcc, > but arranging this might be as simple as setting RTX_FRAME_RELATED_P > on the insns involved. > > If -fasynchronous-unwind-tables, then you'll also need to track the > frame in the epilogue. > > > This function (fn2) ends with a call to abort, which is noreturn, so the > > optimizer sees that the epilogue is dead code and GCC determines that > > there is no need to save the old stack pointer since it will never get > > restored. I guess I need to tell GCC to save the stack pointer in > > expand_prologue even if it never sees a use for it. I guess I need to > > make the temporary register where I save $sp volatile or do something > > else so that the assignment (and its associated .cfi) is not deleted by > > the optimizer. > > Ah, I see. Yes, the temp and pseudo are not really dead if they are > needed for unwinding. Yes, I was originally thinking I just had to make the temp and pseudo regs volatile so that the assignments would not get removed but it appears that I need the epilogue code too (even if I never get there because of a call to abort which GCC knows is non-returning) so that I have the needed .cfi directives there. I am thinking I should add an edge from the entry_block to the exit_block so that the exit block is never removed by the optimizer. I assume this edge would need to be abnormal and/or fake but I am not sure which (if either) of these edges would be appropriate for this. Steve Ellcey sell...@imgtec.com
fake/abnormal/eh edge question
I have a question about FAKE, EH, and ABNORMAL edges. I am not sure I understand all the implications of each type of edge from the description in cfg-flags.def. I am trying to implement dynamic stack alignment for MIPS and I have code that does the following: prologue copy incoming $sp to $12 (temp reg) align $sp copy $sp to $fp (after alignment so that $fp is also aligned) entry block copy $12 to virtual reg (DRAP) for accessing args and for restoring $sp exit block copy virtual reg (DRAP) back to $12 epilogue copy $12 to $sp to restore stack pointer This works fine as long as there as a path from the entry block to the exit block but in some cases (like gcc.dg/cleanup-8.c) we have a function that always calls abort (a non-returning function) and so there is no path from entry to exit and the exit block and epilogue get removed and the copy of $sp to $12 also gets removed because GCC sees no uses of $12. I want to preserve the copy of $sp to $12 and I also want to preserve the .cfi psuedo-ops (and code) in the exit block and epilogue in order for exception handling to work correctly. One way I thought of doing this is to create an edge from the entry block to the exit block but I am unsure of all the implications of creating a fake/eh/abnormal edge to do this and which I would want to use. Steve Ellcey sell...@imgctec.com
Re: fake/abnormal/eh edge question
On Tue, 2015-08-25 at 14:44 -0600, Jeff Law wrote: > > I want to preserve the copy of $sp to $12 and I also want to preserve the > > .cfi psuedo-ops (and code) in the exit block and epilogue in order for > > exception handling to work correctly. One way I thought of doing this > > is to create an edge from the entry block to the exit block but I am > > unsure of all the implications of creating a fake/eh/abnormal edge to > > do this and which I would want to use. > Presumably it's the RTL DCE pass that's eliminating this stuff? Actually, it looks like is peephole2 that is eliminating the instructions (and .cfi psuedo-ops). > > Do you have the FRAME_RELATED bit set of those insns? > > But what I don't understand is why preserving the code is useful if it > can't be reached. Maybe there's something about the dwarf2 unwinding > that I simply don't understand -- I've managed to avoid learning about > it for years. I am not entirely sure I need the code or if I just need the .cfi psuedo-ops and that I need the code to generate the .cfi stuff. I wish I could avoid the dwarf unwinder but that seems to be the main problem I am having with stack realignment. Getting the cfi stuff right so that the unwinder works properly is proving very hard. Steve Ellcey sell...@imgtec.com
GTY / gengtype question - adding a new header file
I have a question about gengtype and GTY. I was looking at adding some code to mips.c and it occurred to me that that file was getting very large (19873 lines). So I wanted to add a new .c file instead but that file needed some types that were defined in mips.c and not in a header file. Specifically it needed the MIPS specific machine_function structure that is defined in mips.c with: struct GTY(()) machine_function { I think I could just move this to mips.h and things would be fine but I didn't want to do that because mips.h is included in tm.h and is visible to the generic GCC code. Currently machine_function is not visible to the generic GCC code and so I wanted to put machine_function in a header file that could only be seen/used by mips specific code. So I created mips-private.h and added it to extra_headers in config.gcc. The problem is that if I include mips-private.h in mips.c instead of having the actual definition of machine_function in mips.c then my build fails and I think it is due to how and where gengtype scans for GTY uses. I couldn't find an example of a platform that has a machine specific header file that was not visible to the generic GCC code and that has GTY types in it so I am not sure what I need to do to get gengtype to scan mips-private.h or if this is even possible (or wise). Steve Ellcey sell...@imgtec.com
Re: GTY / gengtype question - adding a new header file
On Tue, 2015-09-01 at 08:11 +0100, Richard Sandiford wrote: > config.gcc would need to add mips-private.h to target_gtfiles. OK, that was what I missed. > I'm not sure splitting the file is a good idea though. At the moment > the definitions of all target hooks must be visible to a single TU. > Either you'd need to keep all the hooks in one .c file (leading > to an artificial split IMO) or you'd need declare some of them > in the private header. Declaring them in the header file would only be > consistent if the targetm definition was in its own file (so that _every_ > hook had a prototype in the private header). That seems like unnecessary > work though. The code I want to add is actually a separate GCC pass so it breaks out fairly cleanly. It just needs access to the machine_function structure and the types and structures included in that structure (mips_frame_info, mips_int_mask, and mips_shadow_set). It sets a couple of new boolean variables in the machine_function structure which are then used during mips_compute_frame_info. I see what you mean about much of mips.c probably not being splittable due to the target hook structure but machine specific passes may be the exception to that rule. We already have one pass in mips.c (pass_mips_machine_reorg2), that might be something else that could be broken out, though I haven't looked in detail to see what types or structures it would need access to. Steve Ellcey sell...@imgtec.com
Re: GTY / gengtype question - adding a new header file
On Tue, 2015-09-01 at 10:13 +0200, Georg-Johann Lay wrote: > > I'd have a look at what BEs are using non-default target_gtfiles. > > Johann There are a few BEs that add a .c file to target_gtfiles, but no platforms that add a .h file to target_gtfiles. I do see a number of platforms that define the machine_function structure in their header file (aarch64.h, pa.h, i386.h) instead of their .c file though. Maybe that is a better way to go for MIPS instead of doing something completely new. If I move machine_function, mips_frame_info, mips_int_mask, and mips_shadow_set from mips.c to mips.h then I could put my new machine specific pass in a separate .c file from mips.c and not need to do anything with target_gtfiles. The only reason I didn't want to do this was so that machine_function wasn't visible to the rest of GCC but that doesn't seem to have been an issue for other targets. Steve Ellcey sell...@imgtec.com
Build problem with libgomp on ToT?
I just ran into this build failure last night: /usr/bin/install: cannot create regular file `/scratch/sellcey/repos/nightly/install-mips-mti-linux-gnu/lib/gcc/mips-mti-linux-gnu/6.0.0/finclude/omp_lib_kinds.mod': File exists This is on a parallel make install (-j 7) with multilibs. I don't see an obvious patch that could have caused this new failure, has anyone else run into this? I couldn't find anything in the bug database or in the mailing lists. Steve Ellcey sell...@imgtec.com
TARGET_PROMOTE_PROTOTYPES question
I have a question about the TARGET_PROMOTE_PROTOTYPES macro. This macro says that types like short or char should be promoted to ints when passed as arguments, even if there is a prototype for the argument. Now when I look at the code generated on MIPS or x86 it looks like there is conversion code in both the caller and the callee. For example: int foo(char a, short b) { return a+b; } int bar (int a) { return foo(a,a); } In the rtl expand dump (on MIPS) I see this in bar: (insn 6 3 7 2 (set (reg:SI 200) (sign_extend:SI (subreg:HI (reg/v:SI 199 [ a ]) 2))) x.c:2 -1 (nil)) (insn 7 6 8 2 (set (reg:SI 201) (sign_extend:SI (subreg:QI (reg/v:SI 199 [ a ]) 3))) x.c:2 -1 (nil)) Which insures that we pass the arguments as ints. And in foo we have: (insn 8 9 10 2 (set (reg/v:SI 197 [ a+-3 ]) (sign_extend:SI (subreg:QI (reg:SI 198) 3))) x.c:1 -1 (nil)) (insn 10 8 11 2 (set (reg/v:SI 199 [ b+-2 ]) (sign_extend:SI (subreg:HI (reg:SI 200) 2))) x.c:1 -1 (nil)) Which makes sure we do a truncate/extend before using the values. Now I know that we can't get rid of these truncation/extensions entirely, but do we need both? It seems like foo could say that if the original registers (198 and 200) are argument registers that were extended to SImode due to TARGET_PROMOTE_PROTOTYPES then we don't need to do the truncation/extension in the callee and could just use the SImode values directly. Am I missing something? Or are we doing both just to have belts and suspenders and want to keep it that way? Steve Ellcey sell...@imgtec.com
_Fract types and conversion routines
I have a question about the _Fract types and their conversion routines. If I compile this program: extern void abort (void); int main () { signed char a = -1; _Sat unsigned _Fract b = a; if (b != 0.0ur) abort(); return 0; } with -O0 and on a MIPS32 system where char is 1 byte and unsigned (int) is 4 bytes I see a call to '__satfractqiuhq' for the conversion. Now I think the 'qi' part of the name is for the 'from type' of the conversion, a 1 byte signed type (signed char), and the 'uhq' part is for the 'to' part of the conversion. But 'uhq' would be a 2 byte unsigned fract, and the unsigned fract type on MIPS should be 4 bytes (unsigned int is 4 bytes). So shouldn't GCC have generated a call to __satfractqiusq instead? Or am I confused? Steve Ellcey sell...@imgtec.com
Re: _Fract types and conversion routines
On Wed, 2015-10-28 at 13:42 +0100, Richard Biener wrote: > On Wed, Oct 28, 2015 at 12:23 AM, Steve Ellcey wrote: > > > > I have a question about the _Fract types and their conversion routines. > > If I compile this program: > > > > extern void abort (void); > > int main () > > { > > signed char a = -1; > > _Sat unsigned _Fract b = a; > > if (b != 0.0ur) > > abort(); > > return 0; > > } > > > > with -O0 and on a MIPS32 system where char is 1 byte and unsigned (int) > > is 4 bytes I see a call to '__satfractqiuhq' for the conversion. > > > > Now I think the 'qi' part of the name is for the 'from type' of the > > conversion, a 1 byte signed type (signed char), and the 'uhq' part is > > for the 'to' part of the conversion. But 'uhq' would be a 2 byte > > unsigned fract, and the unsigned fract type on MIPS should be 4 bytes > > (unsigned int is 4 bytes). So shouldn't GCC have generated a call to > > __satfractqiusq instead? Or am I confused? > > did it eventually narrow the comparison? Just check some of the tree/RTL > dumps. > > > Steve Ellcey > > sell...@imgtec.com Hm, it looks like it optimized this in expand. In the last tree dump it still looks like: b_2 = (_Sat unsigned _Fract) a_1; But in the expand phase it becomes: (call_insn/u 13 12 14 2 (parallel [ (set (reg:UHQ 2 $2) (call (mem:SI (symbol_ref:SI ("__satfractqiuhq") [flags 0x41]) [0 S4 A32]) (const_int 16 [0x10]))) (clobber (reg:SI 31 $31)) ]) I think this is a legitimate optimization (though I am compiling at -O0 so I wonder if it should really be doing this). The problem I am looking at is that I want to remove 'TARGET_PROMOTE_PROTOTYPES' because it causing us to promote/sign extend types in the caller and the callee. The MIPS ABI requires it be done in the caller so it should not need to be done in the callee as well See https://gcc.gnu.org/ml/gcc/2015-10/msg00149.html When I ran the testsuite, I got one regression: gcc.dg/fixed-point/convert-sat.c. When looking at that failure I thought the problem might be that I was calling __satfractqiuhq instead of __satfractqiusq, but that does not seem to be the issue. The call to __satfractqiuhq is correct, and the difference that I see when I don't define TARGET_PROMOTE_PROTOTYPES is that the result of __satfractqiuhq is not truncated/sign-extended to UHQ mode inside of __satfractqiuhq. I am looking to see if I need to do something with TARGET_PROMOTE_FUNCTION_MODE to handle _Fract types differently than what default_promote_function_mode_always_promote does. I tried updating PROMOTE_MODE to handle _Fract modes (by promoting UHQ to USQ or SQ) but that caused more failures than before. It seems to be only the return of partial word _Fract types that is causing me a problem. Steve Ellcey sell...@imgtec.com
Re: _Fract types and conversion routines
You can ignore that last email. I think I finally found where the problem is. In the main program: extern void abort (void); int main () { signed char a = -1; _Sat unsigned _Fract b = a; if (b != 0.0ur) abort(); return 0; } If I compile with -O0, I see: li $2,-1 # 0x sb $2,24($fp) lbu $4,24($fp) jal __satfractqiuhq We put -1 in register $2, store the byte, then load the byte as an unsigned char instead of a signed char. When TARGET_PROMOTE_PROTOTYPES was defined it didn't matter because __satfractqiuhq did another sign extend before using the value. When I got rid of TARGET_PROMOTE_PROTOTYPES, that extra sign extend went away and the fact that we are doing a 'lbu' unsigned load instead of a 'lb' signed byte load triggered the bug. Now I just need to find out why we are doing an lbu instead of an lb. Steve Ellcey sell...@imgtec.com
Re: _Fract types and conversion routines
OK, I think I understand what is happening with the MIPS failure when converting 'signed char' to '_Sat unsigned _Fract' after I removed the TARGET_PROMOTE_PROTOTYPES macro. This bug is a combination of two factors, one is that calls to library functions (like __satfractqiuhq) don't necessarily get the right type promotion (specifically with regards to signedness) of their arguments and the other is that __satfractqiuhq doesn't deal with that problem correctly, though I think it is supposed to. Reading emit_library_call_value_1 I see comments like: /* Todo, choose the correct decl type of orgfun. Sadly this information isn't present here, so we default to native calling abi here. */ So I think that when calling a library function like '__satfractqiuhq' which takes a signed char argument or calling a library function like __satfractunsqiuhq which takes an unsigned char argument emit_library_call_value_1 cannot ensure that the right type of extension (signed vs unsigned) is done on the argument when it is put in the argument register. Does this sound like a correct understanding of the limitation in emit_library_call_value_1? I don't see this issue on regular non-library calls, presumably because the compiler has all the information needed to do correct explicit conversions. When I look at the preprocessed __satfractqiuhq code I see: unsigned short _Fract __satfractqiuhq (signed char a) { signed char x = a; low = (short) x; When TARGET_PROMOTE_PROTOTYPES was defined this triggered explicit code truncate/sign extend code that took care of the problem I am seeing but when I removed it, GCC assumed the caller had taken care of the truncate/sign extension and, because this is a library function, that wasn't done correctly and I don't think it can be done correctly because emit_library_call_value_1 doesn't have the necessary information. So should __satfractqiuhq be dealing with the fact that the argument 'a' may not have been sign extend in the correct way? I have tried a few code changes in fixed-bit.c (to no avail) but this code is so heavily macro-ized it is tough to figure out what it should be doing. Steve Ellcey sell...@imgtec.com
Question about PR 48814 and ivopts and post-increment
I have a question involving ivopts and PR 48814, which was a fix for the post increment operation. Prior to the fix for PR 48814, MIPS would generate this loop for strcmp (C code from glibc): $L4: lbu $3,0($4) lbu $2,0($5) addiu $4,$4,1 beq $3,$0,$L7 addiu $5,$5,1# This is a branch delay slot beq $3,$2,$L4 subu$2,$3,$2 # This is a branch delay slot (only used after loop) With the current top-of-tree we now generate: addiu $4,$4,1 $L8: lbu $3,-1($4) addiu $5,$5,1 beq $3,$0,$L7 lbu $2,-1($5) # This is a branch delay slot beq $3,$2,$L8 addiu $4,$4,1# This is a branch delay slot subu$2,$3,$2 # Done only once now after exiting loop. The main problem with the new loop is that the beq comparing $2 and $3 is right before the load of $2 so there can be a delay due to the time that the load takes. The ideal code would probably be: addiu $4,$4,1 $L8: lbu $3,-1($4) lbu $2,0($5) # This is a branch delay slot beq $3,$0,$L7 addiu $5,$5,1 beq $3,$2,$L8 addiu $4,$4,1# This is a branch delay slot subu$2,$3,$2 # Done only once now after exiting loop. Where we load $2 earlier (using a 0 offset instead of a -1 offset) and then do the increment of $5 after using it in the load. The problem is that this isn't something that can just be done in the instruction scheduler because we are changing one of the instructions (to modify the offset) in addition to rearranging them and I don't think the instruction scheduler supports that. It looks like is the ivopts code that decided to increment the registers first and use the -1 offsets in the loads after instead of using 0 offsets and then incrementing the offsets after the loads but I can't figure out how or why ivopts made that decision. Does anyone have any ideas on how I could 'fix' GCC to make it generate the ideal code? Is there some way to do it in the instruction scheduler? Is there some way to modify ivopts to fix this by modifying the cost analysis somehow? Could I (partially) undo the fix for PR 48814? According to the final comment in that bugzilla report the change is really only needed for C11 and that the change does degrade the optimizer so could we go back to the old behaviour for C89/C99? The code in ivopts has changed enough since the patch was applied I couldn't immediately see how to do that in the ToT sources. Steve Ellcey sell...@imgtec.com
Instruction scheduler rewriting instructions?
Can the instruction scheduler actually rewrite instructions? I didn't think so but when I compile some code on MIPS with: -O2 -fno-ivopts -fno-peephole2 -fno-schedule-insns2 I get: $L4: lbu $3,0($4) addiu $4,$4,1 lbu $2,0($5) beq $3,$0,$L7 addiu $5,$5,1 beq $3,$2,$L4 subu$2,$3,$2 When I changed -fno-schedule-insns2 to -fschedule-insns2, I get: $L4: lbu $3,0($4) addiu $5,$5,1 lbu $2,-1($5) beq $3,$0,$L7 addiu $4,$4,1 beq $3,$2,$L4 subu$2,$3,$2 I.e. The addiu of $5 and the load using $5 have been swapped around and the load uses a different offset to compensate. I can't see where in the instruction scheduler that this would happen. Any help? This is on MIPS if that matters, though I didn't see any MIPS specific code for this. This issue is related to my earlier question about PR 48814 and ivopts (thus the -fno-ivopts option). The C code I am looking at is the strcmp function from glibc: int strcmp (const char *p1, const char *p2) { const unsigned char *s1 = (const unsigned char *) p1; const unsigned char *s2 = (const unsigned char *) p2; unsigned char c1, c2; do { c1 = (unsigned char) *s1++; c2 = (unsigned char) *s2++; if (c1 == '\0') return c1 - c2; } while (c1 == c2); return c1 - c2; } Steve Ellcey sell...@imgtec.com
Re: Instruction scheduler rewriting instructions?
On Thu, 2015-12-03 at 19:56 +, Ramana Radhakrishnan wrote: > IIRC it's because the scheduler *thinks* it can get a tighter schedule > - probably because it thinks it can dual issue the lbu from $4 and the > addiu to $5. Can it think so ? This may be related - > https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00155.html > > regards > Ramana No, the system I am tuning for (MIPS 24k) is single issue according to its description. At least I do see now where the instruction is getting rewritten in the instruction scheduler, so that is helpful. I am no longer sure the scheduler is where the problem lies though. If I compile with -O2 -mtune=24kc I get this loop: addiu $4,$4,1 $L8: addiu $5,$5,1 lbu $3,-1($4) beq $3,$0,$L7 lbu $2,-1($5) beq $3,$2,$L8 addiu $4,$4,1 If I use -O2 -fno-ivopts -mtune=24kc I get: lbu $3,0($4) $L8: lbu $2,0($5) addiu $4,$4,1 beq $3,$0,$L7 addiu $5,$5,1 beql$3,$2,$L8 lbu $3,0($4) This second loop is better because there is more time between the loads and where the loaded values are used in the beq instructions. So I think there is something missing or wrong in the cost analysis that ivopts is doing that it decides to do the adds before the loads instead of visa versa. I have tried tweaking the cost of loads in mips_rtx_costs and in the instruction descriptions in 24k.md but that didn't seem to have any affect on the ivopts code. Steve Ellcey sell...@imgtec.com
Re: Question about PR 48814 and ivopts and post-increment
On Fri, 2015-12-04 at 16:22 +0800, Bin.Cheng wrote: > Dump before IVO is as below: > > : > # s1_1 = PHI > # s2_2 = PHI > s1_6 = s1_1 + 1; > c1_8 = *s1_1; > s2_9 = s2_2 + 1; > c2_10 = *s2_2; > if (c1_8 == 0) > goto ; > else > goto ; > > And the iv candidates are as: > candidate 1 (important) > var_before ivtmp.6 > var_after ivtmp.6 > incremented before exit test > type unsigned int > base (unsigned int) p1_4(D) > step 1 > base object (void *) p1_4(D) > candidate 2 (important) > original biv > type const unsigned char * > base (const unsigned char *) p1_4(D) > step 1 > base object (void *) p1_4(D) > candidate 3 (important) > var_before ivtmp.7 > var_after ivtmp.7 > incremented before exit test > type unsigned int > base (unsigned int) p2_5(D) > step 1 > base object (void *) p2_5(D) > candidate 4 (important) > original biv > type const unsigned char * > base (const unsigned char *) p2_5(D) > step 1 > base object (void *) p2_5(D) > > Generally GCC would choose normal candidates {1, 3} and insert > increment before exit condition. This is expected in this case. But > when there is applicable original candidates {2, 4}, GCC would prefer > these in order to achieve better debugging. Also as I suspected, > [reg] and [reg-1] have same address cost on mips, that's why GCC makes > current decision. > > Thanks, > bin Yes, I agree that [reg] and [reg-1] have the same address cost, but using [reg-1] means that the increment of reg happens before the access and that puts the load of [reg-1] closer to the use of the value loaded and that causes a stall. If we used [reg] and incremented it after the load then we would have at least one instruction in between the load and the use and either no stall or a shorter stall. I don't know if ivopts has anyway to do this type of analysis when picking the IV. Steve Ellcey sell...@imgtec.com
libstdc++ / uclibc question
Is anyone building GCC (and libstdc++ specifically) with uclibc? I haven't done this in a while and when I do it now I get this build failure: /scratch/sellcey/repos/uclibc-ng/src/gcc/libstdc++-v3/include/ext/random.tcc: In member function '__gnu_cxx::{anonymous}::uniform_on_sphere_helper<_Dimen, _RealType>::result_type __gnu_cxx::{anonymous}::uniform_on_sphere_helper<_Dimen, _RealType>::operator()(_NormalDistribution&, _UniformRandomNumberGenerator&)': /scratch/sellcey/repos/uclibc-ng/src/gcc/libstdc++-v3/include/ext/random.tcc:1573:44: error: expected unqualified-id before '(' token while (__norm == _RealType(0) || ! std::isfinite(__norm)); I am thinking the issue may be isfinite, but I am not sure. I notice there are some tests like 26_numerics/headers/cmath/c99_classification_macros_c++.cc that are xfailed for uclibc and I wonder if this is a related problem. I could not find any uses of isfinite in other C++ files (except cmath) and the tests that use it are the same ones that are xfailed for uclibc. Steve Ellcey sell...@imgtec.com
Re: __builtin_memcpy and alignment assumptions
On Fri, 2016-01-08 at 12:56 +0100, Richard Biener wrote: > On Fri, Jan 8, 2016 at 12:40 PM, Eric Botcazou > wrote: > >> I think we only assume it if the pointer is actually dereferenced, > >> otherwise > >> it just breaks too much code in the wild. And while memcpy dereferences, > >> it dereferences it through a char * cast, and thus only the minimum > >> alignment is assumed. > > > > Yet the compiler was generating the expected code for Steve's testcase on > > strict-alignment architectures until very recently (GCC 4.5 IIUC) and this > > worked perfectly. Yes, I just checked and I did get the better code in GCC 4.5 and I get the current slower code in GCC 4.6. > Consider > > int a[256]; > int > main() > { > void *p = (char *)a + 1; > void *q = (char *)a + 5; > __builtin_memcpy (p, q, 4); > return 0; > } > > where the ME would be entitled to "drop" the char */void * conversions > and use &a typed temps. I am not sure how this works but I tweaked get_pointer_alignment_1 so that if there was no align info or if get_ptr_info_alignment returned false then the routine would return type based alignment information instead of default 'void *' alignment. In that case and using your example, GCC still accessed p & q as pointers to unaligned data. In fact if I used int pointers: int a[256]; int main() { int *p = (int *)((char *)a + 1); int *q = (int *)((char *)a + 5); __builtin_memcpy (p, q, 4); return 0; } GCC did unaligned accesses when optimizing, but when unoptimized (and with my change) GCC did aligned accesses, which would not work on a strict alignment machine like MIPS This seems to match what happens with: int a[256]; int main() { int *p = (int *)((char *)a + 1); int *q = (int *)((char *)a + 5); *p = *q; return 0; } When I optimize it, GCC does unaligned accesses and when unoptimized GCC does aligned accesses which will not work on MIPS. Steve Ellcey sell...@imgtec.com
GCC compat testing and simulator question
I have a question about the compatibility tests (gcc.dg/compat and g++.dg/compat). Do they work with remote/simulator testing? I was trying to run them with qemu and even though I am setting ALT_CC_UNDER_TEST and ALT_CXX_UNDER_TEST it doesn't look like my alternative compiler is ever getting run. The README.compat file contains a line about 'make sure they work for testing with a simulator' does that mean they are known not to work with cross-testing and using a simulator? I don't get any errors or warnings, and tests are being compiled with GCC and run under qemu but it doesn't look like the second compiler is ever run to compile anything. I am using the multi-sim dejagnu board. Steve Ellcey sell...@imgtec.com
Re: glibc test tst-thread_local1.cc fails to compile with latest GCC
On Fri, 2016-10-21 at 17:03 +0100, Jonathan Wakely wrote: > > > Is there some C++ standard change that I am not aware of or some > > other header file I need to include? > No, what probably happened is GCC didn't detect a usable Pthreads > implementation and so doesn't define std::thread. The header > uses this condition around the definition of std::thread: > > #if defined(_GLIBCXX_HAS_GTHREADS) && > defined(_GLIBCXX_USE_C99_STDINT_TR1) Yes, I finally realized I had built a GCC with '--enable-threads=no' and was using that GCC to build GLIBC. Once I rebuilt GCC with threads I could build GLIBC and not get this error. Steve Ellcey
Question about PR preprocessor/60723
I am trying to understand the status of this bug and the patch that fixes it. It looks like a patch was submitted and checked in for 5.0 to fix the problem reported and I see the new behavior caused by the patch in GCC 5.X compilers. This behavior caused a number of issues with configures and scripts that examined preprocessed output as is mentioned in the bug report for PR 60723. There was a later bug, 64864, complaining about the behavior and that was closed as invalid. But when I look at GCC 6.X or ToT compilers I do not see the same behavior as 5.X. Was this patch reverted or was a new patch submitted that undid some of this patches behavior? I couldn't find any revert or new patch to replace the original one so I am not sure when or why the code changed back after the 5.X releases. Here is a test case that I am preprocessing with g++ -E: #include class foo { void operator= ( bool bit); operator bool() const; }; GCC 5.4 breaks up the operator delcarations with line markers and GCC 6.2 does not. Steve Ellcey sell...@caviumnetworks.com
Multilib build question involving MULTILIB_OSDIRNAMES
I have a multilib question that I hope someone can help me with. If I have this multilib setup while building a cross compiler: MULTILIB_DEFAULTS { "mips32r2" } MULTILIB_OPTIONS = mips32r2/mips64r2 MULTILIB_OSDIRNAMES = ../lib ../lib64 Everything works the way I want it to. I have mips32r2 system libraries in /lib under my sysroot and mips64r2 system libraries in /lib64 and everything seems fine. Now I want to make mips64r2 the default compilation mode for GCC but I want to keep my sysroot setup (/lib for mips32r2 and /lib64 for mips64r2) the same. So I change MULTILIB_DEFAULTS to specify "mips64r2" and rebuild. When I do this, a default build (targeting mips64r2) searches for system libraries in /lib instead of /lib64. Is there a way to fix this without having to put mips64r2 system libraries in /lib? Is this the expected behaviour or is this a bug in handling MULTILIB_OSDIRNAMES? Steve Ellcey sell...@mips.com
Where does GCC pick passes for different opt. levels
I have a basic question about optimization selection in GCC. There used to be some code in GCC (passes.c?) that would set various optimize pass flags depending on if the 'optimize' flag was > 0, > 1, or > 2; later I think there may have been a table. This code seems gone now and I can't figure out how GCC is selecting what optimization passes to run at what optimization levels (-O1 vs. -O2 vs. -O3). How is this handled in the top-of-tree GCC code? I see passes.def but there doesn't seem to be anything in there to tie specific passes to specific optimization levels. Likewise in common.opt I see flags for various optimization passes but nothing to tie them to -O1 or -O2, etc. I'm probably missing something obvious, but a pointer would be much appreciated. Steve Ellcey
Re: Where does GCC pick passes for different opt. levels
> default_options_table in opts.c. Thanks Andrew and Marc, I knew it would be obvious once I saw it. Steve
ICE in bitmap routines with LRA and inline assembly language
I was wondering if anyone has seen this bug involving LRA and inline assembly code. On MIPS, I am getting the attached ICE. Somehow the 'first' pointer in the live_reload_and_inheritance_pseudos bitmap structure is either getting clobbered or is not being correctly initialized to begin with. I am not sure which yet. Steve Ellcey sell...@mips.com % cat x.c int NoBarrier_AtomicIncrement(volatile int* ptr, int increment) { int temp, temp2; __asm__ __volatile__(".set push\n" ".set noreorder\n" "1:\n" "ll %0, 0(%3)\n" "addu %1, %0, %2\n" "sc %1, 0(%3)\n" "beqz %1, 1b\n" "addu %1, %0, %2\n" ".set pop\n" : "=&r" (temp), "=&r" (temp2) : "Ir" (increment), "r" (ptr) : "memory"); return temp2; } % mips-mti-linux-gnu-gcc -O1 -c x.c x.c: In function 'NoBarrier_AtomicIncrement': x.c:16:1: internal compiler error: Segmentation fault } ^ 0x9b199f crash_signal /scratch/sellcey/nightly/src/gcc/gcc/toplev.c:339 0x5d3950 bitmap_element_link /scratch/sellcey/nightly/src/gcc/gcc/bitmap.c:456 0x5d3950 bitmap_set_bit(bitmap_head*, int) /scratch/sellcey/nightly/src/gcc/gcc/bitmap.c:673 0x87c370 init_live_reload_and_inheritance_pseudos /scratch/sellcey/nightly/src/gcc/gcc/lra-assigns.c:413 0x87c370 lra_assign() /scratch/sellcey/nightly/src/gcc/gcc/lra-assigns.c:1499 0x877966 lra(_IO_FILE*) /scratch/sellcey/nightly/src/gcc/gcc/lra.c:2236 0x8337de do_reload /scratch/sellcey/nightly/src/gcc/gcc/ira.c:5311 0x8337de execute /scratch/sellcey/nightly/src/gcc/gcc/ira.c:5470 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
dejagnu testsuite bug?
I was looking through my 'make check' output (from a MIPS cross compiler) and saw this error. Has anyone else run into something like this? I am not entirely sure where to start looking for this problem and I am also not sure if this is a new problem or not. Normally I just grep for FAIL and don't examine the testing output that closely. I see the 'usual' C and C++ faliiures after this error and the rest of the testsuite seems to run fine. Steve Ellcey sell...@mips.com Test Run By sellcey on Fri Sep 5 03:08:58 2014 Native configuration is x86_64-unknown-linux-gnu === tests === Schedule of variations: multi-sim Running target multi-sim Using /scratch/sellcey/nightly/src/gcc/dejagnu/testsuite/../config/base-config.exp as tool-and-target-specific interface file. Using /scratch/sellcey/nightly/src/gcc/./dejagnu/baseboards/multi-sim.exp as board description file for target. Using /scratch/sellcey/nightly/src/gcc/./dejagnu/config/sim.exp as generic interface file for target. Using /scratch/sellcey/nightly/src/gcc/./dejagnu/baseboards/basic-sim.exp as board description file for target. Using /scratch/sellcey/nightly/src/gcc/gcc/testsuite/config/default.exp as tool-and-target-specific interface file. Using /scratch/sellcey/nightly/src/gcc/dejagnu/testsuite/config/default.exp as tool-and-target-specific interface file. Running /scratch/sellcey/nightly/src/gcc/dejagnu/testsuite/libdejagnu/tunit.exp ... send: spawn id exp0 not open while executing "send_user -- "$message\n"" ("default" arm line 2) invoked from within "switch -glob "$firstword" { "PASS:" - "XFAIL:" - "KFAIL:" - "UNRESOLVED:" - "UNSUPPORTED:" - "UNTESTED:" { if {$all_flag} { send_user -- ..." (procedure "clone_output" line 10) invoked from within "clone_output "Running $test_file_name ..."" (procedure "runtest" line 7) invoked from within "runtest $test_name" ("foreach" body line 42) invoked from within "foreach test_name [lsort [find ${dir} *.exp]] { if { ${test_name} == "" } { continue } # Ignore this one if asked to. if { ${ignore..." ("foreach" body line 54) invoked from within "foreach dir "${test_top_dirs}" { if { ${dir} != ${srcdir} } { # Ignore this directory if is a directory to be # ignored. if {[info..." ("foreach" body line 121) invoked from within "foreach pass $multipass { # multipass_name is set for `record_test' to use (see framework.exp). if { [lindex $pass 0] != "" } { set multipass_..." ("foreach" body line 51) invoked from within "foreach current_target $target_list { verbose "target is $current_target" set current_target_name $current_target set tlist [split $curren..." (file "/scratch/sellcey/nightly/src/gcc/./dejagnu/runtest.exp" line 1627)^M make[3]: *** [check-DEJAGNU] Error 1 make[3]: Leaving directory `/scratch/sellcey/nightly/obj-mips-mti-linux-gnu/gcc/dejagnu' make[2]: *** [check-am] Error 2 make[2]: Target `check' not remade because of errors. make[2]: Leaving directory `/scratch/sellcey/nightly/obj-mips-mti-linux-gnu/gcc/dejagnu' make[1]: *** [check-dejagnu] Error 2
MULTILIB_OSDIRNAMES mapping question
I have a question about MULTILIB_OSDIRNAMES and about specifying a mapping in this variable. According to fragments.texi: When it is a set of mappings of the form @var{gccdir}=@var{osdir}, the left side gives the GCC convention and the right gives the equivalent OS defined location. But when I try this it doesn't seem to work for me and if I am reading the config/i386/t-linux64 file correctly it looks like instead of having a mapping from one of the MULTILIB_DIRNAMES entries, which is what I expected, it seems to map from the MULTILIB_OPTIONS instead. I.e. The @var{gccdir} entries in config/i386/t-linux64 start with an 'm' like the options do, but which is not part of the GCC directory names. So is the documentation wrong, or am I misreading it, or is the code wrong? I would actually like the code to match the existing documentation because on mips the ABI options contain equal signs (-mabi=32, -mabi=64) so it would be hard/confusing to map an option to a directory when the option itself contains an equal sign. Steve Ellcey sell...@mips.com
fast-math optimization question
I have a -ffast-math (missing?) optimization question. I noticed on MIPS that if I compiled: #include extern x; void foo() { x = sin(log(x)); } GCC will extend 'x' to double precision, call the double precision log and sin functions and then truncate the result to single precision. If instead, I have: #include extern x; void foo() { x = log(x); x = sin(x); } Then GCC will call the single precision log and sin functions and not do any extensions or truncations. In addition to avoiding the extend/trunc instructions the single precision log and sin functions are presumably faster then the double precision ones making the entire code much faster. Is there a reason why GCC couldn't (under -ffast-math) call the single precision routines for the first case? Steve Ellcey sell...@mips.com
Re: fast-math optimization question
On Thu, 2014-10-09 at 11:27 -0700, Andrew Pinski wrote: > > Is there a reason why GCC couldn't (under -ffast-math) call the single > > precision routines for the first case? > > There is no reason why it could not. The reason why it does not > currently is because there is no pass which does the demotion and the > only case of demotion that happens is with a simple > (float)function((double)float_val); > > Thanks, > Andrew Do you know which pass does the simple '(float)function((double)float_val)' demotion? Maybe that would be a good place to extend things. Steve Ellcey
Re: fast-math optimization question
On Thu, 2014-10-09 at 19:50 +, Joseph S. Myers wrote: > On Thu, 9 Oct 2014, Steve Ellcey wrote: > > > Do you know which pass does the simple > > '(float)function((double)float_val)' demotion? Maybe that would be a > > good place to extend things. > > convert.c does such transformations. Maybe the transformations in there > could move to the match-and-simplify infrastructure - convert.c is not a > particularly good place for optimization, and having similar > transformations scattered around (fold-const, convert.c, front ends, SSA > optimizers) isn't helpful; hopefully match-and-simplify will allow some > unification of this sort of optimization. I did a quick and dirty experiment with the match-and-simplify branch just to get an idea of what it might be like. The branch built for MIPS right out of the box so that was great and I added a couple of rules (see below) just to see if it would trigger the optimization I wanted and it did. I was impressed with the match-and-simplify infrastructure, it seemed to work quite well. Will this branch be included in GCC 5.0? Steve Ellcey sell...@mips.com Code added to match-builtin.pd: (if (flag_unsafe_math_optimizations) /* Optimize "(float) expN(x)" [where x is type double] to "expNf((float) x)", i.e. call the 'f' single precision func */ (simplify (convert (BUILT_IN_LOG @0)) (if ((TYPE_MODE (type) == SFmode) && (TYPE_MODE (TREE_TYPE (@0)) == DFmode)) (BUILT_IN_LOGF (convert @0 ) (if (flag_unsafe_math_optimizations) /* Optimize "(float) expN(x)" [where x is type double] to "expNf((float) x)", i.e. call the 'f' single precision func */ (simplify (convert (BUILT_IN_SIN @0)) (if ((TYPE_MODE (type) == SFmode) && (TYPE_MODE (TREE_TYPE (@0)) == DFmode)) (BUILT_IN_SINF (convert @0 )
Cross compiling and multiple sysroot question
(Reposting from gcc-help since I didn't get any replies there.) I have a question about SYSROOT_SUFFIX_SPEC, MULTILIB_OSDIRNAMES, and multilib cross compilers. I was experimenting with a multilib cross compiler and was using SYSROOT_SUFFIX_SPEC to specify different sysroots for different multilibs, including big-endian and little-endian with 32 and 64 bits. Now lets say I create two sysroots: sysroot/be with a bin, lib, lib64, etc. directories sysroot/le with the same set of directories These would represent the sysroot of either a 64 bit big-endian or a 64 bit little-endian linux system that could also run 32 bit executables. I want my cross compiler to be able to generate code for either system. So I set these macros and SPECs: # m32 and be are defaults MULTILIB_OPTIONS = m64 mel # In makefile fragment MULTILIB_DIRNAMES = 64 el # In makefile fragment MULTILIB_OSDIRNAMES = m64=../lib64 # In makefile fragment SYSROOT_SUFFIX_SPEC = %{mel:/el;:/eb} # in header file What seems to be happening is that the search for system libraries like libc.so work fine. It looks in sysroot/be/lib or sysroot/be/lib64 or in the equivalent little-endian directories. I.e. it searches: /lib # 32 bits /lib/../lib64 # 64 bits But when it looks for libgcc_s.so or libstdc++.so it is searching: //lib# 32 bits //lib/../lib64 # 64 bits It does not take into account SYSROOT_SUFFIX_SPEC. In fact when I do my build with this setup the little-endian libgcc_s.so files wind up overwriting the big-endian libgcc_s.so files so two of my libgcc_s.so files are completely missing from the install area. Shouldn't SYSROOT_SUFFIX_SPEC be used for the gcc shared libraries as well as the sysroot areas? I.e. install and search for libgcc_s.so.1 in: /lib # 32 bits /lib/../lib64 # 64 bits Steve Ellcey sell...@imgtec.com
Re: Cross compiling and multiple sysroot question
On Thu, 2015-01-08 at 22:12 +, Joseph Myers wrote: > On Thu, 8 Jan 2015, Steve Ellcey wrote: > > > So I set these macros and SPECs: > > # m32 and be are defaults > > MULTILIB_OPTIONS = m64 mel # In makefile fragment > > MULTILIB_DIRNAMES = 64 el # In makefile fragment > > MULTILIB_OSDIRNAMES = m64=../lib64 # In makefile fragment > > In my experience, for such cases it's best to list all multilibs > explicitly in MULTILIB_OSDIRNAMES, and then to specify > STARTFILE_PREFIX_SPEC as well along the lines of: > > #define STARTFILE_PREFIX_SPEC \ > "%{mabi=32: /usr/local/lib/ /lib/ /usr/lib/} \ >%{mabi=n32: /usr/local/lib32/ /lib32/ /usr/lib32/} \ >%{mabi=64: /usr/local/lib64/ /lib64/ /usr/lib64/}" Thanks for the help Joseph, this combination worked and I was able to build a working GCC using this setup. > GCC never installs anything inside the sysroot (it could be a read-only > mount of the target's root filesystem, for example). Listing all > multilibs explicitly (multilib=dir or multilib=!dir) in > MULTILIB_OSDIRNAMES allows you to ensure they don't overwrite each other. GCC never installs anything inside sysroot's but some tools that people have developed to build cross compiler toolchains copy the shared libgcc libraries (libgcc_s, libstdc++, etc) from the GCC install area into sysroot as part of the build of a cross compiler toolchain. I was wondering if I could use the explicit list of MULTILIB_OSDIRNAMES entries to layout those libraries in a way that would make it easy to copy them into a sysroot if I wanted to. The only thing I am not sure about if there is a way to specify where I want the default (no option) libraries to go. I.e. I can use: MULTILIB_OSDIRNAMES += mips64r2=mipsr2/lib32 MULTILIB_OSDIRNAMES += mips64r2/mabi.64=mipsr2/lib64 To create a mipsr2/lib32 and mipsr2/lib64 directory under /lib for libgcc_s but I would like the default libraries in /lib/mipsr2/lib instead of directly in /lib. That way I could use a single copy to put all of /lib/mipsr2 into my sysroot. Do you know if either of these would work: MULTILIB_OSDIRNAMES += mips32r2=mipsr2/lib MULTILIB_OSDIRNAMES += .=mipsr2/lib I don't think the first one would work because -mips32r2 is the default architecture and is not explicitly listed in MULTILIB_OPTIONS and I don't think the second form is supported at all, but maybe there is some other way to specify the location of the default libraries? Steve Ellcey sell...@imgtec.com
RE: Cross compiling and multiple sysroot question
On Mon, 2015-01-12 at 20:58 +, Joseph Myers wrote: > On Mon, 12 Jan 2015, Matthew Fortune wrote: > > > MIPS does this too for mips64-linux-gnu as it has n32 for the default > > multilib which gets placed in lib32. I don't honestly know how the multilib > > spec doesn't end up building 4 multilibs though. I'm assuming the fact > > that the default ABI is added to the DRIVER_SELF_SPECS may be the reason. > > I suspect MULTILIB_DEFAULTS is relevant. The problem I ran into with MULTILIB_DEFAULTS is that if you have: MULTILIB_DEFAULTS = { mips32r2 } MULTILIB_OPTIONS = mips32r2/mips64r2 mabi=64 EL If you try to use: MULTILIB_EXCEPTIONS = mips32r2/mabi=64* It doesn't work. The mips32r2 option seems to be stripped off before MULTILIB_EXCEPTIONS is applied. You need to use this instead: MULTILIB_EXCEPTIONS = mabi=64* Which is the same as you would use if you didn't specify mips32r2 in MULTILIB_OPTIONS at all. I expect MULTILIB_OSDIRNAMES to work the same way and ignore any mapping entries with the mips32r2 option but maybe I am wrong (I'm still testing it out). Steve Ellcey sell...@imgtec.com
libcc1.so bug/install location question
I have a question about libcc1.so and where it is put in the install directory. My understanding is that GCC install files are put in a directory containing the target name or contain the target name as part of the filename (aka mips-linux-gnu-gcc) so that two GCC's with different targets could be installed into the same installation directory and not stomp on each other. I tried this, building cross compilers for mips-mti-linux-gnu and mips-img-linux-gnu and checked to see if any files overlapped between the two. The only overlap I found was with libcc1. Both cross compilers had a lib directory directly under the install directory that contained a libcc1.so, libcc1.so.0, libcc1.so.0.0.0, and libcc1.la file in them. The files in each install directory were different which makes sense since I was building for two different targets. Is this overlap of names intended or is it a bug? Steve Ellcey
Re: Slow gcc.gnu.org/sourceware.org?
On Tue, 2015-01-27 at 08:02 -0800, H.J. Lu wrote: > For the past couple days, gcc.gnu.org/sourceware.org is > quite slow for me when accessing git and bugzilla. Am > I the only one who has experienced it? I got some timeouts while updating my glibc git repo yesterday. I had never run into that before. Steve Ellcey sell...@imgtec.com
Re: Slow gcc.gnu.org/sourceware.org?
On Tue, 2015-01-27 at 09:36 -0700, Jeff Law wrote: > On 01/27/15 09:20, Steve Ellcey wrote: > > On Tue, 2015-01-27 at 08:02 -0800, H.J. Lu wrote: > >> For the past couple days, gcc.gnu.org/sourceware.org is > >> quite slow for me when accessing git and bugzilla. Am > >> I the only one who has experienced it? > > > > I got some timeouts while updating my glibc git repo yesterday. > > I had never run into that before. > Are you using anonymous mode, or ssh-authenticated? The former is > usually throttled as the load rises, the latter is not. > > jeff I was using anonymous mode. Steve Ellcey
unfused fma question
I have a question about *unfused* fma instructions. MIPS has processors with both fused and unfused multiple and add instructions and for fused madd's it is clear what to do; define 'fma' instructions in the md file and let convert_mult_to_fma decide whether or not to use them. But for non-fused multiply and adds, it is less clear. One could define '*madd' instructions with the plus and mult operator and let the peephole optimizer convert normal expressions that have these operators into (unfused) instructions. This is what MIPS currently does. Or one could change convert_mult_to_fma to add a check if fma is fused vs. non-fused in addition to the check for the flag_fp_contract_mode in order to decide whether to convert expressions into an fma and then define fma instructions in the md file. I was wondering if anyone had an opinion about the advantages or disadvantages of these two approaches. Steve Ellcey sell...@imgtec.com
RE: unfused fma question
On Sun, 2015-02-22 at 10:30 -0800, Matthew Fortune wrote: > Steve Ellcey writes: > > Or one could change convert_mult_to_fma to add a check if fma is fused > > vs. non-fused in addition to the check for the flag_fp_contract_mode > > in order to decide whether to convert expressions into an fma and then > > define fma instructions in the md file. > > I was about to say that I see no reason to change how non-fused multiply > adds work i.e. leave them to pattern matching but I think your point was > that when both fused and non-fused patterns are available then what > should we do. No, I am thinking about the case where there are only non-fused multiply add instructions available. To make sure I am using the right terminology, I am using a non-fused multiply-add to mean a single fma instruction that does '(a + (b * c))' but which rounds the result of '(b * c)' before adding it to 'a' so that there is no difference in the results between using this instruction and using individual add and mult instructions. My understanding is that this is how the mips32r2 madd instruction works. In this case there seems to be two ways to have GCC generate the fma instruction. One is the current method using combine_instructions with an instruction defined as: (define_insn "*madd" (set (0) (plus (mult (1) (2 "madd.\t%0,%3,%1,%2" The other way would be to extend the convert_mult_to_fma so that instead of: if (FLOAT_TYPE_P (type) && flag_fp_contract_mode == FP_CONTRACT_OFF) return false it has something like: if (FLOAT_TYPE_P (type) && (flag_fp_contract_mode == FP_CONTRACT_OFF) && !targetm.fma_does_rounding)) return false And then define an instruction like: (define_insn "fma" (set (0) (fma (1) (2) (3" madd.\t%0,%3,%1,%2" The question I have is whether one or the other of these two approaches would be better at creating fma instructions (vs leaving mult/add combinations) or be might be preferable for some other reason. Steve Ellcey sell...@imgtec.com
LRA spill/fill memory alignment question
I have a question about spilling variables and alignment requirements. There is currently code that allows one to declare local variables with an alignment that is greater than MAX_STACK_ALIGNMENT. In that case expand_stack_vars calls allocate_dynamic_stack_space to create a pointer to properly aligned stack space. (There is actually a bug in this code, PR 65315, but I have submitted a patch.) But there does not seem to be any way to do spills and fills into memory that has an alignment requirement greater than MAX_STACK_ALIGNMENT. Is that correct? I am looking at MIPS using the LRA allocator. I was hoping there was some way to spill 16 byte registers into a 16 byte aligned spill slot even if the MAX_STACK_ALIGNMENT is 8 bytes. I know x86 has some platform specific code to dynamically increase the stack alignment and I think that is how they handle this situation but I don't see any other platforms using that technique and I was wondering if there is any more generalized method for spilling registers to memory with an alignment requirement greater than MAX_STACK_ALIGNMENT. Steve Ellcey sell...@imgtec.com
Questions about dynamic stack realignment
This email is a follow-up to some earlier email I sent about alignment of spills and fills but did not get any replies to. https://gcc.gnu.org/ml/gcc/2015-03/msg00028.html After looking into that I have decided to look more into dynamically realigning the stack so that my spills and fills would be aligned and I have done some experiments with stack realignment and I am trying to understand what hooks already exist and how to use them. Currently mips just has: #define STACK_BOUNDARY (TARGET_NEWABI ? 128 : 64) I added: #define MAX_STACK_ALIGNMENT 128 #define PREFERRED_STACK_BOUNDARY (TARGET_MSA ? 128 : STACK_BOUNDARY) #define INCOMING_STACK_BOUNDARY STACK_BOUNDARY To try and get GCC to realign the stack to 128 bits if we are compiling with the -mmsa option. After doing this I found I needed to create a TARGET_GET_DRAP_RTX that would return a register rtx when a drap was needed so I did that and I got things to compile but I don't see any code that actually realigned the stack. It is not clear to me from the documentation if there is shared code somewhere that should be trying to realign the stack by changing the stack pointer given these definitions or if I also need to add my own code to exand_prologue to do the stack realignment myself. I am also not sure if I understand the drap (Dynamic Realign Argument Pointer) register functionality correctly. My guess/understanding was that the drap was used to access arguments in cases where the regular stack pointer may have been changed in order to be aligned. Is that correct? Any help/advice on how the hooks for dynamically realigned stack are supposed to all work together would be appreciated. Steve Ellcey