Re: Why auto variables NOT overlap on stack?
There's another funny thing about gcc3 behavior which I've just discovered: $ gcc -v 2>&1 | grep version gcc version 3.4.2 $ gcc -o mem mem.c ; ./mem -1024 $ gcc -o mem1 mem1.c ; ./mem1 0 $ cat mem.c #include int main() { char *p1, *p2; { char a[1024]; p1 = a; } { char a[1024]; p2 = a; } printf("%d\n", p2 - p1); return 0; } $ cat mem1.c #include static const int N = 1024; int main() { char *p1, *p2; { char a[N]; p1 = a; } { char a[N]; p2 = a; } printf("%d\n", p2 - p1); return 0; } Alexey
Re: AC_CHECK_DECLS(basename) (Was: Re: Ping: patches required for --enable-build-with-cxx)
I'm adding autoc...@gnu.org to the destinations, since this is a pretty fundamental problem with AC_CHECK_DECL and C++ On Tue, Feb 9, 2010 at 02:17, Joern Rennecke wrote: > >> On 02/08/2010 09:58 AM, Joern Rennecke wrote: >>> >>> That would only work if every program that uses libiberty uses >>> AC_SYSTEM_EXTENSIONS . >> >> GCC does, gdb (I think, I don't have it checked out) does and nothing >> else uses basename anyway (they use lbasename). If problems come up, >> other users can be patched to use AC_USE_SYSTEM_EXTENSIONS. > > I've tried going down that route, and it turned out that my original patch > only > worked due to a typo. The _GNU_SOURCE inconsistency is a red herring. > > The real problem is that when libcpp is configured, it is configured with > g++ > as the compiler, and the test for a basename declaration fails because > basename is declared in an overloaded way - a const and a non-const > variant - while the test code has: > | int > | main () > | { > | #ifndef basename > | (void) basename; > | #endif > | > | ; > | return 0; > | } > > so g++ complains: > > conftest.cpp: In function 'int main()': > conftest.cpp:78:10: error: void cast cannot resolve address of overloaded > function > > and configure mistakenly assumes that no basename declaration exists. > Thus, when libiberty is included, it 'helpfully' provides another > declaration > for basename, which makes the build fail. > > So, AC_CHECK_DECLS as it is now simply cannot work when configuring with > g++ as compiler for any function that has overloaded declarations. In > order to do a valid positive check, we'd have to use a valid function > signature - which means we have to know a valid function signature first, > which would be specific to the function. > > If we know such a signature, we can use #ifdef __cplusplus to compile > a function call in this case. A C++ compiler should give an error if > the function was not declared. > > We could soup up AC_CHECK_DECLS to know all the standard functions by name, > or at least the overloaded ones - but I'm not sure such a complex solution > will really save time in the long term. Paolo
Re: AC_CHECK_DECLS(basename) (Was: Re: Ping: patches required for --enable-build-with-cxx)
Quoting Paolo Bonzini : I'm adding autoc...@gnu.org to the destinations, since this is a pretty fundamental problem with AC_CHECK_DECL and C++ I've whipped up a patch with a modified version of AC_CHECK_DECLS - I've called it AC_CHECK_PROTOS - that can optionally have argument types for a function (without spaces). Then I've used this new macro in the libcpp configure.ac to replace AC_CHECK_DECLS. This bootstrapped and regtested fine on i686-pc-linux-gnu , and it now also bootstraps with '--enable-build-with-cxx (although the regtest results are still affected by PR testsuite/42843). As I've seen you re-wrote the AC_CHECK_DECLS from 2.63 to 2.64 to save on configure script size, one design criterion was to avoid unnecessary passing of extra arguments outside of the shell function. However, I wonder if there is a better way to do the string processing - I only do autoconf hacking sporadically, and my code looks somewhat different from the original style. 2010-02-09 Joern Rennecke libcpp: * aclocal (_AC_CHECK_PROTO_BODY): New shell function. (AC_CHECK_PROTO, _AC_CHECK_PROTOS, AC_CHECK_PROTOS): New macros. * configure.ac: Use AC_CHECK_PROTOS instead of AC_CHECK_DECLS. * configure: Regenerate. Index: libcpp/configure.ac === --- libcpp/configure.ac (revision 156598) +++ libcpp/configure.ac (working copy) @@ -81,8 +81,8 @@ define(libcpp_UNLOCKED_FUNCS, clearerr_u fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked dnl putchar_unlocked putc_unlocked) AC_CHECK_FUNCS(libcpp_UNLOCKED_FUNCS) -AC_CHECK_DECLS(m4_split(m4_normalize(abort asprintf basename errno getopt \ - libcpp_UNLOCKED_FUNCS vasprintf))) +AC_CHECK_PROTOS(m4_split(m4_normalize(abort asprintf basename(char*) errno \ + getopt libcpp_UNLOCKED_FUNCS vasprintf))) # Checks for library functions. AC_FUNC_ALLOCA Index: libcpp/aclocal.m4 === --- libcpp/aclocal.m4 (revision 156598) +++ libcpp/aclocal.m4 (working copy) @@ -22,3 +22,75 @@ m4_include([../config/lib-link.m4]) m4_include([../config/lib-prefix.m4]) m4_include([../config/override.m4]) m4_include([../config/warnings.m4]) + +## ## +## Checking for declared symbols. ## +## This is like *AC_CHECK_DECL*, except that for c++, we may use a ## +## prototype to check for a (possibly overloaded) function. ## +## ## + + +# _AC_CHECK_PROTO_BODY +# --- +# Shell function body for AC_CHECK_PROTO. +m4_define([_AC_CHECK_PROTO_BODY], +[ AS_LINENO_PUSH([$[]1]) + [as_decl_name=`echo $][2|sed 's/(.*//'`] + [as_decl_use=`echo $][2|sed -e 's/(/((/' -e 's/\()\|,\)/) 0&/'`] + AC_CACHE_CHECK([whether $as_decl_name is declared], [$[]3], + [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([$[]4], +...@%:@ifndef $[]as_decl_name +...@%:@ifdef __cplusplus + $[]as_decl_type + (void) $[]as_decl_use; +...@%:@else + (void) $[]as_decl_name; +...@%:@endif +...@%:@endif +])], + [AS_VAR_SET([$[]3], [yes])], + [AS_VAR_SET([$[]3], [no])])]) + AS_LINENO_POP +])# _AC_CHECK_PROTO_BODY + +# AC_CHECK_PROTO(SYMBOL, +# [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND], +# [INCLUDES = DEFAULT-INCLUDES]) +# --- +# Check whether SYMBOL (a function, variable, or constant) is declared. +AC_DEFUN([AC_CHECK_PROTO], +[AC_REQUIRE_SHELL_FN([ac_fn_]_AC_LANG_ABBREV[_check_proto], + [AS_FUNCTION_DESCRIBE([ac_fn_]_AC_LANG_ABBREV[_check_proto], +[LINENO SYMBOL VAR], +[Tests whether SYMBOL is declared, setting cache variable VAR accordingly.])], + [_$0_BODY])]dnl +[AS_VAR_PUSHDEF([ac_Symbol], [ac_cv_have_decl_$1])]dnl +[ac_fn_[]_AC_LANG_ABBREV[]_check_proto ]dnl +["$LINENO" "$1" "ac_Symbol" "AS_ESCAPE([AC_INCLUDES_DEFAULT([$4])], [""])" +AS_VAR_IF([ac_Symbol], [yes], [$2], [$3]) +AS_VAR_POPDEF([ac_Symbol])dnl +])# AC_CHECK_PROTO + + +# _AC_CHECK_PROTOS(SYMBOL, ACTION-IF_FOUND, ACTION-IF-NOT-FOUND, +# INCLUDES) +# - +# Helper to AC_CHECK_PROTOS, which generates the check for a single +# SYMBOL with INCLUDES, performs the AC_DEFINE, then expands +# ACTION-IF-FOUND or ACTION-IF-NOT-FOUND. +m4_define([_AC_CHECK_PROTOS], +[AC_CHECK_PROTO([$1], [ac_have_decl=1], [ac_have_decl=0], [$4])]dnl +[AC_DEFINE_UNQUOTED(AS_TR_CPP(patsubst(HAVE_DECL_[$1],[(.*])), [$ac_have_decl], + [Define to 1 if you have the declaration of `$1', + and to 0 if you don't.])]dnl +[m4_ifvaln([$2$3], [AS_IF([test $ac_have_decl = 1], [$2], [$3])])]) + +# AC_CHECK_PROTOS(SYMBOLS, +# [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND], +# [INCLUDES = DEFAULT-INCLUDES]) +#
RE: Failure to combine SHIFT with ZERO_EXTEND
Hi Jeff, Many thanks for the pointers. I will make the changes and attach the patch to the bugzilla soon. Cheers, Rahul -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: 09 February 2010 00:45 To: Rahul Kharche Cc: gcc@gcc.gnu.org; sdkteam-gnu Subject: Re: Failure to combine SHIFT with ZERO_EXTEND On 02/04/10 08:39, Rahul Kharche wrote: > Hi All, > > On our private port of GCC 4.4.1 we fail to combine successive SHIFT > operations like in the following case > > #include > #include > > void f1 () > { >unsigned short t1; >unsigned short t2; > >t1 = rand(); >t2 = rand(); > >t1<<= 1; t2<<= 1; >t1<<= 1; t2<<= 1; >t1<<= 1; t2<<= 1; >t1<<= 1; t2<<= 1; >t1<<= 1; t2<<= 1; >t1<<= 1; t2<<= 1; > >printf("%d\n", (t1+t2)); > } > > This is a ZERO_EXTEND problem, because combining SHIFTs with whole > integers works correctly, so do signed values. The problem seems to > arise in the RTL combiner which combines the ZERO_EXTEND with the > SHIFT to generate a SHIFT and an AND. Our architecture does not > support AND with large constants and hence do not have a matching > insn pattern (we prefer not doing this, because of large constants > remain hanging at the end of all RTL optimisations and cause needless > reloads). > > Fixing the combiner to convert masking AND operations to ZERO_EXTRACT > fixes this issue without any obvious regressions. I'm adding the > patch here against GCC 4.4.1 for any comments and/or suggestions. > Good catch.However, note we are a regression bugfix only phase of development right now in preparation for branching for GCC 4.5. As a result the patch can't be checked in at this time; I would recommend you update the patch to the current sources & attach it to bug #41998 which contains queued patches for after GCC 4.5 branches. > Cheers, > Rahul > > > --- combine.c 2009-04-01 21:47:37.0 +0100 > +++ combine.c 2010-02-04 15:04:41.0 + > @@ -446,6 +446,7 @@ > static void record_truncated_values (rtx *, void *); > static bool reg_truncated_to_mode (enum machine_mode, const_rtx); > static rtx gen_lowpart_or_truncate (enum machine_mode, rtx); > +static bool can_zero_extract_p (rtx, rtx, enum machine_mode); > > > > /* It is not safe to use ordinary gen_lowpart in combine. > @@ -6973,6 +6974,16 @@ > make_compound_operation (XEXP (x, 0), > next_code), > i, NULL_RTX, 1, 1, 0, 1); > + else if (can_zero_extract_p (XEXP (x, 0), XEXP (x, 1), mode)) > +{ > + unsigned HOST_WIDE_INT len = HOST_BITS_PER_WIDE_INT > + - CLZ_HWI (UINTVAL (XEXP (x, > 1))); > + new_rtx = make_extraction (mode, > + make_compound_operation (XEXP (x, 0), > + next_code), > + 0, NULL_RTX, len, 1, 0, > + in_code == COMPARE); > There should be a comment prior to this code fragment describing the transformation being performed. Something like: /* Convert (and (shift X Y) MASK) into ... when ... */ That will make it clear in the future when your transformation applies rather than forcing someone scanning the code to read it in detail. > +} > > break; > > @@ -7245,6 +7256,25 @@ > return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); > } > > +static bool > +can_zero_extract_p (rtx x, rtx mask_rtx, enum machine_mode mode) > There should be a comment prior to this function which briefly describes what the function does, the parameters & return value. Use comments prior to other functions to guide you. > @@ -8957,7 +8987,6 @@ > op0 = UNKNOWN; > > *pop0 = op0; > - > /* ??? Slightly redundant with the above mask, but not entirely. >Moving this above means we'd have to sign-extend the mode mask >for the final test. */ > Useless diff fragment. Remove this change as it's completely unrelated and useless. You should also write a ChangeLog entry for your patch. ChangeLogs describe what changed, not why something changed. So a suitable entry might look something like: Your Name * combine.c (make_compound_operation): Convert shifts and masks into zero_extract in certain cases. (can_zero_extract_p): New function. If you could make those changes and attach the result to PR 41998 they should be able to go in once 4.5 branches from the mainline. Jeff
Your Fund Release!
Re: Exception handling information in the macintosh
Jack Howarth a écrit : Jacob, Apple's gcc is based on their own branch and is not the same as FSF gcc. The first FSF gcc that is validated on on darwin10 was gcc 4.4. However I would suggest you first start testing against current FSF gcc trunk. There are a number of fixes for darwin10 that aren't present in the FSF gcc 4.4.x releases yet. In particular, the compilers now link with -no_compact_unwind by default on darwin10 to avoid using the new compact unwinder. Also, when you build your JVM, I would suggest you stick to the FSF gcc trunk compilers you build. In particular, the Apple libstdc++ and FSF libstdc++ aren't interchangable on intel. So you don't want to mix c++ code built with the two different compilers. Jack OK. I downloaded gcc 4.4 and recompiled all the server again with it. Now, throws within C++ work but not when they have to pass through the JITted code. The problem is that we need to link with quite a lot of libraries: /usr/local/gcc-4.5/bin/g++ -o Debug/server -m64 -fPIC -fno-omit-frame-pointer -g -O0 -w Debug/service_helper.o Debug/service.o -L../Debug/64 -ldatabaselibrary -lcryptolibrary -lCPThreadLibrary -lshared -lwin32 -L../CryptoLibrary/lib/Darwin/64 -lcrypto -lssl -lsrp -lint128 -ldl -lpthread ../icu/icu3.4/lib/mac/libicui18n.dylib.34 ../icu/icu3.4/lib/mac/libicuuc.dylib.34 ../icu/icu3.4/lib/mac/libicudata.dylib.34 ../CompilerLibrary/mac/libcclib64.a ../Debug/64/libwin32.a What can I do about libssl.a, libdl.a libcrypto.A? Those are system libraries and I do not have the source code. Should I compile those too? I downloaded gcc 4.5 and the situation is the same... jacob
RTX costs
Hi all, After reading the internal docs about rtx_costs I am left wondering what they exactly are estimating. - Are they estimating in the beginning of expand how many insns will be generated from a particular insn until the assembler is generated? - or Are they estimating how many assembler instructions will be generated for a particular insn? - or something else? If someone could clear this up, I would be happy. :) Thanks, Paulo Matos
Re: RTX costs
Quoting "Paulo J. Matos" : Hi all, After reading the internal docs about rtx_costs I am left wondering what they exactly are estimating. - Are they estimating in the beginning of expand how many insns will be generated from a particular insn until the assembler is generated? - or Are they estimating how many assembler instructions will be generated for a particular insn? - or something else? When optimizing for speed, they estimate execution cycles, when optimizing for space, they estimate code size. In each case, normalized so that COSTS_N_INSNS (1) is equivalent to the cost of a simple instruction. Or at least that's the theoretical goal. Individual ports might deviate from that for historical and/or practical reasons.
Re: RTX costs
> After reading the internal docs about rtx_costs I am left wondering what > they exactly are estimating. > - Are they estimating in the beginning of expand how many insns will be > generated from a particular insn until the assembler is generated? > - or Are they estimating how many assembler instructions will be > generated for a particular insn? The latter is what they're *supposed* to be measuring, but of course it's an approximation.
Re: Modulo Scheduling
On Tue, 2 Feb 2010, Cameron Lowell Palmer wrote: Does Modulo Scheduling work on x86 platforms? I have tried adding in various versions of the -fmodulo-sched option and get the exact same output with or without. The application is a very simplistic matrix multiply without dependencies. No, at present SMS is not able to schedule any loops on x86 at all. This is due to implementation detail: SMS operates on loops that end with decrement-and-branch instruction, and GCC does not generate such instructions on x86. Sorry. Alexander Monakov
Re: RTX costs
Joern Rennecke wrote: Quoting "Paulo J. Matos" : Hi all, After reading the internal docs about rtx_costs I am left wondering what they exactly are estimating. - Are they estimating in the beginning of expand how many insns will be generated from a particular insn until the assembler is generated? - or Are they estimating how many assembler instructions will be generated for a particular insn? - or something else? When optimizing for speed, they estimate execution cycles, when optimizing for space, they estimate code size. In each case, normalized so that COSTS_N_INSNS (1) is equivalent to the cost of a simple instruction. Or at least that's the theoretical goal. Individual ports might deviate from that for historical and/or practical reasons. Thanks for the reply. The name of the macro COSTS_N_INSNS was making me think I was estimating numbers of insns instead of number of assembler instructions even though when optimising for size the latter makes more sense.
Generating store after fdivd: how to avoid delay slot
I try to patch gcc so that after a fdivd the destination register is stored to the stack. fdivd %f0,%f2,%f4; std %f4, [%sp] I generate the rtl for divdf3 using a emit_insn,DONE sequence in a define_expand pattern (see below). In the assembler output phase I use a define_insn and write out "fdivd\t%%1, %%2, %%0; std %%0, %%3" as the expression string. My question: - How can I mark the pattern so that it will not be sheduled into a delay slot? How can I specify that the output will be 2 instructions and hint the scheduler about it? - Is the (set_attr "length" "2") attribute in define_insn divdf3_store (below) already sufficient? -- Greetings Konrad ;; handle divdf3 (define_expand "divdf3" [(parallel [(set (match_operand:DF 0 "register_operand" "=e") (div:DF (match_operand:DF 1 "register_operand" "e") (match_operand:DF 2 "register_operand" "e"))) (clobber (match_scratch:SI 3 ""))])] "TARGET_FPU" "{ output_divdf3_emit (operands[0], operands[1], operands[2], operands[3]); DONE; }") (define_insn "divdf3_store" [(set (match_operand:DF 0 "register_operand" "=e") (div:DF (match_operand:DF 1 "register_operand" "e") (match_operand:DF 2 "register_operand" "e"))) (clobber (match_operand:DF 3 "memory_operand" "" ))] "TARGET_FPU && TARGET_STORE_AFTER_DIVSQRT" { return output_divdf3 (operands[0], operands[1], operands[2], operands[3]); } [(set_attr "type" "fpdivd") (set_attr "fptype" "double") (set_attr "length" "2")]) (define_insn "divdf3_nostore" [(set (match_operand:DF 0 "register_operand" "=e") (div:DF (match_operand:DF 1 "register_operand" "e") (match_operand:DF 2 "register_operand" "e")))] "TARGET_FPU && (!TARGET_STORE_AFTER_DIVSQRT)" "fdivd\t%1, %2, %0" [(set_attr "type" "fpdivd") (set_attr "fptype" "double")]) / handle fdivd / char * output_divdf3 (rtx op0, rtx op1, rtx dest, rtx scratch) { static char string[128]; sprintf(string,"fdivd\t%%1, %%2, %%0; std %%0, %%3 !!!"); return string; } void output_divdf3_emit (rtx dest, rtx op0, rtx op1, rtx scratch) { rtx slot0, div, divsave; div = gen_rtx_SET (VOIDmode, dest, gen_rtx_DIV (DFmode, op0, op1)); if (TARGET_STORE_AFTER_DIVSQRT) { slot0 = assign_stack_local (DFmode, 8, 8); divsave = gen_rtx_SET (VOIDmode, slot0, dest); emit_insn(divsave); emit_insn (gen_rtx_PARALLEL(VOIDmode, gen_rtvec (2, div, gen_rtx_CLOBBER (SImode, slot0; } else { emit_insn(div); } }
Re: [GRAPHITE] Re: Loop Transformations Question
On Tue, Feb 9, 2010 at 6:01 PM, Cristianno Martins wrote: > Hi everyone, > > First of all, I already find [and fix] the problem that I had > described in the last email. > Now, I need a help with a pretty intriguing issue, described below. > > Well, such as I told in the last email, I'm working on a > implementation of a heuristic for loop skewing transformation. To > expose the issue, I will show how it happens with an example. > > First, the code used to compile is very simple, and can be seen below > (the matrix dimensions were minimized for simplification). > > #define m 20 > #define n 30 > int a[m][n]; > int main() { > int i, j; > for(i = 0; i < m; i++) > for(j = 0; j < n; j++) > a[i][j] = 1; > > return a[1][1] + a[2][2]; > } > > > After the skewing pass, if I call > > > cloog_prog_clast pc = scop_to_clast (scop); > printf ("\nCLAST generated by CLooG: \n"); > print_clast_stmt (stdout, pc.stmt); > > > I get the following result: > > > CLAST generated by CLooG: > for (scat_1=0;scat_1<=48;scat_1++) { > for (scat_3=max(scat_1-29,0);scat_3<=min(scat_1,19);scat_3++) { > b[scat_3][scat_1-scat_3] = 1 ; > } > } > > > Going ahead with this, such as can be seen in > simple_test.c.105t.graphite [attached], the code is correctly > generated (in gimple), but the important bb is: > > > : > # graphite_IV.5_17 = PHI <0(2), graphite_IV.5_5(9)> > # .MEM_37 = PHI <.MEM_13(D)(2), .MEM_38(9)> > D.3185_12 = graphite_IV.5_17 + 0x0ffe3; > D.3186_11 = MAX_EXPR ; # this is the line in question It looks like D.3185 is unsigned. Richard. > D.3187_23 = MIN_EXPR ; > D.3188_24 = D.3187_23 + 1; > D.3189_25 = D.3186_11 < D.3188_24; > if (D.3189_25 != 0) > goto ; > else > goto ; > > > Curiously, the line marked above doesn't work in assembly. The > D.3186_11 is assigned to -29, although zero is greater than that, and > the code inside the loop body never runs. > Moreover, If I get the clast code generated by cloog after the skewing > (above), and put it on the source file (compiling without skeking), > the max expr appears in gimple as an if statement, and the code > executes perfectly. > > Does someone have any idea how could I fix it?? > > Thanks in advance, > > On Sun, Feb 7, 2010 at 7:31 PM, Cristianno Martins > wrote: >> >> Hello everyone, >> >> I've working on graphite lately, and I did an loop skewing >> implementation, starting from the loop interchange code [in >> gcc/graphite-interchange.c]. >> However, after the transformation, if I print the clast generated by >> Cloog, what I get is a almost the same loop as the original one. >> Moreover, if I write the pbb transformed domain and scattering into a >> file, and run the cloog command with that file, the result is exactly >> what I want from the beggining. >> For better comprehension of the problem, some interesting data are showed >> below. >> >> But, first, for short, my question is: am I forgetting something >> important that I must be doing (like a function call)? >> >> >>> [...] >> >> Thanks in advance, >> >> -- >> Cristianno Martins > > -- > Cristianno Martins >
Re: [GRAPHITE] Re: Loop Transformations Question
Hi, Thanks for the fast reply. Only one more thing: is there some way that I could force it to be signed?? On Tue, Feb 9, 2010 at 3:17 PM, Richard Guenther wrote: > On Tue, Feb 9, 2010 at 6:01 PM, Cristianno Martins > wrote: >> Hi everyone, >> >> First of all, I already find [and fix] the problem that I had >> described in the last email. >> Now, I need a help with a pretty intriguing issue, described below. >> >> Well, such as I told in the last email, I'm working on a >> implementation of a heuristic for loop skewing transformation. To >> expose the issue, I will show how it happens with an example. >> >> First, the code used to compile is very simple, and can be seen below >> (the matrix dimensions were minimized for simplification). >> >> #define m 20 >> #define n 30 >> int a[m][n]; >> int main() { >> int i, j; >> for(i = 0; i < m; i++) >> for(j = 0; j < n; j++) >> a[i][j] = 1; >> >> return a[1][1] + a[2][2]; >> } >> >> >> After the skewing pass, if I call >> >> >> cloog_prog_clast pc = scop_to_clast (scop); >> printf ("\nCLAST generated by CLooG: \n"); >> print_clast_stmt (stdout, pc.stmt); >> >> >> I get the following result: >> >> >> CLAST generated by CLooG: >> for (scat_1=0;scat_1<=48;scat_1++) { >> for (scat_3=max(scat_1-29,0);scat_3<=min(scat_1,19);scat_3++) { >> b[scat_3][scat_1-scat_3] = 1 ; >> } >> } >> >> >> Going ahead with this, such as can be seen in >> simple_test.c.105t.graphite [attached], the code is correctly >> generated (in gimple), but the important bb is: >> >> >> : >> # graphite_IV.5_17 = PHI <0(2), graphite_IV.5_5(9)> >> # .MEM_37 = PHI <.MEM_13(D)(2), .MEM_38(9)> >> D.3185_12 = graphite_IV.5_17 + 0x0ffe3; >> D.3186_11 = MAX_EXPR ; # this is the line in question > > It looks like D.3185 is unsigned. > > Richard. > >> D.3187_23 = MIN_EXPR ; >> D.3188_24 = D.3187_23 + 1; >> D.3189_25 = D.3186_11 < D.3188_24; >> if (D.3189_25 != 0) >> goto ; >> else >> goto ; >> >> >> Curiously, the line marked above doesn't work in assembly. The >> D.3186_11 is assigned to -29, although zero is greater than that, and >> the code inside the loop body never runs. >> Moreover, If I get the clast code generated by cloog after the skewing >> (above), and put it on the source file (compiling without skeking), >> the max expr appears in gimple as an if statement, and the code >> executes perfectly. >> >> Does someone have any idea how could I fix it?? >> >> Thanks in advance, >> >> On Sun, Feb 7, 2010 at 7:31 PM, Cristianno Martins >> wrote: >>> >>> Hello everyone, >>> >>> I've working on graphite lately, and I did an loop skewing >>> implementation, starting from the loop interchange code [in >>> gcc/graphite-interchange.c]. >>> However, after the transformation, if I print the clast generated by >>> Cloog, what I get is a almost the same loop as the original one. >>> Moreover, if I write the pbb transformed domain and scattering into a >>> file, and run the cloog command with that file, the result is exactly >>> what I want from the beggining. >>> For better comprehension of the problem, some interesting data are showed >>> below. >>> >>> But, first, for short, my question is: am I forgetting something >>> important that I must be doing (like a function call)? >>> >>> [...] >>> >>> Thanks in advance, >>> >>> -- >>> Cristianno Martins >> >> -- >> Cristianno Martins >> > -- Cristianno Martins Mestrando em Computação Universidade Estadual de Campinas cristianno.mart...@students.ic.unicamp.br cel: (19) 8825-5731 [oi] (19) 8240-3217 [tim] skype: cristiannomartins gTalk: cristiannomartins msn: cristiannomart...@hotmail.com
Re: [GRAPHITE] Re: Loop Transformations Question
On Tue, Feb 9, 2010 at 12:34, Cristianno Martins wrote: > Hi, > > Thanks for the fast reply. Only one more thing: is there some way that > I could force it to be signed?? I guess that you should wait the fixes from Tobias and Ramakrishna to CLooG and Graphite to have the type of the IV exposed by CLooG, rather than having the original IV type set to the generated IV. Sebastian > > On Tue, Feb 9, 2010 at 3:17 PM, Richard Guenther > wrote: >> On Tue, Feb 9, 2010 at 6:01 PM, Cristianno Martins >> wrote: >>> Hi everyone, >>> >>> First of all, I already find [and fix] the problem that I had >>> described in the last email. >>> Now, I need a help with a pretty intriguing issue, described below. >>> >>> Well, such as I told in the last email, I'm working on a >>> implementation of a heuristic for loop skewing transformation. To >>> expose the issue, I will show how it happens with an example. >>> >>> First, the code used to compile is very simple, and can be seen below >>> (the matrix dimensions were minimized for simplification). >>> >>> #define m 20 >>> #define n 30 >>> int a[m][n]; >>> int main() { >>> int i, j; >>> for(i = 0; i < m; i++) >>> for(j = 0; j < n; j++) >>> a[i][j] = 1; >>> >>> return a[1][1] + a[2][2]; >>> } >>> >>> >>> After the skewing pass, if I call >>> >>> >>> cloog_prog_clast pc = scop_to_clast (scop); >>> printf ("\nCLAST generated by CLooG: \n"); >>> print_clast_stmt (stdout, pc.stmt); >>> >>> >>> I get the following result: >>> >>> >>> CLAST generated by CLooG: >>> for (scat_1=0;scat_1<=48;scat_1++) { >>> for (scat_3=max(scat_1-29,0);scat_3<=min(scat_1,19);scat_3++) { >>> b[scat_3][scat_1-scat_3] = 1 ; >>> } >>> } >>> >>> >>> Going ahead with this, such as can be seen in >>> simple_test.c.105t.graphite [attached], the code is correctly >>> generated (in gimple), but the important bb is: >>> >>> >>> : >>> # graphite_IV.5_17 = PHI <0(2), graphite_IV.5_5(9)> >>> # .MEM_37 = PHI <.MEM_13(D)(2), .MEM_38(9)> >>> D.3185_12 = graphite_IV.5_17 + 0x0ffe3; >>> D.3186_11 = MAX_EXPR ; # this is the line in question >> >> It looks like D.3185 is unsigned. >> >> Richard. >> >>> D.3187_23 = MIN_EXPR ; >>> D.3188_24 = D.3187_23 + 1; >>> D.3189_25 = D.3186_11 < D.3188_24; >>> if (D.3189_25 != 0) >>> goto ; >>> else >>> goto ; >>> >>> >>> Curiously, the line marked above doesn't work in assembly. The >>> D.3186_11 is assigned to -29, although zero is greater than that, and >>> the code inside the loop body never runs. >>> Moreover, If I get the clast code generated by cloog after the skewing >>> (above), and put it on the source file (compiling without skeking), >>> the max expr appears in gimple as an if statement, and the code >>> executes perfectly. >>> >>> Does someone have any idea how could I fix it?? >>> >>> Thanks in advance, >>> >>> On Sun, Feb 7, 2010 at 7:31 PM, Cristianno Martins >>> wrote: Hello everyone, I've working on graphite lately, and I did an loop skewing implementation, starting from the loop interchange code [in gcc/graphite-interchange.c]. However, after the transformation, if I print the clast generated by Cloog, what I get is a almost the same loop as the original one. Moreover, if I write the pbb transformed domain and scattering into a file, and run the cloog command with that file, the result is exactly what I want from the beggining. For better comprehension of the problem, some interesting data are showed below. But, first, for short, my question is: am I forgetting something important that I must be doing (like a function call)? > [...] Thanks in advance, -- Cristianno Martins >>> >>> -- >>> Cristianno Martins >>> >> > > > > -- > Cristianno Martins > Mestrando em Computação > Universidade Estadual de Campinas > cristianno.mart...@students.ic.unicamp.br > > cel: (19) 8825-5731 [oi] > (19) 8240-3217 [tim] > skype: cristiannomartins > gTalk: cristiannomartins > msn: cristiannomart...@hotmail.com >
Zero extractions and zero extends
Dear all, If I consider this code typedef struct sTestUnsignedChar { uint64_t a:1; }STestUnsignedChar; uint64_t getU (STestUnsignedChar a) { return a.a; } I get this in the DCE pass : (insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0) (zero_extract:DI (reg/v:DI 73 [ a ]) (const_int 1 [0x1]) (const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD (reg/v:DI 73 [ a ]) (nil))) (insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74) (zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2} (expr_list:REG_DEAD (reg:QI 75) (nil))) (on the x86 port, I get a and instead of the zero_extract) However, on the combine pass both stay, whereas in the x86 port, the zero_extend is removed. Where is this decided exactly ? I've checked the costs of the instructions, I have the same thing as the x86 port. Thanks for all your help, Jean Christophe Beyler
Re: [GRAPHITE] Re: Loop Transformations Question
On 09.02.2010 19:39, Sebastian Pop wrote: > On Tue, Feb 9, 2010 at 12:34, Cristianno Martins > wrote: >> Hi, >> >> Thanks for the fast reply. Only one more thing: is there some way that >> I could force it to be signed?? > > I guess that you should wait the fixes from Tobias and Ramakrishna to > CLooG and Graphite to have the type of the IV exposed by CLooG, rather > than having the original IV type set to the generated IV. > > Sebastian Yes it looks as this is one of the bugs Ramakrishna and me are working on. In short the reason for the bugs seems to be that the loop induction variable is converted to an unsigned int. Later cloog generates a statement that is negative and assigned to the unsigned int. Instead of this expression iv < -something being false the signed value is wrapped to an unsigned one. Therefore the whole expression becomes true. I will have a look into this one. Tobias
Re: porting GCC to a micro with a very limited addressing mode --- success with LEGITIMATE / LEGITIMIZE_ADDRESS, stuck with ICE !
Michael Hope wrote: > Hi Sergio. Any luck so far? Micheal, thanks for your inquiry. I made some progress, in fact. I got the GO_IF_LEGITIMATE_ADDRESS() macro to detect correctly REG+IMM addresses, and then the LEGITIMIZE_ADDRESS() macro to force them to be pre-computed in a register. However, now the compiler freaks out with an ICE.. :-/ I put some details below. Thanks for any clue that you or others can give me. Cheers, Sergio == This is a fragment of my LEGITIMIZE_ADDRESS(): - rtx legitimize_address(rtx X,rtx OLDX, enum machine_mode MODE) { rtx op1,op2,op,sum; op=NULL; ... if(GET_CODE(X)==PLUS && !no_new_pseudos) { op1=XEXP(X,0); op2=XEXP(X,1); if(GET_CODE(op1) == CONST_INT && (GET_CODE(op2) == REG || GET_CODE(op2) == SUBREG)) // base displacement { sum = gen_rtx_PLUS (MODE, op1, op2); op = force_reg(MODE, sum); } ... - Now when compiling a simple program such as: void foobar(int par1, int par2, int parN) { int a,b; a = 0x1234; b = a; } the instructions (n. 8,12,13) which compute the addresses in registers seem to be generated correctly: - ;; Function foobar ;; Register dispositions: 37 in 4 38 in 2 39 in 4 40 in 2 41 in 2 ;; Hard regs used: 2 4 30 (note 2 0 3 NOTE_INSN_DELETED) (note 3 2 6 0 NOTE_INSN_FUNCTION_BEG) ;; Start of basic block 1, registers live: 1 [A1] 29 [B13] 30 [B14] (note 6 3 8 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 8 6 9 1 (set (reg/f:HI 4 A4 [37]) (plus:HI (reg/f:HI 30 B14) (const_int -16 [0xfff0]))) 9 {addhi3} (nil) (nil)) (insn 9 8 10 1 (set (reg:HI 2 A2 [38]) (const_int 4660 [0x1234])) 5 {*constant_load} (nil) (nil)) (insn 10 9 12 1 (set (mem/i:HI (reg/f:HI 4 A4 [37]) [0 a+0 S2 A32]) (reg:HI 2 A2 [38])) 7 {*store_word} (nil) (nil)) (insn 12 10 13 1 (set (reg/f:HI 4 A4 [39]) (plus:HI (reg/f:HI 30 B14) (const_int -14 [0xfff2]))) 9 {addhi3} (nil) (nil)) (insn 13 12 14 1 (set (reg/f:HI 2 A2 [40]) (plus:HI (reg/f:HI 30 B14) (const_int -16 [0xfff0]))) 9 {addhi3} (nil) (nil)) (insn 14 13 15 1 (set (reg:HI 2 A2 [orig:41 a ] [41]) (mem/i:HI (reg/f:HI 2 A2 [40]) [0 a+0 S2 A32])) 4 {*load_word} (nil) (nil)) (insn 15 14 16 1 (set (mem/i:HI (reg/f:HI 4 A4 [39]) [0 b+0 S2 A16]) (reg:HI 2 A2 [orig:41 a ] [41])) 7 {*store_word} (nil) (nil)) ;; End of basic block 1, registers live: 1 [A1] 29 [B13] 30 [B14] (note 16 15 25 NOTE_INSN_FUNCTION_END) (note 25 16 0 NOTE_INSN_DELETED) - However, when I compile it $ hcc -da foobar8.c I get an ICE at the end of the compilation, and the assembly source is not produced: [ lots of my debugging output removed ] legitimate_address2(non-strict, soft-reg allowed), X= (reg/f:HI 29 B13) legitimate_address2() yes: (X)==REG && non_strict_base_reg(REGNO(X)) -MOVHI--- [generating a MOV X, Y insn] MOVHI: operands[0] (mem:HI (reg/f:HI 29 B13) [0 S2 A8]) MOVHI: operands[1] (reg:HI 31 B15) MOVHI --- END [then checking if -2(B13) is legitimate, it is not...] legitimate_address2(non-strict, soft-reg allowed), X= (plus:HI (reg/f:HI 29 B13) (const_int -2 [0xfffe])) legitimate_address2(): FOUND register+offset --> FAIL! legitimate_address2(non-strict, soft-reg allowed), X= (plus:HI (reg/f:HI 29 B13) (const_int -2 [0xfffe])) legitimate_address2(): FOUND register+offset --> FAIL! legitimate_address2(non-strict, soft-reg allowed), X= (plus:HI (reg/f:HI 29 B13) (const_int -2 [0xfffe])) legitimate_address2(): FOUND register+offset --> FAIL! legitimate_address2(non-strict, soft-reg allowed), X= (plus:HI (reg/f:HI 29 B13) (const_int -2 [0xfffe])) legitimate_address2(): FOUND register+offset --> FAIL! [and after four check of the add above, gcc 4.0.2 freaks out with ] foobar8.c: In function ‘foobar’: foobar8.c:7: internal compiler error: in change_address_1, at emit-rtl.c:1800 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. The failed assertion is in line 1800: some "addr" is not an address. 1784 change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate) 1785 { 1786 rtx new; 1787 1788 gcc_assert (MEM_P (memref)); 1789 if (mode == VOIDmode) 1790 mode = GET_MODE (memref); 1791 if (addr == 0) 1792 addr = XEXP (memref, 0); 1793 if (mode == GET
Re: Questions about compute_transpout in gcse.c code hoisting implementation
On 01/26/10 16:47, Steven Bosscher wrote: Hello Jeff and other interested readers :-) There is a function compute_transpout() in gcse.c and there are a couple of things about this functions that I don't understand exactly. First, there is this comment before the function says: "An expression is transparent to an edge unless it is killed by the edge itself. This can only happen with abnormal control flow, when the edge is traversed through a call. This happens with non-local labels and exceptions. " What does this mean, exactly? The implementation of the function simply kills all expressions that are MEM_P if a basic block ends with a CALL_INSN (with a wrong comment about something flow did in the gcc dark ages, say 15 years ago). But I don't see how compute_transpout handles non-local labels. And what about non-call exceptions? I believe rth added this code along time ago to avoid ever needing to insert an insn on an abnormal edge during gcse. IIRC the hoisting code utilizes the same compute_transpout and thus inherits this behaviour. Second, it looks like gcc says that an expression can be in VBEOUT, but can not be hoisted. In hoist_code this is expressed like so: if (TEST_BIT (hoist_vbeout[bb->index], i) && TEST_BIT (transpout[bb->index], i)) { /* We've found a potentially hoistable expression, now we look at every block BB dominates to see if it computes the expression. */ Why does the code hoisting pass not do the same as the LCM-PRE pass: Eliminate expressions it cannot handle early on? In this case, wouldn't it be easier (better?) to eliminate expressions that are not TRANSPOUT from VBEOUT in compute_vbeinout? Would it be OK if I teach compute_vbeinout to eliminate expressions that may trap from VEBOUT, if there are exception edges to successor blocks? This is similar to what LCM-PRE does in compute_pre_data (well, more or less, sort-of, etc.). Hoisting was a quick transcription of Muchnick's algorithm and hasn't been touched too much since. I don't see that there's anything to lose by pruning the expression sets early. jeff
gcc-4.4-20100209 is now available
Snapshot gcc-4.4-20100209 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100209/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.4 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch revision 156635 You'll find: gcc-4.4-20100209.tar.bz2 Complete GCC (includes all of below) gcc-core-4.4-20100209.tar.bz2 C front end and core compiler gcc-ada-4.4-20100209.tar.bz2 Ada front end and runtime gcc-fortran-4.4-20100209.tar.bz2 Fortran front end and runtime gcc-g++-4.4-20100209.tar.bz2 C++ front end and runtime gcc-java-4.4-20100209.tar.bz2 Java front end and runtime gcc-objc-4.4-20100209.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.4-20100209.tar.bz2The GCC testsuite Diffs from 4.4-20100202 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: insn length attribute and code size optimization
Daniel Jacobowitz writes: > On Wed, Feb 03, 2010 at 06:23:19AM -0800, Ian Lance Taylor wrote: >> fanqifei writes: >> >> > According to the internal manual, insn length attribute can be used to >> > to calculate the length of emitted code chunks when verifying branch >> > distances. >> > Can it be used in code size optimization? >> >> I suppose it could, but it isn't. Instead of asking the backend for >> the length of instructions, the compiler asks the backend for the cost >> of instructions. The backend is free to determine that cost however >> it likes. When using -Os, using the size of the instruction is a good >> measure of cost. > > It seems to me that there's a hard ordering problem here: we can't > determine insn lengths, using the current framework, until very late. > We need at least (A) whole instructions, not just RTL expressions; (B) > register allocation to select alternatives; (C) branch shortening to > determine branch alternatives. > > I'm curious if anyone thinks there's a generic solution to this (that > doesn't involve a complete instruction selection rewrite :-). Yeah, it's something I've often wanted too, since at the moment you end up duplicating a lot of the instruction selection in C code. E.g. the MIPS port has stuff like: if (float_mode_p && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode)) && TARGET_FUSED_MADD && !HONOR_NANS (mode) && !HONOR_SIGNED_ZEROS (mode)) { /* See if we can use NMADD or NMSUB. See mips.md for the associated patterns. */ rtx op0 = XEXP (x, 0); rtx op1 = XEXP (x, 1); if (GET_CODE (op0) == MULT && GET_CODE (XEXP (op0, 0)) == NEG) { *total = (mips_fp_mult_cost (mode) + rtx_cost (XEXP (XEXP (op0, 0), 0), SET, speed) + rtx_cost (XEXP (op0, 1), SET, speed) + rtx_cost (op1, SET, speed)); return true; } if (GET_CODE (op1) == MULT) { *total = (mips_fp_mult_cost (mode) + rtx_cost (op0, SET, speed) + rtx_cost (XEXP (op1, 0), SET, speed) + rtx_cost (XEXP (op1, 1), SET, speed)); return true; } } Ugh! But I could never see a cure that was better than the disease without (as you say) a rewrite. I think (A) is the main problem: we already have code to estimate constraints selection before reload, so we could at least guess at (B) and (C). (Which is what the costs have to do anyway really.) Richard
Linkage order in Linux
Hello. Recently I found out a surprising requirement to compile own application with Horde3D library (http://horde3d.org/), OpenGL 3D graphics engine. Horde3D library links to shared GL library. But -lHorde3D must be listed *before* -lGL for any application to work correctly. If I link the application first to GL, and only then to Horde3D, then it merely segfaults when Horde3D's init calls glCreateShader, a GL library function. We have several speculations about what causes this particular order for the linker: http://horde3d.org/forums/viewtopic.php?f=2&t=384 But I'd like to know real reason of this surprising order of linkage requirement. Thanks.
Re: Linkage order in Linux
michael kapelko writes: > Recently I found out a surprising requirement to compile own > application with Horde3D library (http://horde3d.org/), OpenGL 3D > graphics engine. > Horde3D library links to shared GL library. But -lHorde3D must be > listed *before* -lGL for any application to work correctly. If I link > the application first to GL, and only then to Horde3D, then it merely > segfaults when Horde3D's init calls glCreateShader, a GL library > function. > We have several speculations about what causes this particular order > for the linker: http://horde3d.org/forums/viewtopic.php?f=2&t=384 > But I'd like to know real reason of this surprising order of linkage > requirement. The mailing list gcc@gcc.gnu.org is for gcc developers. Questions about using gcc should be taken to gcc-h...@gcc.gnu.org. Please take any followups to gcc-help. Thanks. Unix linkers are always order dependent, by design. If -lHorde3D requires symbols from -lGL, but links successfully even if -lHorde3D is used after -lGL, then perhaps the references in -lHorde3D are weak for some reason. Or there are a few other possibilities; I'm not familiar with the libraries in question. One thing is for sure: this has nothing to do with gcc. gcc does not include a linker. If you are using GNU/Linux, then your linker is almost certainly coming from the GNU binutils, which is a different project; see http://sourceware.org/binutils/ . Ian