Re: Why auto variables NOT overlap on stack?

2010-02-09 Thread Alexey Salmin
There's another funny thing about gcc3 behavior which I've just discovered:

$ gcc -v 2>&1 | grep version
gcc version 3.4.2

$ gcc -o mem mem.c ; ./mem
-1024
$ gcc -o mem1 mem1.c ; ./mem1
0

$ cat mem.c
#include 

int main() {
char *p1, *p2;
{
char a[1024];
p1 = a;
}
{
char a[1024];
p2 = a;
}
printf("%d\n", p2 - p1);
return 0;
}

$ cat mem1.c
#include 

static const int N = 1024;

int main() {
char *p1, *p2;
{
char a[N];
p1 = a;
}
{
char a[N];
p2 = a;
}
printf("%d\n", p2 - p1);
return 0;
}


Alexey


Re: AC_CHECK_DECLS(basename) (Was: Re: Ping: patches required for --enable-build-with-cxx)

2010-02-09 Thread Paolo Bonzini
I'm adding autoc...@gnu.org to the destinations, since this is a
pretty fundamental problem with AC_CHECK_DECL and C++

On Tue, Feb 9, 2010 at 02:17, Joern Rennecke
 wrote:
>
>> On 02/08/2010 09:58 AM, Joern Rennecke wrote:
>>>
>>> That would only work if every program that uses libiberty uses
>>> AC_SYSTEM_EXTENSIONS .
>>
>> GCC does, gdb (I think, I don't have it checked out) does and nothing
>> else uses basename anyway (they use lbasename).  If problems come up,
>> other users can be patched to use AC_USE_SYSTEM_EXTENSIONS.
>
> I've tried going down that route, and it turned out that my original patch
> only
> worked due to a typo.  The _GNU_SOURCE inconsistency is a red herring.
>
> The real problem is that when libcpp is configured, it is configured with
> g++
> as the compiler, and the test for a basename declaration fails because
> basename is declared in an overloaded way - a const and a non-const
> variant - while the test code has:
> | int
> | main ()
> | {
> | #ifndef basename
> |   (void) basename;
> | #endif
> |
> |   ;
> |   return 0;
> | }
>
> so g++ complains:
>
> conftest.cpp: In function 'int main()':
> conftest.cpp:78:10: error: void cast cannot resolve address of overloaded
> function
>
> and configure mistakenly assumes that no basename declaration exists.
> Thus, when libiberty is included, it 'helpfully' provides another
> declaration
> for basename, which makes the build fail.
>
> So, AC_CHECK_DECLS as it is now simply cannot work when configuring with
> g++ as compiler for any function that has overloaded declarations.  In
> order to do a valid positive check, we'd have to use a valid function
> signature - which means we have to know a valid function signature first,
> which would be specific to the function.
>
> If we know such a signature, we can use #ifdef __cplusplus to compile
> a function call in this case.  A C++ compiler should give an error if
> the function was not declared.
>
> We could soup up AC_CHECK_DECLS to know all the standard functions by name,
> or at least the overloaded ones - but I'm not sure such a complex solution
> will really save time in the long term.

Paolo


Re: AC_CHECK_DECLS(basename) (Was: Re: Ping: patches required for --enable-build-with-cxx)

2010-02-09 Thread Joern Rennecke

Quoting Paolo Bonzini :


I'm adding autoc...@gnu.org to the destinations, since this is a
pretty fundamental problem with AC_CHECK_DECL and C++


I've whipped up a patch with a modified version of AC_CHECK_DECLS -
I've called it AC_CHECK_PROTOS - that can optionally have argument types
for a function (without spaces).  Then I've used this new macro in the
libcpp configure.ac to replace AC_CHECK_DECLS.
This bootstrapped and regtested fine on i686-pc-linux-gnu , and it now
also bootstraps with '--enable-build-with-cxx (although the regtest results
are still affected by PR testsuite/42843).

As I've seen you re-wrote the AC_CHECK_DECLS from 2.63 to 2.64 to save
on configure script size, one design criterion was to avoid unnecessary
passing of extra arguments outside of the shell function.

However, I wonder if there is a better way to do the string processing -
I only do autoconf hacking sporadically, and my code looks somewhat
different from the original style.
2010-02-09  Joern Rennecke  

libcpp:
* aclocal (_AC_CHECK_PROTO_BODY): New shell function.
(AC_CHECK_PROTO, _AC_CHECK_PROTOS, AC_CHECK_PROTOS): New macros.
* configure.ac: Use AC_CHECK_PROTOS instead of AC_CHECK_DECLS.
* configure: Regenerate.

Index: libcpp/configure.ac
===
--- libcpp/configure.ac (revision 156598)
+++ libcpp/configure.ac (working copy)
@@ -81,8 +81,8 @@ define(libcpp_UNLOCKED_FUNCS, clearerr_u
   fread_unlocked fwrite_unlocked getchar_unlocked getc_unlocked dnl
   putchar_unlocked putc_unlocked)
 AC_CHECK_FUNCS(libcpp_UNLOCKED_FUNCS)
-AC_CHECK_DECLS(m4_split(m4_normalize(abort asprintf basename errno getopt \
-  libcpp_UNLOCKED_FUNCS vasprintf)))
+AC_CHECK_PROTOS(m4_split(m4_normalize(abort asprintf basename(char*) errno \
+  getopt libcpp_UNLOCKED_FUNCS vasprintf)))
 
 # Checks for library functions.
 AC_FUNC_ALLOCA
Index: libcpp/aclocal.m4
===
--- libcpp/aclocal.m4   (revision 156598)
+++ libcpp/aclocal.m4   (working copy)
@@ -22,3 +22,75 @@ m4_include([../config/lib-link.m4])
 m4_include([../config/lib-prefix.m4])
 m4_include([../config/override.m4])
 m4_include([../config/warnings.m4])
+
+##  ##
+## Checking for declared symbols.   ##
+## This is like *AC_CHECK_DECL*, except that for c++, we may use a  ##
+## prototype to check for a (possibly overloaded) function. ##
+##  ##
+
+
+# _AC_CHECK_PROTO_BODY
+# ---
+# Shell function body for AC_CHECK_PROTO.
+m4_define([_AC_CHECK_PROTO_BODY],
+[  AS_LINENO_PUSH([$[]1])
+  [as_decl_name=`echo $][2|sed 's/(.*//'`]
+  [as_decl_use=`echo $][2|sed -e 's/(/((/' -e 's/\()\|,\)/) 0&/'`]
+  AC_CACHE_CHECK([whether $as_decl_name is declared], [$[]3],
+  [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([$[]4],
+...@%:@ifndef $[]as_decl_name
+...@%:@ifdef __cplusplus
+  $[]as_decl_type
+  (void) $[]as_decl_use;
+...@%:@else
+  (void) $[]as_decl_name;
+...@%:@endif
+...@%:@endif
+])],
+  [AS_VAR_SET([$[]3], [yes])],
+  [AS_VAR_SET([$[]3], [no])])])
+  AS_LINENO_POP
+])# _AC_CHECK_PROTO_BODY
+
+# AC_CHECK_PROTO(SYMBOL,
+#   [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND],
+#   [INCLUDES = DEFAULT-INCLUDES])
+# ---
+# Check whether SYMBOL (a function, variable, or constant) is declared.
+AC_DEFUN([AC_CHECK_PROTO],
+[AC_REQUIRE_SHELL_FN([ac_fn_]_AC_LANG_ABBREV[_check_proto],
+  [AS_FUNCTION_DESCRIBE([ac_fn_]_AC_LANG_ABBREV[_check_proto],
+[LINENO SYMBOL VAR],
+[Tests whether SYMBOL is declared, setting cache variable VAR 
accordingly.])],
+  [_$0_BODY])]dnl
+[AS_VAR_PUSHDEF([ac_Symbol], [ac_cv_have_decl_$1])]dnl
+[ac_fn_[]_AC_LANG_ABBREV[]_check_proto ]dnl
+["$LINENO" "$1" "ac_Symbol" "AS_ESCAPE([AC_INCLUDES_DEFAULT([$4])], [""])"
+AS_VAR_IF([ac_Symbol], [yes], [$2], [$3])
+AS_VAR_POPDEF([ac_Symbol])dnl
+])# AC_CHECK_PROTO
+
+
+# _AC_CHECK_PROTOS(SYMBOL, ACTION-IF_FOUND, ACTION-IF-NOT-FOUND,
+#  INCLUDES)
+# -
+# Helper to AC_CHECK_PROTOS, which generates the check for a single
+# SYMBOL with INCLUDES, performs the AC_DEFINE, then expands
+# ACTION-IF-FOUND or ACTION-IF-NOT-FOUND.
+m4_define([_AC_CHECK_PROTOS],
+[AC_CHECK_PROTO([$1], [ac_have_decl=1], [ac_have_decl=0], [$4])]dnl
+[AC_DEFINE_UNQUOTED(AS_TR_CPP(patsubst(HAVE_DECL_[$1],[(.*])), [$ac_have_decl],
+  [Define to 1 if you have the declaration of `$1',
+   and to 0 if you don't.])]dnl
+[m4_ifvaln([$2$3], [AS_IF([test $ac_have_decl = 1], [$2], [$3])])])
+
+# AC_CHECK_PROTOS(SYMBOLS,
+# [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND],
+# [INCLUDES = DEFAULT-INCLUDES])
+# 

RE: Failure to combine SHIFT with ZERO_EXTEND

2010-02-09 Thread Rahul Kharche
Hi Jeff,

Many thanks for the pointers. I will make the changes and attach the
patch to the bugzilla soon.

Cheers,
Rahul

-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: 09 February 2010 00:45
To: Rahul Kharche
Cc: gcc@gcc.gnu.org; sdkteam-gnu
Subject: Re: Failure to combine SHIFT with ZERO_EXTEND

On 02/04/10 08:39, Rahul Kharche wrote:
> Hi All,
>
> On our private port of GCC 4.4.1 we fail to combine successive SHIFT
> operations like in the following case
>
> #include
> #include
>
> void f1 ()
> {
>unsigned short t1;
>unsigned short t2;
>
>t1 = rand();
>t2 = rand();
>
>t1<<= 1; t2<<= 1;
>t1<<= 1; t2<<= 1;
>t1<<= 1; t2<<= 1;
>t1<<= 1; t2<<= 1;
>t1<<= 1; t2<<= 1;
>t1<<= 1; t2<<= 1;
>
>printf("%d\n", (t1+t2));
> }
>
> This is a ZERO_EXTEND problem, because combining SHIFTs with whole
> integers works correctly, so do signed values. The problem seems to
> arise in the RTL combiner which combines the ZERO_EXTEND with the
> SHIFT to generate a SHIFT and an AND. Our architecture does not
> support AND with large constants and hence do not have a matching
> insn pattern (we prefer not doing this, because of large constants
> remain hanging at the end of all RTL optimisations and cause needless
> reloads).
>
> Fixing the combiner to convert masking AND operations to ZERO_EXTRACT
> fixes this issue without any obvious regressions. I'm adding the
> patch here against GCC 4.4.1 for any comments and/or suggestions.
>
Good catch.However, note we are a regression bugfix only phase of 
development right now in preparation for branching for GCC 4.5.  As a 
result the patch can't be checked in at this time; I would recommend you

update the patch to the current sources & attach it to bug #41998 which 
contains queued patches for after GCC 4.5 branches.


> Cheers,
> Rahul
>
>
> --- combine.c 2009-04-01 21:47:37.0 +0100
> +++ combine.c 2010-02-04 15:04:41.0 +
> @@ -446,6 +446,7 @@
>   static void record_truncated_values (rtx *, void *);
>   static bool reg_truncated_to_mode (enum machine_mode, const_rtx);
>   static rtx gen_lowpart_or_truncate (enum machine_mode, rtx);
> +static bool can_zero_extract_p (rtx, rtx, enum machine_mode);
>
>
>
>   /* It is not safe to use ordinary gen_lowpart in combine.
> @@ -6973,6 +6974,16 @@
>  make_compound_operation (XEXP (x, 0),
>   next_code),
>  i, NULL_RTX, 1, 1, 0, 1);
> +  else if (can_zero_extract_p (XEXP (x, 0), XEXP (x, 1), mode))
> +{
> +   unsigned HOST_WIDE_INT len =  HOST_BITS_PER_WIDE_INT
> + - CLZ_HWI (UINTVAL (XEXP (x,
> 1)));
> +   new_rtx = make_extraction (mode,
> + make_compound_operation (XEXP (x, 0),
> +  next_code),
> +  0, NULL_RTX, len, 1, 0,
> +  in_code == COMPARE);
>
There should be a comment prior to this code fragment describing the 
transformation being performed.   Something like:

/* Convert (and (shift X Y) MASK) into ... when ... */

That will make it clear in the future when your transformation applies 
rather than forcing someone scanning the code to read it in detail.


> +}
>
> break;
>
> @@ -7245,6 +7256,25 @@
>   return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
>   }
>
> +static bool
> +can_zero_extract_p (rtx x, rtx mask_rtx, enum machine_mode mode)
>
There should be a comment prior to this function which briefly describes

what the function does, the parameters & return value.  Use comments 
prior to other functions to guide you.
> @@ -8957,7 +8987,6 @@
>   op0 = UNKNOWN;
>
> *pop0 = op0;
> -
> /* ??? Slightly redundant with the above mask, but not entirely.
>Moving this above means we'd have to sign-extend the mode mask
>for the final test.  */
>
Useless diff fragment.  Remove this change as it's completely unrelated 
and useless.

You should also write a ChangeLog entry for your patch.  ChangeLogs 
describe what changed, not why something changed.  So a suitable entry 
might look something like:

  Your Name 

 * combine.c (make_compound_operation): Convert shifts and masks
 into zero_extract in certain cases.
 (can_zero_extract_p): New function.


If you could make those changes and attach the result to PR 41998 they 
should be able to go in once 4.5 branches from the mainline.

Jeff



Your Fund Release!

2010-02-09 Thread African Development Bank




Re: Exception handling information in the macintosh

2010-02-09 Thread jacob navia

Jack Howarth a écrit :

Jacob,
   Apple's gcc is based on their own branch and is not the
same as FSF gcc. The first FSF gcc that is validated on
on darwin10 was gcc 4.4. However I would suggest you first
start testing against current FSF gcc trunk. There are a
number of fixes for darwin10 that aren't present in the
FSF gcc 4.4.x releases yet. In particular, the compilers
now link with -no_compact_unwind by default on darwin10
to avoid using the new compact unwinder. Also, when you
build your JVM, I would suggest you stick to the FSF gcc
trunk compilers you build. In particular, the Apple libstdc++
and FSF libstdc++ aren't interchangable on intel. So you don't
want to mix c++ code built with the two different compilers.
Jack
  

OK. I downloaded gcc 4.4 and recompiled all the server again with it.
Now, throws within C++ work but not when they have to pass through
the JITted code.

The problem is that we need to link with quite a lot of libraries:
/usr/local/gcc-4.5/bin/g++ -o Debug/server -m64 -fPIC 
-fno-omit-frame-pointer -g -O0 -w Debug/service_helper.o Debug/service.o 
-L../Debug/64 -ldatabaselibrary -lcryptolibrary -lCPThreadLibrary 
-lshared -lwin32   -L../CryptoLibrary/lib/Darwin/64  -lcrypto -lssl 
-lsrp -lint128 -ldl -lpthread ../icu/icu3.4/lib/mac/libicui18n.dylib.34 
../icu/icu3.4/lib/mac/libicuuc.dylib.34 
../icu/icu3.4/lib/mac/libicudata.dylib.34 
../CompilerLibrary/mac/libcclib64.a ../Debug/64/libwin32.a


What can I do about libssl.a, libdl.a  libcrypto.A?

Those are system libraries and I do not have the source code.
Should I compile those too?

I downloaded gcc 4.5 and the situation is the same...

jacob




RTX costs

2010-02-09 Thread Paulo J. Matos

Hi all,

After reading the internal docs about rtx_costs I am left wondering what 
they exactly are estimating.
- Are they estimating in the beginning of expand how many insns will be 
generated from a particular insn until the assembler is generated?
- or Are they estimating how many assembler instructions will be 
generated for a particular insn?

- or something else?

If someone could clear this up, I would be happy. :)

Thanks,

Paulo Matos


Re: RTX costs

2010-02-09 Thread Joern Rennecke

Quoting "Paulo J. Matos" :


Hi all,

After reading the internal docs about rtx_costs I am left wondering
what they exactly are estimating.
- Are they estimating in the beginning of expand how many insns will be
generated from a particular insn until the assembler is generated?
- or Are they estimating how many assembler instructions will be
generated for a particular insn?
- or something else?


When optimizing for speed, they estimate execution cycles, when
optimizing for space, they estimate code size.  In each case,
normalized so that COSTS_N_INSNS (1) is equivalent to the cost of
a simple instruction.  Or at least that's the theoretical goal.
Individual ports might deviate from that for historical and/or
practical reasons.


Re: RTX costs

2010-02-09 Thread Richard Kenner
> After reading the internal docs about rtx_costs I am left wondering what 
> they exactly are estimating.
> - Are they estimating in the beginning of expand how many insns will be 
> generated from a particular insn until the assembler is generated?
> - or Are they estimating how many assembler instructions will be 
> generated for a particular insn?

The latter is what they're *supposed* to be measuring, but of course it's
an approximation.


Re: Modulo Scheduling

2010-02-09 Thread Alexander Monakov



On Tue, 2 Feb 2010, Cameron Lowell Palmer wrote:


Does Modulo Scheduling work on x86 platforms? I have tried adding in
various versions of the -fmodulo-sched option and get the exact same
output with or without. The application is a very simplistic matrix
multiply without dependencies.


No, at present SMS is not able to schedule any loops on x86 at all.  This 
is due to implementation detail: SMS operates on loops that end with 
decrement-and-branch instruction, and GCC does not generate such 
instructions on x86.


Sorry.
Alexander Monakov


Re: RTX costs

2010-02-09 Thread Paulo J. Matos

Joern Rennecke wrote:

Quoting "Paulo J. Matos" :


Hi all,

After reading the internal docs about rtx_costs I am left wondering
what they exactly are estimating.
- Are they estimating in the beginning of expand how many insns will be
generated from a particular insn until the assembler is generated?
- or Are they estimating how many assembler instructions will be
generated for a particular insn?
- or something else?


When optimizing for speed, they estimate execution cycles, when
optimizing for space, they estimate code size.  In each case,
normalized so that COSTS_N_INSNS (1) is equivalent to the cost of
a simple instruction.  Or at least that's the theoretical goal.
Individual ports might deviate from that for historical and/or
practical reasons.


Thanks for the reply. The name of the macro COSTS_N_INSNS was making me 
think I was estimating numbers of insns instead of number of assembler 
instructions even though when optimising for size the latter makes more 
sense.


Generating store after fdivd: how to avoid delay slot

2010-02-09 Thread k e
I try to patch gcc so that after a fdivd the destination register is
stored to the stack.

fdivd %f0,%f2,%f4; std %f4, [%sp]

I generate the rtl for divdf3 using a emit_insn,DONE sequence in a
define_expand pattern (see below).

In the assembler output phase I use a define_insn and write
out "fdivd\t%%1, %%2, %%0; std %%0, %%3" as the expression string.

My question:
  - How can I mark the pattern so that it will not be sheduled into a
delay slot? How can I specify that the output will be 2 instructions
and hint the scheduler about it?
  - Is the (set_attr "length" "2") attribute in define_insn divdf3_store
(below) already sufficient?

-- Greetings Konrad


;; handle divdf3 
(define_expand "divdf3"
  [(parallel [(set (match_operand:DF 0 "register_operand" "=e")
  (div:DF (match_operand:DF 1 "register_operand" "e")
(match_operand:DF 2 "register_operand" "e")))
  (clobber (match_scratch:SI 3 ""))])]
  "TARGET_FPU"
  "{
  output_divdf3_emit (operands[0], operands[1], operands[2], operands[3]);
  DONE;
}")

(define_insn "divdf3_store"
  [(set (match_operand:DF 0 "register_operand" "=e")
  (div:DF (match_operand:DF 1 "register_operand" "e")
(match_operand:DF 2 "register_operand" "e")))
  (clobber (match_operand:DF 3 "memory_operand" ""  ))]
  "TARGET_FPU && TARGET_STORE_AFTER_DIVSQRT"
   {
   return output_divdf3 (operands[0], operands[1], operands[2],
operands[3]);
   }
   [(set_attr "type" "fpdivd")
   (set_attr "fptype" "double")
   (set_attr "length" "2")])

(define_insn "divdf3_nostore"
  [(set (match_operand:DF 0 "register_operand" "=e")
(div:DF (match_operand:DF 1 "register_operand" "e")
(match_operand:DF 2 "register_operand" "e")))]
  "TARGET_FPU && (!TARGET_STORE_AFTER_DIVSQRT)"
  "fdivd\t%1, %2, %0"
  [(set_attr "type" "fpdivd")
   (set_attr "fptype" "double")])




/ handle fdivd /
char *
output_divdf3 (rtx op0, rtx op1, rtx dest, rtx scratch)
{
  static char string[128];
  sprintf(string,"fdivd\t%%1, %%2, %%0; std %%0, %%3 !!!");
  return string;
}

void
output_divdf3_emit (rtx dest, rtx op0, rtx op1, rtx scratch)
{
  rtx slot0, div, divsave;

  div = gen_rtx_SET (VOIDmode,
 dest,
 gen_rtx_DIV (DFmode,
  op0,
  op1));

  if (TARGET_STORE_AFTER_DIVSQRT) {
slot0 = assign_stack_local (DFmode, 8, 8);
divsave = gen_rtx_SET (VOIDmode, slot0, dest);
emit_insn(divsave);
emit_insn (gen_rtx_PARALLEL(VOIDmode,
gen_rtvec (2,
   div,
   gen_rtx_CLOBBER (SImode,
slot0;
  } else {
emit_insn(div);
  }
}


Re: [GRAPHITE] Re: Loop Transformations Question

2010-02-09 Thread Richard Guenther
On Tue, Feb 9, 2010 at 6:01 PM, Cristianno Martins
 wrote:
> Hi everyone,
>
> First of all, I already find [and fix] the problem that I had
> described in the last email.
> Now, I need a help with a pretty intriguing issue, described below.
>
> Well, such as I told in the last email, I'm working on a
> implementation of a heuristic for loop skewing transformation. To
> expose the issue, I will show how it happens with an example.
>
> First, the code used to compile is very simple, and can be seen below
> (the matrix dimensions were minimized for simplification).
> 
>  #define m 20
>  #define n 30
>  int a[m][n];
>  int main() {
>     int i, j;
>     for(i = 0; i < m; i++)
>        for(j = 0; j < n; j++)
>           a[i][j] = 1;
>
>     return a[1][1] + a[2][2];
>  }
> 
>
> After the skewing pass, if I call
>
> 
>  cloog_prog_clast pc = scop_to_clast (scop);
>  printf ("\nCLAST generated by CLooG: \n");
>  print_clast_stmt (stdout, pc.stmt);
> 
>
> I get the following result:
>
> 
>  CLAST generated by CLooG:
>  for (scat_1=0;scat_1<=48;scat_1++) {
>     for (scat_3=max(scat_1-29,0);scat_3<=min(scat_1,19);scat_3++) {
>        b[scat_3][scat_1-scat_3] = 1 ;
>     }
>  }
> 
>
> Going ahead with this, such as can be seen in
> simple_test.c.105t.graphite [attached], the code is correctly
> generated (in gimple), but the important bb is:
>
> 
>  :
>      # graphite_IV.5_17 = PHI <0(2), graphite_IV.5_5(9)>
>      # .MEM_37 = PHI <.MEM_13(D)(2), .MEM_38(9)>
>      D.3185_12 = graphite_IV.5_17 + 0x0ffe3;
>      D.3186_11 = MAX_EXPR ; # this is the line in question

It looks like D.3185 is unsigned.

Richard.

>      D.3187_23 = MIN_EXPR ;
>      D.3188_24 = D.3187_23 + 1;
>      D.3189_25 = D.3186_11 < D.3188_24;
>      if (D.3189_25 != 0)
>        goto ;
>      else
>        goto ;
> 
>
> Curiously, the line marked above doesn't work in assembly. The
> D.3186_11 is assigned to -29, although zero is greater than that, and
> the code inside the loop body never runs.
> Moreover, If I get the clast code generated by cloog after the skewing
> (above), and put it on the source file (compiling without skeking),
> the max expr appears in gimple as an if statement, and the code
> executes perfectly.
>
> Does someone have any idea how could I fix it??
>
> Thanks in advance,
>
> On Sun, Feb 7, 2010 at 7:31 PM, Cristianno Martins
>  wrote:
>>
>> Hello everyone,
>>
>> I've working on graphite lately, and I did an loop skewing
>> implementation, starting from the loop interchange code [in
>> gcc/graphite-interchange.c].
>> However, after the transformation, if I print the clast generated by
>> Cloog, what I get is a almost the same loop as the original one.
>> Moreover, if I write the pbb transformed domain and scattering into a
>> file, and run the cloog command with that file, the result is exactly
>> what I want from the beggining.
>> For better comprehension of the problem, some interesting data are showed 
>> below.
>>
>> But, first, for short, my question is: am I forgetting something
>> important that I must be doing (like a function call)?
>>
>>
>>> [...]
>>
>> Thanks in advance,
>>
>> --
>> Cristianno Martins
>
> --
> Cristianno Martins
>


Re: [GRAPHITE] Re: Loop Transformations Question

2010-02-09 Thread Cristianno Martins
Hi,

Thanks for the fast reply. Only one more thing: is there some way that
I could force it to be signed??

On Tue, Feb 9, 2010 at 3:17 PM, Richard Guenther
 wrote:
> On Tue, Feb 9, 2010 at 6:01 PM, Cristianno Martins
>  wrote:
>> Hi everyone,
>>
>> First of all, I already find [and fix] the problem that I had
>> described in the last email.
>> Now, I need a help with a pretty intriguing issue, described below.
>>
>> Well, such as I told in the last email, I'm working on a
>> implementation of a heuristic for loop skewing transformation. To
>> expose the issue, I will show how it happens with an example.
>>
>> First, the code used to compile is very simple, and can be seen below
>> (the matrix dimensions were minimized for simplification).
>> 
>>  #define m 20
>>  #define n 30
>>  int a[m][n];
>>  int main() {
>>     int i, j;
>>     for(i = 0; i < m; i++)
>>        for(j = 0; j < n; j++)
>>           a[i][j] = 1;
>>
>>     return a[1][1] + a[2][2];
>>  }
>> 
>>
>> After the skewing pass, if I call
>>
>> 
>>  cloog_prog_clast pc = scop_to_clast (scop);
>>  printf ("\nCLAST generated by CLooG: \n");
>>  print_clast_stmt (stdout, pc.stmt);
>> 
>>
>> I get the following result:
>>
>> 
>>  CLAST generated by CLooG:
>>  for (scat_1=0;scat_1<=48;scat_1++) {
>>     for (scat_3=max(scat_1-29,0);scat_3<=min(scat_1,19);scat_3++) {
>>        b[scat_3][scat_1-scat_3] = 1 ;
>>     }
>>  }
>> 
>>
>> Going ahead with this, such as can be seen in
>> simple_test.c.105t.graphite [attached], the code is correctly
>> generated (in gimple), but the important bb is:
>>
>> 
>>  :
>>      # graphite_IV.5_17 = PHI <0(2), graphite_IV.5_5(9)>
>>      # .MEM_37 = PHI <.MEM_13(D)(2), .MEM_38(9)>
>>      D.3185_12 = graphite_IV.5_17 + 0x0ffe3;
>>      D.3186_11 = MAX_EXPR ; # this is the line in question
>
> It looks like D.3185 is unsigned.
>
> Richard.
>
>>      D.3187_23 = MIN_EXPR ;
>>      D.3188_24 = D.3187_23 + 1;
>>      D.3189_25 = D.3186_11 < D.3188_24;
>>      if (D.3189_25 != 0)
>>        goto ;
>>      else
>>        goto ;
>> 
>>
>> Curiously, the line marked above doesn't work in assembly. The
>> D.3186_11 is assigned to -29, although zero is greater than that, and
>> the code inside the loop body never runs.
>> Moreover, If I get the clast code generated by cloog after the skewing
>> (above), and put it on the source file (compiling without skeking),
>> the max expr appears in gimple as an if statement, and the code
>> executes perfectly.
>>
>> Does someone have any idea how could I fix it??
>>
>> Thanks in advance,
>>
>> On Sun, Feb 7, 2010 at 7:31 PM, Cristianno Martins
>>  wrote:
>>>
>>> Hello everyone,
>>>
>>> I've working on graphite lately, and I did an loop skewing
>>> implementation, starting from the loop interchange code [in
>>> gcc/graphite-interchange.c].
>>> However, after the transformation, if I print the clast generated by
>>> Cloog, what I get is a almost the same loop as the original one.
>>> Moreover, if I write the pbb transformed domain and scattering into a
>>> file, and run the cloog command with that file, the result is exactly
>>> what I want from the beggining.
>>> For better comprehension of the problem, some interesting data are showed 
>>> below.
>>>
>>> But, first, for short, my question is: am I forgetting something
>>> important that I must be doing (like a function call)?
>>>
>>>
 [...]
>>>
>>> Thanks in advance,
>>>
>>> --
>>> Cristianno Martins
>>
>> --
>> Cristianno Martins
>>
>



-- 
Cristianno Martins
Mestrando em Computação
Universidade Estadual de Campinas
cristianno.mart...@students.ic.unicamp.br

cel: (19) 8825-5731 [oi]
  (19) 8240-3217 [tim]
skype: cristiannomartins
gTalk: cristiannomartins
msn: cristiannomart...@hotmail.com


Re: [GRAPHITE] Re: Loop Transformations Question

2010-02-09 Thread Sebastian Pop
On Tue, Feb 9, 2010 at 12:34, Cristianno Martins
 wrote:
> Hi,
>
> Thanks for the fast reply. Only one more thing: is there some way that
> I could force it to be signed??

I guess that you should wait the fixes from Tobias and Ramakrishna to
CLooG and Graphite to have the type of the IV exposed by CLooG, rather
than having the original IV type set to the generated IV.

Sebastian

>
> On Tue, Feb 9, 2010 at 3:17 PM, Richard Guenther
>  wrote:
>> On Tue, Feb 9, 2010 at 6:01 PM, Cristianno Martins
>>  wrote:
>>> Hi everyone,
>>>
>>> First of all, I already find [and fix] the problem that I had
>>> described in the last email.
>>> Now, I need a help with a pretty intriguing issue, described below.
>>>
>>> Well, such as I told in the last email, I'm working on a
>>> implementation of a heuristic for loop skewing transformation. To
>>> expose the issue, I will show how it happens with an example.
>>>
>>> First, the code used to compile is very simple, and can be seen below
>>> (the matrix dimensions were minimized for simplification).
>>> 
>>>  #define m 20
>>>  #define n 30
>>>  int a[m][n];
>>>  int main() {
>>>     int i, j;
>>>     for(i = 0; i < m; i++)
>>>        for(j = 0; j < n; j++)
>>>           a[i][j] = 1;
>>>
>>>     return a[1][1] + a[2][2];
>>>  }
>>> 
>>>
>>> After the skewing pass, if I call
>>>
>>> 
>>>  cloog_prog_clast pc = scop_to_clast (scop);
>>>  printf ("\nCLAST generated by CLooG: \n");
>>>  print_clast_stmt (stdout, pc.stmt);
>>> 
>>>
>>> I get the following result:
>>>
>>> 
>>>  CLAST generated by CLooG:
>>>  for (scat_1=0;scat_1<=48;scat_1++) {
>>>     for (scat_3=max(scat_1-29,0);scat_3<=min(scat_1,19);scat_3++) {
>>>        b[scat_3][scat_1-scat_3] = 1 ;
>>>     }
>>>  }
>>> 
>>>
>>> Going ahead with this, such as can be seen in
>>> simple_test.c.105t.graphite [attached], the code is correctly
>>> generated (in gimple), but the important bb is:
>>>
>>> 
>>>  :
>>>      # graphite_IV.5_17 = PHI <0(2), graphite_IV.5_5(9)>
>>>      # .MEM_37 = PHI <.MEM_13(D)(2), .MEM_38(9)>
>>>      D.3185_12 = graphite_IV.5_17 + 0x0ffe3;
>>>      D.3186_11 = MAX_EXPR ; # this is the line in question
>>
>> It looks like D.3185 is unsigned.
>>
>> Richard.
>>
>>>      D.3187_23 = MIN_EXPR ;
>>>      D.3188_24 = D.3187_23 + 1;
>>>      D.3189_25 = D.3186_11 < D.3188_24;
>>>      if (D.3189_25 != 0)
>>>        goto ;
>>>      else
>>>        goto ;
>>> 
>>>
>>> Curiously, the line marked above doesn't work in assembly. The
>>> D.3186_11 is assigned to -29, although zero is greater than that, and
>>> the code inside the loop body never runs.
>>> Moreover, If I get the clast code generated by cloog after the skewing
>>> (above), and put it on the source file (compiling without skeking),
>>> the max expr appears in gimple as an if statement, and the code
>>> executes perfectly.
>>>
>>> Does someone have any idea how could I fix it??
>>>
>>> Thanks in advance,
>>>
>>> On Sun, Feb 7, 2010 at 7:31 PM, Cristianno Martins
>>>  wrote:

 Hello everyone,

 I've working on graphite lately, and I did an loop skewing
 implementation, starting from the loop interchange code [in
 gcc/graphite-interchange.c].
 However, after the transformation, if I print the clast generated by
 Cloog, what I get is a almost the same loop as the original one.
 Moreover, if I write the pbb transformed domain and scattering into a
 file, and run the cloog command with that file, the result is exactly
 what I want from the beggining.
 For better comprehension of the problem, some interesting data are showed 
 below.

 But, first, for short, my question is: am I forgetting something
 important that I must be doing (like a function call)?


> [...]

 Thanks in advance,

 --
 Cristianno Martins
>>>
>>> --
>>> Cristianno Martins
>>>
>>
>
>
>
> --
> Cristianno Martins
> Mestrando em Computação
> Universidade Estadual de Campinas
> cristianno.mart...@students.ic.unicamp.br
>
> cel: (19) 8825-5731 [oi]
>      (19) 8240-3217 [tim]
> skype: cristiannomartins
> gTalk: cristiannomartins
> msn: cristiannomart...@hotmail.com
>


Zero extractions and zero extends

2010-02-09 Thread Jean Christophe Beyler
Dear all,

If I consider this code

typedef struct sTestUnsignedChar {
uint64_t a:1;
}STestUnsignedChar;

uint64_t getU (STestUnsignedChar a)
{
return a.a;
}


I get this in the DCE pass :
(insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0)
(zero_extract:DI (reg/v:DI 73 [ a ])
(const_int 1 [0x1])
(const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD
(reg/v:DI 73 [ a ])
(nil)))

(insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74)
(zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2}
(expr_list:REG_DEAD (reg:QI 75)
(nil)))


(on the x86 port, I get a and instead of the zero_extract)

However, on the combine pass both stay, whereas in the x86 port, the
zero_extend is removed. Where is this decided exactly ?
I've checked the costs of the instructions, I have the same thing as
the x86 port.

Thanks for all your help,
Jean Christophe Beyler


Re: [GRAPHITE] Re: Loop Transformations Question

2010-02-09 Thread Tobias Grosser
On 09.02.2010 19:39, Sebastian Pop wrote:
> On Tue, Feb 9, 2010 at 12:34, Cristianno Martins
>  wrote:
>> Hi,
>>
>> Thanks for the fast reply. Only one more thing: is there some way that
>> I could force it to be signed??
> 
> I guess that you should wait the fixes from Tobias and Ramakrishna to
> CLooG and Graphite to have the type of the IV exposed by CLooG, rather
> than having the original IV type set to the generated IV.
> 
> Sebastian

Yes it looks as this is one of the bugs Ramakrishna and me are working on.

In short the reason for the bugs seems to be that the loop induction
variable is converted to an unsigned int. Later cloog generates a
statement that is negative and assigned to the unsigned int. Instead of
this expression iv < -something being false the signed value is wrapped
to an unsigned one. Therefore the whole expression becomes true.

 I will have a look into this one.

Tobias


Re: porting GCC to a micro with a very limited addressing mode --- success with LEGITIMATE / LEGITIMIZE_ADDRESS, stuck with ICE !

2010-02-09 Thread Sergio Ruocco

Michael Hope wrote:
> Hi Sergio.  Any luck so far?

Micheal, thanks for your inquiry. I made some progress, in fact.

I got the GO_IF_LEGITIMATE_ADDRESS() macro to detect correctly REG+IMM
addresses, and then the LEGITIMIZE_ADDRESS() macro to force them to be
pre-computed in a register.

However, now the compiler freaks out with an ICE.. :-/ I put some
details below. Thanks for any clue that you or others can give me.

Cheers,

Sergio

==


This is a fragment of my LEGITIMIZE_ADDRESS():
-

rtx
legitimize_address(rtx X,rtx OLDX, enum machine_mode MODE)
{
rtx op1,op2,op,sum;
op=NULL;
...
if(GET_CODE(X)==PLUS && !no_new_pseudos)
{
op1=XEXP(X,0);
op2=XEXP(X,1);
if(GET_CODE(op1) == CONST_INT && (GET_CODE(op2) == REG ||
GET_CODE(op2) == SUBREG)) // base displacement
{
sum = gen_rtx_PLUS (MODE, op1, op2);
op = force_reg(MODE, sum);
}
...
-


Now when compiling a simple program such as:

void foobar(int par1, int par2, int parN)
{
int a,b;
a = 0x1234;
b = a;
}

the instructions (n. 8,12,13) which compute the addresses in registers
seem to be generated correctly:

-
;; Function foobar

;; Register dispositions:
37 in 4  38 in 2  39 in 4  40 in 2  41 in 2

;; Hard regs used:  2 4 30

(note 2 0 3 NOTE_INSN_DELETED)

(note 3 2 6 0 NOTE_INSN_FUNCTION_BEG)

;; Start of basic block 1, registers live: 1 [A1] 29 [B13] 30 [B14]
(note 6 3 8 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 8 6 9 1 (set (reg/f:HI 4 A4 [37])
(plus:HI (reg/f:HI 30 B14)
(const_int -16 [0xfff0]))) 9 {addhi3} (nil)
(nil))

(insn 9 8 10 1 (set (reg:HI 2 A2 [38])
(const_int 4660 [0x1234])) 5 {*constant_load} (nil)
(nil))

(insn 10 9 12 1 (set (mem/i:HI (reg/f:HI 4 A4 [37]) [0 a+0 S2 A32])
(reg:HI 2 A2 [38])) 7 {*store_word} (nil)
(nil))

(insn 12 10 13 1 (set (reg/f:HI 4 A4 [39])
(plus:HI (reg/f:HI 30 B14)
(const_int -14 [0xfff2]))) 9 {addhi3} (nil)
(nil))

(insn 13 12 14 1 (set (reg/f:HI 2 A2 [40])
(plus:HI (reg/f:HI 30 B14)
(const_int -16 [0xfff0]))) 9 {addhi3} (nil)
(nil))

(insn 14 13 15 1 (set (reg:HI 2 A2 [orig:41 a ] [41])
(mem/i:HI (reg/f:HI 2 A2 [40]) [0 a+0 S2 A32])) 4 {*load_word} (nil)
(nil))

(insn 15 14 16 1 (set (mem/i:HI (reg/f:HI 4 A4 [39]) [0 b+0 S2 A16])
(reg:HI 2 A2 [orig:41 a ] [41])) 7 {*store_word} (nil)
(nil))
;; End of basic block 1, registers live:
 1 [A1] 29 [B13] 30 [B14]

(note 16 15 25 NOTE_INSN_FUNCTION_END)

(note 25 16 0 NOTE_INSN_DELETED)
-

However, when I compile it

$ hcc -da foobar8.c

I get an ICE at the end of the compilation, and the assembly source is
not produced:

[ lots of my debugging output removed ]

legitimate_address2(non-strict, soft-reg allowed), X=
(reg/f:HI 29 B13)
legitimate_address2() yes: (X)==REG && non_strict_base_reg(REGNO(X))

-MOVHI--- [generating a MOV X, Y insn]
MOVHI: operands[0]
(mem:HI (reg/f:HI 29 B13) [0 S2 A8])
MOVHI: operands[1]
(reg:HI 31 B15)
MOVHI --- END


[then checking if -2(B13) is legitimate, it is not...]

legitimate_address2(non-strict, soft-reg allowed), X=
(plus:HI (reg/f:HI 29 B13)
(const_int -2 [0xfffe]))
legitimate_address2(): FOUND register+offset --> FAIL!

legitimate_address2(non-strict, soft-reg allowed), X=
(plus:HI (reg/f:HI 29 B13)
(const_int -2 [0xfffe]))
legitimate_address2(): FOUND register+offset --> FAIL!

legitimate_address2(non-strict, soft-reg allowed), X=
(plus:HI (reg/f:HI 29 B13)
(const_int -2 [0xfffe]))
legitimate_address2(): FOUND register+offset --> FAIL!

legitimate_address2(non-strict, soft-reg allowed), X=
(plus:HI (reg/f:HI 29 B13)
(const_int -2 [0xfffe]))
legitimate_address2(): FOUND register+offset --> FAIL!


[and after four check of the add above, gcc 4.0.2 freaks out with ]

foobar8.c: In function ‘foobar’:
foobar8.c:7: internal compiler error: in change_address_1, at
emit-rtl.c:1800
Please submit a full bug report,

with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.


The failed assertion is in line 1800: some "addr" is not an address.

   1784 change_address_1 (rtx memref, enum machine_mode mode, rtx addr,
int validate)
   1785 {
   1786   rtx new;
   1787
   1788   gcc_assert (MEM_P (memref));
   1789   if (mode == VOIDmode)
   1790 mode = GET_MODE (memref);
   1791   if (addr == 0)
   1792 addr = XEXP (memref, 0);
   1793   if (mode == GET

Re: Questions about compute_transpout in gcse.c code hoisting implementation

2010-02-09 Thread Jeff Law

On 01/26/10 16:47, Steven Bosscher wrote:

Hello Jeff and other interested readers :-)

There is a function compute_transpout() in gcse.c and there are a
couple of things about this functions that I don't understand exactly.


First, there is this comment before the function says:

"An expression is transparent to an edge unless it is killed by
the edge itself. This can only happen with abnormal control flow,
when the edge is traversed through a call. This happens with
non-local labels and exceptions. "

What does this mean, exactly? The implementation of the function
simply kills all expressions that are MEM_P if a basic block ends with
a CALL_INSN (with a wrong comment about something flow did in the gcc
dark ages, say 15 years ago). But I don't see how compute_transpout
handles non-local labels. And what about non-call exceptions?
   
I believe rth added this code along time ago to avoid ever needing to 
insert an insn on an abnormal edge during gcse.  IIRC the hoisting code 
utilizes the same compute_transpout and thus inherits this behaviour.





Second, it looks like gcc says that an expression can be in VBEOUT,
but can not be hoisted. In hoist_code this is expressed like so:

  if (TEST_BIT (hoist_vbeout[bb->index], i)
&&  TEST_BIT (transpout[bb->index], i))
{
  /* We've found a potentially hoistable expression, now
 we look at every block BB dominates to see if it
 computes the expression.  */

Why does the code hoisting pass not do the same as the LCM-PRE pass:
Eliminate expressions it cannot handle early on? In this case,
wouldn't it be easier (better?) to eliminate expressions that are not
TRANSPOUT from VBEOUT in compute_vbeinout? Would it be OK if I teach
compute_vbeinout to eliminate expressions that may trap from VEBOUT,
if there are exception edges to successor blocks? This is similar to
what LCM-PRE does in compute_pre_data (well, more or less, sort-of,
etc.).
   
Hoisting was a quick transcription of Muchnick's algorithm and hasn't 
been touched too much since.  I don't see that there's anything to lose 
by pruning the expression sets early.


jeff




gcc-4.4-20100209 is now available

2010-02-09 Thread gccadmin
Snapshot gcc-4.4-20100209 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20100209/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 156635

You'll find:

gcc-4.4-20100209.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.4-20100209.tar.bz2 C front end and core compiler

gcc-ada-4.4-20100209.tar.bz2  Ada front end and runtime

gcc-fortran-4.4-20100209.tar.bz2  Fortran front end and runtime

gcc-g++-4.4-20100209.tar.bz2  C++ front end and runtime

gcc-java-4.4-20100209.tar.bz2 Java front end and runtime

gcc-objc-4.4-20100209.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.4-20100209.tar.bz2The GCC testsuite

Diffs from 4.4-20100202 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: insn length attribute and code size optimization

2010-02-09 Thread Richard Sandiford
Daniel Jacobowitz  writes:
> On Wed, Feb 03, 2010 at 06:23:19AM -0800, Ian Lance Taylor wrote:
>> fanqifei  writes:
>> 
>> > According to the internal manual, insn length attribute can be used to
>> > to calculate the length of emitted code chunks when verifying branch
>> > distances.
>> > Can it be used in code size optimization?
>> 
>> I suppose it could, but it isn't.  Instead of asking the backend for
>> the length of instructions, the compiler asks the backend for the cost
>> of instructions.  The backend is free to determine that cost however
>> it likes.  When using -Os, using the size of the instruction is a good
>> measure of cost.
>
> It seems to me that there's a hard ordering problem here: we can't
> determine insn lengths, using the current framework, until very late.
> We need at least (A) whole instructions, not just RTL expressions; (B)
> register allocation to select alternatives; (C) branch shortening to
> determine branch alternatives.
>
> I'm curious if anyone thinks there's a generic solution to this (that
> doesn't involve a complete instruction selection rewrite :-).

Yeah, it's something I've often wanted too, since at the moment you end
up duplicating a lot of the instruction selection in C code.  E.g. the
MIPS port has stuff like:

  if (float_mode_p
  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
  && TARGET_FUSED_MADD
  && !HONOR_NANS (mode)
  && !HONOR_SIGNED_ZEROS (mode))
{
  /* See if we can use NMADD or NMSUB.  See mips.md for the
 associated patterns.  */
  rtx op0 = XEXP (x, 0);
  rtx op1 = XEXP (x, 1);
  if (GET_CODE (op0) == MULT && GET_CODE (XEXP (op0, 0)) == NEG)
{
  *total = (mips_fp_mult_cost (mode)
+ rtx_cost (XEXP (XEXP (op0, 0), 0), SET, speed)
+ rtx_cost (XEXP (op0, 1), SET, speed)
+ rtx_cost (op1, SET, speed));
  return true;
}
  if (GET_CODE (op1) == MULT)
{
  *total = (mips_fp_mult_cost (mode)
+ rtx_cost (op0, SET, speed)
+ rtx_cost (XEXP (op1, 0), SET, speed)
+ rtx_cost (XEXP (op1, 1), SET, speed));
  return true;
}
}

Ugh!

But I could never see a cure that was better than the disease without
(as you say) a rewrite.  I think (A) is the main problem: we already
have code to estimate constraints selection before reload, so we could
at least guess at (B) and (C).  (Which is what the costs have to do
anyway really.)

Richard


Linkage order in Linux

2010-02-09 Thread michael kapelko

Hello.
Recently I found out a surprising requirement to compile own application 
with Horde3D library (http://horde3d.org/), OpenGL 3D graphics engine.
Horde3D library links to shared GL library. But -lHorde3D must be listed 
*before* -lGL for any application to work correctly. If I link the 
application first to GL, and only then to Horde3D, then it merely 
segfaults when Horde3D's init calls glCreateShader, a GL library function.
We have several speculations about what causes this particular order for 
the linker: http://horde3d.org/forums/viewtopic.php?f=2&t=384
But I'd like to know real reason of this surprising order of linkage 
requirement.

Thanks.



Re: Linkage order in Linux

2010-02-09 Thread Ian Lance Taylor
michael kapelko  writes:

> Recently I found out a surprising requirement to compile own
> application with Horde3D library (http://horde3d.org/), OpenGL 3D
> graphics engine.
> Horde3D library links to shared GL library. But -lHorde3D must be
> listed *before* -lGL for any application to work correctly. If I link
> the application first to GL, and only then to Horde3D, then it merely
> segfaults when Horde3D's init calls glCreateShader, a GL library
> function.
> We have several speculations about what causes this particular order
> for the linker: http://horde3d.org/forums/viewtopic.php?f=2&t=384
> But I'd like to know real reason of this surprising order of linkage
> requirement.

The mailing list gcc@gcc.gnu.org is for gcc developers.  Questions
about using gcc should be taken to gcc-h...@gcc.gnu.org.  Please take
any followups to gcc-help.  Thanks.

Unix linkers are always order dependent, by design.  If -lHorde3D
requires symbols from -lGL, but links successfully even if -lHorde3D
is used after -lGL, then perhaps the references in -lHorde3D are weak
for some reason.  Or there are a few other possibilities; I'm not
familiar with the libraries in question.

One thing is for sure: this has nothing to do with gcc.  gcc does not
include a linker.  If you are using GNU/Linux, then your linker is
almost certainly coming from the GNU binutils, which is a different
project; see http://sourceware.org/binutils/ .

Ian