Darthrader (sent by Nabble.com) wrote:
Is the AST[Abstract Syntax Tree] machine dependent?
The nodes that are generated for the program as it is parsed are machine
independent. However:
1) this can only be true if your source code is already preprocessed.
Otherwise, the source code that GCC really sees it is different. If you
use the "--save-temps" option, you will find a .i file which is what GCC
has built the parse tree from.
2) Some tree nodes are implicit in the compiler, and are always created
even if the program does not mention them. You can imagine that every
program had on top of it something like:
typedef void __attribute__ ((mode (QI))) char;
typedef void __attribute__ ((mode (HI))) short;
typedef void __attribute__ ((mode (SI))) int;
typedef void __attribute__ ((mode (__word__))) long;
typedef void __attribute__ ((mode (DI))) long long;
(which is not really valid C code, of course!) and so on. GCC's
syntax tree nodes are heavily annotated with information on the nodes,
and the information for these implicit nodes can be machine dependent:
for example, long may have 4 bytes on some machines and 8 bytes on
others. In your case, "algn" was the alignment required by the type.
3) Some of the information percolates from the types to other tree
nodes, such as the variables. So your "original" dump could contain
some differences in the tree nodes for variables: these come from the
machine dependent nodes I just described.
4) Finally, some minor optimizations in fold-const.c that are performed
before the ".original" dump is emitted, so you could see differences
because of that.
You probably know that a compiler is roughly divided in front-ends
(reads source code), middle-end (works on an intermediate
representation), and back-end (uses the intermediate representation to
emit assembly).
Very roughly speaking, GCC's middle-end starts in fold-const.c. GCC's
middle-end uses more or less four intermediate representation: GENERIC
(a standardized form of syntax trees), GIMPLE (an even more standardized
and simplified form of syntax trees), RTL, and "strict" RTL (where each
instruction and each operand has an almost 1-1 mapping with the assembly
language instructions and operands). In practice it is a bit more
complicated because GENERIC is transformed to GIMPLE "slowly" and gradually.
GCC's syntax trees are pretty well machine independent, enough
independent that they work well as a machine independent intermediate
representation for many optimization passes at the beginning in the
compiler pipeline. As GCC switches from tree nodes to RTL and to strict
RTL, more and more machine dependent pieces creep into its intermediate
representation. In the end, things are so machine dependent that most
non-trivial changes to the RTL middle-end had better be tested on more
than one machine: they can work like a charm on i686 and break, say,
powerpc or arm or s390 or all of these.
Paolo