Darthrader (sent by Nabble.com) wrote:
Is the AST[Abstract Syntax Tree] machine dependent?

The nodes that are generated for the program as it is parsed are machine independent. However:

1) this can only be true if your source code is already preprocessed. Otherwise, the source code that GCC really sees it is different. If you use the "--save-temps" option, you will find a .i file which is what GCC has built the parse tree from.

2) Some tree nodes are implicit in the compiler, and are always created even if the program does not mention them. You can imagine that every program had on top of it something like:

   typedef void __attribute__ ((mode (QI))) char;
   typedef void __attribute__ ((mode (HI))) short;
   typedef void __attribute__ ((mode (SI))) int;
   typedef void __attribute__ ((mode (__word__))) long;
   typedef void __attribute__ ((mode (DI))) long long;

(which is not really valid C code, of course!) and so on.  GCC's
syntax tree nodes are heavily annotated with information on the nodes, and the information for these implicit nodes can be machine dependent: for example, long may have 4 bytes on some machines and 8 bytes on others. In your case, "algn" was the alignment required by the type.

3) Some of the information percolates from the types to other tree nodes, such as the variables. So your "original" dump could contain some differences in the tree nodes for variables: these come from the machine dependent nodes I just described.

4) Finally, some minor optimizations in fold-const.c that are performed before the ".original" dump is emitted, so you could see differences because of that.

You probably know that a compiler is roughly divided in front-ends (reads source code), middle-end (works on an intermediate representation), and back-end (uses the intermediate representation to emit assembly).

Very roughly speaking, GCC's middle-end starts in fold-const.c. GCC's middle-end uses more or less four intermediate representation: GENERIC (a standardized form of syntax trees), GIMPLE (an even more standardized and simplified form of syntax trees), RTL, and "strict" RTL (where each instruction and each operand has an almost 1-1 mapping with the assembly language instructions and operands). In practice it is a bit more complicated because GENERIC is transformed to GIMPLE "slowly" and gradually.

GCC's syntax trees are pretty well machine independent, enough independent that they work well as a machine independent intermediate representation for many optimization passes at the beginning in the compiler pipeline. As GCC switches from tree nodes to RTL and to strict RTL, more and more machine dependent pieces creep into its intermediate representation. In the end, things are so machine dependent that most non-trivial changes to the RTL middle-end had better be tested on more than one machine: they can work like a charm on i686 and break, say, powerpc or arm or s390 or all of these.

Paolo

Reply via email to