Alias-Improvements Branch
=========================

The primary objective of the branch is to transition away from the use
of the virtual operand use-def chains as data-flow representation for
memory optimizations.  Memory optimizers need to transition to the
use of the alias-oracle which is a query based system.  For the
forseeable future they can still rely on the safety-net the virtual
operand use-def chain provides.

I propose to merge the branch at the beginning of stage1 for GCC 4.5.
Bugfixes put on the branch can be merged independently.  Likewise it
should be possible to merge the points-to changes and some of the
call-clobber changes before the rest.
Other changes though are difficult or impossible to separate without
intermediate regressions.

Overview of infrastructure changes done on the branch:

 - There is a single memory symbol (.MEM) per function used to build the
   virtual operand use-def chain.  In trunk terms you can view this as
   equivalent to having a single memory partition containing all symbols.
   Due to the lack of precision computing and keeping this virtual
   operand use-def chain is trivial, all code supporting more complex
   operations has been removed.  In particular it can be assumed that
   at most a single VUSE and VDEF is available per statement.  Most code
   has been simplified according to that assumption.

 - This single-symbol virtual operand use-def chain safety-net is also
   available during early optimizations (in fact, always if the IL is
   in SSA form).  This removes the need for special-casing memory
   statements during that compilation phase and enables to use common
   infrastructure during early (and IPA) optimization phases.  It also
   allows to schedule regular memory optimization passes during early
   optimization.

 - The virtual operand use-def chain is kept up-to-date by the
   operand-scanner on the branch.  There is no need to manually mark
   any symbols for renaming (instead this is now useless and costly,
   code auditing is still necessary here).  As there cannot be magically
   more, less or different memory symbols required (there's only one)
   optimization passes usually know if they are removing a load or a
   store and so can act properly (or rely on the operand scanner which
   updates virtual SSA form manually in the simple cases).

 - The at most two virtual operands (a VUSE and a VDEF) are now real
   operands (there is a vuse and a vdef member in the gimple structure
   for memory statements).  This means they can share the infrastructure
   present for regular operands and immediate uses.  All infrastructure
   dealing with the old virtual operands has been removed.  The operand
   structures are linked in the normal use and defs lists, the special
   lists for virtual uses and defs has been removed.

   This means that operand iterators have been simplified a lot.  The
   possibility to iterate over only virtual operands has been removed
   (there is only at most a single one, directly accessible via
   gimple_v{use,def}_op).  Most bulk changes in optimizers had to be
   done because of this simplification.
 
 - Global data that was not kept in sync and is either unused on the
   branch or very hard to keep up-to-date has been removed.  This includes
   the bitmap of addressable variables and the escape type per variable
   and pointer.

 - Points-to information has been abstracted and is directly used for
   pointer information as well as call-clobber related information.  All
   existing call-clobber related code has been removed.

 - Numerous fixes to the points-to analysis code have been done.

 - The alias-oracle present on the trunk has been extended to also
   use points-to alias information.

 - The FRE and PRE passes using the common SCCVN infrastructure have been
   improved by improving the memory statement walking infrastructure of
   the alias-oracle and its precision.

 - The tree DSE pass has been improved to compensate the loss of DSE
   done by the tree DCE pass on the trunk (but see below).

The alias-oracle and its interface is contained in the tree-ssa-alias.[ch]
files.  Common with the trunk is the refs_may_alias_p type of query
which has been accompanied by statement based clobber / use queries.

Preliminary performance measurements show comparable performance for
SPEC 2000, Polyhedron and our usual set of C++ benchmarks.
SPEC 2006 measurements are on the way, all on x86_64.
Daily testing can be monitored beyond http://gcc.opensuse.org/,
look for "alias-improvements" branch annotation and compare with the
other runs on the frescobaldi machine.  Performance comparisons on other
architectures are welcome.

In general the trade-off with the new scheme is that walking statements
via the virtual use-def links will be slower because possibly (a lot)
more statements are visited.  This is offsetted by leaner operand
iterators, cheaper operand scanning and updating and less memory use
for maintaining the virtual operands.  And most important, a lot less
code to maintain.


Current shortcomings ("regressions") present on the branch, which I do
not plan to work on before the merge.

 - Uninitialized warnings for memory which depend on the virtual operands
   will no longer work (but in very small testcases).
    FAIL: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12)
    FAIL: gcc.dg/uninit-pr19430.c  (test for warnings, line 32)
    FAIL: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41)

 - A trick on the 4.4 branch shadowed the issue with the C/C++ frontend
   using alias set zero for char and how that relates to char members
   in structures (their alias sets have a zero member).
    FAIL: gcc.dg/tree-ssa/pr27799.c (test for excess errors)

 - DCE no longer removes as many dead stores as on trunk.
    FAIL: gcc.dg/tree-ssa/loop-36.c scan-tree-dump-not dce2 "c.array"
    FAIL: gcc.dg/tree-ssa/ldist-11.c scan-tree-dump-times ldist "distributed: 
split to 2 loops" 1

There are also some XPASSes.  Likewise gazillions of copile-time and
memory-hog related bugs should be fixed (and an unknown amount of new
ones introduced).  A bunch of optimization related PRs are fixed as well.


There are a number of tasks left to do, either on the branch or (prefered)
on trunk after merging the branch.

 - The pass-pipeline needs re-tuning.  We re-compute points-to information
   too often and at non-optimal places (is no longer necessary for correctness
   to "re-compute" alias information).

 - Special "early" passes can get removed, merged or be made use of
   generally in the optimization pipeline.

 - The DCE / DSE situation needs to be resolved.
   On the trunk DCE does remove far more dead stores than DSE does.  On
   the branch dead store removal from DCE is severely pessimized (because
   it still relies on the virtual operands).  DSE is improved on the branch,
   but it runs too late and not often enough and is also very simple.
   The best solution is to remove the DSE pass and do all (and proper)
   dead store removal from within DCE, but this is quite an effort and
   as can be seen from benchmark numbers not a blocker for merging the
   branch.


Comments / Performance measurements / code-reviews welcome.

Thanks,
Richard.

Reply via email to