I sent this discussion of how one might go about large-scale target macro
removal in response to an off-list enquiry last month, but it may be of
more general interest.
--
Joseph S. Myers
jos...@codesourcery.com
---------- Forwarded message ----------
Date: Tue, 20 Jan 2015 18:04:27 +0000 (UTC)
From: Joseph Myers <jos...@codesourcery.com>
Subject: Re: target macro removal
Say you want to convert all (or nearly all) 680 (or thereabouts) target
macros into hooks, and have several person-months to spend on this
conversion. (Much the same applies even if dealing with smaller subsets
such as all target macros used in front ends.) This won't get
target-independence in code that no longer needs to include tm.h - in
particular, option handling involves a global enumeration of all options
and brings defines relating to one part of the compiler into other parts
of the compiler (similarly, insn-* files would also need considering) -
but it's a reasonable starting point and we can discuss further
target-dependence removal after the target macro removal.
Although this would involve 680 conversion patches (except where it makes
sense to convert a set of closely related target macros at once), it
should not need to involve 680 manually-written patches. Rather, if doing
a large-scale target macro removal project I think a good starting point
would be to write a set of robust Python scripts that (a) parse the
structure of GCC source code at the preprocessor level (so understanding,
for example, what macros are used directly in #if / #elif conditions; what
are used indirectly through being in the expansion of another macro used
in such a condition; and, for each macro definition, what #if conditions
apply for that definition to be active), and (b) can carry out
refactorings based on that understanding. The results of such a
refactoring may need manual editing where e.g. it's hard to get the
scripts to get the formatting of new hooks completely right, or where the
English wording of the documentation of the macro, converted to
documentation for a hook, needs fine-tuning, but having refactoring
scripts should save a lot of work with the actual conversions.
Now, such refactoring scripts do not need to handle the fully general case
so that one script can handle converting all 680 macros. It's quite
reasonable to have a script that detects problems and gives up, and
separate refactorings to make things ready for that script. And some of
the preparation patches might well be completely manual.
I listed the main problem cases in
<https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02046.html>. Code built
for the target simply cannot use hooks, which are for the host; tests in
#if, whether direct or indirect, again simply cannot use hooks. The
following approaches apply to fixing such cases to prepare for hook
conversions; I think of these as high priorities because they open the way
to automatic refactorings converting more hooks.
In all cases, for multi-target changes it's a good idea to test, as a
minimum, building cc1 for all affected architectures (with
--enable-werror=always building starting from a native compiler of the
same version, if possible; see contrib/config-list.mk for a more thorough
check, though I don't think the full set of targets there is needed for
every patch), as well as a clean bootstrap and regression test for at
least one configuration.
(a) Where a target macro is used in code built for the host and in code
built for the target, predefine a macro with -fbuilding-libgcc and then
make the code built for the target use that new predefine; in general I
think such patches would be manually written rather than generated by
refactoring scripts. Note that:
(i) Sometimes an existing macro may suffice (e.g. LONG_LONG_TYPE_SIZE in
target code could be changed to __SIZEOF_LONG_LONG__ * __CHAR_BIT__).
(ii) Sometimes a literal one-to-one conversion may not be cleanest (e.g.
<https://gcc.gnu.org/wiki/Top-Level_Libgcc_Migration> lists
GTHREAD_USE_WEAK SUPPORTS_WEAK TARGET_ATTRIBUTE_WEAK - do all really need
separate defines?). But if it's not clear how to do a cleaner conversion,
it may be best to do a literal conversion and leave further cleanup to
later.
(iii) Some source files are used both for the host and for the target.
Consider, for example, libgcc/libgcov.h. The code inside "#ifndef
IN_GCOV_TOOL" is for the target, so it needs e.g. LONG_LONG_TYPE_SIZE
converted as above (and BITS_PER_UNIT - whether BITS_PER_UNIT strictly
counts as a target macro now is unclear, but it comes from host-side code
so is best treated as one for target-side code). But other code is for
the host - or for both host and target and so needs handling accordingly.
Much the same applies to various files under gcc/ada/ - they are built for
both host and target.
(iv) After such a change there may still be other obstacles to converting
to a hook (e.g. the macro may be used in #if on the host). So there may
be multiple incremental changes for a macro before it becomes possible to
convert it to a hook.
(b) Where a target macro is used only in code built for the target, it can
be moved to libgcc_tm.h (the headers in libgcc/config/). It's not
problematic to have target macros defined there. I think it should be
possible to generate such patches by refactoring scripts (or by hand -
there aren't that many). The scripts would need to ensure that the macros
are defined in libgcc_tm.h under the same conditions as in the host tm.h -
this includes making sure the header lists in libgcc/config.host
correspond appropriately to those in gcc/config.gcc (some manual checking
might be involved there) and that any conditionals in the gcc/config/
headers controlling when a macro is defined are also appropriately
reflected in the libgcc/config/ header. Note that:
(i) In some cases, a macro in this category might be so closely related to
one that's also used on the host that it's better to handle both the same
(i.e. define macros with -fbuilding-libgcc instead of moving the macro to
libgcc_tm.h).
(ii) Some libgcc files include tm.h but not libgcc_tm.h (unwind-seh.h and
config/cr16/unwind-cr16.c at least; maybe more). If you change a macro
used in such a file, you need to add the missing libgcc_tm.h includes.
(iii) Some code built for the target outside the libgcc/ directory may not
include libgcc_tm.h - this could include e.g. libobjc and some Ada files.
If such code uses the macro in question and can't readily be made to
include libgcc_tm.h then it may be necessary to use the -fbuilding-libgcc
approach instead for that code.
(iv) Sometimes a macro used for the target may have a definition that
depends on other macros that are also used for the host (whether a
dependency in #if conditionals, or an expansion using those other macros).
(v) When a target macro stops being used on the host it should be poisoned
in system.h - this applies whether it was converted to a hook, moved to
libgcc_tm.h, or eliminated from host code in any other way.
(c) Where a target macro is used in #if or #ifdef or #elif, it's a good
idea to convert to a more restricted pattern of defining the macro to a
default definition if not already defined, with that default going in
defaults.h and all #if uses elsewhere being removed - such a restricted
pattern is more amenable to automatic refactoring into a hook. Of course,
you need to take care not to change the semantics in the process. If a
macro was tested with #ifdef before, definitions to empty or 0 or 1 would
all have had the same effect. If you're changing the macro to be 0/1
valued then existing definitions need to be made to define it to 1 and the
#if tests need to change to C "if (MACRO)" tests. Or, if you have e.g.
#ifdef MACRO
if (MACRO (args))
{
lots of code
}
#endif
then the new default definition might be "#define MACRO(args) 0". There
are probably lots of other cases, each of which requires understanding the
code enough to satisfy yourself, and explain in the patch submission
write-up, why the semantics are not changed for any target.
In some cases, the defaults already exist - just not in defaults.h. A
move to defaults.h is simple, but still needs checking that you don't e.g.
have different defaults in different source files, or another source files
using #ifdef/#ifndef on the macro in a way that would be affected by
adding a default definition to defaults.h.
It's likely such patches are largely manually written. Each such patch
reduces the risk of GCC changes breaking the build for targets they
weren't tested for, by reducing the amount of code that's conditionally
compiled (if (0) code still gets checked for syntax, not referring to
undeclared variables, etc., whereas #if 0 code doesn't).
(d) Some target macros are used in contexts such as enum definitions, case
labels and array or bit-field sizes that can't readily be changed to
hooks. Let's ignore these for now. These mainly relate to the RTL parts
of the compiler and we can take it that front ends and GIMPLE optimizers
are higher priority to wean off target macros. A design for
target-independence for these few macros will be harder. It's best also
to ignore BITS_PER_UNIT for now except for target-side code. (It's no
longer defined in tm.h anyway - rather, the definition is output by
genmodes - so uses of BITS_PER_UNIT don't require you to include tm.h.)
(e) Now let's suppose you have a target macro or macros to which the above
issues do not apply - probably hundreds right now, and the vast bulk of
target macros after cleanups (a), (b), (c) are applied to all macros for
which they are applicable. You wish to do a target hook conversion. This
includes moving the documentation of the macro to target.def (CC me on the
patch and say you want docstring relicensing approval), appropriately
edited, with an @hook like going in tm.texi.in. It includes setting a
default hook definition (typically from one of the files such as hooks.c,
if such a hook is available), and adding hook definitions for each target
whose definition was not the default. You'll need to select the prototype
for the hook manually - but then the replacement of macro calls by
function calls will eliminate a potential source of architecture-specific
build failures from type differences.
(i) Some targets define their target structure at the bottom of <arch>.c.
Others define it near the top of <arch>.c, which requires forward
declarations of all the functions used as hooks. Any refactoring scripts
will need to allow for this variation. (My view would be that all targets
should define it at the bottom of <arch>.c, and generally topologically
sort static functions to reduce the need for forward declarations - if you
get consensus for that on the mailing list you could do a preliminary
refactoring pass to move all targets to that approach, so subsequent
refactorings don't need to deal with this issue.)
(ii) Sometimes the target macro has the same definition for all OS targets
for an architecture. These are the simple cases to convert. Sometimes it
depends on the target OS or other aspects of configuration (e.g. being
defined in <arch>/<os>.h - or being undefined there, or being defined in
one header based on macros defined in another). Refactoring tools will
need to take account of this. Typically hooks are functions not data so
can have conditionals, e.g. "if (IS_LINUX_TARGET) return 2; else return
3;". That is, you can move from a tm.h target macro visible to the whole
compiler to an architecture-specific macro visible only within the back
end. While it would be desirable to eliminate such macros as well (with
e.g. a back-end-specific target structure) I think that's another thing to
defer and separate from the main target macro removal.
(iii) Some target macros are used not just in the compilers proper but
also in the driver, or are used only in the driver. ("driver" includes
collect2 and lto-wrapper for these purposes, and front-end-specific
drivers.) Those used in both places can go in the existing "common"
target structure. Those used only in the driver would go in a new driver
target structure. That driver target structure would probably be defined
in a separate C file including driver_tm.h (given the extent to which such
macros are OS-specific); the driver/config/ refactoring, as I noted in
<https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02046.html>, would be
similar to the move of macros to libgcc/config/ in that the refactoring
tool needs to take care of #if structure of tm.h headers (and in that the
definitions may depend on other tm.h macros used outside the driver). It
might well make sense to defer the move for macros used in the driver,
since they are mostly well-separated from those used elsewhere.
(iv) Sometimes the correct design for hooks is not a direct one-to-one
conversion from macros. If there's a group of closely-related macros,
such as *_TYPE_SIZE or the *_TYPE macros for various typedefs that
currently expand to strings, it's best to start by discussing on the GCC
mailing list what the right design for corresponding hooks is.
(f) For information I attach my scripts for listing and classifying target
macros. Note that these have false positives (and maybe false negatives),
as well as hardcoded paths, but they may be a helpful starting point for
identifying target macros and where they are used. In the semi-automated
refactoring approach I envisage above, I'd expect an early step to be
replacing these scripts by a rather more robust and general set of Python
modules that deal with understanding the target macro structure of GCC
source code.
--
Joseph S. Myers
jos...@codesourcery.com
#! /usr/bin/env bash
# Run this from the toplevel GCC source directory. Produces a list of
# target macros: macros defined in headers in gcc/config/ or in
# defaults.h or in tm_defines in config.gcc and used outside
# gcc/config/ and gcc/common/config/ (or in explicitly listed
# target-side gcc/config/ files). This excludes macros defined only
# in generated files, including those from config.in and from .opt
# files as well as those such as HAVE_<insn> from generator programs.
# It also excludes macros defined via makefiles. For uses, it
# excludes uses inside gcc/config/ except for the explicitly listed
# target-side files. Macros defined in libgcc/config/ are also
# excluded now those headers are no longer included on the host.
set -e
outdir=$HOME/gcc/target-macros
if [ "$1" ]; then
outdir=$1
fi
script_dir=$(cd $(dirname "$0") && pwd)
process_script=$script_dir/process-source-file
# gcc/config/ headers, plus gcc/defaults.h.
config_file_list=$outdir/tmac-config-files
# Files outside gcc/config/ and gcc/common/config/ that might
# potentially use a macro from a gcc/config/ header.
non_config_file_list=$outdir/tmac-non-config-files
# List of potential target macros.
maybe_target_macro_list=$outdir/tmac-maybe-target-macros
# List of target macro uses.
target_macro_use_list=$outdir/tmac-target-macro-uses
# List of files using tm.h but not directly using target macros.
target_macro_no_use_list=$outdir/tmac-stray-tm-h
# defaults.h is included to catch some target macros with no
# definition, or only defined there by derivation from other macros
# but used in target code.Intrinsic headers are excluded as they are
# installed target code rather than providing macros saying how to
# configure the libraries or GCC. frv-asm.h likewise.
{
echo gcc/defaults.h
find gcc/config -name '*.h'
} | sort | egrep -v
'^(gcc/config/(.*/xm-.*\.h|.*[0-9a-z]intrin\.h|arm/arm_neon\.h|m68k/math-68881\.h|i386/cpuid\.h|i386/mm3dnow\.h|i386/cross-stdarg\.h|mips/loongson\.h|rs6000/(ppc-asm|altivec|spe|ppu_intrinsics|paired|spu2vmx|vec_types|si2vmx)\.h|alpha/va_list\.h|sh/.*shmedia\.h|spu/(spu_intrinsics|spu_internals|vmx2spu|spu_mfcio|vec_types|spu_cache)\.h|frv/frv-asm\.h))$'
> $config_file_list
{
# Empirical list of directories using tm.h.
# .s .S .asm files excluded on the basis that tm.h has C
# declarations not just macros so cannot be used there.
find gcc libdecnumber libgcc libobjc \
-name '*.c' -o -name '*.h' -o -name '*.cc' -o -name '*.def' \
| sort | egrep -v '^gcc/(config|testsuite|common/config)/'
} | sort > $non_config_file_list
: > $non_config_file_list.new
for f in $(cat $non_config_file_list); do
target=false
case $f in
(lib*)
target=true
;;
(gcc/coretypes.h | gcc/defaults.h)
;;
(*)
if egrep -q 'COPYING\.RUNTIME|if *you *link' $f; then
target=true
fi
;;
esac
if $target; then
echo "=$f" >> $non_config_file_list.new
else
echo "$f" >> $non_config_file_list.new
fi
done
mv $non_config_file_list.new $non_config_file_list
: > $maybe_target_macro_list
for f in $(cat $config_file_list); do
$process_script $f >> $maybe_target_macro_list
done
tm_defines_list=$(grep tm_defines= gcc/config.gcc|perl -pe
's/^.*?tm_defines=//; s/\"//g; s/\$\{?tm_defines\}?//g; s/=\S*//g; s/;;//g;
s/'\''//g; s/SUPPORT_\`.*//g; s/\$sh_.*//g;')
for d in $tm_defines_list; do
echo $d >> $maybe_target_macro_list
done
sort < $maybe_target_macro_list | uniq | egrep -v
'^(__int64|ALTIVEC_VECTOR_MODE|PV_FOR|RA_REGNUM|REG_AT|SP_REGNUM|UNW_FLAG_EHANDLER|UNW_LENGTH|UNW_FLAG_UHANDLER|R_LR)$'
\
> $maybe_target_macro_list.new
mv $maybe_target_macro_list.new $maybe_target_macro_list
: > $target_macro_use_list
: > $target_macro_no_use_list
for f in $(cat $non_config_file_list); do
fname="${f#=}"
$process_script $fname $maybe_target_macro_list \
| sed -e "s|\$| $f|" -e "s|# | #|" > $target_macro_use_list.tmp
cat $target_macro_use_list.tmp >> $target_macro_use_list
if grep -q '"tm\.h"' $fname && ! [ -s $target_macro_use_list.tmp ]; then
echo $fname >> $target_macro_no_use_list
fi
rm $target_macro_use_list.tmp
done
#! /usr/bin/perl -w
# $ARGV[0] names a source file from which target macros, or macro
# uses, are to be extracted. If $ARGV[1] is defined, it names a file
# with a list of target macros, and uses of those macros should be
# checked for; otherwise a list of definitions should be printed.
undef $/;
$source = $ARGV[0];
if ($#ARGV >= 1) {
$macro_list_file = $ARGV[1];
$print_uses = 1;
} else {
$print_uses = 0;
}
open(SOURCE, "<$source") || die("open $source: $!\n");
$contents = <SOURCE>;
close(SOURCE) || die("close $source: $!\n");
$contents = "\n$contents\n\n";
$contents =~ s/\r\n/\n/g;
$contents =~ s/\\[ \t]*\n//g;
$contents =~ s/[ \t]*\n[ \t]*/\n/g;
$left = "";
while ($contents ne "") {
$contents =~ s/^((?:[^\/\"\']|\/(?![\/\*]))*)//s;
$left = "$left$1";
if ($contents =~ s/^\/\/[^\n]*\n//s) {
$left = "$left\n";
} elsif ($contents =~ s/^\/\*.*?\*\///s) {
$left = "$left ";
} elsif ($contents =~ s/^\"(?:[^\"\\\n]|\\[^\n])*\"//s) {
$left = "$left\"\"";
} elsif ($contents =~ s/^\'(?:[^\'\\\n]|\\[^\n])*\'//s) {
$left = "$left\'\'";
} elsif ($contents ne "") {
warn "Lex error in $source\n";
$contents = "";
}
}
$left =~ s/[ \t]*\n[ \t*]/\n/g;
$left =~ s/[ \t]+/ /g;
$left =~ s/\n\# /\n\#/g;
$left =~ s/\n+/\n/g;
if ($print_uses) {
open(MACROS, "<$macro_list_file") || die("open $macro_list_file: $!\n");
$maclist_text = <MACROS>;
close(MACROS) || die("close $macro_list_file: $!\n");
@maclist = split(/\n/, $maclist_text);
$left =~ s/\n\#define (\w+)\b/\n/g;
foreach my $macro (@maclist) {
if ($left =~ /\n#(if|elif)[^\n]*\b$macro\b/) {
print "$macro#\n";
} elsif ($left =~ /\b$macro\b/) {
print "$macro\n";
}
}
} else {
@lines = split(/\n/, $left);
foreach my $line (@lines) {
if ($line =~ /^\#define (\w+)\b/) {
print "$1\n";
}
}
}
#! /usr/bin/env bash
set -e
{
egrep 'gcc/(ada/|cp/|fortran/|go/|java/|lto/|objc/|objcp/|c-)'
tmac-target-macro-uses |grep -v spec|sed -e 's/ .*/ FrontEnd/'
egrep 'gcc/(ada/|cp/|fortran/|go/|java/|lto/|objc/|objcp/|c-)'
tmac-target-macro-uses |grep spec|sed -e 's/ .*/ FrontEndDriver/'
egrep '=' tmac-target-macro-uses |sed -e 's/ .*/ Target/'
egrep
'gcc/(cppspec|gcc|gccspec|collect2|collect2-aix|tlink|prefix|lto-wrapper)\.[ch]'
tmac-target-macro-uses |sed -e 's/ .*/ Driver/'
egrep 'gcc/defaults\.h' tmac-target-macro-uses |sed -e 's/ .*/ Defaults/'
egrep -v
'gcc/(ada/|cp/|fortran/|go/|java/|lto/|objc/|objcp/|c-)|=|gcc/(cppspec|gcc|gccspec|collect2|collect2-aix|tlink|lto-wrapper)\.[ch]|gcc/defaults\.h'
tmac-target-macro-uses |sed -e 's/ .*/ MiddleEnd/'
} | sort | uniq | perl -ne 'chomp; if (/^(\S*) (\S*)$/) { if
(defined($type{$1})) { $type{$1} .= " $2"; } else { $type{$1} = $2; } } else {
die "bad line $_\n"; } END { foreach my $k (sort keys %type) { printf "%-44s
%s\n", $k, $type{$k}; } }'