Re: Add a simulate_builin_function_decl langhook

Richard Sandiford Sat, 05 Oct 2019 04:30:07 -0700

Jeff Law <l...@redhat.com> writes:
> On 9/26/19 6:04 AM, Richard Sandiford wrote:
>> Although it's possible to define the SVE intrinsics in a normal header
>> file, it's much more convenient to define them directly in the compiler.
>> This also speeds up compilation and gives better error messages.
>> 
>> The idea is therefore for arm_sve.h (the main intrinsics header file)
>> to have the pragma:
>> 
>>     #pragma GCC aarch64 "arm_sve.h"
>> 
>> telling GCC to define (almost) everything arm_sve.h needs to define.
>> The target then needs a way of injecting new built-in function
>> declarations during compilation.
>> 
>> The main hook for defining built-in functions is add_builtin_function.
>> This is designed for use at start-up, and so has various features that
>> are correct in that context but not for the pragma above:
>> 
>>   (1) the location is always BUILTINS_LOCATION, whereas for arm_sve.h
>>       it ought to be the location of the pragma.
>> 
>>   (2) the function is only immediately visible if it's in the implementation
>>       namespace, whereas the pragma is deliberately injecting functions
>>       into the general namespace.
>> 
>>   (3) there's no attempt to emulate a normal function declaration in
>>       C or C++, whereas functions declared by the pragma should be
>>       checked in the same way as an open-coded declaration would be.
>>       E.g. we should get an error if there was a previous incompatible
>>       declaration.
>> 
>>   (4) in C++, the function is treated as extern "C" and so can't be
>>       overloaded, whereas SVE intrinsics do use function overloading.
>> 
>> This patch therefore adds a hook that targets can use to inject
>> the equivalent of a source-level function declaration, but bound
>> to a BUILT_IN_MD function.
>> 
>> The main SVE intrinsic patch has tests to make sure that we report an
>> error for conflicting definitions that appear either before or after
>> including arm_sve.h.
>> 
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> I'm struggling to see how this is significantly better than a suitable
> arm_sve.h file.    Can you walk me through more of the motivation side?


Sure.  This message is going to go to the other extreme, sorry, but I'm
not sure which part will be the most convincing (if any).

I guess I should start by saying that SVE intrinsics have three types of
predication (zeroing, merging, and "don't care") that combine with the usual
type suffixes seen in arm_neon.h.  This gives 3,736 functions for baseline
SVE (SVE2 adds more).

The vast majority of those functions can't be open-coded using the
core parts of C and C++, even with GNU extensions.  Some could perhaps
be coded using new library extensions, but that just shifts the question
from "how do we implement this feature in arm_sve.h?" to "how we implement
this feature in the library extension?".

An alternative to open coding using the core language is to use asm
statements.  But implementing intrinsics that way produces poor code,
and we wanted to avoid it even as a temporary measure.

So that leaves us using built-in functions for almost all of those 3,736
functions.  With the traditional approach, the target would need to
register the functions at start-up and then define the header file in
terms of them.

There are two approaches to doing that.  One is to define the built-in
functions under their header file name so that they become active once
a prototype is seen.  But that's only appropriate for functions like
printf that have linkage.  The arm_sve.h functions don't have linkage
and there's a chance that non-SVE code could be using the same names for
something else (perhaps even with the same prototype, in the case of
things like

  uint64_t svcntb (void);

that don't mention SVE types).

The other alternative is to define builtins in the "__builtin_"
namespace and wrap them in inline wrappers, which I think is what
most intrinsics header files do.  E.g., from arm_neon.h:

__extension__ extern __inline float64x2_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vabsq_f64 (float64x2_t __a)
{
  return __builtin_aarch64_absv2df (__a);
}

But that's 6 lines per function.  Counting a blank line between
each one, we'd end up with a header file of at least 26,151 lines.
(OK, so arm_neon.h is already longer than that.  But with SVE2 and with
other considerations mentioned below, arm_sve.h could easily end up into
6 figures this way.)

It's very hard to maintain header files like that by hand without
introducing errors, such as forgetting to put the arguments safely
in the implementation namespace ("__a" rather than "a" etc.).  Kyrill
fixed some arm_neon.h instances of this the other week.  And using macros
to handle common patterns just makes the error messages worse, since the
macros then show up in error backtraces.

An alternative to maintaining the header file by hand is to generate
it via a script.  Ideally the script would use the same metadata as
the compiler itself uses when registering the built-in functions.
But this means writing two pieces of code to process the metadata,
one to generate text for the inline wrappers and one to register the
built-ins.  And we still end up with the same very long header file.

A more fundamental problem with inline wrappers is that they make it
harder to honour the spec for constant arguments.  If an instruction
requires a constant operand, Arm has traditionally required the
associated intrinsic argument to be an integer constant expression
(in the C and C++ sense).  GCC has tended to fudge this and instead only
requires an integer constant at expand time, after inlining and constant
propagation have taken place.  But this means that all sorts of other
optimisations have happened too, and so what's constant at expand time
wasn't necessarily constant at the language level.

Admittedly some people like that behaviour :-), but it means that what's
acceptable depends on the vagaries compiler optimisation.  It also means
that code is often not portable between GCC and clang, which implements
the spec more closely.

So the pragma approach seemed better for several reasons:

(1) The compiler registers (almost) the same built-in functions as it
    would anyway, but it registers them when they're actually needed
    rather than at start-up.  This saves the compile-time cost and
    memory footprint associated with registering thousands of functions
    that most TUs won't need.

(2) It avoids the need to maintain a long header file by hand, or maintain
    separate scripts/programs to generate the header.

(3) It avoids the compile-time overhead of parsing a large header file.

(4) The frontend can check that the arguments to a function are conforming
    (using the new TARGET_CHECK_BUILTIN_CALL hook) rather than expand
    having to fill the gap.

    [It might be that even TARGET_CHECK_BUILTIN_CALL doesn't have a true
    idea of which arguments are integer constant expressions, but that's
    something we could iterate on, and it's going to much closer than
    the alternative.]

(5) It improves the diagnostics: there are no inline wrappers, and so
    inline wrappers don't show up in the backtrace of error messages.

That's why I think this approach is better even for traditional
intrinsic headers like arm_neon.h.  But there's more :-)  For SVE
we wanted to provide overloaded functions to reduce verbosity, rather
than require the types of the arguments to be given as part of the
function name.  This is especially useful for gathers and scatters,
where fully specifying the types involved leads to long function names.

For C++, the overloading of course comes as part of the language.
An implementation in the header file could just define the overloaded
functions in the same way as the non-overloaded functions, although that
bulks the header file quite a bit.

But the overloaded intrinsics are available in C too.  Clang has
an "overloadable" attribute that allows the above C++ definitions
to work for C as well, which is one way of implementing the C side.
We don't have that attribute in GCC though.

Another way of implementing the C overloading would be to use _Generic,
and the spec was designed to be compatible with that approach for
implementations that chose to use it.  But _Generic was never going
to be the perfect approach from a QoI perspective, for several reasons:

(a) it has exponential behaviour for nested calls (see PR91937 for a recent
    example of this)

(b) it gives poor error messages for invalid calls

(c) the complicated _Generic macro shows up in the error backtrace

As Jakub says in PR91937, we now implement tgmath.h overloading directly
in GCC, even though that's what _Generic was originally designed for.
And it seemed like that was the best approach for arm_sve.h too.

So what we did was:

(a) For C++, have the pragma define the individual instances of overloaded
    functions in the same way as the header file would, except that
    they're BUILT_IN_MD functions and so have no function body.

(b) For C, have the pragma define a single built-in function per
    overloaded function name and make TARGET_RESOLVE_OVERLOADED_BUILTIN
    resolve calls appropriately.

IMO the error messages we get from (b) are better than the messages we
get from (a).  If there's no matching overload, the C++ frontend just
prints a list of candidate functions, which is usually quite long and
probably not that helpful.  (There might be times when the user genuinely
doesn't know which overloads are available, but more often than not
these kinds of messages come from simple mistakes.)  (b) can instead
work to a higher-level view of what the functions do and so has a clearer
idea what the "real" error is likely to be.

FWIW, the current arm_sve.h is just 7 lines, not including the guard
and licence.

Thanks,
Richard

Re: Add a simulate_builin_function_decl langhook

Reply via email to