Jeff Law <l...@redhat.com> writes: > On 9/26/19 6:04 AM, Richard Sandiford wrote: >> Although it's possible to define the SVE intrinsics in a normal header >> file, it's much more convenient to define them directly in the compiler. >> This also speeds up compilation and gives better error messages. >> >> The idea is therefore for arm_sve.h (the main intrinsics header file) >> to have the pragma: >> >> #pragma GCC aarch64 "arm_sve.h" >> >> telling GCC to define (almost) everything arm_sve.h needs to define. >> The target then needs a way of injecting new built-in function >> declarations during compilation. >> >> The main hook for defining built-in functions is add_builtin_function. >> This is designed for use at start-up, and so has various features that >> are correct in that context but not for the pragma above: >> >> (1) the location is always BUILTINS_LOCATION, whereas for arm_sve.h >> it ought to be the location of the pragma. >> >> (2) the function is only immediately visible if it's in the implementation >> namespace, whereas the pragma is deliberately injecting functions >> into the general namespace. >> >> (3) there's no attempt to emulate a normal function declaration in >> C or C++, whereas functions declared by the pragma should be >> checked in the same way as an open-coded declaration would be. >> E.g. we should get an error if there was a previous incompatible >> declaration. >> >> (4) in C++, the function is treated as extern "C" and so can't be >> overloaded, whereas SVE intrinsics do use function overloading. >> >> This patch therefore adds a hook that targets can use to inject >> the equivalent of a source-level function declaration, but bound >> to a BUILT_IN_MD function. >> >> The main SVE intrinsic patch has tests to make sure that we report an >> error for conflicting definitions that appear either before or after >> including arm_sve.h. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > I'm struggling to see how this is significantly better than a suitable > arm_sve.h file. Can you walk me through more of the motivation side?
Sure. This message is going to go to the other extreme, sorry, but I'm not sure which part will be the most convincing (if any). I guess I should start by saying that SVE intrinsics have three types of predication (zeroing, merging, and "don't care") that combine with the usual type suffixes seen in arm_neon.h. This gives 3,736 functions for baseline SVE (SVE2 adds more). The vast majority of those functions can't be open-coded using the core parts of C and C++, even with GNU extensions. Some could perhaps be coded using new library extensions, but that just shifts the question from "how do we implement this feature in arm_sve.h?" to "how we implement this feature in the library extension?". An alternative to open coding using the core language is to use asm statements. But implementing intrinsics that way produces poor code, and we wanted to avoid it even as a temporary measure. So that leaves us using built-in functions for almost all of those 3,736 functions. With the traditional approach, the target would need to register the functions at start-up and then define the header file in terms of them. There are two approaches to doing that. One is to define the built-in functions under their header file name so that they become active once a prototype is seen. But that's only appropriate for functions like printf that have linkage. The arm_sve.h functions don't have linkage and there's a chance that non-SVE code could be using the same names for something else (perhaps even with the same prototype, in the case of things like uint64_t svcntb (void); that don't mention SVE types). The other alternative is to define builtins in the "__builtin_" namespace and wrap them in inline wrappers, which I think is what most intrinsics header files do. E.g., from arm_neon.h: __extension__ extern __inline float64x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vabsq_f64 (float64x2_t __a) { return __builtin_aarch64_absv2df (__a); } But that's 6 lines per function. Counting a blank line between each one, we'd end up with a header file of at least 26,151 lines. (OK, so arm_neon.h is already longer than that. But with SVE2 and with other considerations mentioned below, arm_sve.h could easily end up into 6 figures this way.) It's very hard to maintain header files like that by hand without introducing errors, such as forgetting to put the arguments safely in the implementation namespace ("__a" rather than "a" etc.). Kyrill fixed some arm_neon.h instances of this the other week. And using macros to handle common patterns just makes the error messages worse, since the macros then show up in error backtraces. An alternative to maintaining the header file by hand is to generate it via a script. Ideally the script would use the same metadata as the compiler itself uses when registering the built-in functions. But this means writing two pieces of code to process the metadata, one to generate text for the inline wrappers and one to register the built-ins. And we still end up with the same very long header file. A more fundamental problem with inline wrappers is that they make it harder to honour the spec for constant arguments. If an instruction requires a constant operand, Arm has traditionally required the associated intrinsic argument to be an integer constant expression (in the C and C++ sense). GCC has tended to fudge this and instead only requires an integer constant at expand time, after inlining and constant propagation have taken place. But this means that all sorts of other optimisations have happened too, and so what's constant at expand time wasn't necessarily constant at the language level. Admittedly some people like that behaviour :-), but it means that what's acceptable depends on the vagaries compiler optimisation. It also means that code is often not portable between GCC and clang, which implements the spec more closely. So the pragma approach seemed better for several reasons: (1) The compiler registers (almost) the same built-in functions as it would anyway, but it registers them when they're actually needed rather than at start-up. This saves the compile-time cost and memory footprint associated with registering thousands of functions that most TUs won't need. (2) It avoids the need to maintain a long header file by hand, or maintain separate scripts/programs to generate the header. (3) It avoids the compile-time overhead of parsing a large header file. (4) The frontend can check that the arguments to a function are conforming (using the new TARGET_CHECK_BUILTIN_CALL hook) rather than expand having to fill the gap. [It might be that even TARGET_CHECK_BUILTIN_CALL doesn't have a true idea of which arguments are integer constant expressions, but that's something we could iterate on, and it's going to much closer than the alternative.] (5) It improves the diagnostics: there are no inline wrappers, and so inline wrappers don't show up in the backtrace of error messages. That's why I think this approach is better even for traditional intrinsic headers like arm_neon.h. But there's more :-) For SVE we wanted to provide overloaded functions to reduce verbosity, rather than require the types of the arguments to be given as part of the function name. This is especially useful for gathers and scatters, where fully specifying the types involved leads to long function names. For C++, the overloading of course comes as part of the language. An implementation in the header file could just define the overloaded functions in the same way as the non-overloaded functions, although that bulks the header file quite a bit. But the overloaded intrinsics are available in C too. Clang has an "overloadable" attribute that allows the above C++ definitions to work for C as well, which is one way of implementing the C side. We don't have that attribute in GCC though. Another way of implementing the C overloading would be to use _Generic, and the spec was designed to be compatible with that approach for implementations that chose to use it. But _Generic was never going to be the perfect approach from a QoI perspective, for several reasons: (a) it has exponential behaviour for nested calls (see PR91937 for a recent example of this) (b) it gives poor error messages for invalid calls (c) the complicated _Generic macro shows up in the error backtrace As Jakub says in PR91937, we now implement tgmath.h overloading directly in GCC, even though that's what _Generic was originally designed for. And it seemed like that was the best approach for arm_sve.h too. So what we did was: (a) For C++, have the pragma define the individual instances of overloaded functions in the same way as the header file would, except that they're BUILT_IN_MD functions and so have no function body. (b) For C, have the pragma define a single built-in function per overloaded function name and make TARGET_RESOLVE_OVERLOADED_BUILTIN resolve calls appropriately. IMO the error messages we get from (b) are better than the messages we get from (a). If there's no matching overload, the C++ frontend just prints a list of candidate functions, which is usually quite long and probably not that helpful. (There might be times when the user genuinely doesn't know which overloads are available, but more often than not these kinds of messages come from simple mistakes.) (b) can instead work to a higher-level view of what the functions do and so has a clearer idea what the "real" error is likely to be. FWIW, the current arm_sve.h is just 7 lines, not including the guard and licence. Thanks, Richard