[RFC] c++: parser - Support for target address spaces in C++
Hi, Presently, GCC supports target address spaces as a GNU extension (cf. `info -n "(gcc)Named Address Spaces"`). This is however supported only by the C frontend, which is a bit sad, since all the GIMPLE machinery is readily available and, moreover, LLVM supports this GNU extension both for C and C++. Here is a first attempt at a patch to enable the support for target address spaces in C++. The basic idea is only to make the parser recognize address spaces, and lower them into GIMPLE, in the same fashion as the C parser. This also makes it possible to merge the function `c_register_addr_space` in one place which is better than before. The patch still has some drawbacks compared to its C counterpart. Namely, much like the `__restrict__` keyword, it is always enabled and -std=c++11 won't disable it (whereas -std=c11) would reject address spaces. Not also that the mangler ignores address spaces, much like it ignores __restrict__. Depending on the reception, I'll add further testcases and will update the relevant part of the documentation. Cheers, Paul Author: Paul Iannetta Date: Wed Oct 5 16:44:36 2022 +0200 Add support for custom address spaces in C++ gcc/ * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses. gcc/c/ * c-decl.c: Remove c_register_addr_space. gcc/c-family/ * c-common.c (c_register_addr_space): Imported from c-decl.c * c-common.h: Remove the FIXME. gcc/cp/ * cp-tree.h (enum cp_decl_spec): Add addr_space support. (struct cp_decl_specifier_seq): Likewise. * decl.c (get_type_quals): Likewise. * parser.c (cp_parser_type_specifier): Likewise. (cp_parser_cv_qualifier_seq_opt): Likewise * tree.c: Remove c_register_addr_space stub. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 064c2f263f0..282ba54ab70 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -2615,6 +2615,25 @@ c_build_bitfield_integer_type (unsigned HOST_WIDE_INT width, int unsignedp) return build_nonstandard_integer_type (width, unsignedp); } +/* Register reserved keyword WORD as qualifier for address space AS. */ + +void +c_register_addr_space (const char *word, addr_space_t as) +{ + int rid = RID_FIRST_ADDR_SPACE + as; + tree id; + + /* Address space qualifiers are only supported + in C with GNU extensions enabled. */ + if (c_dialect_objc () || flag_no_asm) +return; + + id = get_identifier (word); + C_SET_RID_CODE (id, rid); + TREE_LANG_FLAG_0 (id) = 1; + ridpointers[rid] = id; +} + /* The C version of the register_builtin_type langhook. */ void diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index ed39b7764bf..f2c1df0c8de 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -808,9 +808,6 @@ extern const struct attribute_spec c_common_format_attribute_table[]; extern tree (*make_fname_decl) (location_t, tree, int); -/* In c-decl.c and cp/tree.c. FIXME. */ -extern void c_register_addr_space (const char *str, addr_space_t as); - /* In c-common.c. */ extern bool in_late_binary_op; extern const char *c_addr_space_name (addr_space_t as); @@ -926,6 +923,7 @@ extern void c_common_finish (void); extern void c_common_parse_file (void); extern FILE *get_dump_info (int, dump_flags_t *); extern alias_set_type c_common_get_alias_set (tree); +extern void c_register_addr_space (const char *, addr_space_t); extern void c_register_builtin_type (tree, const char*); extern bool c_promoting_integer_type_p (const_tree); extern bool self_promoting_args_p (const_tree); diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index 8e24b522ee4..278d1248d1c 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -11927,25 +11927,6 @@ c_parse_final_cleanups (void) ext_block = NULL; } -/* Register reserved keyword WORD as qualifier for address space AS. */ - -void -c_register_addr_space (const char *word, addr_space_t as) -{ - int rid = RID_FIRST_ADDR_SPACE + as; - tree id; - - /* Address space qualifiers are only supported - in C with GNU extensions enabled. */ - if (c_dialect_objc () || flag_no_asm) -return; - - id = get_identifier (word); - C_SET_RID_CODE (id, rid); - C_IS_RESERVED_WORD (id) = 1; - ridpointers [rid] = id; -} - /* Return identifier to look up for omp declare reduction. */ tree diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 15ec4cd199f..4ae971ccb90 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5974,6 +5974,7 @@ enum cp_decl_spec { ds_const, ds_volatile, ds_restrict, + ds_addr_space, ds_inline, ds_virtual, ds_explicit, @@ -6046,6 +6047,8 @@ struct cp_decl_specifier_seq { /* True iff the alternate "__intN__" form of the __intN type has been used. */ BOOL_BITFIELD int_n_alt: 1; + /* The address space that the declaration belongs to. */ + addr_space_t address_space; }; /* The various kinds of declarators. */ diff --git a/gcc/cp/decl.c b/gcc/cp/decl.
Re: [RFC] c++: parser - Support for target address spaces in C++
On Thu, Oct 06, 2022 at 09:52:45AM -0700, Andrew Pinski wrote: > On Thu, Oct 6, 2022 at 7:15 AM Paul Iannetta via Gcc wrote: > > > > Hi, > > > > Presently, GCC supports target address spaces as a GNU extension (cf. > > `info -n "(gcc)Named Address Spaces"`). This is however supported > > only by the C frontend, which is a bit sad, since all the GIMPLE > > machinery is readily available and, moreover, LLVM supports this GNU > > extension both for C and C++. > > > > Here is a first attempt at a patch to enable the support for target > > address spaces in C++. The basic idea is only to make the parser > > recognize address spaces, and lower them into GIMPLE, in the same > > fashion as the C parser. This also makes it possible to merge the > > function `c_register_addr_space` in one place which is better than > > before. > > > > The patch still has some drawbacks compared to its C counterpart. > > Namely, much like the `__restrict__` keyword, it is always enabled and > > -std=c++11 won't disable it (whereas -std=c11) would reject address > > spaces. Not also that the mangler ignores address spaces, much like > > it ignores __restrict__. > > Without the mangler support or overloading support, there would will > be wrong code emitted as some named address spaces don't overlap with > the unnamed ones. > E.g. on SPU (which has been since removed), the name address support > was added to support automatically DMAing from the PPU memory on the > cell. So two address spaces are disjoint. > This is also true on other targets too. > This is biggest reason why this was not added to the C++ front-end. > If clang supports it, does it do overload/template mangling support. I > think we don't want a partial support added which will just break > under simple usage. > I don't know to what extent clang supports their usage in templates but it does support proper overloading/name mangling. I'll add the currently missing mangling support, and how it behaves with templates. > Also note named address support is not exactly a GNU specific > extension either; > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1275.pdf has the > definition of it for C and GCC tries to follow that. > Fixed types support in GCC also follows that draft. > Good to know! Thank, Paul > Thanks, > Andrew Pinski > > > > > > Depending on the reception, I'll add further testcases and will update > > the relevant part of the documentation. > > > > Cheers, > > Paul > > > > Author: Paul Iannetta > > Date: Wed Oct 5 16:44:36 2022 +0200 > > > > Add support for custom address spaces in C++ > > > > gcc/ > > * tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses. > > > > gcc/c/ > > * c-decl.c: Remove c_register_addr_space. > > > > gcc/c-family/ > > * c-common.c (c_register_addr_space): Imported from c-decl.c > > * c-common.h: Remove the FIXME. > > > > gcc/cp/ > > * cp-tree.h (enum cp_decl_spec): Add addr_space support. > > (struct cp_decl_specifier_seq): Likewise. > > * decl.c (get_type_quals): Likewise. > > * parser.c (cp_parser_type_specifier): Likewise. > > (cp_parser_cv_qualifier_seq_opt): Likewise > > * tree.c: Remove c_register_addr_space stub. > > > > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c > > index 064c2f263f0..282ba54ab70 100644 > > --- a/gcc/c-family/c-common.c > > +++ b/gcc/c-family/c-common.c > > @@ -2615,6 +2615,25 @@ c_build_bitfield_integer_type (unsigned > > HOST_WIDE_INT width, int unsignedp) > >return build_nonstandard_integer_type (width, unsignedp); > > } > > > > +/* Register reserved keyword WORD as qualifier for address space AS. */ > > + > > +void > > +c_register_addr_space (const char *word, addr_space_t as) > > +{ > > + int rid = RID_FIRST_ADDR_SPACE + as; > > + tree id; > > + > > + /* Address space qualifiers are only supported > > + in C with GNU extensions enabled. */ > > + if (c_dialect_objc () || flag_no_asm) > > +return; > > + > > + id = get_identifier (word); > > + C_SET_RID_CODE (id, rid); > > + TREE_LANG_FLAG_0 (id) = 1; > > + ridpointers[rid] = id; > > +} > > + > > /* The C version of the register_builtin_type langhook. */ > > > > void > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h > > index ed39b7764bf..f2c1df0c8de 100644 > > --- a/gcc/c-famil
Re: Announcement: Porting the Docs to Sphinx - 9. November 2022
Hi Martin, Thank you very much for porting the documentation to Sphinx, it is very convenient to use, especially the menu on the left and the search bar. However, I also regularly browse and search the documentation through info, especially when I want to use regexps to search or need to include a special character (eg.,+,-,_,(; this can happen when I search for '(define' ) for example) in the search string. Does the port to Sphinx means the end of texinfo? Or, will both be available as it is the case now with the official texinfo and your unofficial splichal.eu pages. Paul On Mon, Oct 17, 2022 at 03:28:34PM +0200, Martin Liška wrote: > Hello. > > Based on the very positive feedback I was given at the Cauldron Sphinx > Documentation BoF, > I'm planning migrating the documentation on 9th November. There are still > some minor comments > from Sandra when it comes to the PDF output, but we can address that once the > conversion is done. > > The reason I'm sending the email now is that I was waiting for latest Sphinx > release (5.3.0) that > simplifies reference format for options and results in much simpler Option > summary section ([1]) > > The current GCC master (using Sphinx 5.3.0) converted docs can be seen here: > https://splichal.eu/scripts/sphinx/ > > If you see any issues with the converted documentation, or have a feedback > about it, > please reply to this email. > > Cheers, > Martin > > [1] https://github.com/sphinx-doc/sphinx/pull/10840 > [1] > https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/option-summary.html > > > >
Re: Announcement: Porting the Docs to Sphinx - 9. November 2022
On Wed, Oct 19, 2022 at 09:24:06AM +0200, Martin Liška wrote: > On 10/17/22 16:16, Paul Iannetta wrote: > > Hi Martin, > > > > Thank you very much for porting the documentation to Sphinx, it is > > very convenient to use, especially the menu on the left and the > > search bar. > > Thanks, I also like it! > > > > > However, I also regularly browse and search the documentation through > > info, especially when I want to use regexps to search or need to > > include a special character (eg.,+,-,_,(; this can happen when I > > search for '(define' ) for example) in the search string. > > > > Does the port to Sphinx means the end of texinfo? Or, will both be > > available as it is the case now with the official texinfo and your > > unofficial splichal.eu pages. > > It will be still available same as now where manual pages and info pages > are built if you compile GCC from sources. We haven't been publishing manual > pages and info pages on our web pages, people typically get these from > their distribution packages. As long as it is possible to build the info manual with "make info", even through something like rst2texinfo, I would be happy. Would it be possible to see the rst source of the port so as to try rst2texinfo on it? > > Does it help? Or do you expect any change regarding what we publish at: > https://gcc.gnu.org/onlinedocs/ > ? Currently, there is a tarball with texinfo sources for all the manuals for each version. Thanks, Paul > > Cheers, > Martin > > > > > Paul > > > > On Mon, Oct 17, 2022 at 03:28:34PM +0200, Martin Liška wrote: > >> Hello. > >> > >> Based on the very positive feedback I was given at the Cauldron Sphinx > >> Documentation BoF, > >> I'm planning migrating the documentation on 9th November. There are still > >> some minor comments > >> from Sandra when it comes to the PDF output, but we can address that once > >> the conversion is done. > >> > >> The reason I'm sending the email now is that I was waiting for latest > >> Sphinx release (5.3.0) that > >> simplifies reference format for options and results in much simpler Option > >> summary section ([1]) > >> > >> The current GCC master (using Sphinx 5.3.0) converted docs can be seen > >> here: > >> https://splichal.eu/scripts/sphinx/ > >> > >> If you see any issues with the converted documentation, or have a feedback > >> about it, > >> please reply to this email. > >> > >> Cheers, > >> Martin > >> > >> [1] https://github.com/sphinx-doc/sphinx/pull/10840 > >> [1] > >> https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/option-summary.html > >> > >> > >> > >> > > > > > > > > > > > > >
LTO apparently does not support _FloatNx types
Hi, I was investigating an ICE (in our yet to be upstreamed back-end which has native support for float16), on "gcc.dg/torture/float16-complex.c" when compiled with lto: ./gcc/build/gcc/xgcc -B./gcc/build/gcc/ ./gcc/gcc/testsuite/gcc.dg/torture/float16-complex.c \ -O2 -flto -fno-use-linker-plugin -flto-partition=none -lm -o ./float16-complex.exe I narrowed it down to the fact that lto-lang does not support _FloatNx types, the function "lto_type_for_mode" (in gcc/lto/lto-lang.c) and "c_common_type_for_mode" (in gcc/c-family/c-common.c) are exactly the same except that "lto_type_for_mode" does not support _FloatNx. Is this intentional or an oversight? Cheers, Paul
Re: LTO apparently does not support _FloatNx types
On Fri, Jan 13, 2023 at 08:15:34AM +0100, Richard Biener wrote: > On Thu, Jan 12, 2023 at 5:35 PM Richard Biener > wrote: > > > > > > > > > Am 12.01.2023 um 17:18 schrieb Paul Iannetta via Gcc : > > > > > > Hi, > > > > > > I was investigating an ICE (in our yet to be upstreamed back-end which > > > has native support for float16), on "gcc.dg/torture/float16-complex.c" > > > when compiled with lto: > > > > > > ./gcc/build/gcc/xgcc -B./gcc/build/gcc/ > > > ./gcc/gcc/testsuite/gcc.dg/torture/float16-complex.c \ > > > -O2 -flto -fno-use-linker-plugin -flto-partition=none -lm -o > > > ./float16-complex.exe > > > > > > I narrowed it down to the fact that lto-lang does not support _FloatNx > > > types, the function "lto_type_for_mode" (in gcc/lto/lto-lang.c) and > > > "c_common_type_for_mode" (in gcc/c-family/c-common.c) are exactly the > > > same except that "lto_type_for_mode" does not support _FloatNx. > > > > > > Is this intentional or an oversight? > > > > It’s probably an oversight. > > I'm testing a patch to sync them up. > > Richard. > Wonderful, thanks! Paul > > > > > Cheers, > > > Paul > > > > > > > > > > > > > > > >
The macro STACK_BOUNDARY may overflow
Hi, Currently, the macro STACK_BOUNDARY is defined as Macro: STACK_BOUNDARY Define this macro to the minimum alignment enforced by hardware for the stack pointer on this machine. The definition is a C expression for the desired alignment (measured in bits). This value is used as a default if 'PREFERRED_STACK_BOUNDARY' is not defined. On most machines, this should be the same as 'PARM_BOUNDARY'. with no mentions about its type and bounds. However, at some point, the value of this macro gets assigned to the field "regno_pointer_align" of "struct emit_status" which points to an "unsigned char", hence if STACK_BOUNDARY gets bigger than 255, it will overflow... Thankfully, the backend which defines the highest value is microblaze with 128 < 255. The assignment happens in "emit-rtl.c" through the REGNO_POINTER_ALIGN macro: in function.h: #define REGNO_POINTER_ALIGN(REGNO) (crtl->emit.regno_pointer_align[REGNO]) in emit-rtl.cc: REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = STACK_BOUNDARY; [...] REGNO_POINTER_ALIGN (VIRTUAL_OUTGOING_ARGS_REGNUM) = STACK_BOUNDARY; Would it be possible to, either add an explicit bound to STACK_BOUNDARY in the manual, and/or use an "unsigned int *" rather than and "unsigned char *" for the field "regno_pointer_align". Thanks, Paul
Re: Exporting inline functions
Hi, On Mon, Sep 25, 2023 at 05:46:36PM -0500, Nima Hamidi via Gcc wrote: > Is there any flag that I can pass to gcc to make it generate dynamic symbols > for inline functions too? Let’s say I need to lookup an inline function via > dlopen and call it. Is there an easy way to achieve this? > You may want to look at options -fkeep-inline-functions and -fkeep-static-functions. If you use them inline and static functions will be emitted into the object file. Beware that if __attribute__((always_inline)) is used it won't be kept. Paul
Re: Complex numbers support: discussions summary
On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote: > On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc wrote: > > > > Hi, > > > > We had very interesting discussions during our presentation with Paul on > > the > > support of complex numbers in gcc at the Cauldron. > > > > Thank you all for your participation ! > > > > Here is a small summary from our viewpoint: > > > > - Replace CONCAT with a backend defined internal representation in RTL > > --> No particular problems > > > > - Allow backend to write patterns for operation on complex modes > > --> No particular problems > > > > - Conditional lowering depending on whether a pattern exists or not > > --> Concerns when the vectorization of split complex operations performs > > better > > than not vectorized unified complex operations > > > > - Centralize complex lowering in cplxlower > > --> No particular problems if it doesn't prevent IEEE compliance and > > optimizations (like const folding) > > > > - Vectorization of complex operations > > --> 2 representations (interleaved and separated real/imag): cannot > > impose one > > if some machines prefer the other > > --> Complex are composite modes, the vectorizer assumes that the inner > > mode is > > scalar to do some optimizations (which ones ?) > > --> Mixed split/unified complex operations cannot be vectorized easely > > --> Assuming that the inner representation of complex vectors is let to > > target > > backends, the vectorizer doesn't know it, which prevent some > > optimizations > > (which ones ?) > > > > - Explicit vectors of complex > > --> Cplxlower cannot lower it, and moving veclower before cplxlower is a > > bad > > idea as it prevents some optimizations > > --> Teaching cplxlower how to deal with vectors of complex seems to be a > > reasonable alternative > > --> Concerns about ABI or indexing if the internal representation is let > > to the > > backend and differs from the representation in memory > > > > - Impact of the current SLP pattern matching of complex operations > > --> Only with -ffast-math > > --> It can match user defined operations (not C99) that can be > > simplified with a > > complex instruction > > --> Dedicated opcode and real vector type choosen VS standard opcode and > > complex > > mode in our implementation > > --> Need to preserve SLP pattern matching as too many applications > > redefines > > complex and bypass C99 standard. > > --> So need to harmonize with our implementation > > > > - Support of the pure imaginary type (_Imaginary) > > --> Still not supported by gcc (and llvm), neither in our implementation > > --> Issues comes from the fact that an imaginary is not a complex with > > real part > > set to 0 > > --> The same issue with complex multiplication by a real (which is split > > in the > > frontend, and our implementation hasn't changed it yet) > > --> Idea: Add an attribute to the Tree complex type which specify pure > > real / pure > > imaginary / full complex ? > > > > - Fast pattern for IEEE compliant emulated operations > > --> Not enough time to discuss about it > > > > Don't hesitate to add something or bring more precision if you want. > > > > As I said at the end of the presentation, we have written a paper which > > explains > > our implementation in details. You can find it on the wiki page of the > > Cauldron > > (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&target=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf). > > Thanks for the detailed presentation at the Cauldron. > > My personal summary is that I'm less convinced delaying lowering is > the way to go. This is not only delayed lowering, if the SPN are there, there is no lowering at all. > I do think that if targets implement complex optabs we should use them but > eventually re-discovering complex operations from lowered form is going to be > more useful. I would not be opposed to rediscovering complex operations but I think that even though, rediscovering a + b, a - b is easy, a * b would still be doable, but even a / b will be hard. Even though, I doubt will see a hardware complex division but who knows. However, once lowered, re-associating a * b * c and more complex expressions is going to be hard. > That's because as you said, use of _Complex is limited and people > inventing their own representation. Yes, this would be a step back at first, but, proper support for _Complex would probably be an incentive for library writers to take them into account. > SLP vectorization can discover some ops > already with the limiting factor being that we don't specifically search for > only complex operations (plus we expose the result as vector operations, > requiring target support for the vector ops rather than [SD]Cmode operations). Our only concern with SLP is that it only works within loops. If we want to re-discover complex numbers we could either add a dedicat
Re: Complex numbers support: discussions summary
On Tue, Sep 26, 2023 at 09:28:08AM +, Tamar Christina wrote: > > -Original Message- > > From: Gcc On Behalf > > Of Paul Iannetta via Gcc > > Sent: Tuesday, September 26, 2023 9:54 AM > > To: Richard Biener > > Cc: Sylvain Noiry ; gcc@gcc.gnu.org; > > sylvain.no...@hotmail.fr > > Subject: Re: Complex numbers support: discussions summary > > > > On Tue, Sep 26, 2023 at 09:30:21AM +0200, Richard Biener via Gcc wrote: > > > On Mon, Sep 25, 2023 at 5:17 PM Sylvain Noiry via Gcc > > wrote: > > > > > > > > Hi, > > > > > > > > We had very interesting discussions during our presentation with > > > > Paul on the support of complex numbers in gcc at the Cauldron. > > > > > > > > Thank you all for your participation ! > > > > > > > > Here is a small summary from our viewpoint: > > > > > > > > - Replace CONCAT with a backend defined internal representation in > > > > RTL > > > > --> No particular problems > > > > > > > > - Allow backend to write patterns for operation on complex modes > > > > --> No particular problems > > > > > > > > - Conditional lowering depending on whether a pattern exists or not > > > > --> Concerns when the vectorization of split complex operations > > > > --> performs > > > > better > > > > than not vectorized unified complex operations > > > > > > > > - Centralize complex lowering in cplxlower > > > > --> No particular problems if it doesn't prevent IEEE compliance and > > > > optimizations (like const folding) > > > > > > > > - Vectorization of complex operations > > > > --> 2 representations (interleaved and separated real/imag): cannot > > > > impose one > > > > if some machines prefer the other > > > > --> Complex are composite modes, the vectorizer assumes that the > > > > --> inner > > > > mode is > > > > scalar to do some optimizations (which ones ?) > > > > --> Mixed split/unified complex operations cannot be vectorized > > > > --> easely Assuming that the inner representation of complex vectors > > > > --> is let to > > > > target > > > > backends, the vectorizer doesn't know it, which prevent some > > > > optimizations > > > > (which ones ?) > > > > > > > > - Explicit vectors of complex > > > > --> Cplxlower cannot lower it, and moving veclower before cplxlower > > > > --> is a > > > > bad > > > > idea as it prevents some optimizations > > > > --> Teaching cplxlower how to deal with vectors of complex seems to > > > > --> be a > > > > reasonable alternative > > > > --> Concerns about ABI or indexing if the internal representation is > > > > --> let > > > > to the > > > > backend and differs from the representation in memory > > > > > > > > - Impact of the current SLP pattern matching of complex operations > > > > --> Only with -ffast-math > > > > --> It can match user defined operations (not C99) that can be > > > > simplified with a > > > > complex instruction > > > > --> Dedicated opcode and real vector type choosen VS standard opcode > > > > --> and > > > > complex > > > > mode in our implementation > > > > --> Need to preserve SLP pattern matching as too many applications > > > > redefines > > > > complex and bypass C99 standard. > > > > --> So need to harmonize with our implementation > > > > > > > > - Support of the pure imaginary type (_Imaginary) > > > > --> Still not supported by gcc (and llvm), neither in our > > > > --> implementation Issues comes from the fact that an imaginary is > > > > --> not a complex with > > > > real part > > > > set to 0 > > > > --> The same issue with complex multiplication by a real (which is > > > > --> split > > > > in the > > > > frontend, and our implementation hasn't changed it yet) > > > > --> Idea: Add an attribute to the Tree complex type which specify > > > > --> pure > > > > real / pure > > > >
Re: Complex numbers support: discussions summary
On Tue, Sep 26, 2023 at 08:29:16AM +, Tamar Christina via Gcc wrote: > Hi, > > I tried to find you two on Sunday but couldn't locate you. Thanks for the > presentation! Yes, sadly we could not attend on Sunday because we wanted to be back for Monday. > > > > > > We had very interesting discussions during our presentation with Paul > > > on the support of complex numbers in gcc at the Cauldron. > > > > > > Thank you all for your participation ! > > > > > > Here is a small summary from our viewpoint: > > > > > > - Replace CONCAT with a backend defined internal representation in RTL > > > --> No particular problems > > > > > > - Allow backend to write patterns for operation on complex modes > > > --> No particular problems > > > > > > - Conditional lowering depending on whether a pattern exists or not > > > --> Concerns when the vectorization of split complex operations > > > --> performs > > > better > > > than not vectorized unified complex operations > > > > > > - Centralize complex lowering in cplxlower > > > --> No particular problems if it doesn't prevent IEEE compliance and > > > optimizations (like const folding) > > > > > > - Vectorization of complex operations > > > --> 2 representations (interleaved and separated real/imag): cannot > > > impose one > > > if some machines prefer the other > > > --> Complex are composite modes, the vectorizer assumes that the inner > > > mode is > > > scalar to do some optimizations (which ones ?) > > > --> Mixed split/unified complex operations cannot be vectorized easely > > > --> Assuming that the inner representation of complex vectors is let > > > --> to > > > target > > > backends, the vectorizer doesn't know it, which prevent some > > > optimizations > > > (which ones ?) > > > > > > - Explicit vectors of complex > > > --> Cplxlower cannot lower it, and moving veclower before cplxlower is > > > --> a > > > bad > > > idea as it prevents some optimizations > > > --> Teaching cplxlower how to deal with vectors of complex seems to be > > > --> a > > > reasonable alternative > > > --> Concerns about ABI or indexing if the internal representation is > > > --> let > > > to the > > > backend and differs from the representation in memory > > > > > > - Impact of the current SLP pattern matching of complex operations > > > --> Only with -ffast-math > > > --> It can match user defined operations (not C99) that can be > > > simplified with a > > > complex instruction > > > --> Dedicated opcode and real vector type choosen VS standard opcode > > > --> and > > > complex > > > mode in our implementation > > > --> Need to preserve SLP pattern matching as too many applications > > > redefines > > > complex and bypass C99 standard. > > > --> So need to harmonize with our implementation > > > > > > - Support of the pure imaginary type (_Imaginary) > > > --> Still not supported by gcc (and llvm), neither in our > > > --> implementation Issues comes from the fact that an imaginary is not > > > --> a complex with > > > real part > > > set to 0 > > > --> The same issue with complex multiplication by a real (which is > > > --> split > > > in the > > > frontend, and our implementation hasn't changed it yet) > > > --> Idea: Add an attribute to the Tree complex type which specify pure > > > real / pure > > > imaginary / full complex ? > > > > > > - Fast pattern for IEEE compliant emulated operations > > > --> Not enough time to discuss about it > > > > > > Don't hesitate to add something or bring more precision if you want. > > > > > > As I said at the end of the presentation, we have written a paper > > > which explains our implementation in details. You can find it on the > > > wiki page of the Cauldron > > > > > (https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=view&tar > > get=Exposing+Complex+Numbers+to+Target+Back-ends+%28paper%29.pdf). > > > > Thanks for the detailed presentation at the Cauldron. > > > > My personal summary is that I'm less convinced delaying lowering is the way > > to go. > > I personally like the delayed lowering for scalar because it allows us to > properly > reassociate as a unit. That is to say, it's easier to detect a * b * c when > they > are still complex ops. And the late lowering will allow beter codegen than > today. > > However I think we should *unconditionally* not lower them, even in situations > such as a * b * imag(b). This situation can happen by late optimizations > anyway > so it has to be dealt with regardless so I don't think it should punt. > > I think you can then conditionally lower if the target does *not* implement > the > optab. i.e. for AArch64 the complex mode wouldn't be useful. > Indeed, our current approach in the vectorizer works only if the complex scalar patterns exist as well, and I agree that it would be better to if the absence of either scalar or vector patterns would not prevent any optimizations. Keeping everything unified unti
Re: The macro STACK_BOUNDARY may overflow
On Sat, Mar 25, 2023 at 10:28:02AM -0600, Jeff Law via Gcc wrote: > On 3/24/23 07:48, Paul Iannetta via Gcc wrote: > > Hi, > > > > Currently, the macro STACK_BOUNDARY is defined as > > > >Macro: STACK_BOUNDARY > > Define this macro to the minimum alignment enforced by hardware for > > the stack pointer on this machine. The definition is a C > > expression for the desired alignment (measured in bits). This > > value is used as a default if 'PREFERRED_STACK_BOUNDARY' is not > > defined. On most machines, this should be the same as > > 'PARM_BOUNDARY'. > > > > with no mentions about its type and bounds. However, at some point, the > > value > > of this macro gets assigned to the field "regno_pointer_align" of "struct > > emit_status" which points to an "unsigned char", hence if STACK_BOUNDARY > > gets > > bigger than 255, it will overflow... Thankfully, the backend which defines > > the > > highest value is microblaze with 128 < 255. > > > > The assignment happens in "emit-rtl.c" through the REGNO_POINTER_ALIGN > > macro: > > > > in function.h: > >#define REGNO_POINTER_ALIGN(REGNO) > > (crtl->emit.regno_pointer_align[REGNO]) > > in emit-rtl.cc: > >REGNO_POINTER_ALIGN (STACK_POINTER_REGNUM) = STACK_BOUNDARY; > >[...] > >REGNO_POINTER_ALIGN (VIRTUAL_OUTGOING_ARGS_REGNUM) = STACK_BOUNDARY; > > > > Would it be possible to, either add an explicit bound to STACK_BOUNDARY in > > the > > manual, and/or use an "unsigned int *" rather than and "unsigned char *" for > > the field "regno_pointer_align". > Feel free to send a suitable patch to gcc-patches. THe alignment data isn't > held in the RTX structures, so it's not critical that it be kept as small as > possible. We don't want to waste space, so an unsigned short might be > better. A char was good for 30 years, so we don't need to go crazy here. > > The alternative would be to change the representation to store the log2 of > the alignment. That would be a much larger change and would trade off > runtime for memory consumption. I would have suggested this approach if the > data were in the RTX structures amd space at a premium. > > While I do see a trend in processor design to reduce/remove the misalignment > penalties (thus eliminating the driver for increasing data alignment), > that's been primarily in high end designs. > > jeff Hi, Here is a patch that changes the type of the variable holding the alignment to unsigned short for the reasons outlined above. Should I also mention the new upper bound for STACK_BOUNDARY in the documentation or keep it silent? Thanks, Paul >8---8< From: Paul Iannetta Date: Fri, 12 Jan 2024 10:18:34 +0100 Subject: [PATCH] make regno_pointer_align a unsigned short* This changes allows to align values greater than 128 to be used for alignment. * emit-rtl.cc (emit_status::ensure_regno_capacity): Change unsigned char* to unsigned short* (init_emit): Likewise. * function.h (struct GTY): Likewise. --- gcc/emit-rtl.cc | 6 +++--- gcc/function.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc index 1856fa4884f..f0f0ad193b5 100644 --- a/gcc/emit-rtl.cc +++ b/gcc/emit-rtl.cc @@ -1231,9 +1231,9 @@ emit_status::ensure_regno_capacity () while (reg_rtx_no >= new_size) new_size *= 2; - char *tmp = XRESIZEVEC (char, regno_pointer_align, new_size); + short *tmp = XRESIZEVEC (short, regno_pointer_align, new_size); memset (tmp + old_size, 0, new_size - old_size); - regno_pointer_align = (unsigned char *) tmp; + regno_pointer_align = (unsigned short *) tmp; rtx *new1 = GGC_RESIZEVEC (rtx, regno_reg_rtx, new_size); memset (new1 + old_size, 0, (new_size - old_size) * sizeof (rtx)); @@ -5972,7 +5972,7 @@ init_emit (void) crtl->emit.regno_pointer_align_length = LAST_VIRTUAL_REGISTER + 101; crtl->emit.regno_pointer_align -= XCNEWVEC (unsigned char, crtl->emit.regno_pointer_align_length); += XCNEWVEC (unsigned short, crtl->emit.regno_pointer_align_length); regno_reg_rtx = ggc_cleared_vec_alloc (crtl->emit.regno_pointer_align_length); diff --git a/gcc/function.h b/gcc/function.h index 2d775b877fc..c4a20060844 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -72,7 +72,7 @@ struct GTY(()) emit_status { /* Indexed by pseudo register number, if nonzero gives the known alignment for that pseudo (if REG_POINTER is set in x_regno_reg_rtx). Allocated in parallel with x_regno_reg_rtx. */ - unsigned char * GTY((skip)) regno_pointer_align; + unsigned short * GTY((skip)) regno_pointer_align; };
Re: [RFC] Linux system call builtins
Hi, On Mon, Apr 08, 2024 at 06:19:14AM -0300, Matheus Afonso Martins Moreira via Gcc wrote: > Hello! I'm a beginner when it comes to GCC development. > I want to learn how it works and start contributing. > Decided to start by implementing something relatively simple > but which would still be very useful for me: Linux builtins. > I sought help in the OFTC IRC channel and it was suggested > that I discuss it here first and obtain consensus before > spending more time on it since it might not be acceptable. > > I'd like to add GCC builtins for generating Linux system call > code for all architectures supported by Linux. > > They would look like this: > > __builtin_linux_system_call(long n, ...) > __builtin_linux_system_call_1(long n, long _1) > __builtin_linux_system_call_2(long n, long _1, long _2) > /* More definitions, all the way up to 6 arguments */ > As noted by J. Wakely, you don't need to have one variant for each number of arguments. By the way, even if you have multiple variants you could unify them all under a macro __builtin_linux_system_call by means such as "overloading macros based on the argument count." [1] > Calling these builtins will make GCC place all the parameters > in the correct registers for the system call, emit the appropriate > instruction for the target architecture and return the result. > In other words, they would implement the calling convention[1] of > the Linux system calls. > > I'm often asked why anyone should care about this system call stuff, > and I've been asked why I want this added to GCC in particular. > My rationale is as follows: > > + It's stable > [snip] I assume you're talking about the interface which is often abstracted by functions such as the following which are often found in libcs or freestanding libraries. The musl is a typical example (cf syscall_arch.h) for each architecture ( https://git.musl-libc.org/cgit/musl/tree/arch ) long linux_system_call_1(long number, long _1) { register long rax __asm__("rax") = number; register long rdi __asm__("rdi") = _1; __asm__ volatile ("syscall" : "+r" (rax) : "r" (rdi) : "rcx", "r11", "cc", "memory"); return rax; } > > + It's a calling convention > > GCC already supports many calling conventions > via function attributes. On x86 alone[3] there's > cdecl, fastcall, thiscall, stdcall, ms_abi, sysv_abi, > Win32 specific hot patching hooks. So I believe this > would not at all be a strange addition to the compiler. I may be wrong, but I think that at least on sysv x86_64, syscalls have the same calling conventions as regular functions. However, the function descriptor is not an address (or a symbol reference) but a number. > > + It's becoming common > [snip] > > + It doesn't make sense for libraries to support it > [snip] At least, it would be nice if not all freestanding libraries had to reimplement those syscalls stubs. > > + It allows freestanding software to easily target Linux > > + It centralizes functionality in the compiler > > + It allows other languages to easily target Linux > > + Compilers seem like the proper place for it I tend to agree with those points. > Implementation wise, I have managed to define the above builtins > in my GCC branch and compile it successfully. I have not yet > figured out how or even where to implement the code generation. > I was hoping to show up here with patches ready for review > but it really is a complex project. That's why I would like to > to see what the community thinks before proceeding. > I think you could have a look at the function 'expand_call' in calls.cc to see how regular calls are expanded to RTL and see what you would need to do to support calls which use a number rather than an address. Cheers, Paul [1]: https://jadlevesque.github.io/PPMP-Iceberg/explanations#overloading-macros-based-on-argument-count
Re: [RFC] Linux system call builtins
On Mon, Apr 08, 2024 at 11:26:40AM -0700, Andrew Pinski wrote: > On Mon, Apr 8, 2024 at 11:20 AM Paul Iannetta via Gcc wrote: > > > > Hi, > > > > On Mon, Apr 08, 2024 at 06:19:14AM -0300, Matheus Afonso Martins Moreira > > via Gcc wrote: > > > Hello! I'm a beginner when it comes to GCC development. > > > I want to learn how it works and start contributing. > > > Decided to start by implementing something relatively simple > > > but which would still be very useful for me: Linux builtins. > > > I sought help in the OFTC IRC channel and it was suggested > > > that I discuss it here first and obtain consensus before > > > spending more time on it since it might not be acceptable. > > > > > > I'd like to add GCC builtins for generating Linux system call > > > code for all architectures supported by Linux. > > > > > > They would look like this: > > > > > > __builtin_linux_system_call(long n, ...) > > > __builtin_linux_system_call_1(long n, long _1) > > > __builtin_linux_system_call_2(long n, long _1, long _2) > > > /* More definitions, all the way up to 6 arguments */ > > > > > > > As noted by J. Wakely, you don't need to have one variant for each > > number of arguments. By the way, even if you have multiple variants > > you could unify them all under a macro __builtin_linux_system_call by > > means such as "overloading macros based on the argument count." [1] > > Actually you don't need a macro if implemented inside GCC. Can you can > count the number of arguments and expand it based on that. No reason > for macros. I fully agree here. I was mentioning the macro solution in the case where it is supported outside the compiler. > Now the question comes is the argument long or some other > type? E.g. for some 32bit ABIs built on top of 64bit ISA might always > just pass 32bits or they might allow passing the full 64bits. (x32 > might fall under this and MIPS n32). Or do you split a 64bit argument > into the lower and upper half registers. Maybe you should warn/error > out if not passed the correct sized argument. > Also do you sign or zero extend a 32bit argument for LP64 targets? > Right now it is not obvious nor documented in your examples. > Another case would be targets allowing an immediate argument for their syscall instruction. Sign extend is probably always an error, zero extend may give the expected results. Emitting an error or a warning seems a very good idea if the size does not match. Syscalls can receive both values or pointers (which may not have the same size as regular values) which may complicate the handling and the types of the arguments. However, for most complex ABIs, all the cases you mentioned should be addressed by each target backend by specializing the call/call_value SPNs in their machine description files, and specifying the right constraints. > > Thanks, > Andrew Pinski > > > > > > Calling these builtins will make GCC place all the parameters > > > in the correct registers for the system call, emit the appropriate > > > instruction for the target architecture and return the result. > > > In other words, they would implement the calling convention[1] of > > > the Linux system calls. > > > > > > I'm often asked why anyone should care about this system call stuff, > > > and I've been asked why I want this added to GCC in particular. > > > My rationale is as follows: > > > > > > + It's stable > > > [snip] > > > > I assume you're talking about the interface which is often abstracted > > by functions such as the following which are often found in libcs or > > freestanding libraries. The musl is a typical example (cf syscall_arch.h) > > for each architecture ( https://git.musl-libc.org/cgit/musl/tree/arch ) > > > > long linux_system_call_1(long number, long _1) > > { > > register long rax __asm__("rax") = number; > > register long rdi __asm__("rdi") = _1; > > > > __asm__ volatile > > ("syscall" > > > > : "+r" (rax) > > : "r" (rdi) > > : "rcx", "r11", "cc", "memory"); > > > > return rax; > > } > > > > > > > > + It's a calling convention > > > > > > GCC already supports many calling conventions > > > via function attributes. On x86 alone[3] there's >