Re: Different ASM for ReLU function between GCC11 and GCC12
On Mon, Jun 19, 2023 at 9:45 PM Jakub Jelinek via Gcc wrote: > > On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote: > > I noticed that a simple function like > > auto relu( float x ) { > > return x > 0.f ? x : 0.f; > > } > > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On > > -O3 -mavx2 the former compiles above function to > > Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list, > if you are convinced that loading the constant from memory is faster. > Another possibility is > vxorps xmm1, xmm1, xmm1 > vmaxss xmm0, xmm0, xmm1 > ret > which doesn't need to wait for the memory. > This changed with https://gcc.gnu.org/r12-7693 I guess we previously were able to see that one operand of the comparison was not NaN. Maybe some REG_EQUAL note can come to the rescue here? > > > > relu(float): > > vmaxss xmm0, xmm0, DWORD PTR .LC0[rip] > > ret > > .LC0: > > .long 0 > > > > which is what I would naively expect and what also clang essentially does > > (clang actually uses an xor before the maxss to get the zero). The latter, > > however, compiles the function to > > > > relu(float): > > vxorps xmm1, xmm1, xmm1 > > vcmpltss xmm2, xmm1, xmm0 > > vblendvps xmm0, xmm1, xmm0, xmm2 > > ret > > > > which looks like a missed optimisation. Does anyone know if there's a > > reason for the changed behaviour? > > Jakub >
Re: Different ASM for ReLU function between GCC11 and GCC12
On Tue, Jun 20, 2023 at 10:15:37AM +0200, Richard Biener wrote: > On Mon, Jun 19, 2023 at 9:45 PM Jakub Jelinek via Gcc wrote: > > > > On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote: > > > I noticed that a simple function like > > > auto relu( float x ) { > > > return x > 0.f ? x : 0.f; > > > } > > > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On > > > -O3 -mavx2 the former compiles above function to > > > > Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list, > > if you are convinced that loading the constant from memory is faster. > > Another possibility is > > vxorps xmm1, xmm1, xmm1 > > vmaxss xmm0, xmm0, xmm1 > > ret > > which doesn't need to wait for the memory. > > This changed with https://gcc.gnu.org/r12-7693 > > I guess we previously were able to see that one operand of > the comparison was not NaN. Maybe some REG_EQUAL > note can come to the rescue here? ce1 pass results in emit_conditional_move with (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (reg:SF 84) operands in the GCC 11 case and so is successfully matched by ix86_expand_fp_movcc as ix86_expand_sse_fp_minmax. But, in GCC 12+, emit_conditional_move is called with (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (const_double:SF 0.0 [0x0.0p+0]) instead (reg:SF 84 in both cases contains the (const_double:SF 0.0 [0x0.0p+0]) value, in the GCC 11 case loaded from memory, in the GCC 12+ case set directly in a move. The reason for the difference is exactly that because cheap SSE constant can be moved directly into a reg, it is done so instead of reusing a pseudo that contains that value already. In the latter case ix86_expand_fp_movcc is called even not with the const_double because the expander doesn't allow immediates, but with it forced into some other register, so it can't really find out it is actually a minmax. Even if it allowed the cheap SSE constants, it wouldn't know that r84 is also zero (unless the expander checks that it is a pseudo with a single setter and verifies it or something similar). Jakub
Re: Different ASM for ReLU function between GCC11 and GCC12
Hello, On Tue, 20 Jun 2023, Jakub Jelinek via Gcc wrote: > ce1 pass results in emit_conditional_move with > (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (reg:SF 84) > operands in the GCC 11 case and so is successfully matched by > ix86_expand_fp_movcc as ix86_expand_sse_fp_minmax. > But, in GCC 12+, emit_conditional_move is called with > (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (const_double:SF > 0.0 [0x0.0p+0]) > instead (reg:SF 84 in both cases contains the (const_double:SF 0.0 [0x0.0p+0]) > value, in the GCC 11 case loaded from memory, in the GCC 12+ case set > directly in a move. The reason for the difference is exactly that > because cheap SSE constant can be moved directly into a reg, it is done so > instead of reusing a pseudo that contains that value already. But reg84 is _also_ used as operand of emit_conditional_move, so there's no reason to not also use it as third operand. It seems more canonical to call a function with X-containing-B, A, B than with X-containing-B, A, something-equal-to-B-but-not-B so either the (const_double) RTL should be used in both, or reg84 should, but not a mix. Exactly so to ... > actually a minmax. Even if it allowed the cheap SSE constants, > it wouldn't know that r84 is also zero (unless the expander checks that > it is a pseudo with a single setter and verifies it or something similar). ... not have this problem. Ciao, Michael.
Re: Different ASM for ReLU function between GCC11 and GCC12
On Tue, Jun 20, 2023 at 03:03:19PM +, Michael Matz via Gcc wrote: > Hello, > > On Tue, 20 Jun 2023, Jakub Jelinek via Gcc wrote: > > > ce1 pass results in emit_conditional_move with > > (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (reg:SF 84) > > operands in the GCC 11 case and so is successfully matched by > > ix86_expand_fp_movcc as ix86_expand_sse_fp_minmax. > > But, in GCC 12+, emit_conditional_move is called with > > (gt (reg/v:SF 83 [ x ]) (reg:SF 84)), (reg/v:SF 83 [ x ]), (const_double:SF > > 0.0 [0x0.0p+0]) > > instead (reg:SF 84 in both cases contains the (const_double:SF 0.0 > > [0x0.0p+0]) > > value, in the GCC 11 case loaded from memory, in the GCC 12+ case set > > directly in a move. The reason for the difference is exactly that > > because cheap SSE constant can be moved directly into a reg, it is done so > > instead of reusing a pseudo that contains that value already. > > But reg84 is _also_ used as operand of emit_conditional_move, so there's > no reason to not also use it as third operand. It seems more canonical to > call a function with > > X-containing-B, A, B > > than with > > X-containing-B, A, something-equal-to-B-but-not-B > > so either the (const_double) RTL should be used in both, or reg84 should, > but not a mix. Exactly so to ... If ifcvt.cc knows that, then sure, but it can't be (easily) worked around on the backend side... Jakub
Announcing GCC Code of Conduct
I am pleased to announce that the GCC Steering Committee has decided to adopt a Code of Conduct (https://gcc.gnu.org/conduct.html) for interactions in GCC project spaces, including mailing lists, bugzilla, and IRC. The vast majority of the time, the GCC community is a very civil, cooperative space. On the rare occasions that it isn't, it's helpful to have something to point to to remind people of our expectations. It's also good for newcomers to have something to refer to, for both how they are expected to conduct themselves and how they can expect to be treated. More importantly, if there is offensive behavior that isn't corrected immediately, it's important for there to be a way to report that to the project leadership so that we can intervene. At this time the CoC is preliminary: the code itself should be considered active, but the CoC committee (and so the reporting and response procedures) are not yet in place. Specific suggestions for improvement are welcome, either on the gcc-patches post or by email to cond...@gcc.gnu.org. If you are interested in serving on the CoC committee, or would like to suggest someone who you think would be a good candidate, please email cond...@gcc.gnu.org. GCC Code of Conduct FAQ (https://gcc.gnu.org/conduct-faq.html): Why not just refer to the GNU Kind Communication Guidelines? The Guidelines are helpful for establishing the kind of behavior we want to see, but it's also important to have a reporting mechanism to help people feel safe and supported in the community, and to help leadership to hear about problems that might otherwise have escaped their notice. Shouldn't people try to work problems out between themselves first? Certainly, in many cases. And we hope referring to the CoC might be helpful then, as well. If the problem is successfully resolved, no report is necessary, though individuals might still want to let the CoC committee know about the incident just for their information. What about the rights of the reportee? The CoC committee will get their perspective, and any other available information, before taking any action. Besides which, we expect the response to the vast majority of incidents to be email asking those involved to moderate their behavior. That has been the experience of other free software projects after adopting a code of conduct: see the Linux Kernel CoC reports for an example. Is this going to be used to drive out people with "wrong" opinions? No, this is a code of conduct, not a code of philosophy. And it only deals with behavior within the context of the GCC project; for instance, harassment in private email in response to a public discussion is covered, a social media post about politics is not. Can I report incidents from before the adoption of the CoC? Yes. We may take no action if the issue seems to have been resolved, but it can be helpful to have context for future discussions. My question isn't answered here! Please also see the Reporting Guidelines (https://gcc.gnu.org/conduct-report.html) and Response Guide (https://gcc.gnu.org/conduct-response.html). If they don't answer your question either, feel free to ask here or email cond...@gcc.gnu.org with any additional questions or feedback.
Re: [PATCH v5 3/5] p1689r5: initial support
On Mon, Jun 19, 2023 at 17:33:58 -0400, Jason Merrill wrote: > On 5/12/23 10:24, Ben Boeckel wrote: > > `file` can be omitted (the `output_stream` will be used then). I *think* > > I see that adding: > > > > %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}} > > %{!fdeps-file: but yes. > > > would at least do for `-fdeps-file` defaults? I don't know if there's a > > reasonable default for `-fdeps-target=` though given that this command > > line has no information about the object file that will be used (`-o` is > > used for preprocessor output since we're leaning on `-E` here). > > I would think it could default to %b.o? I suppose that could work, yes. > I had quite a few more comments on the v5 patch that you didn't respond > to here or address in the v6 patch; did your mail client hide them from you? Oof. Sorry, I saw large chunks of quoting and apparently assumed the rest was fine (I usually do aggressive trimming when doing that style of review). I see them now. Will go through and include in v7. --Ben
Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF
Le 19/06/2023 à 23:34, Jason Merrill a écrit : On 6/6/23 16:50, Ben Boeckel wrote: Unicode does not support such values because they are unrepresentable in UTF-16. Pushed. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 7 +++ 1 file changed, 7 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d7f323b2cd5..3b34d804cf1 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp); if (err) return false; + + /* Additionally, Unicode declares that all codepoints above 0010 are + invalid because they cannot be represented in UTF-16. + + Reject such values.*/ + if (cp >= 0x10) + return false; } /* No problems encountered. */ return true; Hello, I think the comparison should be ">" instead of ">=" as 0x10 seems a valid value (Unicode says value above 0x10 is invalid). Other tests around same value in this file are using ">". Regards, Damien
Re: [PATCH v5 3/5] p1689r5: initial support
On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote: > On 1/25/23 13:06, Ben Boeckel wrote: > > - header-unit information fields > > > > Header units (including the standard library headers) are 100% > > unsupported right now because the `-E` mechanism wants to import their > > BMIs. A new mode (i.e., something more workable than existing `-E` > > behavior) that mocks up header units as if they were imported purely > > from their path and content would be required. > > I notice that the cpp dependency generation tries (in open_file_failed) > to continue after encountering a missing file, is that not sufficient > for header units? Or adjustable to be sufficient? No. Header units can introduce macros which can be used to modify the set of modules that are imported. Included headers are "discovered" dependencies and don't modify the build graph (just add more files that trigger a rebuild) and can be collected during compilation. Module dependencies are needed to get the build correct in the first place in order to: - order module compilations in the build graph so that imported modules are ready before anything using them; and - computing the set of flags needed for telling the compiler where imported modules' CMI files should be located. > > - non-utf8 paths > > > > The current standard says that paths that are not unambiguously > > represented using UTF-8 are not supported (because these cases are rare > > and the extra complication is not worth it at this time). Future > > versions of the format might have ways of encoding non-UTF-8 paths. For > > now, this patch just doesn't support non-UTF-8 paths (ignoring the > > "unambiguously represetable in UTF-8" case). > > typo "representable" Fixed. > > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc > > index c68a2a27469..1c14ce3fe8e 100644 > > --- a/gcc/c-family/c-opts.cc > > +++ b/gcc/c-family/c-opts.cc > > @@ -77,6 +77,9 @@ static bool verbose; > > /* Dependency output file. */ > > static const char *deps_file; > > > > +/* Enhanced dependency output file. */ > > Maybe "structured", as in the docs? It isn't really a direct > enhancement of the makefile dependencies. Agreed. I'll also add a link to p1689r5 as a comment for what "structured" means where it is parsed out. > > + if (cpp_opts->deps.format != DEPS_FMT_NONE) > > +{ > > + if (!fdeps_file) > > + fdeps_stream = out_stream; > > + else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0') > > + fdeps_stream = stdout; > > You probably want to check that deps_stream and fdeps_stream don't end > up as the same stream. Hmm. But `stdout` is probably fine to use for both though. Basically: if (fdeps_stream == out_stream && fdeps_stream != stdout) make_diagnostic_noise (); > > @@ -1374,6 +1410,8 @@ handle_deferred_opts (void) > > > > if (opt->code == OPT_MT || opt->code == OPT_MQ) > > deps_add_target (deps, opt->arg, opt->code == OPT_MQ); > > + else if (opt->code == OPT_fdep_output_) > > + deps_add_output (deps, opt->arg, true); > > How about fdeps_add_target? Renamed. > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > > index ef371ca8c26..630781fdf8a 100644 > > --- a/gcc/c-family/c.opt > > +++ b/gcc/c-family/c.opt > > @@ -256,6 +256,18 @@ MT > > C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target > > after %qs) > > -MT Add a target that does not require quoting. > > > > +fdep-format= > > +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after > > %qs) > > +Format for output dependency information. Supported (\"p1689r5\"). > > I think we want "structured" here, as well. Fixed. > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > > index 06d77983e30..b61c3ebd3ec 100644 > > --- a/gcc/doc/invoke.texi > > +++ b/gcc/doc/invoke.texi > > @@ -2791,6 +2791,21 @@ is @option{-fpermitted-flt-eval-methods=c11}. The > > default when in a GNU > > dialect (@option{-std=gnu11} or similar) is > > @option{-fpermitted-flt-eval-methods=ts-18661-3}. > > > > +@item -fdep-file=@var{file} > > +@opindex fdep-file > > +Where to write structured dependency information. > > + > > +@item -fdep-format=@var{format} > > +@opindex fdep-format > > +The format to use for structured dependency information. @samp{p1689r5} is > > the > > +only supported format right now. Note that when this argument is > > specified, the > > +output of @samp{-MF} is stripped of some information (namely C++ modules) > > so > > +that it does not use extended makefile syntax not understood by most tools. > > + > > +@item -fdep-output=@var{file} > > +@opindex fdep-output > > +Analogous to @option{-MT} but for structured dependency information. > > Please add more detail about how these are intended to be used. Will do. > > diff --git a/gcc/testsuite/g++.dg/modules/p1689-1.C > > b/gcc/testsuite/g++.dg/modules/p1689-1.C > > new file mode 100644 > > index 000..245e30d09ce
Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF
On Tue, Jun 20, 2023 at 21:16:40 +0200, Damien Guibouret wrote: > I think the comparison should be ">" instead of ">=" as 0x10 seems a > valid value (Unicode says value above 0x10 is invalid). > Other tests around same value in this file are using ">". Ah, good catch. I'll make a separate patch submission for that. --Ben