Re: Function attribute((optimize(...))) ignored on inline functions?
On Thu, Jul 30, 2015 at 6:40 PM, Matt Turner wrote: > I'd like to tell gcc that it's okay to inline functions (such as > rintf(), to get the SSE4.1 roundss instruction) at particular call > sights without compiling the entire source file or calling function > with different CFLAGS. > > I attempted this by making inline wrapper functions annotated with > attribute((optimize(...))), but it appears that the annotation does > not apply to inline functions? Take for example, ex.c: > > #include > > static inline float __attribute__((optimize("-fno-trapping-math"))) > rintf_wrapper_inline(float x) > { >return rintf(x); > } > > float > rintf_wrapper_inline_call(float x) > { >return rintf(x); > } > > float __attribute__((optimize("-fno-trapping-math"))) > rintf_wrapper(float x) > { >return rintf(x); > } > > % gcc -O2 -msse4.1 -c ex.c > % objdump -d ex.o > > ex.o: file format elf64-x86-64 > > > Disassembly of section .text: > > : >0: e9 00 00 00 00 jmpq 5 >5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) >c: 00 00 00 00 > > 0010 : > 10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0 > 16: c3 retq > > whereas I expected that rintf_wrapper_inline_call would be the same as > rintf_wrapper. > > I've read that per-function optimization is broken [1]. Is this still > the case? Is there a way to accomplish what I want? Not in this way. Once rintf would be inlined the no-trapping-math flag would be gone. The only way is to use SSE intrinsics directly here or have the optimized variant not inlined. Richard. > [1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html
Re: making the new if-converter not mangle IR that is already vectorizer-friendly
On Wed, Jul 29, 2015 at 6:15 PM, Abe wrote: > [Richard wrote:] >> >> Yes, the GIMPLE if-converter should strictly be a vectorization enabler. >> If the "new" if-converter produces code that the vectorizer does not >> handle (on the target targeted) then it has done a bad job. > > > Understood [or at least I _think_ I did ;-)] and agreed. > > OTOH, there _are_ things that apparently the vectorizer > _could_ do better, esp. handling scatter/gather. > That improvement might be pending new vector > instructions in the target ISA[s], though... > > ... YMMV. ;-) > > > [Richard wrote:] >> >> I think I fixed this bug. It was because of an inverted test. >> 2015-07-10 Richard Biener >> PR tree-optimization/66823 >> * tree-if-conv.c (memrefs_read_or_written_unconditionally): >>Fix inverted predicate. > > > I can confirm that as of that just after that commit, the bug is gone. > > If I understand correctly, the bug was in the analysis leading to > the decision to convert or not, such that the fix makes the old > [pre-scratchpad] converter decide "don`t convert". Is that correct? Yes. > Would you like me to include the DejaGNUified form of the related new test > case > in the patch even though this bug no longer exists in the "old" converter? > Maybe it`s nice to keep the test for catching possible future regressions. Yes. > > [Richard wrote:] >> >> It was known to produce store data-races with > >> -ftree-loop-if-convert-stores. >> >> That is something the scratchpad use avoids >> (at the expense of a lot(?) of vectorization). > > > TTBOMK, that "expense" is temporary, pending fixes/improvement > to the new converter. IOW, I don`t think any of the relevant > regressions are as a result of "essential complexity". I don't see how you can fix this without scatter support in the CPU. > [Richard wrote:] >> >> First of all to fix regressions on the load if-conversion side > >> you should stop using a scratchpad for loads (do you have one? >> I seem to remember you do from the design description). > > Yes, we do have one. > > I respectfully disagree with the suggestion to remove all scratchpad use for > loads. > That use is what allows us to if-convert -- and therefor vectorize -- > expressions that > may result in invalid pointer dereferences for "condition is false" > iterations/columns. > AFAIK the old converter -- due to lack of a scratchpad -- must give up and > not > convert such cases, thus resulting in code of that kind not getting > vectorized. Correct. But then you should restrict the if-conversion to targets that can do gather vectorization. > As of now, the relevant "improvement" in/due-to the new converter is not > always > effective, i.e. it does not always result in a vectorization, due to the > current implementation of the scratchpad mechanism being too generic and > adding indirection that confuses the vectorizer. We intend to fix this. This should be fixed before the patch can go in. > > [Richard wrote:] >> >> Then you should IMHO keep the old store path > >> for --param allow-store-data-races=1. > > I have discussed this with Sebastian; > he and I formed a plan to allow the following modes: > > * no if conversion [note: %%] > * if-conversion of loadsonly, the old way > * if-conversion of stores only, the old way > * if-conversion of loads and stores, the old way > * if-conversion of loadsonly, with scratchpads > * if-conversion of stores only, with scratchpads > * if-conversion of loads and stores, with scratchpads > > "%%": to retain as the default in trunk GCC until we have > enough improvement, e.g. a cost model to decide whether > or not it`s good to if-convert a particular loop There is the existing mechanism to if-convert only a copy of the loop guarded with IFN_LOOP_VECOTRIZED so the cost model is that of the vectorizer when it tries to vectorize the if-converted copy. If it doesn't vectorize that copy the non-if-converted loop is preserved, otherwise it is deleted. I've argued in the past that the GIMPLE if-converter should always go that path (thus strictly be a vectorization enabler only). > Once all the known regressions in the new converter have been > fixed and the numbers [e.g. from SPEC] of loops vectorized > looks good, I expect/hope that it will be possible to > remove the old if converter in favor of the new one. > > I expect/hope that we will {enable {if conversion} by default > when autovectorization is enabled} once we have a cost model that > prevents {if conversion that is inadvisable due to it promoting > from conditional to unconditional the evaluation of [pure] > expression[s] that is/are too expensive, thereby making the > conversion a "pessimization" instead of a true optimization}. if-conversion is already on by default if vectorizing. We only do not introduce store-data-races by default. > > [Richard wrote:] >> >> I'd like to see a compariso
Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
On 29/07/15 23:14, Jeff Law wrote: On 07/28/2015 12:18 PM, Alex Velenko wrote: On 21/04/15 06:27, Jeff Law wrote: On 04/20/2015 01:09 AM, Shiva Chen wrote: Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks. This looks pretty good. I expanded the comment for the new function a bit and renamed the function in an effort to clarify its purpose. From reviewing can_replace_by, it seems it should have been handling this case, but clearly wasn't due to implementation details. I then bootstrapped and regression tested the patch on x86_64-linux-gnu where it passed. I also instrumented that compiler to see how often this code triggers. During a bootstrap it triggers a couple hundred times (which is obviously a proxy for cross jumping improvements). So it's triggering regularly on x86_64, which is good. I also verified that this fixes BZ64916 for an arm-non-eabi toolchain configured with --with-arch=armv7. Installed on the trunk. No new testcase as it's covered by existing tests. Thanks,, jeff Hi, I see this patch been committed in r56 on trunk. Is it okay to port this to fsf-5? It's not a regression, so backporting it would be generally frowned upon. If you feel strongly about it, you should ask Jakub, Joseph or Richi (the release managers) for an exception to the general policy. jeff Hi Jakub, Can this commit be ported to fsf-5? It fixed gcc.target/arm/pr43920-2.c at the time, so I think it is a good idea to port. Please, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916 Kind regards, Alex
Nathan Sidwell appointed nvptx maintainer
I am pleased to announce that the GCC Steering Committee has appointed Nathan Sidwell as nvptx maintainer. Please join me in congratulating Nathan on his new role. Nathan, please update your listing in the MAINTAINERS file. Happy hacking! David
Re: Nathan Sidwell appointed nvptx maintainer
On 07/31/15 13:37, David Edelsohn wrote: I am pleased to announce that the GCC Steering Committee has appointed Nathan Sidwell as nvptx maintainer. Please join me in congratulating Nathan on his new role. Nathan, please update your listing in the MAINTAINERS file. thanks! nathan
Re: making the new if-converter not mangle IR that is already vectorizer-friendly
[Abe wrote:] TTBOMK, that "expense" is temporary, pending fixes/improvement to the new converter. IOW, I don`t think any of the relevant regressions are as a result of "essential complexity". [Richard wrote:] I don't see how you can fix this without scatter support in the CPU. Well, for some test cases, I can see in the GIMPLE dumps that a/the scratchpad wasn`t needed here and wasn`t needed there. IOW, the converter that "knows how" to indirect via scratchpads is doing so _always_ rather than only when it`s truly needed. This is causing additional indirection which the vectorizer rejects. Making the new converter "smarter" should fix this. Overall, however, I see your point and agree with it. Gather support is needed for the loads and scatter for the stores, in the general case. [Richard wrote:] But then you should restrict the if-conversion to targets that can do gather vectorization. I think adding this requirement -- to generalize, _two_ req.s: _gather_ support for _load_ if conversion with [a] scratchpad[s], and scatter support for _store_ if conversion with [a] scratchpad[s] -- is a reasonable and probably wise goal. Doing so should eventually enable us to conclude automatically "it is safe to enable this kind of if conversion for/on this target" and enable the relevant transformations automatically when given e.g. "-O3" and autovectorization. [Abe wrote] As of now, the relevant "improvement" in/due-to the new converter is not always effective [...] We intend to fix this. [Richard wrote:] This should be fixed before the patch can go in. The current plan is to add new switches that control which [if any] if converter is active, and to have the old one _and_ the new one be included in GCC trunk for the time being. Then I/we can have more eyes on the new code, helping with bugs etc., without having any impact at all on code that only either is compiled with the old flags [which shall retain their old semantics for now] or with no relevant flags whatsoever. Is that plan OK? [Richard wrote:] There is the existing mechanism to if-convert only a copy of the loop guarded with IFN_LOOP_VECOTRIZED so the cost model is that of the vectorizer when it tries to vectorize the if-converted copy. If it doesn't vectorize that copy the non-if-converted loop is preserved, otherwise it is deleted. I think I found the code to which you are referring here, and I tried to use it [so as to not reduplicate efforts], but my first effort at doing so was unsuccessful AFAIK. Trying again -- this time with help from Jakub, I hope -- is on my work agenda. [Richard wrote:] I've argued in the past that the GIMPLE if-converter should always go that path (thus strictly be a vectorization enabler only). If I understand correctly what you mean by that, then I agree. if-conversion is already on by default if vectorizing. > We only do not introduce store-data-races by default. Thanks. Do you know if the e.g. GCC 5.1 [and 5.2, if bug fix not included] release[s] were "safe" even without your recent bug fix [July 10] for code _not_ compiled with "-ftree-loop-if-convert-stores"? I ask b/c the test case I wrote which was able to cause GCC to generate crashing code [for a correct-under-low-optimization test case] intentionally had a crash due to dereferencing a null pointer in a _load_. IIRC, getting GCC to generate that bad code required "-ftree-loop-if-convert-stores" despite it being a faulty _load_. FYI: I have made some progress on the SPEC testing, but I do not yet have the work at a point where I`m ready to report numbers. Regards, Abe
Bin Cheng as Loop Induction Variable Optimizations maintainer
I am pleased to announce that the GCC Steering Committee has appointed Bin Cheng as the IVopts maintainer. Please join me in congratulating Bin on his new role. Bin, please update your entry in the MAINTAINERS file. I also believe you have some patches to self-approve :-) Thanks, Jeff