Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning
On Sun, 6 Jan 2019, Jan Hubicka wrote: > Hello, > while running benchmarks for inliner tuning I also run benchmarks > comparing -O2 and -O2 -ftree-vectorize -ftree-slp-vectorize using Martin > Liska's LNT setup (https://lnt.opensuse.org/). The results are > summarized below but you can also see also colorful table produced > by Martin's LNT magic > > https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=3&min_percentage_change=0.02&revisions=746f%2C55f&fbclid=IwAR1EhvEnavV5Fg5g404cTrguOXG2cW7b3mRZZvtYn1qy93zihyAanZ7AiWQ > https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10&min_percentage_change=0.02&revisions=746f%2C55f > > Overall we got following SPECrate improvements: > > SPECfp2k6 kabylake generic +7.15% > SPECfp2k6 kabylake native +9.36% > SPECfp2k17 kabylake generic +5.36% > SPECfp2k17 kabylake native +6.03% > SPECint2k17 kabylake generic +4.13% > > SPECfp2k6 zen generic +9.98% > SPECfp2k6 zen native +7.04% > SPECfp2k17 zen generic +6.11% > SPECfp2k17 zen native +5.46% > SPECint2k17 zen generic +3.61% > SPECint2k17 zen native +5.18% > > The performance results seems surprisingly a lot in favor of > vectorization. Martin's setup is also checking code size which goes up > by as much 26% on leslie 3d, but since many of benchmarks are small, > this is not very representative for overall code size/compile time costs > of vectorization. > > I measured compile time/size on larger programs I have available with > notable changes on DealII, but otherwise sub 1% increases. I also > benchmarked Firefox but there are no significant differences because > build system already uses -O3 for places where it matters (graphics > library etc.) Well, as much as compile-time/size of spec is not representable the performance improvements are. >Compile timecode segment size > Firefox mainlin in noise 0.8% > gcc from spec2k6 0.5% 0.6% > gdb 0.8% 0.3% > crafty0% 0% > DealII3.2% 4% > > Note that I benchmarked -ftree-slp-vectorize separately before and > results was hit/miss, so perhaps enabling only -ftree-vectorize would > give better compile time tradeoffs. I was worried of partial memory > stalls, but I will benchmark it and also benchmark difference between > cost models. > > There are some performance regressions, most notably in SPEC > - exchange (all settings), > - gamess (all settings), > - calculix (Zen native only), > - bwaves (zen native) > and induct2 on all settings and ffft2 zen only from Polyhedron. Botan > seems very noisy, but it is rather special code. > > Exchange can be fixed by adding heuristics that it is bad idea to > vectorize withing loop nest of 10 containing recursive call. I believe > gamess and calculix are understood and i can look into the remaining > cases. > > Overall I am surprised how many improvements vectorization at -O2 can do > - clearly more parallel CPUs depends it depends on it. In my experience > from analyzing regressions of gcc -O2 compared to clang -O2 buids, > vectorization is one of most common reasons. Having gcc -O2 producing > lower SPEC scores and comparably large binaries to clang -O2 does not > feel OK and I think the problem is not limited just to artificial > benchmarks. > > Even though it is late in release cycle I wonder if we can do that for > GCC 9? Performance of vectorization is very architecture specific, I > would propose enabling vectorization for Zen, core based chips and > generic in x86-64. I can also run benchmarks on buldozer. I can then > tune down the cheap model to avoid some of more expensive > transformations. I'd rather not do this now, it's _way_ too late (also considering you are again doing inliner tuning so late). See our last attempts at this btw. Richard. > Honza > > > Kabylake Spec2k6, generic tuning > > improvements: > SPEC2006/FP/481.wrf -31.33% > SPEC2006/FP/436.cactusADM -28.17% > SPEC2006/FP/437.leslie3d -17.21% > SPEC2006/FP/434.zeusmp-12.90% > SPEC2006/FP/454.calculix -6.44% > SPEC2006/FP/433.milc -6.03% > SPEC2006/FP/459.GemsFDTD -4.65% > SPEC2006/FP/450.soplex-2.11% > SPEC2006/INT/403.gcc -6.54% > SPEC2006/INT/456.hmmer-5.45% > SPEC2006/INT/464.h264ref -2.23% > regresions: > SPEC2006/FP/416.gamess8.51% > SPEC2006/FP/447.dealII2.73% > > Kabylake spec2k6 -march=native > > improvements: > SPEC2006/FP/436.cactusADM -45.52% > SPEC2006/FP/481.wrf -34.13% > SPEC2006/FP/434.zeusmp-20.25% > SPEC2006/FP/437.leslie3d -1
Re:4G WiFi cameras
4G camera !!! Hi dear gcc, How are you? This is ZYsecurity co.,ltd new 4G Wireless WiFi Bullet cameras. Support: * Audio/SD card/Wireless WiFi/4G/Reset bullon/APP:CamHi If you have interested in this products welcome to contact me to get more details. Looking forward to hear from you soon. Thanks & Regards Janson Zhan This email was sent to gcc@gcc.gnu.org (mailto:gcc@gcc.gnu.org) why did I get this? (https://zysecurity.us15.list-manage.com/about?u=180ff4e1e8d2da3c75d9a68d4&id=01b0171922&e=0d7ddca964&c=67e340c5a6) unsubscribe from this list (https://zysecurity.us15.list-manage.com/unsubscribe?u=180ff4e1e8d2da3c75d9a68d4&id=01b0171922&e=0d7ddca964&c=67e340c5a6) update subscription preferences (https://zysecurity.us15.list-manage.com/profile?u=180ff4e1e8d2da3c75d9a68d4&id=01b0171922&e=0d7ddca964) CCTV security . 5-F,3th Building,BaoFeng industrial area,XiaShuiJing, Buji Town, LongGang District, ShenZhen,China . Shenzhen, 86 518110 . China Email Marketing Powered by Mailchimp http://www.mailchimp.com/monkey-rewards/?utm_source=freemium_newsletter&utm_medium=email&utm_campaign=monkey_rewards&aid=180ff4e1e8d2da3c75d9a68d4&afl=1
Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning
> Note that I benchmarked -ftree-slp-vectorize separately before and > results was hit/miss, so perhaps enabling only -ftree-vectorize would > give better compile time tradeoffs. I was worried of partial memory > stalls, but I will benchmark it and also benchmark difference between > cost models. ; Alias to enable both -ftree-loop-vectorize and -ftree-slp-vectorize. ftree-vectorize Common Report Optimization Enable vectorization on trees. -- Eric Botcazou
Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning
> > Note that I benchmarked -ftree-slp-vectorize separately before and > > results was hit/miss, so perhaps enabling only -ftree-vectorize would > > give better compile time tradeoffs. I was worried of partial memory > > stalls, but I will benchmark it and also benchmark difference between > > cost models. > > ; Alias to enable both -ftree-loop-vectorize and -ftree-slp-vectorize. > ftree-vectorize > Common Report Optimization > Enable vectorization on trees. Thanks! I would probably fall into that trap and run same set of benchmarks again. Honza > > -- > Eric Botcazou
Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning
On Mon, Jan 07, 2019 at 09:29:09AM +0100, Richard Biener wrote: > On Sun, 6 Jan 2019, Jan Hubicka wrote: > > Even though it is late in release cycle I wonder if we can do that for > > GCC 9? Performance of vectorization is very architecture specific, I > > would propose enabling vectorization for Zen, core based chips and > > generic in x86-64. I can also run benchmarks on buldozer. I can then > > tune down the cheap model to avoid some of more expensive > > transformations. > > I'd rather not do this now, it's _way_ too late (also considering > you are again doing inliner tuning so late). This probably should be more generic than just x86 really, we have similar problems on Power (-O3 is almost always faster than -O2, which is bad). Likely other archs have the same problems. But yes, too late for GCC 9. Segher
Re: Enabling vectorization at -O2 for x86 generic, core and zen tuning
> On Mon, Jan 07, 2019 at 09:29:09AM +0100, Richard Biener wrote: > > On Sun, 6 Jan 2019, Jan Hubicka wrote: > > > Even though it is late in release cycle I wonder if we can do that for > > > GCC 9? Performance of vectorization is very architecture specific, I > > > would propose enabling vectorization for Zen, core based chips and > > > generic in x86-64. I can also run benchmarks on buldozer. I can then > > > tune down the cheap model to avoid some of more expensive > > > transformations. > > > > I'd rather not do this now, it's _way_ too late (also considering > > you are again doing inliner tuning so late). > > This probably should be more generic than just x86 really, we have similar > problems on Power (-O3 is almost always faster than -O2, which is bad). > Likely other archs have the same problems. > > But yes, too late for GCC 9. Yep, I guessed so, still wanted to ask :) I think this is similar to schedule-insns(2) which is subtarget specific whether it is a win or not. So I think it is good to leave up to target to enable the pass - we probably have fewer targets that do want vectorizing than those we don't. I would suggest enabling it on x86 early next stage1 and try to do similar benchmarks on ppc and arm. We can then try to tune the code size/speed tradeoffs. Honza > > > Segher
GCC 9 Status report (2019-01-07), trunk in regression and documentation fixes mode
Status == Stage 3 is done now. Changes of GCC trunk should now be restricted to regression and documentation fixes. That is, it is in the same mode as the open release branches we have. As soon as the count of P1 bugs drops to zero (and un-categorized, aka P3 bugs have been categorized) you can expect trunk to branch and stage 1 open for general development of GCC 10. Do not hold your breath though, history suggests you'll have to wait until mid of April for that to happen. You can make it happen faster by fixing regressions. Please also give your favorite target production-level quality testing and make sure to file bugs about regressions you encounter. Quality Data Priority # Change from GCC 8 stage3 -> stage4 transition --- --- P1 42 + 6 P2 187 + 54 P3 47 - 10 P4 182 + 24 P5 25 - 2 --- --- Total P1-P3 276 + 50 Total 483 + 72 Previous Report === https://gcc.gnu.org/ml/gcc/2018-11/msg00067.html
Patch Resend
Greetings All, I was wondering as I sent a patch before the holidays if I should resend it as I did not get any replies. Thanks, Nick
Re: Patch Resend
On Mon, 7 Jan 2019 at 15:42, nick wrote: > > Greetings All, > > I was wondering as I sent a patch before the holidays if I should resend it > as I did not get any replies. Which patch? I don't see any patch from you that didn't get some replies.
Re: Patch Resend
On 2019-01-07 10:44 a.m., Jonathan Wakely wrote: > On Mon, 7 Jan 2019 at 15:42, nick wrote: >> >> Greetings All, >> >> I was wondering as I sent a patch before the holidays if I should resend it >> as I did not get any replies. > > Which patch? I don't see any patch from you that didn't get some replies. > Sorry this is what I was talking about it's a fix for a bad patch: This fixes the bug id, 71176 to use the proper known code print formatter type, %lu for size_t rather than %d which is considered best pratice for print statements. Signed-off-by: Nicholas Krause --- fixincludes/fixincl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c index 6dba2f6e830..5b8b77a77f0 100644 --- a/fixincludes/fixincl.c +++ b/fixincludes/fixincl.c @@ -158,11 +158,11 @@ main (int argc, char** argv) if (VLEVEL( VERB_PROGRESS )) { tSCC zFmt[] = "\ -Processed %5d files containing %d bytes\n\ +Processed %5d files containing %lu bytes\n\ Applying %5d fixes to %d files\n\ Altering %5d of them\n"; -fprintf (stderr, zFmt, process_ct, ttl_data_size, apply_ct, +fprintf (stderr, zFmt, process_ct, (unsigned int long) ttl_data_size, apply_ct, fixed_ct, altered_ct); } #endif /* DO_STATS */ -- 2.17.1 Nick
Re: Patch Resend
On Mon, 7 Jan 2019 at 15:51, nick wrote: > > > > On 2019-01-07 10:44 a.m., Jonathan Wakely wrote: > > On Mon, 7 Jan 2019 at 15:42, nick wrote: > >> > >> Greetings All, > >> > >> I was wondering as I sent a patch before the holidays if I should resend it > >> as I did not get any replies. > > > > Which patch? I don't see any patch from you that didn't get some replies. > > > Sorry this is what I was talking about it's a fix for a bad patch: Ah yes thanks, I see it now, at https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01511.html
LLVM/GCC social in Nanjing China: Jan 19, 2019
Hi all, The 5th LLVM/GCC social in Nanjing will happen on Jan 19, 2019. Everyone interested in LLVM/GCC/Toolchain/IDE related projects is invited to join. Event details is at https://mp.weixin.qq.com/s/7jupkPiRrlxjYEuglMbvFA BoF style. Presentations are welcome :-) Looking forward to meet you ! -- Best wishes, Wei Wu (吴伟)