Reposurgeon update
Translation from Python to Go is now 90% complete. Early results indicate a speedup of at least 3-4x on fast-import-stream reads; if I get a similar speedup on Subversion stream reads (which I think is very likely) that should pull my GCC test runs down below three hours each. And more speed gains than that seem quite likely as I've only spend a couple of hours on performance tuning so far. I don't know yet because most of the remaining 10% is in fact the Subversion dump stream reader. At least the major blocker I described in my last update has been removed; it's just effort and testing from here. Realistically, another serious attack on the repository conversion is probably about two months out. But progress is being made. -- http://www.catb.org/~esr/";>Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner
Spectre V1 diagnostic / mitigation
Hi, in the past weeks I've been looking into prototyping both spectre V1 (speculative array bound bypass) diagnostics and mitigation in an architecture independent manner to assess feasability and some kind of upper bound on the performance impact one can expect. https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is an interesting read in this context as well. For simplicity I have implemented mitigation on GIMPLE right before RTL expansion and have chosen TLS to do mitigation across function boundaries. Diagnostics sit in the same place but both are not in any way dependent on each other. The mitigation strategy chosen is that of tracking speculation state via a mask that can be used to zero parts of the addresses that leak the actual data. That's similar to what aarch64 does with -mtrack-speculation (but oddly there's no mitigation there). I've optimized things to the point that is reasonable when working target independent on GIMPLE but I've only looked at x86 assembly and performance. I expect any "final" mitigation if we choose to implement and integrate such would be after RTL expansion since RTL expansion can end up introducing quite some control flow whose speculation state is not properly tracked by the prototype. I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local mitigation and =3 does mitigation global with passing the state via TLS memory. The following was measured on a Haswell desktop CPU: -O2 vs. -O2 -fspectre-v1=2 Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 400.perlbench9770245 39.8 *9770452 21.6 * 184% 401.bzip29650378 25.5 *9650726 13.3 * 192% 403.gcc 8050236 34.2 *8050352 22.8 * 149% 429.mcf 9120223 40.9 *9120656 13.9 * 294% 445.gobmk 10490400 26.2 * 10490666 15.8 * 167% 456.hmmer9330388 24.1 *9330536 17.4 * 138% 458.sjeng 12100437 27.7 * 12100661 18.3 * 151% 462.libquantum 20720300 69.1 * 20720384 53.9 * 128% 464.h264ref 22130451 49.1 * 22130586 37.8 * 130% 471.omnetpp 6250291 21.5 *6250398 15.7 * 137% 473.astar7020334 21.0 *7020522 13.5 * 156% 483.xalancbmk6900182 37.9 *6900306 22.6 * 168% Est. SPECint_base2006 -- Est. SPECint2006-- -O2 -fspectre-v1=3 Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 400.perlbench9770497 19.6 * 203% 401.bzip29650772 12.5 * 204% 403.gcc 8050427 18.9 * 181% 429.mcf 9120696 13.1 * 312% 445.gobmk 10490726 14.4 * 181% 456.hmmer9330537 17.4 * 138% 458.sjeng 12100721 16.8 * 165% 462.libquantum 20720446 46.4 * 149% 464.h264ref 22130613 36.1 * 136% 471.omnetpp 6250471 13.3 * 162% 473.astar7020579 12.1 * 173% 483.xalancbmk6900350 19.7 * 192% Est. SPECint(R)_base2006 Not Run Est. SPECint2006-- While the following was measured on a Zen Epyc server: -O2 vs -O2 -fspectre-v1=2 Estimated Estimated Base BaseBasePeak PeakPeak Benchmarks Copies Run Time RateCopies Run Time Rate --- --- - ---- - - 500.perlbench_r 1499 3.19 * 1621 2.56 * 124% 502.gcc_r 1286 4.95 * 1
Re: Spectre V1 diagnostic / mitigation
On 12/18/18 8:36 AM, Richard Biener wrote: > > Hi, > > in the past weeks I've been looking into prototyping both spectre V1 > (speculative array bound bypass) diagnostics and mitigation in an > architecture independent manner to assess feasability and some kind > of upper bound on the performance impact one can expect. > https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is > an interesting read in this context as well. > > For simplicity I have implemented mitigation on GIMPLE right before > RTL expansion and have chosen TLS to do mitigation across function > boundaries. Diagnostics sit in the same place but both are not in > any way dependent on each other. > > The mitigation strategy chosen is that of tracking speculation > state via a mask that can be used to zero parts of the addresses > that leak the actual data. That's similar to what aarch64 does > with -mtrack-speculation (but oddly there's no mitigation there). > > I've optimized things to the point that is reasonable when working > target independent on GIMPLE but I've only looked at x86 assembly > and performance. I expect any "final" mitigation if we choose to > implement and integrate such would be after RTL expansion since > RTL expansion can end up introducing quite some control flow whose > speculation state is not properly tracked by the prototype. > > I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs > were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local > mitigation and =3 does mitigation global with passing the state > via TLS memory. > > The following was measured on a Haswell desktop CPU: [ ... ] Interesting. So we'd been kicking this issue around a bit internally. The number of packages where we'd want to turn this on was very small and thus it was difficult to justify burning resources in this space. LLVM might be an option for those limited packages, but LLVM is missing other security things we don't want to lose (such as stack clash mitigation). In the end we punted for the immediate future. We'll almost certainly revisit at some point and your prototype would obviously factor into the calculus around future decisions. [ ... ] > > > The patch relies heavily on RTL optimizations for DCE purposes. At the > same time we rely on RTL not statically computing the mask (RTL has no > conditional constant propagation). Full instrumentation of the classic > Spectre V1 testcase Right. But it does do constant propagation into arms of conditionals as well as jump threading. I'd fear they might compromise things. Obviously we'd need to look further into those issues. But even if they do, something like what you've done may mitigate enough vulnerable sequences that it's worth doing, even if there's some gaps due to "over" optimization in the RTL space. [ ... ] > > so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome. > > Patch below for reference (and your own testing in case you are curious). > I do not plan to pursue this further at this point. Understood. Thanks for posting it. We're not currently working in this space, but again, we may re-evaluate that stance in the future. jeff
Re: Spectre V1 diagnostic / mitigation
On 18/12/2018 15:36, Richard Biener wrote: > > Hi, > > in the past weeks I've been looking into prototyping both spectre V1 > (speculative array bound bypass) diagnostics and mitigation in an > architecture independent manner to assess feasability and some kind > of upper bound on the performance impact one can expect. > https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is > an interesting read in this context as well. Interesting, thanks for posting this. > > For simplicity I have implemented mitigation on GIMPLE right before > RTL expansion and have chosen TLS to do mitigation across function > boundaries. Diagnostics sit in the same place but both are not in > any way dependent on each other. We considered using TLS for propagating the state across call-boundaries on AArch64, but rejected it for several reasons. - It's quite expensive to have to set up the TLS state in every function; - It requires some global code to initialize the state variable - that's kind of ABI; - It also seems likely to be vulnerable to Spectre variant 4 - unless the CPU can always correctly store-to-load forward the speculation state, then you have the situation where the load may see an old value of the state - and that's almost certain to say "we're not speculating". The last one is really the killer here. > > The mitigation strategy chosen is that of tracking speculation > state via a mask that can be used to zero parts of the addresses > that leak the actual data. That's similar to what aarch64 does > with -mtrack-speculation (but oddly there's no mitigation there). We rely on the user inserting the new builtin, which we can more effectively optimize if the compiler is generating speculation state tracking data. That doesn't preclude a full solution at a later date, but it looked like it was likely overkill for protecting every load and safely pruning the loads is not an easy problem to solve. Of course, the builtin does require the programmer to do some work to identify which memory accesses might be vulnerable. R. > > I've optimized things to the point that is reasonable when working > target independent on GIMPLE but I've only looked at x86 assembly > and performance. I expect any "final" mitigation if we choose to > implement and integrate such would be after RTL expansion since > RTL expansion can end up introducing quite some control flow whose > speculation state is not properly tracked by the prototype. > > I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs > were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local > mitigation and =3 does mitigation global with passing the state > via TLS memory. > > The following was measured on a Haswell desktop CPU: > > -O2 vs. -O2 -fspectre-v1=2 > > Estimated Estimated > Base Base BasePeak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -- -- - --- - - > 400.perlbench9770245 39.8 *9770452 21.6 * > 184% > 401.bzip29650378 25.5 *9650726 13.3 * > 192% > 403.gcc 8050236 34.2 *8050352 22.8 * > 149% > 429.mcf 9120223 40.9 *9120656 13.9 * > 294% > 445.gobmk 10490400 26.2 * 10490666 15.8 * > 167% > 456.hmmer9330388 24.1 *9330536 17.4 * > 138% > 458.sjeng 12100437 27.7 * 12100661 18.3 * > 151% > 462.libquantum 20720300 69.1 * 20720384 53.9 * > 128% > 464.h264ref 22130451 49.1 * 22130586 37.8 * > 130% > 471.omnetpp 6250291 21.5 *6250398 15.7 * > 137% > 473.astar7020334 21.0 *7020522 13.5 * > 156% > 483.xalancbmk6900182 37.9 *6900306 22.6 * > 168% > Est. SPECint_base2006 -- > Est. SPECint2006-- > >-O2 -fspectre-v1=3 > > Estimated Estimated > Base Base BasePeak Peak Peak > Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio > -- -- - --- - - > 400.perlbench9770497 19.6 * > 203% > 401.bzip29650772 12.5 * > 204% > 403.gcc 8050427 18.9 * > 181% > 429.mcf 9120696 13.1 * > 312% > 445.gobmk
hi Allie
Hi Allie, We are all meeting to Khaleesi's later to watch the game. Remember, it finished at roughly 14:23 so be there early if possible.
Trophex Show 2k19
Hi, I am following up to check if you're company is interested in acquiring Attendees List of " The Trophex Show 2019" and the Total count will be 7,500. We have Special discount price for Christmas & New Year. Data Fields includes: Company name, Contact name, Title, Email address, Website, Industry, Telephone number, etc. Note: The counts given above are GDPR complaint with all the rules and regulations with 100% opt-in contacts Awaiting your reply Thanks & Regards, Helen Stovall Demand Generator To opt out, please reply with Leave Out in the Subject Line.
OpenACC 2.6 "host_data" construct, "if_present" clause
Hi Jakub! OpenACC 2.6 adds a new clause to the "host_data" construct: 2.8.3. "if_present clause". Gergő (in CC) is working on that. When an 'if_present' clause appears on the directive, the compiler will only change the address of any variable or array which appears in _var-list_ that is present on the current device. So, basically: --- a/libgomp/target.c +++ b/libgomp/target.c @@ -1130,13 +1130,17 @@ gomp_map_vars_async (struct gomp_device_descr *devicep, else if ((kind & typemask) == GOMP_MAP_USE_DEVICE_PTR) { cur_node.host_start = (uintptr_t) hostaddrs[i]; cur_node.host_end = cur_node.host_start; splay_tree_key n = gomp_map_lookup (mem_map, &cur_node); if (n == NULL) { + if ([...]) +/* No error, continue using the host address. */ +continue; gomp_mutex_unlock (&devicep->lock); gomp_fatal ("use_device_ptr pointer wasn't mapped"); } Note that this clause applies to *all* "use_device" ("GOMP_MAP_USE_DEVICE_PTR") clauses present on the "host_data" construct, so it's just a single bit flag for the construct. Do you suggest we yet add a new mapping kind "GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT" for that? And, any preference about the specific value to use? Gergő proposed: --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -80,6 +80,10 @@ enum gomp_map_kind GOMP_MAP_DEVICE_RESIDENT = (GOMP_MAP_FLAG_SPECIAL_1 | 1), /* OpenACC link. */ GOMP_MAP_LINK =(GOMP_MAP_FLAG_SPECIAL_1 | 2), +/* Like GOMP_MAP_USE_DEVICE_PTR below, translate a host to a device + address. If translation fails because the target is not mapped, + continue using the host address. */ +GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT = (GOMP_MAP_FLAG_SPECIAL_1 | 3), /* Allocate. */ GOMP_MAP_FIRSTPRIVATE =(GOMP_MAP_FLAG_SPECIAL | 0), /* Similarly, but store the value in the pointer rather than Or, I had the idea that we could avoid that, instead continue using "GOMP_MAP_USE_DEVICE_PTR", and transmit the "if_present" flag through the "int device" argument of "GOACC_data_start" (making sure that old executables continue to function as before). For OpenACC, that argument is only ever set to "GOMP_DEVICE_ICV" or "GOMP_DEVICE_HOST_FALLBACK" (for "if" clause evaluating to "false"), so has some bits to spare for that. However, I've not been able to convince myself that this solution would be any much prettier than adding a new mapping kind... ;-) Grüße Thomas
Re: Segfault Question
On 2018-12-17 11:12 a.m., nick wrote: > > > On 2018-12-17 10:23 a.m., Nathan Sidwell wrote: >> On 12/17/18 10:11 AM, Jonathan Wakely wrote: >> >>> The second snippet is his suggested fix for the caller of tsubst_expr >>> in expand_concept. That would have been a lot more helpful as a patch: >>> >>> --- a/gcc/cp/constraint.cc >>> +++ b/gcc/cp/constraint.cc >>> @@ -563,7 +563,7 @@ expand_concept (tree decl, tree args) >>> ++processing_template_decl; >>> tree result = tsubst_expr (def, args, tf_none, NULL_TREE, true); >>> --processing_template_decl; >>> - if (result == error_mark_node) >>> + if (result == error_mark_node || t == NULL_TREE) >>> return error_mark_node; >>> >>> /* And lastly, normalize it, check for implications, and save >>> >>> The point is that tsubst_expr can return NULL_TREE, we should check for it. >> >> Are there cases that tsubst_expr returns NULL when the incoming T is >> non-null? I.e. I'm hypothesizing DEF is NULL already. >> >> nathan >> > > Sorry about my miscommunication before it. > As for Nathan's comment you could be right. But the bug reports > two concept calls in gdb where only one crashes according to it. > However I managed to track down the differences to this occurring with the > seg fault caller: > #45 0x008f3dfa in (anonymous > namespace)::satisfy_associated_constraints (args=0x770ca4a0, > ci=0x770ca3c0) at ../../gcc/gcc/cp/cp-tree.h:1446 > versus without: > #30 0x008f55f2 in (anonymous namespace)::tsubst_compound_requirement > (in_decl=0x0, complain=0, args=0x770bfde8, t=0x770bf528) at > ../../gcc/gcc/tree.h:3658 > > Don't know why this would cause issues: > #define OMP_CLAUSE_PRIVATE_DEBUG(NODE) \ > (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PRIVATE)->base.public_flag) > > in gcc/tree.h on line 1448. Any ideas? > > Nick > I tried working on it more today and it seems that this make be wrong it call.c: for (ix = 1; args->iterate (ix, &arg); ++ix) tempvec->quick_push (arg); for add_candiates. I don't know why we aren't setting it like: tempvec->quick_push((*arg[ix])); If you would prefer I send a patch to show the proposed fix just let me known, Nick