Reposurgeon update

2018-12-18 Thread Eric S. Raymond
Translation from Python to Go is now 90% complete.  Early results
indicate a speedup of at least 3-4x on fast-import-stream reads; if I
get a similar speedup on Subversion stream reads (which I think is
very likely) that should pull my GCC test runs down below three hours
each.  And more speed gains than that seem quite likely as I've only
spend a couple of hours on performance tuning so far.

I don't know yet because most of the remaining 10% is in fact the
Subversion dump stream reader. At least the major blocker I described
in my last update has been removed; it's just effort and testing from
here.

Realistically, another serious attack on the repository conversion is
probably about two months out.  But progress is being made.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond

"The state calls its own violence `law', but that of the individual `crime'"
-- Max Stirner


Spectre V1 diagnostic / mitigation

2018-12-18 Thread Richard Biener


Hi,

in the past weeks I've been looking into prototyping both spectre V1 
(speculative array bound bypass) diagnostics and mitigation in an
architecture independent manner to assess feasability and some kind
of upper bound on the performance impact one can expect.
https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
an interesting read in this context as well.

For simplicity I have implemented mitigation on GIMPLE right before
RTL expansion and have chosen TLS to do mitigation across function
boundaries.  Diagnostics sit in the same place but both are not in
any way dependent on each other.

The mitigation strategy chosen is that of tracking speculation
state via a mask that can be used to zero parts of the addresses
that leak the actual data.  That's similar to what aarch64 does
with -mtrack-speculation (but oddly there's no mitigation there).

I've optimized things to the point that is reasonable when working
target independent on GIMPLE but I've only looked at x86 assembly
and performance.  I expect any "final" mitigation if we choose to
implement and integrate such would be after RTL expansion since
RTL expansion can end up introducing quite some control flow whose
speculation state is not properly tracked by the prototype.

I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
mitigation and =3 does mitigation global with passing the state
via TLS memory.

The following was measured on a Haswell desktop CPU:

-O2 vs. -O2 -fspectre-v1=2

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  -
400.perlbench9770245   39.8 *9770452   21.6 *  
184%
401.bzip29650378   25.5 *9650726   13.3 *  
192%
403.gcc  8050236   34.2 *8050352   22.8 *  
149%
429.mcf  9120223   40.9 *9120656   13.9 *  
294%
445.gobmk   10490400   26.2 *   10490666   15.8 *  
167%
456.hmmer9330388   24.1 *9330536   17.4 *  
138%
458.sjeng   12100437   27.7 *   12100661   18.3 *  
151%
462.libquantum  20720300   69.1 *   20720384   53.9 *  
128%
464.h264ref 22130451   49.1 *   22130586   37.8 *  
130%
471.omnetpp  6250291   21.5 *6250398   15.7 *  
137%
473.astar7020334   21.0 *7020522   13.5 *  
156%
483.xalancbmk6900182   37.9 *6900306   22.6 *  
168%
 Est. SPECint_base2006   --
 Est. SPECint2006--

   -O2 -fspectre-v1=3

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  -
400.perlbench9770497   19.6 *  
203%
401.bzip29650772   12.5 *  
204%
403.gcc  8050427   18.9 *  
181%
429.mcf  9120696   13.1 *  
312%
445.gobmk   10490726   14.4 *  
181%
456.hmmer9330537   17.4 *  
138%
458.sjeng   12100721   16.8 *  
165%
462.libquantum  20720446   46.4 *  
149%
464.h264ref 22130613   36.1 *  
136%
471.omnetpp  6250471   13.3 *  
162%
473.astar7020579   12.1 *  
173%
483.xalancbmk6900350   19.7 *  
192%
 Est. SPECint(R)_base2006   Not Run
 Est. SPECint2006--


While the following was measured on a Zen Epyc server:

-O2 vs -O2 -fspectre-v1=2

   Estimated   Estimated
 Base BaseBasePeak PeakPeak
Benchmarks   Copies  Run Time RateCopies  Run Time Rate
--- ---  -  ----  -  -
500.perlbench_r   1499   3.19  *   1621   2.56  
* 124%
502.gcc_r 1286   4.95  *   1 

Re: Spectre V1 diagnostic / mitigation

2018-12-18 Thread Jeff Law
On 12/18/18 8:36 AM, Richard Biener wrote:
> 
> Hi,
> 
> in the past weeks I've been looking into prototyping both spectre V1 
> (speculative array bound bypass) diagnostics and mitigation in an
> architecture independent manner to assess feasability and some kind
> of upper bound on the performance impact one can expect.
> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> an interesting read in this context as well.
> 
> For simplicity I have implemented mitigation on GIMPLE right before
> RTL expansion and have chosen TLS to do mitigation across function
> boundaries.  Diagnostics sit in the same place but both are not in
> any way dependent on each other.
> 
> The mitigation strategy chosen is that of tracking speculation
> state via a mask that can be used to zero parts of the addresses
> that leak the actual data.  That's similar to what aarch64 does
> with -mtrack-speculation (but oddly there's no mitigation there).
> 
> I've optimized things to the point that is reasonable when working
> target independent on GIMPLE but I've only looked at x86 assembly
> and performance.  I expect any "final" mitigation if we choose to
> implement and integrate such would be after RTL expansion since
> RTL expansion can end up introducing quite some control flow whose
> speculation state is not properly tracked by the prototype.
> 
> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> mitigation and =3 does mitigation global with passing the state
> via TLS memory.
> 
> The following was measured on a Haswell desktop CPU:
[ ... ]
Interesting.  So we'd been kicking this issue around a bit internally.

The number of packages where we'd want to turn this on was very small
and thus it was difficult to justify burning resources in this space.
LLVM might be an option for those limited packages, but LLVM is missing
other security things we don't want to lose (such as stack clash
mitigation).

In the end we punted for the immediate future.  We'll almost certainly
revisit at some point and your prototype would obviously factor into the
calculus around future decisions.

[ ... ]


> 
> 
> The patch relies heavily on RTL optimizations for DCE purposes.  At the
> same time we rely on RTL not statically computing the mask (RTL has no
> conditional constant propagation).  Full instrumentation of the classic
> Spectre V1 testcase
Right. But it does do constant propagation into arms of conditionals as
well as jump threading.  I'd fear they might compromise things.
Obviously we'd need to look further into those issues.  But even if they
do, something like what you've done may mitigate enough vulnerable
sequences that it's worth doing, even if there's some gaps due to "over"
optimization in the RTL space.

[  ... ]

> 
> so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> 
> Patch below for reference (and your own testing in case you are curious).
> I do not plan to pursue this further at this point.
Understood.  Thanks for posting it.  We're not currently working in this
space, but again, we may re-evaluate that stance in the future.

jeff


Re: Spectre V1 diagnostic / mitigation

2018-12-18 Thread Richard Earnshaw (lists)
On 18/12/2018 15:36, Richard Biener wrote:
> 
> Hi,
> 
> in the past weeks I've been looking into prototyping both spectre V1 
> (speculative array bound bypass) diagnostics and mitigation in an
> architecture independent manner to assess feasability and some kind
> of upper bound on the performance impact one can expect.
> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> an interesting read in this context as well.

Interesting, thanks for posting this.

> 
> For simplicity I have implemented mitigation on GIMPLE right before
> RTL expansion and have chosen TLS to do mitigation across function
> boundaries.  Diagnostics sit in the same place but both are not in
> any way dependent on each other.

We considered using TLS for propagating the state across call-boundaries
on AArch64, but rejected it for several reasons.

- It's quite expensive to have to set up the TLS state in every function;
- It requires some global code to initialize the state variable - that's
kind of ABI;
- It also seems likely to be vulnerable to Spectre variant 4 - unless
the CPU can always correctly store-to-load forward the speculation
state, then you have the situation where the load may see an old value
of the state - and that's almost certain to say "we're not speculating".

The last one is really the killer here.

> 
> The mitigation strategy chosen is that of tracking speculation
> state via a mask that can be used to zero parts of the addresses
> that leak the actual data.  That's similar to what aarch64 does
> with -mtrack-speculation (but oddly there's no mitigation there).

We rely on the user inserting the new builtin, which we can more
effectively optimize if the compiler is generating speculation state
tracking data.  That doesn't preclude a full solution at a later date,
but it looked like it was likely overkill for protecting every load and
safely pruning the loads is not an easy problem to solve.  Of course,
the builtin does require the programmer to do some work to identify
which memory accesses might be vulnerable.

R.


> 
> I've optimized things to the point that is reasonable when working
> target independent on GIMPLE but I've only looked at x86 assembly
> and performance.  I expect any "final" mitigation if we choose to
> implement and integrate such would be after RTL expansion since
> RTL expansion can end up introducing quite some control flow whose
> speculation state is not properly tracked by the prototype.
> 
> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> mitigation and =3 does mitigation global with passing the state
> via TLS memory.
> 
> The following was measured on a Haswell desktop CPU:
> 
>   -O2 vs. -O2 -fspectre-v1=2
> 
>   Estimated   Estimated
> Base Base   BasePeak Peak   Peak
> Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
> -- --  -  ---  -  -
> 400.perlbench9770245   39.8 *9770452   21.6 * 
>  184%
> 401.bzip29650378   25.5 *9650726   13.3 * 
>  192%
> 403.gcc  8050236   34.2 *8050352   22.8 * 
>  149%
> 429.mcf  9120223   40.9 *9120656   13.9 * 
>  294%
> 445.gobmk   10490400   26.2 *   10490666   15.8 * 
>  167%
> 456.hmmer9330388   24.1 *9330536   17.4 * 
>  138%
> 458.sjeng   12100437   27.7 *   12100661   18.3 * 
>  151%
> 462.libquantum  20720300   69.1 *   20720384   53.9 * 
>  128%
> 464.h264ref 22130451   49.1 *   22130586   37.8 * 
>  130%
> 471.omnetpp  6250291   21.5 *6250398   15.7 * 
>  137%
> 473.astar7020334   21.0 *7020522   13.5 * 
>  156%
> 483.xalancbmk6900182   37.9 *6900306   22.6 * 
>  168%
>  Est. SPECint_base2006   --
>  Est. SPECint2006--
> 
>-O2 -fspectre-v1=3
> 
>   Estimated   Estimated
> Base Base   BasePeak Peak   Peak
> Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
> -- --  -  ---  -  -
> 400.perlbench9770497   19.6 * 
>  203%
> 401.bzip29650772   12.5 * 
>  204%
> 403.gcc  8050427   18.9 * 
>  181%
> 429.mcf  9120696   13.1 * 
>  312%
> 445.gobmk 

hi Allie

2018-12-18 Thread Britney Collier
Hi Allie,



We are all meeting to Khaleesi's later to watch the game.  Remember, it 
finished at roughly 14:23 so be there early if possible.




Trophex Show 2k19

2018-12-18 Thread Helen Stovall via gcc
Hi,

 

I am following up to check if you're company is interested in acquiring
Attendees List of " The Trophex Show 2019" and the Total count will be
7,500.

 

We have Special discount price for Christmas & New Year.

 

Data Fields includes: Company name, Contact name, Title, Email address,
Website, Industry, Telephone number, etc. 

 

 Note: The counts given above are GDPR complaint with all the rules and
regulations with 100% opt-in contacts

 

Awaiting your reply

 

Thanks & Regards,

Helen Stovall

Demand Generator

 

To opt out, please reply with Leave Out in the Subject Line.

 



OpenACC 2.6 "host_data" construct, "if_present" clause

2018-12-18 Thread Thomas Schwinge
Hi Jakub!

OpenACC 2.6 adds a new clause to the "host_data" construct:
2.8.3. "if_present clause".  Gergő (in CC) is working on that.

When an 'if_present' clause appears on the directive, the compiler
will only change the address of any variable or array which appears
in _var-list_ that is present on the current device.

So, basically:

--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1130,13 +1130,17 @@ gomp_map_vars_async (struct gomp_device_descr 
*devicep,
   else if ((kind & typemask) == GOMP_MAP_USE_DEVICE_PTR)
 {
   cur_node.host_start = (uintptr_t) hostaddrs[i];
   cur_node.host_end = cur_node.host_start;
   splay_tree_key n = gomp_map_lookup (mem_map, &cur_node);
   if (n == NULL)
 {
+  if ([...])
+/* No error, continue using the host address.  */
+continue;
   gomp_mutex_unlock (&devicep->lock);
   gomp_fatal ("use_device_ptr pointer wasn't mapped");
 }

Note that this clause applies to *all* "use_device"
("GOMP_MAP_USE_DEVICE_PTR") clauses present on the "host_data" construct,
so it's just a single bit flag for the construct.

Do you suggest we yet add a new mapping kind
"GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT" for that?  And, any preference about
the specific value to use?  Gergő proposed:

--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -80,6 +80,10 @@ enum gomp_map_kind
 GOMP_MAP_DEVICE_RESIDENT = (GOMP_MAP_FLAG_SPECIAL_1 | 1),
 /* OpenACC link.  */
 GOMP_MAP_LINK =(GOMP_MAP_FLAG_SPECIAL_1 | 2),
+/* Like GOMP_MAP_USE_DEVICE_PTR below, translate a host to a device
+   address.  If translation fails because the target is not mapped,
+   continue using the host address. */
+GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT =   
(GOMP_MAP_FLAG_SPECIAL_1 | 3),
 /* Allocate.  */
 GOMP_MAP_FIRSTPRIVATE =(GOMP_MAP_FLAG_SPECIAL | 0),
 /* Similarly, but store the value in the pointer rather than

Or, I had the idea that we could avoid that, instead continue using
"GOMP_MAP_USE_DEVICE_PTR", and transmit the "if_present" flag through the
"int device" argument of "GOACC_data_start" (making sure that old
executables continue to function as before).  For OpenACC, that argument
is only ever set to "GOMP_DEVICE_ICV" or "GOMP_DEVICE_HOST_FALLBACK" (for
"if" clause evaluating to "false"), so has some bits to spare for that.
However, I've not been able to convince myself that this solution would
be any much prettier than adding a new mapping kind...  ;-)


Grüße
 Thomas


Re: Segfault Question

2018-12-18 Thread nick



On 2018-12-17 11:12 a.m., nick wrote:
> 
> 
> On 2018-12-17 10:23 a.m., Nathan Sidwell wrote:
>> On 12/17/18 10:11 AM, Jonathan Wakely wrote:
>>
>>> The second snippet is his suggested fix for the caller of tsubst_expr
>>> in expand_concept. That would have been a lot more helpful as a patch:
>>>
>>> --- a/gcc/cp/constraint.cc
>>> +++ b/gcc/cp/constraint.cc
>>> @@ -563,7 +563,7 @@ expand_concept (tree decl, tree args)
>>>     ++processing_template_decl;
>>>     tree result = tsubst_expr (def, args, tf_none, NULL_TREE, true);
>>>     --processing_template_decl;
>>> -  if (result == error_mark_node)
>>> +  if (result == error_mark_node || t == NULL_TREE)
>>>   return error_mark_node;
>>>
>>>     /* And lastly, normalize it, check for implications, and save
>>>
>>> The point is that tsubst_expr can return NULL_TREE, we should check for it.
>>
>> Are there cases that tsubst_expr returns NULL when the incoming T is 
>> non-null?  I.e. I'm hypothesizing DEF is NULL already.
>>
>> nathan
>>
> 
> Sorry about my miscommunication before it. 
> As for Nathan's comment you could be right. But the bug reports
> two concept calls in gdb where only one crashes according to it.
> However I managed to track down the differences to  this occurring with the 
> seg fault caller:
> #45 0x008f3dfa in (anonymous 
> namespace)::satisfy_associated_constraints (args=0x770ca4a0, 
> ci=0x770ca3c0) at ../../gcc/gcc/cp/cp-tree.h:1446
> versus without:
> #30 0x008f55f2 in (anonymous namespace)::tsubst_compound_requirement 
> (in_decl=0x0, complain=0, args=0x770bfde8, t=0x770bf528) at 
> ../../gcc/gcc/tree.h:3658
> 
> Don't know why this would cause issues:
> #define OMP_CLAUSE_PRIVATE_DEBUG(NODE) \
> (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_PRIVATE)->base.public_flag)
> 
> in gcc/tree.h on line 1448. Any ideas?
> 
> Nick
> 

I tried working on it more today and it seems that this make be wrong it call.c:
for (ix = 1; args->iterate (ix, &arg); ++ix)
tempvec->quick_push (arg);

for add_candiates. I don't know why we aren't setting it like:

tempvec->quick_push((*arg[ix]));

If you would prefer I send a patch to show the proposed fix just let me known,

Nick