On 11/6/20 11:13 AM, Martin Sebor wrote:
On 11/5/20 8:08 PM, Andrew MacLeod wrote:
On 11/5/20 7:50 PM, Martin Sebor wrote:
On 11/5/20 5:02 PM, Andrew MacLeod wrote:
On 11/5/20 4:02 PM, Martin Sebor wrote:
On 11/5/20 12:29 PM, Martin Sebor wrote:
On 10/1/20 11:25 AM, Martin Sebor wrote:
On 10/1/20 9:34 AM, Aldy Hernandez wrote:
On 10/1/20 3:22 PM, Andrew MacLeod wrote:
> On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>> Thanks for doing all this! There isn't anything I don't
understand
>>> in the sprintf changes so no questions from me (well,
almost none).
>>> Just some comments:
>> Thanks for your comments on the sprintf/strlen API conversion.
>>
>>> The current call statement is available in all functions
that take
>>> a directive argument, as dir->info.callstmt. There should
be no need
>>> to also add it as a new argument to the functions that now
need it.
>> Fixed.
>>
>>> The change adds code along these lines in a bunch of places:
>>>
>>> + value_range vr;
>>> + if (!query->range_of_expr (vr, arg, stmt))
>>> + vr.set_varying (TREE_TYPE (arg));
>>>
>>> I thought under the new Ranger APIs when a range couldn't be
>>> determined it would be automatically set to the maximum for
>>> the type. I like that and have been moving in that direction
>>> with my code myself (rather than having an API fail, have it
>>> set the max range and succeed).
>> I went through all the above idioms and noticed all are
being used on
>> supported types (integers or pointers). So range_of_expr
will always
>> return true. I've removed the if() and the set_varying.
>>
>>> Since that isn't so in this case, I think it would still
be nice
>>> if the added code could be written as if the range were
set to
>>> varying in this case and (ideally) reduced to just
initialization:
>>>
>>> value_range vr = some-function (query, stmt, arg);
>>>
>>> some-function could be an inline helper defined just for
the sprintf
>>> pass (and maybe also strlen which also seems to use the
same pattern),
>>> or it could be a value_range AKA irange ctor, or it could
be a member
>>> of range_query, whatever is the most appropriate.
>>>
>>> (If assigning/copying a value_range is thought to be too
expensive,
>>> declaring it first and then passing it to that helper to
set it
>>> would work too).
>>>
>>> In strlen, is the removed comment no longer relevant?
(I.e., does
>>> the ranger solve the problem?)
>>>
>>> - /* The range below may be "inaccurate" if a
constant has been
>>> - substituted earlier for VAL by this pass that
hasn't been
>>> - propagated through the CFG. This shoud be fixed
by the new
>>> - on-demand VRP if/when it becomes available
(hopefully in
>>> - GCC 11). */
>> It should.
>>
>>> I'm wondering about the comment added to
get_range_strlen_dynamic
>>> and other places:
>>>
>>> + // FIXME: Use range_query instead of global ranges.
>>>
>>> Is that something you're planning to do in a followup or
should
>>> I remember to do it at some point?
>> I'm not planning on doing it. It's just a reminder that it
would be
>> beneficial to do so.
>>
>>> Otherwise I have no concern with the changes.
>> It's not cleared whether Andrew approved all 3 parts of the
patchset
>> or just the valuation part. I'll wait for his nod before
committing
>> this chunk.
>>
>> Aldy
>>
> I have no issue with it, so OK.
Pushed all 3 patches.
>
> Just an observation that should be pointed out, I believe
Aldy has all
> the code for converting to a ranger, but we have not pursued
that any
> further yet since there is a regression due to our lack of
equivalence
> processing I think? That should be resolved in the coming
month, but at
> the moment is a holdback/concern for converting these
passes... iirc.
Yes. Martin, the take away here is that the strlen/sprintf
pass has been converted to the new API, but ranger is still not
up and running on it (even on the branch).
With the new API, all you have to do is remove all instances of
evrp_range_analyzer and replace them with a ranger. That's it.
Below is an untested patch that would convert you to a ranger
once it's contributed.
IIRC when I enabled the ranger for your pass a while back,
there was one or two regressions due to missing equivalences,
and the rest were because the tests were expecting an actual
specific range, and the ranger returned a slightly
different/better one. You'll need to adjust your tests.
Ack. I'll be on the lookout for the ranger commit (if you hppen
to remember and CC me on it just in case I might miss it that would
be great).
I have applied the patch and ran some tests. There are quite
a few failures (see the list below). I have only looked at
a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
boils down to the following test case. There should be no warning
for either sprintf call. The one in h() is a false positive and
the reason for at least some of the regressions. Somehow,
the conversions between int and char are causing Ranger to lose
the range.
$ cat t.c && gcc -O2 -S -Wall t.c
char a[2];
extern int x;
signed char f (int min, int max)
{
signed char i = x;
return i < min || max < i ? min : i;
}
void ff (signed char i)
{
__builtin_sprintf (a, "%i", f (0, 9)); // okay
}
signed char g (signed char min, signed char max)
{
signed char i = x;
return i < min || max < i ? min : i;
}
void gg (void)
{
__builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
}
t.c: In function ‘gg’:
t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes
into a region of size 2 [-Wformat-overflow=]
24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
| ^~
t.c:24:25: note: directive argument in the range [-128, 127]
24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
| ^~~~
t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes
into a destination of size 2
24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another failure (the one in builtin-sprintf-warn-22.c) is this where
the false positive is due to the strlen pass somehow having lost
the upper bound on the sum of the two string lengths.
I'm guessing this might have something to do with the strlen pass
calling set_range_info() on the nonconstant results of strlen calls
it can determine the range for. How would I go about doing it with
ranger? I see ranger_cache::set_global_range() and the class ctor
takes a gimple_ranger as argument. Would calling the function be
the right way to do it? (I couldn't find anything on the Wiki.)
I haven't had time to write a new modern wiki yet... probably won't
until well after stage 1 closes either.
And no, thats the internal cache of the ranger, which you dont have
access to.
At the moment, there isnt a way to inject a "new" global range out
of the blue. It hasnt been a requirement before, but wouldn't be
particularly difficult.
Short answer, there is no current equivalent of set_range_info()
because we dont have a persistent global cache in the same way, and
the ranger doesnt set the RANGE_INFO field at this point.. we
havent gotten to the point of reworking that yet.
So the question is, what exactly are you trying to do? I presume
you are analyzing a strlen statement, and are making range
conclusions about either the result or some of the operands are
that are not otherwise exposed in the code?
And you are looking to have those values injected into the rangers
calculations as refinements?
Yes, exactly. This is used both for optimization and for warnings
(both to enable them and to suppress false positives).
Besides folding strlen and sprintf calls into constants, the strlen
pass sets the range of the results of the calls when the lengths of
the arguments/output are known to be at least some number of bytes.
Besides exposing optimization opportunities downstream, the pass
uses this information to either issue or suppress warnings like in
the sprintf case above. -Wformat-overflow issues warnings either
based on string lengths (or ranges) if they're known, or based on
the sizes of the arrays the strings are stored in when nothing is
known about the lengths. The array size is used as the worst case
estimate, which can cause false positives.
Here's an example of a downstream optimization made possible by
this. Because i's range is known, the range of the snprintf result
is also known (the range is computed for all snprintf arguments).
The strlen pass sets the range and subsequent passes eliminate
the unreachable code.
void f (int i)
{
if (i < 100 || 9999 < i)
i = 100;
int n = __builtin_snprintf (0, 0, "%i", i);
if (n < 3 || 4 < n)
__builtin_abort ();
}
Martin
That seems like somethings that the range code for builtins should
take care of rather than be calculated externally and injected
they are just like any other builtins that if the range of certain
arguments are known, you can produce a result?
The strlen optimization is more involved in that it keeps track of
the state of the strings as they're being created by calls to string
functions like strcpy (or single and multi-byte stores). So the only
way for Ranger to figure out the range of the result of a strlen call
is to query the strlen pass. That might work via a callback but it
would of course only work while the strlen pass is running. The same
goes for the sprintf results (which are based on the strlen data).
It might be good enough as long as the ranges can also be exported
to downstream passes.
It might be possible to compute this on demand too (and introduce
something like a strlen_range_query) but it would need reworking
the strlen pass. I like the on-demand design but in the strlen
case I'm not sure the benefits would justify the costs.
I gather the range of i being known to be [100,9999] means we know
something about n. The the ranger would be able to do that anywhere
and everywhere, not just within this pass. One of the ranger goals
is to centralize all range tracking
and the warning pass can continue to check the ranges and issue
warnings when the ranges arent what it likes.
In this simple case, yes. But consider a more involved example:
char a[8];
void f (int i)
{
if (i < 100 || 9999 < i) i = 100;
memcpy (a, "123", 3); // strlen(a) is in [3, 7]
int n = snprintf (0, 0, "%s %i", a, i); // n is in [7, 12]
if (n < 7 || 12 < n) // folded to false
abort ();
}
To compute n's range you need to know not only that of i but also
the length of a (or its range in this case). That's what the strlen
optimization enables.
Thats why we need precise examples :-)
Ive attached a thouroughly untested patch which you can apply to the
latest trunk stuff I've checked in and try. It gives you an entry
point into the ranger:
bool import_global_range (tree name, irange &r);
NAME is the ssa_name you are updating the global value for, and
R is the new range you wish to add. Note it is not a replacement, but
rather an intergration. ie, and extra source of information the ranger
can't see.
you can call it with your newly calculated value,and it will enhance the
value the ranger already has with the extra information.. so above, you
would call it with
ranger->import_global_range (n_6, [7,12])
this will set the range for n_6 to the INTERSECTION of [7,12] and
whatever the ranger already knows. If its varying, then it'll set it to
[7,12]. If the ranger had already figured it was [9,20] for some reason,
then the result will be [9,12] aftewr this call, and it will reurn FALSE
indicating the new range is different than the range you passed in. if
you care.
It should also push the range of n_6 forward into pother blocks, so on
entry to the block with the abort(), it should have a range of [],
indicating it can't eb reached.
There are a couple of caveats.
1) It may or may not affect other dependent value at this point
depending on the order you visit things since the propagation of
dependency chains is not complete yet. ie if the abort() block has
h_4 = n_6 + 2
g_2 = h_4 + 6 , g_2 may not be reevaluated with the new range... It
might.. but I cant guarantee it just now.
2) If you wish this value to be exported into SSA_NAME_RANGE_INFO, you
will have to continue doing that manually. The ranger cache is not
currently ever exported the RANGE_INFO fields. We've run into numerous
issues with "differences": that occur when a multi-range is converted to
a value_range and although the range is correct, it may be different
that the orignal and produce different results. We are waiting until
we replace the SSA_NAME_RANGE_INFO mechanism.
Anyway, let me known if this works. I havent had a chance to test it
yet. If it works we can add it to the full API
Andrew
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 92a6335bec5..1568dee10d7 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1038,6 +1038,27 @@ gimple_ranger::range_of_stmt (irange &r, gimple *s, tree name)
return true;
}
+// Update the current global value for NAME, and then intersect R with this
+// to produce a new global range, and push it thru the cache.
+// Return true if the new range is the same as the one passed in.
+
+bool
+gimple_ranger::import_global_range (tree name, irange &r)
+{
+ int_range_max tmp;
+ gimple *s = SSA_NAME_DEF_STMT (name);
+ // Get the current range for NAME.
+ if (range_of_stmt (tmp, s, name))
+ {
+ // Combine it with the imported value, and set the new global range.
+ tmp.intersect (r);
+ m_cache.set_global_range (name, tmp);
+ return tmp == r;
+ }
+ m_cache.set_global_range (name, r);
+ return true;
+}
+
// This routine will export whatever global ranges are known to GCC
// SSA_RANGE_NAME_INFO fields.
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 0aa6d4672ee..ecf1b35acac 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -53,6 +53,7 @@ public:
virtual void range_on_entry (irange &r, basic_block bb, tree name);
virtual void range_on_exit (irange &r, basic_block bb, tree name);
void export_global_ranges ();
+ bool import_global_range (tree name, irange &r);
void dump (FILE *f);
protected:
bool calc_stmt (irange &r, gimple *s, tree name = NULL_TREE);