Re: [RFC] Only warn for maybe-uninitialized SRAed bits in -Wextra (PR 80635)

Martin Jambor Mon, 11 Nov 2019 14:08:14 -0800

Hi,

On Mon, Nov 11 2019, Martin Sebor wrote:
> On 11/11/19 10:29 AM, Martin Jambor wrote:
>> On Mon, Nov 11 2019, Martin Sebor wrote:
>>> On 11/8/19 5:41 AM, Martin Jambor wrote:
>>>> Hi,
>>>>
>>>> this patch is an attempt to implement my idea from a previous thread
>>>> about moving -Wmaybe-uninitialized to -Wextra:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00220.html
>>>>
>>>> Specifically, it attempts to split -Wmaybe-uninitialized into those that
>>>> are about SRA DECLs and those which are not, and move to -Wextra only
>>>> the former ones.  The main idea is that false -Wmaybe-uninitialized
>>>> warings about values that are scalar in user's code are easy to silence
>>>> by initializing them to zero or something, as opposed to bits of
>>>> aggregates such as a value in std::optional which are not.  Therefore,
>>>> the warnings about user-scalars can remain in -Wall but warnings about
>>>> SRAed pieces should be moved to -Wextra.
>>>>


...

>>>> +This option enables the same warning like @option{-Wmaybe-uninitialized} 
>>>> but
>>>> +for parts of aggregates (i.g.@: structures, arrays or unions) that GCC
>>>> +optimizers can track.  These warnings are only possible in optimizing
>>>> +compilation, because otherwise GCC does not keep track of the state of
>>>> +variables.  This warning is enabled by @option{-Wextra}.
>>>> +
>>>
>>> Let me ask a question.  Suppose I have code like this:
>>>
>>>     struct S { char a[4], b[4]; };
>>>
>>>     void* g (void)
>>>     {
>>>       struct S *p = malloc (sizeof *p);
>>>       strcpy (p->a + 1, p->b + 1);
>>>       return p;
>>>     }
>>>
>>> (I include the offsets only because they make an interesting
>>> difference in the internal representation.  My question is
>>> the same even without them.)
>>>
>>> With this new warning, would the appropriate diagnostic to
>>> issue be -Wmaybe-uninitialized-aggregates or -Wuninitialized?
>> 
>> The patch should not change the behavior of -Wuninitialized so that if
>> an uninitialized value use is detected on a spot which is always
>> executed, a warning should be emitted regardless if the underpinning
>> DECL is a result of SRA or not - the user should fix their code, not
>> silence a spurious warning because it is not spurious.
>> 
>> It's the tricky maybe-uninitialized cases where I wanted to mitigate a
>> common source of false positives which are difficult to silence.
>
> Understood.
>
>> 
>> As far as the strcpy example is concerned, ideally it would be emitted
>> as part of both -Wuninitialized and -Wmaybe-uninitialized-aggregates
>> depending on whether it is a maybe warning or not, but not with only
>> -Wmaybe-uninitialized.
>
> I see.  This makes sense for the simple example above.
>
>> 
>>>
>>> The description makes it sound like the former but I'm not
>>> sure that's what I would want, either as an implementer of
>>> the uninitialized strcpy warning (I plan to add one) or as
>>> its user.
>> 
>> What are the problems for the user?  I think that the distinction
>> between maybe uninitialized and always uninitialized is genuinely useful
>> one.  And as an implementor of a new similar warning, don't you need to
>> distinguish between them even now?
>
> Yes, the distinction between the "maybe" and "definitely" kinds
> of warnings is useful and (IMO) clear, and not my concern. (Sorry,
> I think my example may not have been as helpful as I wanted it to
> be.)
>
> To be useful, the I think the distinction between -Wmaybe-
> uninitialized and -Wmaybe-uninitialized-aggregates will need
> to be made also very clear.  But I'm not sure that it will be
> possible.  In the strcpy example, the GIMPLE looks like this:
>
>    _1 = &MEM <char[4]> [(void *)p_5 + 5B];
>    _2 = &MEM <char[4]> [(void *)p_5 + 1B];
>    strcpy (_2, _1);
>
> i.e., it's clear the accesses are to distinct parts of the same
> object, but it's not not necessarily as clear whether they are
> to distinct subobjects of the same aggregate.

In my simple world, anything that is not an outright scalar is an
aggregate (well, naked complex numbers and vectors are perhaps in the
grey area).  I would say that once you look at MEM_REF, it is an
aggregate.

>
> It becomes even less clear if one or both offsets are non-constant
> (but in a known range), or when only parts of the larger objects
> are initialized (e.g., just some members, or some initial bytes
> of a chunk of memory allocated by malloc as often happens with
> string functions).  It's also unclear when the access involves
> a PHI node some of whose operands are aggregates and the others
> are not.
>
> I haven't had time to think about it very hard but it seems that
> the answer in each of these cases might come down to a judgment
> call.  That will blur the distinction between the two and make
> the new warning option less useful.
>
>>> On the other hand, if the answer is the latter (or that it
>>> depends) then introducing an option for it would seem like
>>> exposing an interface to an internal detail (limitation)
>>> with unspecified effects.
>> 
>> I agree that describing the new option would be tricky and I am opened
>> to suggestions for improvement/overhaul of that.  On the other hand, the
>> options already depend on internal details and limitations because they
>> only work on stuff that GCC can track.  E.g. they stop working as soon
>> as a variable - even a scalar one - happens to be addressable, for
>> example if we do less inlining.  Likewise if SRA suddenly decided not to
>> scalarize something the warning for that bit would be gone too.
>
> Sure, all late warnings are limited by what GCC tracks and how
> how how well.  What I meant is that I'm not sure it's a good
> idea to introduce a distinction between uninitialized objects
> and their subobjects only because GCC happens to have an easier
> time avoiding false positives in one than in the other.

No, no, no!  I do not want to introduce the split of
-Wmaybe-uninitialized into -Wmaybe-uninitialized and
-Wmaybe-uninitialized-aggregates because the latter has more false
positives.  I don't think it does, I can't see a reason why it would (in
its current form).  I want to split them because the former can be
easily silenced by initializing the scalar variable when the user
defines it while the latter cannot.  That's why the former should be in
-Wall and the latter only in -Wextra.

> Especially if it's uncertain that it will be possible to make
> the difference clear in all cases, or that the balance won't
> swing in the other direction in the future.

I understand this concern but see above my note about MEM_REF basically
always implying some aggregateness.  The aim is to distinguish between
easily silenceable cases from not so easily silenceable ones.
Describing that to the user would be tricky, but then making the
distinction IMHO not that much.

> My general concern is that if the new option is documented to
> control reads of uninitialized aggregates and users learn to
> disable it because it suppresses common false positives, they
> will also disable all the true positives in warnings that may
> not be susceptible to the same false positive problem, either
> present or future.

Well, -Wno-error=maybe-uninitialized was suggested as the best
workaround in PR 80635 and doing that is not so far away from outright
-Wno-maybe-uninitialized.  This patch is actually an attempt to persuade
people not to do either.

Martin

Re: [RFC] Only warn for maybe-uninitialized SRAed bits in -Wextra (PR 80635)

Reply via email to