On 08/10/2018 10:56 AM, Martin Sebor wrote:
> On 08/08/2018 11:36 PM, Jeff Law wrote:
>> On 08/02/2018 09:42 AM, Martin Sebor wrote:
>>
>>> The warning bits are definitely not okay by me. The purpose
>>> of the warnings (-W{format,sprintf}-{overflow,truncation} is
>>> to detect buffer overflows. When a warning doesn't have access
>>> to string length information for dynamically created strings
>>> (like the strlen pass does) it uses array sizes as a proxy.
>>> This is useful both to detect possible buffer overflows and
>>> to prevent false positives for overflows that cannot happen
>>> in correctly written programs.
>> So how much of this falling-back to array sizes as a proxy would become
>> unnecessary if sprintf had access to the strlen pass as an analysis
>> module?
>>
>> As you know we've been kicking that around and from my investigations
>> that doesn't really look hard to do. Encapsulate the data structures in
>> a class, break up the statement handling into analysis and optimization
>> and we should be good to go. I'm hoping to start prototyping this week.
>>
>> If we think that has a reasonable chance to eliminate the array-size
>> fallback, then that seems like the most promising path forward.
>
> We discussed this idea this morning so let me respond here and
> reiterate the answer. Using the strlen data will help detect
> buffer overflow where the array size isn't available but it
> cannot replace the array size heuristic. Here's a simple
> example:
>
> struct S { char a[8]; };
>
> char d[8];
> void f (struct S *s, int i)
> {
> sprintf (d, "%s-%i", s[i].a, i);
> }
>
> We don't know the length of s->a but we do know that it can
> be up to 7 bytes long (assuming it's nul-terminated of course)
> so we know the sprintf call can overflow. Conversely, if
> the size of the destination is increased to 20 the sprintf
> call cannot overflow so the diagnostic can be avoided.
>
> Removing the array size heuristic would force us to either give
> up on diagnosing the first case or issue false positives for
> the second case. I think the second alternative would make
> the warning too noisy to be useful.
>
> The strlen pass will help detect buffer overflows in cases
> where the array size isn't known (e.g., with dynamically
> allocated buffers) but where the length of the string store
> in the array is known. It will also help avoid false positives
> in cases where the string stored in an array of known size is
> known to be too short to cause an overflow. For instance here:
>
> struct S { char a[8]; };
>
> char d[8];
> void f (struct S *s, int i)
> {
> if (strlen (s->a) < 4 && i >= 0 && i < 100)
> sprintf (d, "%s-%i", s->a, i);
> }
ACK. Thanks for explaining things here too. I can't speak for others,
but seeing examples along with the explanation is easier for me to absorb.
For Bernd and others -- after kicking things around a bit with Martin,
what we're recommending is that compute_string_length we compute the
bounds using both GIMPLE and C semantics and return both.
Anything which influences code generation or optimization must use the
GIMPLE semantics. Warnings may use the C semantics in an effort to
improve preciseness.
Martin has some other stuff to flush out of his queue, then he'll be
focused on the changes to compute_string_length.
Jeff