Re: [Stage1][Middle-end][object-size][PATCH v1] Evaluate the object size by the size of the pointee type

Qing Zhao Thu, 10 Apr 2025 11:01:40 -0700


> On Apr 7, 2025, at 22:15, Siddhesh Poyarekar <siddh...@gotplt.org> wrote:
> 
> On 2025-04-07 14:53, Qing Zhao wrote:
>>> Is there a reason to do this at the very end like this and not in the 
>>> GIMPLE_ASSIGN case in the switch block?  Something like this:
>>> 
>>>         tree rhs = gimple_assign_rhs1 (stmt);
>>>         tree counted_by_ref = NULL_TREE;
>>>         if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
>>>             || (gimple_assign_rhs_code (stmt) == ADDR_EXPR
>>>                 && TREE_CODE (TREE_OPERAND (rhs, 0)) == MEM_REF))
>>>           reexamine = plus_stmt_object_size (osi, var, stmt);
>>>         else if (gimple_assign_rhs_code (stmt) == COND_EXPR)
>>>           reexamine = cond_expr_object_size (osi, var, stmt);
>>>         else if (gimple_assign_single_p (stmt)
>>>                  || gimple_assign_unary_nop_p (stmt))
>>>           {
>>>             if (TREE_CODE (rhs) == SSA_NAME
>>>                 && POINTER_TYPE_P (TREE_TYPE (rhs)))
>>>               reexamine = merge_object_sizes (osi, var, rhs);
>>>             else
>>>               expr_object_size (osi, var, rhs);
>>>           }
>>> +        else if ((counted_by_ref = fam_struct_with_counted_by (rhs)))
>>> +          record_fam_object_size (osi, var, counted_by_ref);
>>>         else
>>>           unknown_object_size (osi, var);
>>> 
>>> where you can then fold in all your gating conditions, including getting 
>>> the counted_by ref into the fam_struct_with_counted_by and then limit the 
>>> record_fam_object_size to just evaluating the type size + counted_by size.
>>> 
>>> This may even help avoid the insertion order hack you have to do, i.e. the 
>>> gsi_insert_seq_before vs gsi_insert_seq_after.
>> This is a good suggestion. I will try this to see any issue there.
>> My initial thought is to give the counted_by information the lowest priority
>>  if there are other information (for example, malloc) available.
>> Do you see any issue here?
> 
> No, that is the right idea, but I don't know if you'll actually need to 
> choose.  AFAICT, you'll either be able to trace the pointer to an allocation, 
> in which case the fallback is unnecessary.  Otherwise you'll trace it to one 
> of the following:
> 
> 1. An assignment from an expression that returns the pointer
> 2. A NOP with a var_decl, which is handled in the GIMPLE_NOP case; you'd need 
> to add a similar hook there.
> 
> I can't think of any other cases off the top of my head, how about you?
> 
>>> Also, it seems like simply making build_counted_by_ref available may be 
>>> unnecessary and maybe you could explore associating the counted_by 
>>> component_ref with the parent type somehow.  Alternatively, how about 
>>> building an .ACCESS_WITH_SIZE for types with FAM that have a counted_by?  
>>> That would allow the current access_with_size_object_size() to work out of 
>>> the box.
>> I am not sure about this though.
>> Our initial design is to change every component_ref  (which is a reference 
>> to the FAM)
>> in the data flow of the routine that has a counted_by attribute to a  call 
>> to .ACCESS_WITH_SIZE.
>> Then put this call to .ACCESS_WITH_SIZE into the data flow of the routine.
>> Now, if we try to associate counted_by information to the parent type, how 
>> can we add such information
>> To the data flow of the routine if there is no explicit reference to the 
>> array itself?
> 
> I'm not entirely sure, but maybe whenever there is an access on a ptr to the 
> parent struct, add a call to .ACCESS_WITH_SIZE there, with a built expression 
> for its size?  e.g for:
> 
> struct S
> {
>  size_t c;
>  char a[] __counted_by (c);
> }
> 
> void foo (Struct S *s)
> {
>  ...
>  sz = __builtin_dynamic_object_size (s, 0);
>  ...
> }
> 
> we could generate:
> 
> void foo (struct S *s)
> {
>  ...
>  sz_exp = c + sizeof (struct S);
>  s_1 = .ACCESS_WITH_SIZE (&s..., &c, ...);
>  ...
>  sz = __builtin_dynamic_object_size (*s_1, 0);
> }
>


Yes, I prefer this approach. 

For the definition of .ACCESS_WITH_SIZE:

   ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
                     TYPE_OF_SIZE, ACCESS_MODE)
   which returns the REF_TO_OBJ same as the 1st argument;

   1st argument REF_TO_OBJ: The reference to the object;
   2nd argument REF_TO_SIZE: The reference to the size of the object,
   3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE represents
     0: the number of bytes.
     1: the number of the elements of the object type;
   4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the 
TYPE
    of the object referenced by REF_TO_SIZE
   5th argument ACCESS_MODE:
    -1: Unknown access semantics
     0: none
     1: read_only
     2: write_only
     3: read_write
   6th argument: A constant 0 with the pointer TYPE to the original flexible
     array type.

Currently, we only generate .ACCESS_WITH_SIZE for p->array if it’s a FAM with 
counted_by, the 3rd argument
of this call is 1 (the number of the elements of the object type).  And this 
information can be used in both __builtin_object_size
and array bound sanitizer. 

For a reference to a structure with FAM, such as p, we can generate a call to 
.ACCESS_WITH_SIZE whose 3rd argument
is 0 (the number of bytes). And this information can be used in 
__builtin_object_size. But not in array bound sanitizer.

So, for your above example:

struct S
{
 size_t c;
 char a[] __counted_by (c);
}

void foo (Struct S *s)
{
 ...
 sz = __builtin_dynamic_object_size (s, 0);
 …
 }

C FE could generate:

void foo (struct S *s)
{
 …
 If (!type_unsigned (typeof (s->c))
    s->c = (s->c < 0) ? 0 : s->c;
 sz_exp = MAX (sizeof (struct S), offset (struct S, a) + s->c * sizeof (char));
 s_1 = .ACCESS_WITH_SIZE (&s, &sz_exp, 0, ...);
 ...
 sz = __builtin_dynamic_object_size (*s_1, 0);
}


Then, in the tree-object-size.cc <http://tree-object-size.cc/>, we can use the 
path to access_with_size_object_size() directly. 

How do you think?

Thanks a lot for your suggestion.

Qing
> or something like that.  But like I said, it's an alternative idea to avoid 
> special-casing in tree-object-size, which should provide size information 
> across all passes not just for object size.
> 
> Thanks,
> Sid

Re: [Stage1][Middle-end][object-size][PATCH v1] Evaluate the object size by the size of the pointee type

Reply via email to