> On Apr 7, 2025, at 22:15, Siddhesh Poyarekar <[email protected]> wrote:
>
> On 2025-04-07 14:53, Qing Zhao wrote:
>>> Is there a reason to do this at the very end like this and not in the
>>> GIMPLE_ASSIGN case in the switch block? Something like this:
>>>
>>> tree rhs = gimple_assign_rhs1 (stmt);
>>> tree counted_by_ref = NULL_TREE;
>>> if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
>>> || (gimple_assign_rhs_code (stmt) == ADDR_EXPR
>>> && TREE_CODE (TREE_OPERAND (rhs, 0)) == MEM_REF))
>>> reexamine = plus_stmt_object_size (osi, var, stmt);
>>> else if (gimple_assign_rhs_code (stmt) == COND_EXPR)
>>> reexamine = cond_expr_object_size (osi, var, stmt);
>>> else if (gimple_assign_single_p (stmt)
>>> || gimple_assign_unary_nop_p (stmt))
>>> {
>>> if (TREE_CODE (rhs) == SSA_NAME
>>> && POINTER_TYPE_P (TREE_TYPE (rhs)))
>>> reexamine = merge_object_sizes (osi, var, rhs);
>>> else
>>> expr_object_size (osi, var, rhs);
>>> }
>>> + else if ((counted_by_ref = fam_struct_with_counted_by (rhs)))
>>> + record_fam_object_size (osi, var, counted_by_ref);
>>> else
>>> unknown_object_size (osi, var);
>>>
>>> where you can then fold in all your gating conditions, including getting
>>> the counted_by ref into the fam_struct_with_counted_by and then limit the
>>> record_fam_object_size to just evaluating the type size + counted_by size.
>>>
>>> This may even help avoid the insertion order hack you have to do, i.e. the
>>> gsi_insert_seq_before vs gsi_insert_seq_after.
>> This is a good suggestion. I will try this to see any issue there.
>> My initial thought is to give the counted_by information the lowest priority
>> if there are other information (for example, malloc) available.
>> Do you see any issue here?
>
> No, that is the right idea, but I don't know if you'll actually need to
> choose. AFAICT, you'll either be able to trace the pointer to an allocation,
> in which case the fallback is unnecessary. Otherwise you'll trace it to one
> of the following:
>
> 1. An assignment from an expression that returns the pointer
> 2. A NOP with a var_decl, which is handled in the GIMPLE_NOP case; you'd need
> to add a similar hook there.
>
> I can't think of any other cases off the top of my head, how about you?
>
>>> Also, it seems like simply making build_counted_by_ref available may be
>>> unnecessary and maybe you could explore associating the counted_by
>>> component_ref with the parent type somehow. Alternatively, how about
>>> building an .ACCESS_WITH_SIZE for types with FAM that have a counted_by?
>>> That would allow the current access_with_size_object_size() to work out of
>>> the box.
>> I am not sure about this though.
>> Our initial design is to change every component_ref (which is a reference
>> to the FAM)
>> in the data flow of the routine that has a counted_by attribute to a call
>> to .ACCESS_WITH_SIZE.
>> Then put this call to .ACCESS_WITH_SIZE into the data flow of the routine.
>> Now, if we try to associate counted_by information to the parent type, how
>> can we add such information
>> To the data flow of the routine if there is no explicit reference to the
>> array itself?
>
> I'm not entirely sure, but maybe whenever there is an access on a ptr to the
> parent struct, add a call to .ACCESS_WITH_SIZE there, with a built expression
> for its size? e.g for:
>
> struct S
> {
> size_t c;
> char a[] __counted_by (c);
> }
>
> void foo (Struct S *s)
> {
> ...
> sz = __builtin_dynamic_object_size (s, 0);
> ...
> }
>
> we could generate:
>
> void foo (struct S *s)
> {
> ...
> sz_exp = c + sizeof (struct S);
> s_1 = .ACCESS_WITH_SIZE (&s..., &c, ...);
> ...
> sz = __builtin_dynamic_object_size (*s_1, 0);
> }
>
Yes, I prefer this approach.
For the definition of .ACCESS_WITH_SIZE:
ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE,
TYPE_OF_SIZE, ACCESS_MODE)
which returns the REF_TO_OBJ same as the 1st argument;
1st argument REF_TO_OBJ: The reference to the object;
2nd argument REF_TO_SIZE: The reference to the size of the object,
3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE represents
0: the number of bytes.
1: the number of the elements of the object type;
4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the
TYPE
of the object referenced by REF_TO_SIZE
5th argument ACCESS_MODE:
-1: Unknown access semantics
0: none
1: read_only
2: write_only
3: read_write
6th argument: A constant 0 with the pointer TYPE to the original flexible
array type.
Currently, we only generate .ACCESS_WITH_SIZE for p->array if it’s a FAM with
counted_by, the 3rd argument
of this call is 1 (the number of the elements of the object type). And this
information can be used in both __builtin_object_size
and array bound sanitizer.
For a reference to a structure with FAM, such as p, we can generate a call to
.ACCESS_WITH_SIZE whose 3rd argument
is 0 (the number of bytes). And this information can be used in
__builtin_object_size. But not in array bound sanitizer.
So, for your above example:
struct S
{
size_t c;
char a[] __counted_by (c);
}
void foo (Struct S *s)
{
...
sz = __builtin_dynamic_object_size (s, 0);
…
}
C FE could generate:
void foo (struct S *s)
{
…
If (!type_unsigned (typeof (s->c))
s->c = (s->c < 0) ? 0 : s->c;
sz_exp = MAX (sizeof (struct S), offset (struct S, a) + s->c * sizeof (char));
s_1 = .ACCESS_WITH_SIZE (&s, &sz_exp, 0, ...);
...
sz = __builtin_dynamic_object_size (*s_1, 0);
}
Then, in the tree-object-size.cc <http://tree-object-size.cc/>, we can use the
path to access_with_size_object_size() directly.
How do you think?
Thanks a lot for your suggestion.
Qing
> or something like that. But like I said, it's an alternative idea to avoid
> special-casing in tree-object-size, which should provide size information
> across all passes not just for object size.
>
> Thanks,
> Sid