> On Apr 7, 2025, at 22:15, Siddhesh Poyarekar <siddh...@gotplt.org> wrote: > > On 2025-04-07 14:53, Qing Zhao wrote: >>> Is there a reason to do this at the very end like this and not in the >>> GIMPLE_ASSIGN case in the switch block? Something like this: >>> >>> tree rhs = gimple_assign_rhs1 (stmt); >>> tree counted_by_ref = NULL_TREE; >>> if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR >>> || (gimple_assign_rhs_code (stmt) == ADDR_EXPR >>> && TREE_CODE (TREE_OPERAND (rhs, 0)) == MEM_REF)) >>> reexamine = plus_stmt_object_size (osi, var, stmt); >>> else if (gimple_assign_rhs_code (stmt) == COND_EXPR) >>> reexamine = cond_expr_object_size (osi, var, stmt); >>> else if (gimple_assign_single_p (stmt) >>> || gimple_assign_unary_nop_p (stmt)) >>> { >>> if (TREE_CODE (rhs) == SSA_NAME >>> && POINTER_TYPE_P (TREE_TYPE (rhs))) >>> reexamine = merge_object_sizes (osi, var, rhs); >>> else >>> expr_object_size (osi, var, rhs); >>> } >>> + else if ((counted_by_ref = fam_struct_with_counted_by (rhs))) >>> + record_fam_object_size (osi, var, counted_by_ref); >>> else >>> unknown_object_size (osi, var); >>> >>> where you can then fold in all your gating conditions, including getting >>> the counted_by ref into the fam_struct_with_counted_by and then limit the >>> record_fam_object_size to just evaluating the type size + counted_by size. >>> >>> This may even help avoid the insertion order hack you have to do, i.e. the >>> gsi_insert_seq_before vs gsi_insert_seq_after. >> This is a good suggestion. I will try this to see any issue there. >> My initial thought is to give the counted_by information the lowest priority >> if there are other information (for example, malloc) available. >> Do you see any issue here? > > No, that is the right idea, but I don't know if you'll actually need to > choose. AFAICT, you'll either be able to trace the pointer to an allocation, > in which case the fallback is unnecessary. Otherwise you'll trace it to one > of the following: > > 1. An assignment from an expression that returns the pointer > 2. A NOP with a var_decl, which is handled in the GIMPLE_NOP case; you'd need > to add a similar hook there. > > I can't think of any other cases off the top of my head, how about you? > >>> Also, it seems like simply making build_counted_by_ref available may be >>> unnecessary and maybe you could explore associating the counted_by >>> component_ref with the parent type somehow. Alternatively, how about >>> building an .ACCESS_WITH_SIZE for types with FAM that have a counted_by? >>> That would allow the current access_with_size_object_size() to work out of >>> the box. >> I am not sure about this though. >> Our initial design is to change every component_ref (which is a reference >> to the FAM) >> in the data flow of the routine that has a counted_by attribute to a call >> to .ACCESS_WITH_SIZE. >> Then put this call to .ACCESS_WITH_SIZE into the data flow of the routine. >> Now, if we try to associate counted_by information to the parent type, how >> can we add such information >> To the data flow of the routine if there is no explicit reference to the >> array itself? > > I'm not entirely sure, but maybe whenever there is an access on a ptr to the > parent struct, add a call to .ACCESS_WITH_SIZE there, with a built expression > for its size? e.g for: > > struct S > { > size_t c; > char a[] __counted_by (c); > } > > void foo (Struct S *s) > { > ... > sz = __builtin_dynamic_object_size (s, 0); > ... > } > > we could generate: > > void foo (struct S *s) > { > ... > sz_exp = c + sizeof (struct S); > s_1 = .ACCESS_WITH_SIZE (&s..., &c, ...); > ... > sz = __builtin_dynamic_object_size (*s_1, 0); > } >
Yes, I prefer this approach. For the definition of .ACCESS_WITH_SIZE: ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, CLASS_OF_SIZE, TYPE_OF_SIZE, ACCESS_MODE) which returns the REF_TO_OBJ same as the 1st argument; 1st argument REF_TO_OBJ: The reference to the object; 2nd argument REF_TO_SIZE: The reference to the size of the object, 3rd argument CLASS_OF_SIZE: The size referenced by the REF_TO_SIZE represents 0: the number of bytes. 1: the number of the elements of the object type; 4th argument TYPE_OF_SIZE: A constant 0 with its TYPE being the same as the TYPE of the object referenced by REF_TO_SIZE 5th argument ACCESS_MODE: -1: Unknown access semantics 0: none 1: read_only 2: write_only 3: read_write 6th argument: A constant 0 with the pointer TYPE to the original flexible array type. Currently, we only generate .ACCESS_WITH_SIZE for p->array if it’s a FAM with counted_by, the 3rd argument of this call is 1 (the number of the elements of the object type). And this information can be used in both __builtin_object_size and array bound sanitizer. For a reference to a structure with FAM, such as p, we can generate a call to .ACCESS_WITH_SIZE whose 3rd argument is 0 (the number of bytes). And this information can be used in __builtin_object_size. But not in array bound sanitizer. So, for your above example: struct S { size_t c; char a[] __counted_by (c); } void foo (Struct S *s) { ... sz = __builtin_dynamic_object_size (s, 0); … } C FE could generate: void foo (struct S *s) { … If (!type_unsigned (typeof (s->c)) s->c = (s->c < 0) ? 0 : s->c; sz_exp = MAX (sizeof (struct S), offset (struct S, a) + s->c * sizeof (char)); s_1 = .ACCESS_WITH_SIZE (&s, &sz_exp, 0, ...); ... sz = __builtin_dynamic_object_size (*s_1, 0); } Then, in the tree-object-size.cc <http://tree-object-size.cc/>, we can use the path to access_with_size_object_size() directly. How do you think? Thanks a lot for your suggestion. Qing > or something like that. But like I said, it's an alternative idea to avoid > special-casing in tree-object-size, which should provide size information > across all passes not just for object size. > > Thanks, > Sid