On 6/25/19 5:03 PM, Martin Sebor wrote:
>
> The caller ensures that handle_char_store is only called for stores
> to arrays (MEM_REF) or single elements as wide as char.
Where? I don't see it, even after fixing the formatting in
strlen_check_and_optimize_stmt :-)
> gimple *stmt = gsi_stmt (*gsi);
>
> if (is_gimple_call (stmt))
I think we can agree that we don't have a call, so this is false.
> else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
> {
> tree lhs = gimple_assign_lhs (stmt);
This should be true IIUC, so we'll go into its THEN block.
> if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE (lhs)))
Should be false.
> else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE
> (lhs)))
Should also be false.
> else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
Should be true since LHS is a MEM_REF.
> {
> tree type = TREE_TYPE (lhs);
> if (TREE_CODE (type) == ARRAY_TYPE)
> type = TREE_TYPE (type);
> if (TREE_CODE (type) == INTEGER_TYPE
> && TYPE_MODE (type) == TYPE_MODE (char_type_node)
> && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
> {
> if (! handle_char_store (gsi))
> return false;
> }
> }
If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE. We then
verify that TYPE is a single character type. _But_ we stripped away the
ARRAY_TYPE. So ISTM that we allow either an array of chars or a single
char on the LHS.
So how does this avoid multiple character stores?!? We could have had
an ARRAY_REF on the LHS and we never check the number of elements in the
array. There's no check on the RHS either. SO I don't see how we
guarantee that we're dealing with a single character store.
What am I missing here?
>
> What you describe sounds like
>
> char a[N];
> *(int*)a = 0x31323300;
>
> which is represented as
>
> MEM[(int *)&b] = 825373440;This would be closer (I realize it's not C):
char a[N];
a[0..3] = 0x313233300;
>
> The LHS type of that is int so the function doesn't get called.
I'm concerned about the case where the LHS is an array.
>> And if the NUL byte in the original was at byte offset 2, then didn't we
>> just change the length by overwriting where the NUL is?
>
> No, because cmp is the result of compare_nonzero_chars and cmp > 0
> means:
>
> 1 if SI is known to start with more than OFF nonzero characters
>
> i.e., the character is being stored before the terminating nul.
> This is the basis of the original optimization:
>
> /* If si->nonzero_chars > OFFSET, we aren't overwriting '\0',
> and if we aren't storing '\0', we know that the length of the
> string and any other zero terminated string in memory remains
> the same.
But all this is predicated on the assumption that we're dealing with a
single character memory store. I don't see what enforces that precondition.