On 08/21/18 10:33, Richard Biener wrote:
> On Mon, 20 Aug 2018, Bernd Edlinger wrote:
> 
>> On 08/20/18 15:19, Richard Biener wrote:
>>> On Mon, 20 Aug 2018, Bernd Edlinger wrote:
>>>
>>>> On 08/20/18 13:01, Richard Biener wrote:
>>>>> On Wed, Aug 1, 2018 at 3:05 PM Bernd Edlinger <bernd.edlin...@hotmail.de> 
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 08/01/18 11:29, Richard Biener wrote:
>>>>>>>
>>>>>>> Hmm.  I think it would be nice if TREE_STRING_LENGTH would
>>>>>>> match char[2] and TYPE_SIZE_UNIT even if that is inconvenient
>>>>>>> for your check above.  Because the '\0' doesn't belong to the
>>>>>>> string.  Then build_string internally appends a '\0' outside
>>>>>>> of TREE_STRING_LENGTH.
>>>>>>>
>>>>>>
>>>>>> Hmm. Yes, but the outside-0 byte is just one byte, not a wide
>>>>>> character.
>>>>>
>>>>> That could be fixed though (a wide 0 is just N 0s).  Add a elsz = 1
>>>>> parameter to build_string and allocate as many extra 0s as needed.
>>>>>
>>>>>      There are STRING_CSTs which are not string literals,
>>>>>> for instance attribute tags, Pragmas, asm constrants, etc.
>>>>>> They use the '\0' outside, and have probably no TREE_TYPE.
>>>>>>
>>>>>>>
>>>>>>>> So I would like to be able to assume that the STRING_CST objects
>>>>>>>> are internally always generated properly by the front end.
>>>>>>>
>>>>>>> Yeah, I guess we need to define what "properly" is ;)
>>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> And that the ARRAY_TYPE of the string literal either has the
>>>>>>>> same length than the TREE_STRING_LENGTH or if it is shorter,
>>>>>>>> this is always exactly one (wide) character size less than 
>>>>>>>> TREE_STRING_LENGTH
>>>>>>>
>>>>>>> I think it should be always the same...
>>>>>>>
>>>>>>
>>>>>> One could not differentiate between "\0" without zero-termination
>>>>>> and "" with zero-termination, theoretically.
>>>>>
>>>>> Is that important?  Doesn't the C standard say how to parse string 
>>>>> literals?
>>>>>
>>>>>> We also have char x[100] = "ab";
>>>>>> that is TREE_STRING_LENGTH=3, and TYPE_SIZE_UNIT(TREE_TYPE(x)) = 100.
>>>>>> Of course one could create it with a TREE_STRING_LENGTH = 100,
>>>>>> but imagine char x[100000000000] = "ab"
>>>>>
>>>>> The question is more about TYPE_SIZE_UNIT (TREE_TYPE ("ab")) which I
>>>>> hope matches "ab" and not 'x'.  If it matches 'x' then I'd rather have it
>>>>> unconditionally be [], thus incomplete (because the literals "size" 
>>>>> depends
>>>>> on the context/LHS it is used on).
>>>>>
>>>>
>>>> Sorry, but I must say, it is not at all like that.
>>>>
>>>> If I compile x.c:
>>>> const char x[100] = "ab";
>>>>
>>>> and set a breakpoint at output_constant:
>>>>
>>>> Breakpoint 1, output_constant (exp=0x7ffff6ff9dc8, size=100, align=256,
>>>>        reverse=false) at ../../gcc-trunk/gcc/varasm.c:4821
>>>> 4821         if (size == 0 || flag_syntax_only)
>>>> (gdb) p size
>>>> $1 = 100
>>>> (gdb) call debug(exp)
>>>> "ab"
>>>> (gdb) p *exp
>>>> $2 = {base = {code = STRING_CST, side_effects_flag = 0, constant_flag = 1,
>>>> (gdb) p exp->typed.type->type_common.size_unit
>>>> $5 = (tree) 0x7ffff6ff9d80
>>>> (gdb) call debug(exp->typed.type->type_common.size_unit)
>>>> 100
>>>> (gdb) p exp->string.length
>>>> $6 = 3
>>>> (gdb) p exp->string.str[0]
>>>> $8 = 97 'a'
>>>> (gdb) p exp->string.str[1]
>>>> $9 = 98 'b'
>>>> (gdb) p exp->string.str[2]
>>>> $10 = 0 '\000'
>>>> (gdb) p exp->string.str[3]
>>>> $11 = 0 '\000'
>>>>
>>>>
>>>> This is an important property of string_cst objects, that is used in 
>>>> c_strlen:
>>>>
>>>> It folds c_strlen(&x[4]) directly to 0, because every byte beyond 
>>>> TREE_STRING_LENGTH
>>>> is guaranteed to be zero up to the type size.
>>>>
>>>> I would not have spent one thought on implementing an optimization like 
>>>> that,
>>>> but that's how it is right now.
>>>
>>> Huh.  So somebody interpreted STRING_CSTs similar to CONSTRUCTORs aka
>>> they have zero-padding up to its type size.  I don't see this documented
>>> anywhere and it would suggest to "optimize" "ab\0\0\0\0" to "ab\0"
>>> with appropriate TYPE_SIZE.
>>>
>>> This is also a relatively new thing on trunk (coming out of the added
>>> mem_size parameter of string_constant).  That this looks at the STRING_CST
>>> type like
>>>
>>>     if (TREE_CODE (array) == STRING_CST)
>>>       {
>>>         *ptr_offset = fold_convert (sizetype, offset);
>>>         if (mem_size)
>>>           *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array));
>>>         return array;
>>>
>>> I'd call simply a bug.  As said, interpretation of STRING_CSTs should
>>> depend on the context.  And for
>>>
>>
>> This use of the TYPE_SIZE_UNIT was pre-existing to r263607
>> before that (but not in the gcc-8-branch) we had this in c_strlen:
>>
>>     HOST_WIDE_INT maxelts = strelts;
>>     tree type = TREE_TYPE (src);
>>     if (tree size = TYPE_SIZE_UNIT (type))
>>       if (tree_fits_shwi_p (size))
>>         {
>>          maxelts = tree_to_uhwi (size);
>>          maxelts = maxelts / eltsize - 1;
>>         }
>>
>> which I moved to string_constant and return the result through memsize.
>>
>> It seems to be working somehow, and I'd bet removing that will immediately
>> break twenty or thirty of the strlenopt tests.
>>
>> Obviously the tree string objects allow way too much variations,
>> and it has to be restricted in some way or another.
>>
>> In the moment my approach is: look at what most string constants do
>> and add assertions to ensure that there are no exceptions.
>>
>>
>>> char a[4] = "abc";
>>> char b[5] = "abc";
>>>
>>> we should better be able to share STRING_CSTs [you can see LTO
>>> sharing the nodes if you do b[4] but not when b[5] I suppose].
>>>
>>>> All I want to do, is make sure that all string constants have the same 
>>>> look and feel
>>>> in the middle-end, and restrict the variations that are allowed by the 
>>>> current
>>>> implementation.
>>>
>>> Sure, I understand that.  But I'd like to simplify things and not add
>>> complications like looking at TYPE_SIZE vs. TREE_STRING_LENGTH to decide
>>> whether sth is 0-terminated.
>>>
>>
>> This is not only about 0-terminated. (*)
>>
>> It is also about when you get a STRING_CST with a TREE_STRING_LENGTH,
>> there are places in the middle-end that assume that the object contains
>> _all_ bytes up to TREE_STRING_LENGTH without looking at TYPE_SIZE_UINT.
>> Those I want to protect.
> 
> Well, but string_constant handles &STRING_CST just fine but in that
> context there's no "object" we look at.
> 
> IMHO whenever string_constant ends up with a DECL, looking at
> ctor_for_folding and we end up with a STRING_CST that doesn't fit
> in the DECL we looked at we have a bug (non-truncated STRING_CST)
> and either should do the truncation or simply return NULL.
> 
> So we should look at DECL_SIZE_UNIT and compare that with
> TREE_STRING_LENGTH.  Otherwise you are relying on TYPE_SIZE_UNIT
> of the STRING_CST being "correct" (fitting the decl context).
> 
>> Bernd.
>>
>>
>> *: I called the parameter memsize, and yes, I wrote: "If MEM_SIZE is zero,
>> the string is only returned when it is properly zero terminated.",
>> but maybe I should have written:
>> "If MEM_SIZE is zero, the string is only returned when the storage size
>> of the string object is at least TREE_STRING_LENGTH."
>> That's at least exactly what the check does.
> 
> Well, as said above
> 
>    if (TREE_CODE (array) == STRING_CST)
>      {
>        *ptr_offset = fold_convert (sizetype, offset);
>        if (mem_size)
>          *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array));
>        return array;
>      }
> 
> simply assumes the "storage size" of a STRING_CST is determined
> by its TYPE_SIZE_UNIT.  That may be true as you noted in the folloup,
> but clearly in the above case there's nothing to protect?  And in
> the case we pulled the STRING_CST from some DECL_INITIAL it doesn't
> protect from overflow of the FIELD_DECL unless frontends never
> generate "wrong" STRING_CSTs?
> 

Hmm, I digged some more in varasm.c

Normal STRING_CST use get_constant_size to allocate
MAX(TYPE_SIZE_UNIT, TREE_STRING_LENGTH)
thus they can have excess zero bytes.

while STRING_CST that are in DECL_INITIALIZERs
are shrinked to DECL_SIZE_UNIT.

At least that difference looks unnecessary to me.


But compare_constant does not compare the TYPE_SIZE_UNIT:

     case STRING_CST:
       if (TYPE_MODE (TREE_TYPE (t1)) != TYPE_MODE (TREE_TYPE (t2)))
         return 0;

       return (TREE_STRING_LENGTH (t1) == TREE_STRING_LENGTH (t2)
               && ! memcmp (TREE_STRING_POINTER (t1), TREE_STRING_POINTER (t2),
                          TREE_STRING_LENGTH (t1)));


This looks now like a bug.


> Btw, get_constant_size / mergeable_string_section suggsts that
> we may view STRING_CST as general target-encoded byte blob.
> That may be useful to compress CONSTRUCTORs memory use.
> 
> It looks like mergeable_string_section wrongly would reject
> a char[] typed string because int_size_in_bytes returns -1
> for incomplete types.  I still believe using char[] for most
> STRING_CSTs generated by FEs would be a good thing for
> IL memory use.  Shouldn't the mem_size initializations use
> sth like get_constant_size as well?
> 

I think incomplete types exist only in structs with variable
length:

struct {
   int x;
   char y[];
} s = { 1, "test" };

this is no candidate for a mergeable string.

but
char y[] = "test";

is fixed by the FE to:
char y[5] = "test";

So that does not make problems with mergeable_string_section.


> Also
> 
>    if (mem_size)
>      *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (init));
>    else if (compare_tree_int (array_size, length + 1) < 0)
>      return NULL_TREE;
> 
> the latter check doesn't seem to honor 'offset' and *mem_size
> is again looking at the STRING_CST, not at the FIELD_DECL or
> wherever we got the STRING_CST from.
> 

the caller compares offset with *mem_size, but we do not have
the FIELD_DECL at hand here (Or I did not know where it was).
I'd rather fail the cases when the TYPE_SIZE_UNIT is null,
or something non-constant.

All those are variations of vla with initializer and similar.
Once the STRING_CST has a type, I would like to use it,
when it doesn't the whole thing is probably not worth it
anyway.


Bernd.

Reply via email to