Re: [PATCH] PR 78534 Change character length from int to size_t

Janne Blomqvist Mon, 12 Dec 2016 05:08:02 -0800

On Mon, Dec 12, 2016 at 12:20 PM, FX <fxcoud...@gmail.com> wrote:
> Hi Janne,
>
> This is an ABI change, so it is serious… it will require people to recompile 
> older code and libraries with the new compiler. Do we already plan to break 
> the ABI in this cycle, or is this the first ABI-breaking patch of the cycle?


As Andre mentioned, the ABI has already been broken, Gfortran 7 will
have libgfortran.so.4.

However, this will also affect people doing C->Fortran calls the
old-fashioned way without ISO_C_BINDING, as they will have to change
the string length argument from int to size_t in their prototypes.
Then again, Intel Fortran did this some years ago so I guess at least
people who care about portability to several compilers are aware.

> And do we have real-life examples of character strings larger than 2 GB?

Well, people who have needed such will have figured out some
work-around since we haven't supported it, so how would we know? :) It
could be splitting the data into several strings, or switching to
ifort, using C instead of Fortran, or something else.

In any case, I don't expect characters larger than 2 GB to be common
(particularly with the Fortran standard-mandated behaviour of
space-padding to the end in many cases), but as the ABI has been
broken anyways, we might as well fix it.

IIRC at some point there was some discussion of this on
comp.lang.fortran, and somebody mentioned analysis of genomic data as
a use case where large characters can be useful. I don't have any
personal usecase though, at least at the moment.

>> Also, as there are some places in the frontend were negative character
>> lengths are used as special flag values, in the frontend the character
>> length is handled as a signed variable of the same size as a size_t,
>> although in the runtime library it really is size_t.
>
> First, I thought: we should really make it size_t, and have the negative 
> values be well-defined constants, e.g. (size_t) -1

I tried it, but in addition to the issue with negative characters used
as flag values, there's issues like we have stuff such as
gfc_get_int_expr() that take a kind value, and an integer constant,
and produces a gfc_expr. But that doesn't understand stuff like
unsigned types. So in the end I decided it's better to get this patch
in working shape and merged with the ABI changes, then one can fix the
unsigned-ness later (in the end it's just a factor of two in sizes we
can handle, so not a huge deal).

> On the other hand, there is the problem of the case where the front-end has 
> different size_t than the target: think 32-bit on 64-bit i386 (front-end 
> size_t larger than target size_t), or cross-compiling for 64-bit on a 32-bit 
> machine (front-end size_t smaller than target size_t). So the charlen type 
> bounds need to be determined when the front-end runs, not when it is compiled 
> (i.e. it is not a fixed type).

True. Although things like gfc_charlen_type_node should be correct for
the target, the type gfc_charlen_t that I introduced in the frontend
might be too small if one is doing a 32->64 bit cross-compile. So that
should be changed from a typedef of ptrdiff_t to a typedef of
HOST_WIDE_INT which AFAIK is guaranteed to be 64-bit everywhere.

> In iresolve.c, the "Why is this fixup needed?” comment is kinda scary.

Hmm, I think it's a leftover from some earlier experimentation, should
be removed.

>> I haven't changed the character length variables for the co-array
>> intrinsics, as this is something that may need to be synchronized with
>> OpenCoarrays.
>
> Won’t that mean that coarray programs will fail due to ABI mismatch?

No, the co-array intrinsics are, well, intrinsics, so they're handled
specially in the frontend and don't need to follow the normal
argument-passing conventions. But I think it'd be easier if they did,
and might prevent some obscure corner-case bugs. Say, create a
character variable with length 2**31+9, then typecasting to plain int
when calling the intrinsic would wrap around and the library would see
a negative length.



-- 
Janne Blomqvist

Re: [PATCH] PR 78534 Change character length from int to size_t

Reply via email to