On Mon, Dec 12, 2016 at 12:20 PM, FX <fxcoud...@gmail.com> wrote: > Hi Janne, > > This is an ABI change, so it is serious… it will require people to recompile > older code and libraries with the new compiler. Do we already plan to break > the ABI in this cycle, or is this the first ABI-breaking patch of the cycle?
As Andre mentioned, the ABI has already been broken, Gfortran 7 will have libgfortran.so.4. However, this will also affect people doing C->Fortran calls the old-fashioned way without ISO_C_BINDING, as they will have to change the string length argument from int to size_t in their prototypes. Then again, Intel Fortran did this some years ago so I guess at least people who care about portability to several compilers are aware. > And do we have real-life examples of character strings larger than 2 GB? Well, people who have needed such will have figured out some work-around since we haven't supported it, so how would we know? :) It could be splitting the data into several strings, or switching to ifort, using C instead of Fortran, or something else. In any case, I don't expect characters larger than 2 GB to be common (particularly with the Fortran standard-mandated behaviour of space-padding to the end in many cases), but as the ABI has been broken anyways, we might as well fix it. IIRC at some point there was some discussion of this on comp.lang.fortran, and somebody mentioned analysis of genomic data as a use case where large characters can be useful. I don't have any personal usecase though, at least at the moment. >> Also, as there are some places in the frontend were negative character >> lengths are used as special flag values, in the frontend the character >> length is handled as a signed variable of the same size as a size_t, >> although in the runtime library it really is size_t. > > First, I thought: we should really make it size_t, and have the negative > values be well-defined constants, e.g. (size_t) -1 I tried it, but in addition to the issue with negative characters used as flag values, there's issues like we have stuff such as gfc_get_int_expr() that take a kind value, and an integer constant, and produces a gfc_expr. But that doesn't understand stuff like unsigned types. So in the end I decided it's better to get this patch in working shape and merged with the ABI changes, then one can fix the unsigned-ness later (in the end it's just a factor of two in sizes we can handle, so not a huge deal). > On the other hand, there is the problem of the case where the front-end has > different size_t than the target: think 32-bit on 64-bit i386 (front-end > size_t larger than target size_t), or cross-compiling for 64-bit on a 32-bit > machine (front-end size_t smaller than target size_t). So the charlen type > bounds need to be determined when the front-end runs, not when it is compiled > (i.e. it is not a fixed type). True. Although things like gfc_charlen_type_node should be correct for the target, the type gfc_charlen_t that I introduced in the frontend might be too small if one is doing a 32->64 bit cross-compile. So that should be changed from a typedef of ptrdiff_t to a typedef of HOST_WIDE_INT which AFAIK is guaranteed to be 64-bit everywhere. > In iresolve.c, the "Why is this fixup needed?” comment is kinda scary. Hmm, I think it's a leftover from some earlier experimentation, should be removed. >> I haven't changed the character length variables for the co-array >> intrinsics, as this is something that may need to be synchronized with >> OpenCoarrays. > > Won’t that mean that coarray programs will fail due to ABI mismatch? No, the co-array intrinsics are, well, intrinsics, so they're handled specially in the frontend and don't need to follow the normal argument-passing conventions. But I think it'd be easier if they did, and might prevent some obscure corner-case bugs. Say, create a character variable with length 2**31+9, then typecasting to plain int when calling the intrinsic would wrap around and the library would see a negative length. -- Janne Blomqvist