On Tue, Apr 22, 2014 at 11:51 AM, Richard Sandiford <rdsandif...@googlemail.com> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Tue, Apr 22, 2014 at 11:15 AM, Richard Sandiford >> <rdsandif...@googlemail.com> wrote: >>> Richard Biener <richard.guent...@gmail.com> writes: >>>> On Tue, Apr 22, 2014 at 10:43 AM, Richard Sandiford >>>> <rdsandif...@googlemail.com> wrote: >>>>> Richard Biener <richard.guent...@gmail.com> writes: >>>>>> On Sat, Apr 19, 2014 at 9:51 AM, Richard Sandiford >>>>>> <rdsandif...@googlemail.com> wrote: >>>>>>> wide-int fails to build libitm because of a bad interaction between: >>>>>>> >>>>>>> /* Keep the OI and XI modes from confusing the compiler into thinking >>>>>>> that these modes could actually be used for computation. They are >>>>>>> only holders for vectors during data movement. */ >>>>>>> #define MAX_BITSIZE_MODE_ANY_INT (128) >>>>>>> >>>>>>> and the memcpy folding code: >>>>>>> >>>>>>> /* Make sure we are not copying using a floating-point mode or >>>>>>> a type whose size possibly does not match its precision. */ >>>>>>> if (FLOAT_MODE_P (TYPE_MODE (desttype)) >>>>>>> || TREE_CODE (desttype) == BOOLEAN_TYPE >>>>>>> || TREE_CODE (desttype) == ENUMERAL_TYPE) >>>>>>> { >>>>>>> /* A more suitable int_mode_for_mode would return a vector >>>>>>> integer mode for a vector float mode or a integer complex >>>>>>> mode for a float complex mode if there isn't a regular >>>>>>> integer mode covering the mode of desttype. */ >>>>>>> enum machine_mode mode = int_mode_for_mode (TYPE_MODE >>>>>>> (desttype)); >>>>>>> if (mode == BLKmode) >>>>>>> desttype = NULL_TREE; >>>>>>> else >>>>>>> desttype = build_nonstandard_integer_type (GET_MODE_BITSIZE >>>>>>> (mode), >>>>>>> 1); >>>>>>> } >>>>>>> if (FLOAT_MODE_P (TYPE_MODE (srctype)) >>>>>>> || TREE_CODE (srctype) == BOOLEAN_TYPE >>>>>>> || TREE_CODE (srctype) == ENUMERAL_TYPE) >>>>>>> { >>>>>>> enum machine_mode mode = int_mode_for_mode (TYPE_MODE >>>>>>> (srctype)); >>>>>>> if (mode == BLKmode) >>>>>>> srctype = NULL_TREE; >>>>>>> else >>>>>>> srctype = build_nonstandard_integer_type (GET_MODE_BITSIZE >>>>>>> (mode), >>>>>>> 1); >>>>>>> } >>>>>>> >>>>>>> The failure occurs for complex long double, which we try to copy as >>>>>>> a 256-bit integer type (OImode). >>>>>>> >>>>>>> This patch tries to do what the comment suggests by introducing a new >>>>>>> form of int_mode_for_mode that replaces vector modes with vector modes >>>>>>> and complex modes with complex modes. The fallback case of using a >>>>>>> MODE_INT is limited by MAX_FIXED_MODE_SIZE, so can never go above >>>>>>> 128 bits on x86_64. >>>>>>> >>>>>>> The question then is what to do about 128-bit types for i386. >>>>>>> MAX_FIXED_MODE_SIZE is 64 there, which says that int128_t shouldn't be >>>>>>> used for optimisation. However, gcc.target/i386/pr49168-1.c only passes >>>>>>> for -m32 -msse2 because we use int128_t to copy a float128_t. >>>>>>> >>>>>>> I handled that by allowing MODE_VECTOR_INT to be used instead of >>>>>>> MODE_INT if the mode size is greater than MAX_FIXED_MODE_SIZE, >>>>>>> even if the original type wasn't a vector. >>>>>> >>>>>> Hmm. Sounds reasonable unless there are very weird targets that >>>>>> cannot efficiently load/store vectors unaligned but can handle >>>>>> efficient load/store of unaligned scalars. >>>>> >>>>> Yeah, in general there's no guarantee that even int_mode_for_mode >>>>> will return a mode with the same alignment as the original. Callers >>>>> need to check that (like the memcpy folder does). >>>>> >>>>>>> It might be that other callers to int_mode_for_mode should use >>>>>>> the new function too, but I'll look at that separately. >>>>>>> >>>>>>> I used the attached testcase (with printfs added to gcc) to check that >>>>>>> the right modes and types were being chosen. The patch fixes the >>>>>>> complex float and complex double cases, since the integer type that we >>>>>>> previously picked had a larger alignment than the original complex type. >>>>>> >>>>>> As of complex int modes - are we positively sure that targets even >>>>>> try to do sth "optimal" for loads/stores of those? >>>>> >>>>> Complex modes usually aren't handled directly by .md patterns, >>>>> either int or float. They're really treated as a pair of values. >>>>> So IMO it still makes sense to fold this case. >>>>> >>>>>>> One possibly subtle side-effect of FLOAT_MODE_P (TYPE_MODE (desttype)) >>>>>>> is that vectors are copied as integer vectors if the target supports >>>>>>> them directly but are copied as float vectors otherwise, since in the >>>>>>> latter case the mode will be BLKmode. E.g. the 1024-bit vectors in the >>>>>>> test are copied as vector floats and vector doubles both before and >>>>>>> after the patch. >>>>>> >>>>>> That wasn't intended ... the folding should have failed if we can't >>>>>> copy using an integer mode ... >>>>> >>>>> Does that mean that the fold give up if TYPE_MODE is BLKmode? >>>>> I can do that as a separate patch if so. >>>> >>>> Looking at the code again it should always choose an integer mode/type >>>> via setting desttype/srctype to NULL for BLKmode and >>>> >>>> if (!srctype) >>>> srctype = desttype; >>>> if (!desttype) >>>> desttype = srctype; >>>> if (!srctype) >>>> return NULL_TREE; >>>> >>>> no? Thus if we can't get a integer type for either src or dest then >>>> we fail. But we should never end up with srctype or desttype >>>> being a float mode. No? >>> >>> Right, they don't have a float _mode_, because the target doesn't >>> support 1024-bit modes. Like I say, they have BLKmode instead. >>> But they have a float type. >> >> Ok. I clearly used FLOAT_MODE_P (TYPE_MODE ()) because >> I assumed that only float-modes will eventually end up in FP >> registers (that's what the bug was about - we load into a FP register >> and that load can trigger a FP exception). > > Yeah. I just flagged it up because it wasn't obvious that native vector > float types need to be copied as vector ints while non-native ones don't. > A non-native vector float might be scalarised into individual float > operations and thus might end up using an FP register load.
That's true - we should fail for that case. Richard. > Thanks, > Richard