Hi,
This is a fix for a pretty serious regression in GCC 4.7 onwards where
GCC is likely to put out wrong alignment specifiers in case of the neon
intrinsics. These specifiers appear to be much larger than the alignment
specifiers allowed by the architecture for the memory sizes allowed by
the instructions.
The part of the backend which was emitting the alignment specifiers
wasn't really wrong in what it was doing , it's just that the
information in terms of MEM_SIZE for the memory being accesses was wrong
when neon_dereference_pointer constructed these MEM_REFs in the first place.
There are 2 fundamental problems in the way in which the builtin
expanders and neon_dereference_pointer construct these memory references.
The first problem is that neon_dereference_pointer in the case that
reg_mode and mem_mode are identical doesn't take into account the number
of bytes that elem_type actually uses. The logic below in
neon_dereference_pointer essentially specifies that the memory accessed
by the intrinsic is an array of type elem_type with number of elements
equal to the number of elements in the vector.
The second problem and something more fundamental in
neon_dereference_pointer is that it attempts to figure out the
underlying type of the element being accessed by looking at the actual
parameter for the load or the store. However this is not necessarily
guaranteed to work always as the underlying type could by itself by an
array type causing the logic in neon_dereference_pointer to end up
constructing a multi-dimensional array of the basic type. The way I
spotted this was to construct a testcase from the original PR but using
the vld3q_lane_f32 style intrinsics. In these cases the memory reference
produced appeared to be loading a 2 dimensional array of 6 float values
instead of just 3 float values. Ouch !
The correct method ought to be to use the underlying type from the
formal parameter which is what this patch attempts to do.
Tested cross with no regressions on arm-linux-gnueabi with the relevant
configury, tested with a number of handwritten tests and observed size
of the memory accesses look sane.
Applied on trunk and will wait for a few days before backporting to 4.7
branch.
regards,
Ramana
2012-08-29 Ramana Radhakrishnan <ramana.radhakrish...@arm.com>
Richard Earnshaw <richard.earns...@arm.com>
PR target/54252
* config/arm/arm.c (neon_dereference_pointer): Adjust nelems by
element size. Use elem_type from the formal parameter. New parameter
fcode.
(neon_expand_args): Adjust call to neon_dereference_pointer.