On 7/3/24 2:36 AM, Kewen.Lin wrote:
Hi Carl,
on 2024/6/27 01:05, Carl Love wrote:
GCC maintainers:
The following patch updates the user documentation for the vec_ld, vec_lde,
vec_st and vec_ste built-ins to make it clearer that there are data alignment
requirements for these built-ins. If the data alignment requirements are not
followed, the data loaded or stored by these built-ins will be wrong.
Please let me know if this patch is acceptable for mainline. Thanks.
Carl
----------------------------------------------------
rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly. Add some additional text to the existing
documentation to make this clearer to the user.
Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element. Update the
documentation to make this clear to the user.
gcc/ChangeLog:
* doc/extend.texi: Add clarification for the use of the vec_ld
vec_st, vec_lde and vec_ste built-ins.
---
gcc/doc/extend.texi | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
@end smallexample
Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available. The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions. The
This change removed "even if the VSX instruction set is available.", I think
it's
not intentional? vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO. Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs. But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words
to
PVIPR "ignoring the four low-order bits of the calculated address". And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly.
Yea, probably should have left "even if the VSX instruction set is
available."
I was looking to make it clear that if the data is not 16-bye aligned
you may not get the expected data loaded/stored.
So how about the following instead:
Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
if the VSX
instruction set is available. The instructions mask off the lower
4-bits of
the calculated address. The use of these instructions on data that
is not
16-byte aligned may result in unexpected bytes being loaded or stored.
+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly. The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float. The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly. The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.
As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and
users
should read the description in PVIPR instead (it's more recommended).
The goal of mentioning the vec_lde and vec_ste built-ins was to give the
user a pointer to built-ins that will work as expected on unaligned
data. It will probably save them a lot of time an frustration if they
are given a hint of what built-ins they should look at. So, how about
the following:
See the PVIPR description of the vec_lde and vec_ste for loading and
storing
data that is not 16-byte aligned.
Carl