pengfei accepted this revision.
pengfei added inline comments.
This revision is now accepted and ready to land.


================
Comment at: clang/lib/Headers/fmaintrin.h:22
+/// Computes a multiply-add of 128-bit vectors of [4 x float].
+///    For each element, computes <c> (__A * __B) + __C </c>.
+///
----------------
probinson wrote:
> pengfei wrote:
> > We are using a special format to describute the function in a pseudo code 
> > to share it with the intrinsic guide, e.g.,
> > https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/avx512fintrin.h#L9604-L9610
> > https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_i32logather_pd&ig_expand=4077
> > 
> > There's no strong requirement to follow it, but it would be better to adopt 
> > a uniform format.
> Is a FOR loop with computed bit offsets really clearer than "For each 
> element" ? Is it valuable to repeat information that can be found in the 
> instruction reference? 
> I can accept answers of "yes" and "yes" because I am not someone who ever 
> deals with vector data, but I would be a little surprised by those answers.
> 
We have internal tools to verify them according to these offsets (but of course 
not for 100% intrinsics), which means we can trust such information in most 
time.

The suggestion is based on:
1. Correctness. It's easy to be errorprone for tedious documentations and we 
don't have other verifications but eyes;
2. Unity. Using the same format is not only better for clean code but also 
distinct for future developers to follow;

As I said before, I'm not forcing this way. You decide it.


================
Comment at: clang/lib/Headers/fmaintrin.h:26
+///
+/// This intrinsic corresponds to the <c> VFMADD213PS </c> instruction.
+///
----------------
probinson wrote:
> pengfei wrote:
> > It would be odd to user given this is just 1/3 instructions the intrinsic 
> > may generate, but I don't have a good idea here.
> I listed the 213 version because that's the one that multiplies the first two 
> operands, and the intrinsic multiplies the first two operands. So it's the 
> instruction that most closely corresponds to the intrinsic.
> We don't guarantee that the "corresponding" instruction is what is actually 
> generated, in general. I know this point has come up before regarding 
> intrinsic descriptions. My thinking is that the "corresponding instruction" 
> gives the reader a place to look in the instruction reference manual, so 
> listing only one is again okay.
Note, intrinsics are force inlined. It's rare user will wrap it in a simple 
function with the same arguments order. In reality, each version may be 
generated. https://godbolt.org/z/q7j8Wxrnb

However, I'm not against this approach if we don't have any better way to 
describe it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148021/new/

https://reviews.llvm.org/D148021

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to