Re: RFC: C extension to support variable-length vector types

Richard Sandiford Thu, 03 Aug 2017 10:05:32 -0700

Torvald Riegel <trie...@redhat.com> writes:
> On Wed, 2017-08-02 at 17:59 +0100, Richard Sandiford wrote:
>> Torvald Riegel <trie...@redhat.com> writes:
>> > On Wed, 2017-08-02 at 14:09 +0100, Richard Sandiford wrote:
>> >>   (1) Does the approach seem reasonable?
>> >> 
>> >>   (2) Would it be acceptable in principle to add this extension to the
>> >>       GCC C frontend (only enabled where necessary)?
>> >> 
>> >>   (3) Should we submit this to the standards committee?
>> >
>> > I hadn't have time to look at the proposal in detail.  I think it would
>> > be good to have the standards committees review this.  I doubt you could
>> > find consensus in the C++ for type system changes unless you have a
>> > really good reason.  Have you considered how you could use the ARM
>> > extensions from http://wg21.link/p0214r4 ?
>> 
>> Yeah, we've been following that proposal, but I don't think it helps
>> as-is with SVE.  datapar<T> is "an array of target-specific size,
>> with elements of type T, ..." and for SVE the natural target-specific
>> size would be a runtime value.  The core language would still need to
>> provide a way of creating that array.
>
> I think the actual data will have a size -- your proposal seems to try
> to express a control structure (ie, SIMD / loop-like) by changing the
> type system.  This seems odd to me.
>
> Why can't you keep the underlying data have a size (ie, be an array),
> and change your operations to work on arrays or slices of arrays?  That
> won't help with automatic-storage-duration variables that one would need
> as temporaries, but perhaps it would be better to let programmers
> declare these variables as large vectors and have the compiler figure
> out what size they really need to be if only accessed through the SVE
> operations as temporary storage?


The types only really exist for objects of automatic storage duration
and for passing to and returning from functions.  Like you say, the
original input and final result will be normal arrays.

For example, the vector function underlying:

    #pragma omp declare simd
    double sin(double);

would be:

    svfloat64_t mangled_sin(svfloat64_t, svbool_t);

(The svbool_t is because SVE functions should be predicated by default,
to avoid the need for a scalar tail.)

These svfloat64_t and svbool_t types have no fixed size at compile time:
they represent one SVE register's worth of data, however big that register
happens to be.  Making datapar<T> be an array of a specific size would
make it unsuitable here.

To put it another way: the calling conventions do have the concept
of a register-sized vector that can be passed and returned efficiently.
These ACLE types are the C manifestations of those register-sized ABI types.
If instead we said that SVE vectors should be implicitly extracted from
a larger array, the ABI type would not have a direct representation in C.
I can't think of another case where that's true.

Leaving aside the question of vector library functions, if functions
used arrays for temporary results, and the ACLE intrinsics only operated
on slices of those arrays, it wouldn't always be obvious how big the
arrays should be.  For example, here's a naive ACLE implementation of a
step-1 daxpy (quoting only to show the use of the types, since a
walkthrough of the behaviour might be off-topic):

    void daxpy_1_1(int64_t n, double da, double *dx, double *dy)
    {
      int64_t i = 0;
      svbool_t pg = svwhilelt_b64(i, n);
      do
        {
          svfloat64_t dx_vec = svld1(pg, &dx[i]);
          svfloat64_t dy_vec = svld1(pg, &dy[i]);
          svst1(pg, &dy[i], svmla_x(pg, dy_vec, dx_vec, da));
          i += svcntd();
          pg = svwhilelt_b64(i, n);
        }
      while (svptest_any(svptrue_b64(), pg));
    }

This isn't a good motivating example for why the ACLE is needed,
since the compiler ought to produce similar code from a simple scalar loop.
But if you were writing a less naive implementation for SVE, it would use
the ACLE in a similar way.

The point is that this implementation supports any vector length.
There's no hard limit on the size of the temporaries.

A perhaps more useful example is a naive implementation of a loop that
converts non-printable ASCII characters to '.' (obviously not a common
time-critical operation, but it has the advantage of being short and
using a few SVE-specific features):

    void f(uint8_t *a)
    {
      svbool_t trueb = svptrue_b8();
      svuint8_t dots = svdup_u8('.');
      svbool_t terminators;
      do
        {
          svwrffr(trueb);
          svuint8_t data = svldff1(trueb, a);
          svbool_t ld_mask = svrdffr();
          svbool_t nonascii = svcmplt(ld_mask, data, ' '-1);
          terminators = svcmpeq(ld_mask, data, 0);
          svbool_t st_mask = svbrkb_z(nonascii, terminators);
          svst1(st_mask, a, dots);
          a += svcntp_b8(trueb, ld_mask);
        }
      while (!svptest_any(trueb, terminators));
    }

Again, a walkthrough of the code might be off-topic, but the point
is that the trip count of this loop is data-dependent and the loop
doesn't necessarily operate on the same number of elements in each
iteration.  It can't simply be written as a vector extension of
a scalar operation.

Also, it's possible to rewrite this in a way that should be more
efficient in common cases.  The point of the ACLE is that the
programmer has direct control over that kind of decision, rather
than leaving the control flow to the compiler.

>> Similarly to other vector architectures (including AdvSIMD), the SVE
>> intrinsics and their types are more geared towards people who want
>> to optimise specifically for SVE without having to resort to assembly.
>> That's an important use case for us, and I think there's always going to
>> be a need for it alongside generic SIMD and parallel-programming models
>> (which of course are a good thing to have too).
>> 
>> Being able to use SVE features from C is also important.  Not all
>> projects are prepared to convert to C++.
>
> I'd doubt that the sizeless types would find consensus in the C++
> committee.  The C committee may perhaps be more open to that, given that
> C is more restricted and thus has to use language extensions more often.
>
> If they don't find uptake in ISO C/C++, this will always be a
> vendor-specific thing.  You seem to say that this may be okay for you,
> but are there enough non-library-implementer developers out there that
> would use it to justify extending the type system?

We'd certainly like to get ACLE support into GCC and clang if possible.
It just seemed like the two ways of doing that were to get the type
system changes accepted by the standards committee or to get them
accepted as an extension by both compilers individually (similarly to
how both compilers support many GNU extensions).

[From another message]

> BTW, have you also looked at P0546 and P0122?

Thanks for the pointer.  I hadn't seen those before, but I'm not sure they
would help.  P0122 says:

    The span type is an abstraction that provides a view over a contiguous
    sequence of objects, the storage of which is owned by some other
    object.

whereas we want the ACLE types to be self-contained types that own
the underlying storage.

Thanks,
Richard

Re: RFC: C extension to support variable-length vector types

Reply via email to