Re: [001/nnn] poly_int: add poly-int.h

Martin Sebor Wed, 08 Nov 2017 19:32:39 -0800

On 11/08/2017 11:28 AM, Richard Sandiford wrote:

Martin Sebor <mse...@gmail.com> writes:

On 11/08/2017 09:51 AM, Richard Sandiford wrote:

Martin Sebor <mse...@gmail.com> writes:

On 11/08/2017 02:32 AM, Richard Sandiford wrote:

Martin Sebor <mse...@gmail.com> writes:

I haven't done nearly a thorough review but the dtor followed by
the placement new in the POLY_SET_COEFF() macro caught my eye so
I thought I'd ask sooner rather than later.  Given the macro
definition:


+   The dummy comparison against a null C * is just a way of checking
+   that C gives the right type.  */
+#define POLY_SET_COEFF(C, RES, I, VALUE) \
+  ((void) (&(RES).coeffs[0] == (C *) 0), \
+   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
+   ? (void) ((RES).coeffs[I] = VALUE) \
+   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))

is the following use well-defined?

+template<unsigned int N, typename C>
+inline poly_int_pod<N, C>&
+poly_int_pod<N, C>::operator <<= (unsigned int a)
+{
+  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);

It looks to me as though the VALUE argument in the ctor invoked
by the placement new expression is evaluated after the dtor has
destroyed the very array element the VALUE argument expands to.


Good catch!  It should simply have been doing <<= on each coefficient --
I must have got carried away when converting to POLY_SET_COEFF.

I double-checked the other uses and think that's the only one.

Whether or not is, in fact, a problem, it seems to me that using
a function template rather than a macro would be a clearer and
safer way to do the same thing.  (Safer in that the macro also
evaluates its arguments multiple times, which is often a source
of subtle bugs.)


That would slow down -O0 builds though, by introducing an extra
function call and set of temporaries even when the coefficients
are primitive integers.


Would decorating the function template with attribute always_inline
help?


It would remove the call itself, but we'd still have the extra temporary
objects that were the function argument and return value.


Sorry, I do not want to get into another long discussion about
trade-offs between safety and efficiency but I'm not sure I see
what extra temporaries it would create.  It seems to me that
an inline function template that took arguments of user-defined
types by reference and others by value should be just as efficient
as a macro.

 From GCC's own manual:

   6.43 An Inline Function is As Fast As a Macro
   https://gcc.gnu.org/onlinedocs/gcc/Inline.html


You can see the difference with something like:

  inline
  void __attribute__((always_inline))
  f(int &dst, const int &src) { dst = src; }

  int g1(const int &y) { int x; f(x, y); return x; }
  int g2(const int &y) { int x; x = y; return x; }


Let me say at the outset that I struggle to comprehend that a few
instructions is even a consideration when not optimizing, especially
in light of the bug the macro caused that would have been prevented
by using a function instead.  But...

...I don't think your example above is representative of using
the POLY_SET_COEFF macro.  The function template I'm suggesting
might look something to this:

  template <unsigned N, class C>
  inline void __attribute__ ((always_inline))
  poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
  {

((void) (&(*p).coeffs[0] == (C *) 0),wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void)((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new(&(*p).coeffs[0]) C (val)));


    if (N >= 2)
      for (unsigned int i = 1; i < N; i++)

((void) (&(*p).coeffs[0] == (C *) 0),wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void)((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new(&(*p).coeffs[i]) C (val)));

  }

To compare apples to apples I suggest to instead compare the shift
operator (or any other poly_int function that uses the macro) that
doesn't suffer from the bug vs one that makes use of the function
template.  I see a difference of 2 instructions on x86_64 (21 vs
23) for operator<<=.

Are two assembly instructions even worth talking about?

If that's not the case and there is a significant performance
penalty associated with inline functions at -O0 then GCC should
be fixed to avoid it.


I think those docs are really talking about inline functions being as
fast as macros when optimisation is enabled.  I don't think we make
any guarantees about -O0 code quality.


Sure, but you are using unsafe macros in preference to a safer
inline function even with optimization, introducing a bug as
a result, and making an argument that the performance impact
of a few instructions when not using optimization is what should
drive the decision between one and the other in all situations.
With all respect, I fail to see the logic in this like of
reasoning.  By that argument we would never be able to define
any inline functions.

That being said, if the performance implications of using inline
functions with no optimization are so serious here then I suggest
you should be concerned about introducing the poly_int API in its
current form at all: every access to the class is an inline
function.

On a more serious/constructive note, if you really are worried
about efficiency at this level then introducing an intrinsic
primitive into the compiler instead of a set of classes might
be worth thinking about.  It will only benefit GCC but it might
lay a foundation for all sorts of infinite precision integer
classes (including the C++ proposal that was pointed out in
the other thread).

Martin

Re: [001/nnn] poly_int: add poly-int.h

Reply via email to