On 03/07/18 23:40 +0200, Jakub Jelinek wrote:
On Tue, Jul 03, 2018 at 10:02:47PM +0100, Jonathan Wakely wrote:
+#ifndef _GLIBCXX_BIT
+#define _GLIBCXX_BIT 1
+
+#pragma GCC system_header
+
+#if __cplusplus >= 201402L
+
+#include <type_traits>
+#include <limits>
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+ template<typename _Tp>
+ constexpr _Tp
+ __rotl(_Tp __x, unsigned int __s) noexcept
+ {
+ constexpr auto _Nd = numeric_limits<_Tp>::digits;
+ const unsigned __sN = __s % _Nd;
+ if (__sN)
+ return (__x << __sN) | (__x >> (_Nd - __sN));
Wouldn't it be better to use some branchless pattern that
GCC can also optimize well, like:
return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1)));
(iff _Nd is always power of two),
_Nd is 20 for one of the INT_N types on msp340, but we could have a
special case for the rare integer types with unusual sizes.
or perhaps
return (__x << __sN) | (__x >> ((-_sN) % _Nd));
which is going to be folded into the above one for power of two constants?
That looks good.
E.g. ia32intrin.h also uses:
/* 64bit rol */
extern __inline unsigned long long
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
__rolq (unsigned long long __X, int __C)
{
__C &= 63;
return (__X << __C) | (__X >> (-__C & 63));
}
etc.
Should we delegate to those intrinsics for x86, so that
__builtin_ia32_rolqi and __builtin_ia32_rolhi can be used when
relevant?