On Tue, Jul 03, 2018 at 10:02:47PM +0100, Jonathan Wakely wrote: > +#ifndef _GLIBCXX_BIT > +#define _GLIBCXX_BIT 1 > + > +#pragma GCC system_header > + > +#if __cplusplus >= 201402L > + > +#include <type_traits> > +#include <limits> > + > +namespace std _GLIBCXX_VISIBILITY(default) > +{ > +_GLIBCXX_BEGIN_NAMESPACE_VERSION > + > + template<typename _Tp> > + constexpr _Tp > + __rotl(_Tp __x, unsigned int __s) noexcept > + { > + constexpr auto _Nd = numeric_limits<_Tp>::digits; > + const unsigned __sN = __s % _Nd; > + if (__sN) > + return (__x << __sN) | (__x >> (_Nd - __sN));
Wouldn't it be better to use some branchless pattern that GCC can also optimize well, like: return (__x << __sN) | (__x >> ((-_sN) & (_Nd - 1))); (iff _Nd is always power of two), or perhaps return (__x << __sN) | (__x >> ((-_sN) % _Nd)); which is going to be folded into the above one for power of two constants? E.g. ia32intrin.h also uses: /* 64bit rol */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) __rolq (unsigned long long __X, int __C) { __C &= 63; return (__X << __C) | (__X >> (-__C & 63)); } etc. Jakub