howard.hinnant added a comment.
I created a top-level `gcd` which did nothing but make everything unsigned and
do the abs once, and then called a `__gcd` specialized for unsigned and only
one kind of unsigned, and got measurably faster results (about 10%).
template<class _Tp>
constexpr
std::enable_if_t<is_unsigned<_Tp>{}, _Tp> _LIBCPP_INLINE_VISIBILITY
__gcd(_Tp __m, _Tp __n)
{
return __n == 0 ? __m : __gcd(__n, __m % __n);
}
template<class _Tp, class _Up>
constexpr
common_type_t<_Tp,_Up> _LIBCPP_INLINE_VISIBILITY
gcd(_Tp __m, _Up __n)
{
static_assert((is_integral<_Tp>::value && is_integral<_Up>::value),
"Arguments to gcd must be integer types");
using _Rp = common_type_t<_Tp,_Up>;
using _Wp = make_unsigned_t<_Rp>;
return static_cast<_Rp>(__gcd(static_cast<_Wp>(__abs<_Tp>()(__m)),
static_cast<_Wp>(__abs<_Up>()(__n))));
}
Here is the test driver I used:
#include <chrono>
#include <iostream>
int
main()
{
std::uint64_t x = 0;
auto t0 = std::chrono::steady_clock::now();
for (auto i = 1'999'990'000; i <= 2'000'000'000; ++i)
{
for (auto j = 1'999'990'000; j <= 2'000'000'000; ++j)
{
x += gcd(i, j);
}
}
auto t1 = std::chrono::steady_clock::now();
std::cout << std::chrono::duration<double>(t1-t0).count() << '\n';
return x;
}
I also tried the traditional iterative solution (as opposed to recursive) and
was surprised that it was not faster. I then tried "binary" gcd
(https://en.wikipedia.org/wiki/Binary_GCD_algorithm) and was again surprised it
wasn't faster. Both of these alternatives were in the ball park though, and it
could be that they might measure faster over other domains. The binary variant
used `__m >>= std::__ctz(__m);` instead of a while loop to remove factors of 2
(in two places).
http://reviews.llvm.org/D21343
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits