https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
Andrew Pinski changed:
What|Removed |Added
Target Milestone|--- |11.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #26 from Thomas Koenig ---
Yep, it's implemented and works great.
For a simple "sum of digits" program in base ten, it's an acceleration
by more than a factor of two.
Thanks!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
Jakub Jelinek changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #24 from CVS Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:037fe26ee1ac18581bf0ad646d48591c97d10bd3
commit r11-5648-g037fe26ee1ac18581bf0ad646d48591c97d10bd3
Author: Jakub Jelinek
Date: W
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #23 from CVS Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:855bb43f6d0bee5a74b5d3739456ca34b4609a50
commit r11-5614-g855bb43f6d0bee5a74b5d3739456ca34b4609a50
Author: Jakub Jelinek
Date: T
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #22 from CVS Commits ---
The master branch has been updated by Jakub Jelinek :
https://gcc.gnu.org/g:4d87bd39bafae86747944b2f8c53fdbc43b8dac3
commit r11-5533-g4d87bd39bafae86747944b2f8c53fdbc43b8dac3
Author: Jakub Jelinek
Date: M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #21 from Jakub Jelinek ---
Created attachment 49639
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49639&action=edit
gcc11-pr97459-alt.patch
Variant version of the patch which avoids the final adjustment from unsigned to
signed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
Jakub Jelinek changed:
What|Removed |Added
Attachment #49634|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
Jakub Jelinek changed:
What|Removed |Added
Attachment #49633|0 |1
is obsolete|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #18 from Jakub Jelinek ---
Created attachment 49633
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49633&action=edit
gcc11-pr97459-wip.patch
Completely untested patch, so far only for double-word unsigned modulo.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #17 from Thomas Koenig ---
To be compilable, my previous code lacks
typedef __uint128_t mytype;
> #define ONE ((__uint128_t) 1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #16 from Thomas Koenig ---
(In reply to Jakub Jelinek from comment #15)
> I plan to work on this early in stage3.
> And we really shouldn't use any tables, GCC should figure it all out.
> So, for double-word modulo by constant that wo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #15 from Jakub Jelinek ---
I plan to work on this early in stage3.
And we really shouldn't use any tables, GCC should figure it all out.
So, for double-word modulo by constant that would be expanded using a libcall,
go for x from the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #14 from Thomas Koenig ---
Created attachment 49520
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49520&action=edit
Numbers a, b so that 2^b ≡ 1 mod a up to b=64, larger b taken if several
solutions exist, plus the multiplicat
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
Thomas Koenig changed:
What|Removed |Added
Attachment #49438|divisiontable.dat |divisiontable.txt
filename|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #12 from Thomas Koenig ---
(In reply to Thomas Koenig from comment #11)
> Created attachment 49438 [details]
> Numbers a, b so that 2^b ≡ 1 mod a up to b=64, larger b taken if several
> solutions exist
>
A quick check that all numbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #11 from Thomas Koenig ---
Created attachment 49438
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49438&action=edit
Numbers a, b so that 2^b ≡ 1 mod a up to b=64, larger b taken if several
solutions exist
Here is the promised
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #10 from Thomas Koenig ---
There are a couple of more constants for this could be tried.
Base 7:
static unsigned
rem_7_v2 (mytype n)
{
unsigned long a, b, c, d;
a = n & MASK_48;
b = (n >> 48) & MASK_48;
c = n >> 96;
retur
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #9 from Thomas Koenig ---
(In reply to Jakub Jelinek from comment #7)
> So, can we use this for anything but modulo 3, or 5, or 17, or 257 (all of
> those have 2^32 mod N == 2^64 mod N == 2^128 mod N == 1)
I think so, too.
> probabl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #8 from Jakub Jelinek ---
And perhaps for other (but constant and not power of two) modulos use that
unsigned long long a;
a = (n >> 96) * (unsigned long long) (((__uint128_t 1) << 96) % c);
a += ((n >> 64) & 0xULL) * (u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #7 from Jakub Jelinek ---
So, can we use this for anything but modulo 3, or 5, or 17, or 257 (all of
those have 2^32 mod N == 2^64 mod N == 2^128 mod N == 1), probably also keyed
on the target having corresponding uaddv4_optab handler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #6 from Jakub Jelinek ---
I wasn't sure if the v5 version would be always correct. But the highest
possible halves of the uint128_t number are 0x +
0x, which gives carry 1 and 0xfffeULL and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #5 from Thomas Koenig ---
OK, so here is a benchmark with its function names corrected. It also
includes one version (_v5) which is a bit faster.
(Note I increased the number of iterations to get more accuracy out
of the cycle count,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #4 from Thomas Koenig ---
Here's a complete program for benchmarks on x86_64, using Jakub's
functions (so they are indeed correct):
#include
#include
#include
#include
#include
#include
unsigned r3_128u_v2 (__uint128_t n)
{
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #3 from Jakub Jelinek ---
Or:
unsigned r3_128u_v4 (__uint128_t n)
{
unsigned long a;
a = (n >> 96);
a += (n >> 64) & 0xULL;
a += (n >> 32) & 0xULL;
a += (n & 0xULL);
return a % 3;
}
if the target do
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
--- Comment #2 from Jakub Jelinek ---
E.g.
unsigned r3_128u_v3 (__uint128_t n)
{
unsigned long a;
a = (n >> 88);
a += (n >> 44) & 0xfffULL;
a += (n & 0xfffULL);
return a % 3;
}
could work, but haven't measured how fast i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment #1
27 matches
Mail list logo