[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2021-08-15 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |11.0

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-12-03 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #26 from Thomas Koenig --- Yep, it's implemented and works great. For a simple "sum of digits" program in base ten, it's an acceleration by more than a factor of two. Thanks!

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-12-02 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-12-02 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #24 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:037fe26ee1ac18581bf0ad646d48591c97d10bd3 commit r11-5648-g037fe26ee1ac18581bf0ad646d48591c97d10bd3 Author: Jakub Jelinek Date: W

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-12-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #23 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:855bb43f6d0bee5a74b5d3739456ca34b4609a50 commit r11-5614-g855bb43f6d0bee5a74b5d3739456ca34b4609a50 Author: Jakub Jelinek Date: T

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-30 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #22 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:4d87bd39bafae86747944b2f8c53fdbc43b8dac3 commit r11-5533-g4d87bd39bafae86747944b2f8c53fdbc43b8dac3 Author: Jakub Jelinek Date: M

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #21 from Jakub Jelinek --- Created attachment 49639 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49639&action=edit gcc11-pr97459-alt.patch Variant version of the patch which avoids the final adjustment from unsigned to signed

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Jakub Jelinek changed: What|Removed |Added Attachment #49634|0 |1 is obsolete|

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Jakub Jelinek changed: What|Removed |Added Attachment #49633|0 |1 is obsolete|

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #18 from Jakub Jelinek --- Created attachment 49633 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49633&action=edit gcc11-pr97459-wip.patch Completely untested patch, so far only for double-word unsigned modulo.

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-08 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #17 from Thomas Koenig --- To be compilable, my previous code lacks typedef __uint128_t mytype; > #define ONE ((__uint128_t) 1)

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-08 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #16 from Thomas Koenig --- (In reply to Jakub Jelinek from comment #15) > I plan to work on this early in stage3. > And we really shouldn't use any tables, GCC should figure it all out. > So, for double-word modulo by constant that wo

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #15 from Jakub Jelinek --- I plan to work on this early in stage3. And we really shouldn't use any tables, GCC should figure it all out. So, for double-word modulo by constant that would be expanded using a libcall, go for x from the

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-11-08 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #14 from Thomas Koenig --- Created attachment 49520 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49520&action=edit Numbers a, b so that 2^b ≡ 1 mod a up to b=64, larger b taken if several solutions exist, plus the multiplicat

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-25 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Thomas Koenig changed: What|Removed |Added Attachment #49438|divisiontable.dat |divisiontable.txt filename|

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-24 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #12 from Thomas Koenig --- (In reply to Thomas Koenig from comment #11) > Created attachment 49438 [details] > Numbers a, b so that 2^b ≡ 1 mod a up to b=64, larger b taken if several > solutions exist > A quick check that all numbe

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-24 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #11 from Thomas Koenig --- Created attachment 49438 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49438&action=edit Numbers a, b so that 2^b ≡ 1 mod a up to b=64, larger b taken if several solutions exist Here is the promised

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-23 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #10 from Thomas Koenig --- There are a couple of more constants for this could be tried. Base 7: static unsigned rem_7_v2 (mytype n) { unsigned long a, b, c, d; a = n & MASK_48; b = (n >> 48) & MASK_48; c = n >> 96; retur

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-20 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #9 from Thomas Koenig --- (In reply to Jakub Jelinek from comment #7) > So, can we use this for anything but modulo 3, or 5, or 17, or 257 (all of > those have 2^32 mod N == 2^64 mod N == 2^128 mod N == 1) I think so, too. > probabl

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-19 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #8 from Jakub Jelinek --- And perhaps for other (but constant and not power of two) modulos use that unsigned long long a; a = (n >> 96) * (unsigned long long) (((__uint128_t 1) << 96) % c); a += ((n >> 64) & 0xULL) * (u

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-19 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #7 from Jakub Jelinek --- So, can we use this for anything but modulo 3, or 5, or 17, or 257 (all of those have 2^32 mod N == 2^64 mod N == 2^128 mod N == 1), probably also keyed on the target having corresponding uaddv4_optab handler

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-18 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #6 from Jakub Jelinek --- I wasn't sure if the v5 version would be always correct. But the highest possible halves of the uint128_t number are 0x + 0x, which gives carry 1 and 0xfffeULL and

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-18 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #5 from Thomas Koenig --- OK, so here is a benchmark with its function names corrected. It also includes one version (_v5) which is a bit faster. (Note I increased the number of iterations to get more accuracy out of the cycle count,

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-17 Thread tkoenig at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #4 from Thomas Koenig --- Here's a complete program for benchmarks on x86_64, using Jakub's functions (so they are indeed correct): #include #include #include #include #include #include unsigned r3_128u_v2 (__uint128_t n) {

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-16 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #3 from Jakub Jelinek --- Or: unsigned r3_128u_v4 (__uint128_t n) { unsigned long a; a = (n >> 96); a += (n >> 64) & 0xULL; a += (n >> 32) & 0xULL; a += (n & 0xULL); return a % 3; } if the target do

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-16 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 --- Comment #2 from Jakub Jelinek --- E.g. unsigned r3_128u_v3 (__uint128_t n) { unsigned long a; a = (n >> 88); a += (n >> 44) & 0xfffULL; a += (n & 0xfffULL); return a % 3; } could work, but haven't measured how fast i

[Bug rtl-optimization/97459] __uint128_t remainder for division by 3

2020-10-16 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97459 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1