https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827
Bug ID: 107827 Summary: switch table compression could shrink large tables (but missing), testcase cerf-lib Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: aleks at physik dot tu-berlin.de Target Milestone: --- The cerf-lib (self-contained numeric library that provides an efficient and accurate implementation of complex error functions) contains 2 switch-blocks having about 100 entries. Each entry represents a piece-wise taylor-polynomial-approximation (aiming nearly machine precision, type double). https://jugit.fz-juelich.de/mlz/libcerf/-/tree/main/lib Compiler Explorer (selected -O3 for x86-64-gcc-12) https://godbolt.org/ shows ASM output having huge jump-tables. Even code compiles fine on my PC with gcc11, you have to shrink to 20 cases in Compiler-Explorer to see result (some webservice limit, but this is not the issue here). Lets focus on 2 static functions. 1. erfcx.c / static double erfcx_y100() / case (int) 0..99 (100 cases, no gap, each case is a taylor-polynomial of grade 6). Gives Jump-Table of 100 x 8 bytes, plus code. The table could be easy compressed or replaced by linear formula. a) assumning no function ever exceeded 4 GB of code, it would be possible to use only 4 byte offset (jumping-distance), and for 99% of funtions 65 KB range would be enough (at least in O2 is makes sence, but also in O3 it could be faster to decrease memory/cache usage - this could become a new option to control it). b) for this erfcx_y100-sample it would be even possible to calc the offset by formula. c) of course the parameters could be a separate lookup-table, but no speedup for other reasons (I tried these days). gcc could detect the similarity (idential cases beside the 7 hard-coded coefficients) which would be antother change request.. sample (I cutted the coefficients numbers for readability) switch ((int) y100) { case 0: { double t = 2*y100 - 1; return 0.7 + (0.7 + (0.3 + (0.1 + (0.8 + (0.3 + 0.1 * t) *t) *t) *t) *t) *t; } case 1: { double t = 2*y100 - 3; return 0.2 + (0.7 + (0.3 + (0.1 + (0.8 + (0.3 + 0.1 *t) *t) *t) *t) *t) *t; } .. until case 99: // instead 8 x 100 byte table it could be 2 x 100 byte, or 10 byte linear equation. 2. im_w_of_x.c / static double w_im_y100() / case (int) 0..96 (97 cases, no gap, similar polynomials between 6th to 8th grade, plus another unique final formula). Here the code is (even neglecting the coefficients) not identical, but still similar. Address-Offset-Compression could still be used (but not linear all the way). d) same as a) + b) e) side note: " 2*y100 " seems no to be detected and calculated once at begin of switch. f) detection of similar code (same equation, different parameters) could be used to decrease object-size (text) as only 4 kinds of equations are here, only offset in some created "const double parameters[]"-array differs. PS: no code is wrong or shows different results, this is a ro-memory-consumption and timing issue only. Thanks for reading.