https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96955
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Jakub Jelinek from comment #1) > And if possible, optimize, so that if one does say > int *p = (int *)__builtin_thread_pointer (); > return p[4]; > or > return p[i]; > it will not read %fs:0 into a register and read 16(%reg), but rather read > %fs:16 > etc. (of course only if not -mno-tls-direct-seg-refs) or not read > 16(%reg,%regI,4) but %fs:16(,%regI,4) etc. This optimization already exists in i386/x86-64 backend.