https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81501

--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <h...@gcc.gnu.org>:

https://gcc.gnu.org/g:5cf1b9a03ec5b617af8c50c1e9c0d223083fd7f2

commit r16-3190-g5cf1b9a03ec5b617af8c50c1e9c0d223083fd7f2
Author: H.J. Lu <hjl.to...@gmail.com>
Date:   Fri Aug 19 11:50:41 2022 -0700

    x86-64: Remove redundant TLS calls

    For TLS calls:

    1. UNSPEC_TLS_GD:

      (parallel [
        (set (reg:DI 0 ax)
             (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
                      (const_int 0 [0])))
        (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
                    (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
        (clobber (reg:DI 5 di))])

    2. UNSPEC_TLS_LD_BASE:

      (parallel [
        (set (reg:DI 0 ax)
             (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
                      (const_int 0 [0])))
        (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])

    3. UNSPEC_TLSDESC:

      (parallel [
         (set (reg/f:DI 104)
               (plus:DI (unspec:DI [
                           (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
                           (reg:DI 114)
                           (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
                        (const:DI (unspec:DI [
                                     (symbol_ref:DI ("e") [flags 0x1a])
                                  ] UNSPEC_DTPOFF))))
         (clobber (reg:CC 17 flags))])

      (parallel [
        (set (reg:DI 101)
             (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
                         (reg:DI 112)
                         (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
        (clobber (reg:CC 17 flags))])

    they return the same value for the same input value.  But multiple calls
    with the same input value may be generated for simple programs like:

    void a(long *);
    int b(void);
    void c(void);
    static __thread long e;
    long
    d(void)
    {
      a(&e);
      if (b())
        c();
      return e;
    }

    When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
    generated:

            .type   d, @function
    d:
    .LFB0:
            .cfi_startproc
            pushq   %rbx
            .cfi_def_cfa_offset 16
            .cfi_offset 3, -16
            leaq    e@TLSDESC(%rip), %rbx
            movq    %rbx, %rax
            call    *e@TLSCALL(%rax)
            addq    %fs:0, %rax
            movq    %rax, %rdi
            call    a@PLT
            call    b@PLT
            testl   %eax, %eax
            jne     .L8
            movq    %rbx, %rax
            call    *e@TLSCALL(%rax)
            popq    %rbx
            .cfi_remember_state
            .cfi_def_cfa_offset 8
            movq    %fs:(%rax), %rax
            ret
            .p2align 4,,10
            .p2align 3
    .L8:
            .cfi_restore_state
            call    c@PLT
            movq    %rbx, %rax
            call    *e@TLSCALL(%rax)
            popq    %rbx
            .cfi_def_cfa_offset 8
            movq    %fs:(%rax), %rax
            ret
            .cfi_endproc

    There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
    Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
    extend it to also remove redundant TLS calls to generate:

    d:
    .LFB0:
            .cfi_startproc
            pushq   %rbx
            .cfi_def_cfa_offset 16
            .cfi_offset 3, -16
            leaq    e@TLSDESC(%rip), %rax
            movq    %fs:0, %rdi
            call    *e@TLSCALL(%rax)
            addq    %rax, %rdi
            movq    %rax, %rbx
            call    a@PLT
            call    b@PLT
            testl   %eax, %eax
            jne     .L8
            movq    %fs:(%rbx), %rax
            popq    %rbx
            .cfi_remember_state
            .cfi_def_cfa_offset 8
            ret
            .p2align 4,,10
            .p2align 3
    .L8:
            .cfi_restore_state
            call    c@PLT
            movq    %fs:(%rbx), %rax
            popq    %rbx
            .cfi_def_cfa_offset 8
            ret
            .cfi_endproc

    with only one "call *e@TLSCALL(%rax)".  This reduces the number of
    __tls_get_addr calls in libgcc.a by 72%:

    __tls_get_addr calls     before         after
    libgcc.a                 868            243

    gcc/

            PR target/81501
            * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD,
            X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
            (redundant_load): Renamed to ...
            (redundant_pattern): This.
            (ix86_place_single_vector_set): Replace redundant_load with
            redundant_pattern.
            (replace_tls_call): New.
            (ix86_place_single_tls_call): Likewise.
            (pass_remove_redundant_vector_load): Renamed to ...
            (pass_x86_cse): This.  Add val, def_insn, mode, scalar_mode, kind,
            x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and
            candidate_vector_p.
            (pass_x86_cse::candidate_gnu_tls_p): New.
            (pass_x86_cse::candidate_gnu2_tls_p): Likewise.
            (pass_x86_cse::candidate_vector_p): Likewise.
            (remove_redundant_vector_load): Renamed to ...
            (pass_x86_cse::x86_cse): This.  Extend to remove redundant TLS
            calls.
            (make_pass_remove_redundant_vector_load): Renamed to ...
            (make_pass_x86_cse): This.
            * config/i386/i386-passes.def: Replace
            pass_remove_redundant_vector_load with pass_x86_cse.
            * config/i386/i386-protos.h (ix86_tls_get_addr): New.
            (make_pass_remove_redundant_vector_load): Renamed to ...
            (make_pass_x86_cse): This.
            * config/i386/i386.cc (ix86_tls_get_addr): Remove static.
            * config/i386/i386.h (machine_function): Add
            tls_descriptor_call_multiple_p.
            * config/i386/i386.md (tls64): New attribute.
            (@tls_global_dynamic_64_<mode>): Set
tls_descriptor_call_multiple_p.
            (@tls_local_dynamic_base_64_<mode>): Likewise.
            (@tls_dynamic_gnu2_64_<mode>): Likewise.
            (*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd.
            (*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to
ld_base.
            (*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea.
            (*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call.
            (*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to
            combine.

    gcc/testsuite/

            PR target/81501
            * g++.target/i386/pr81501-1.C: New test.
            * gcc.target/i386/pr81501-1a.c: Likewise.
            * gcc.target/i386/pr81501-1b.c: Likewise.
            * gcc.target/i386/pr81501-2a.c: Likewise.
            * gcc.target/i386/pr81501-2b.c: Likewise.
            * gcc.target/i386/pr81501-3.c: Likewise.
            * gcc.target/i386/pr81501-4a.c: Likewise.
            * gcc.target/i386/pr81501-4b.c: Likewise.
            * gcc.target/i386/pr81501-5.c: Likewise.
            * gcc.target/i386/pr81501-6a.c: Likewise.
            * gcc.target/i386/pr81501-6b.c: Likewise.
            * gcc.target/i386/pr81501-7.c: Likewise.
            * gcc.target/i386/pr81501-8a.c: Likewise.
            * gcc.target/i386/pr81501-8b.c: Likewise.
            * gcc.target/i386/pr81501-9a.c: Likewise.
            * gcc.target/i386/pr81501-9b.c: Likewise.
            * gcc.target/i386/pr81501-10a.c: Likewise.
            * gcc.target/i386/pr81501-10b.c: Likewise.

    Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
  • [Bug rtl-optimization/81501] mu... cvs-commit at gcc dot gnu.org via Gcc-bugs

Reply via email to