https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522

Brendan Dolan-Gavitt <brendandg at nyu dot edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |brendandg at nyu dot edu

--- Comment #19 from Brendan Dolan-Gavitt <brendandg at nyu dot edu> ---
I read through the crtfastmath.c implementations for the other affected targets
and confirmed that they do all set flush-to-zero in this thread:

https://threadreaderapp.com/thread/1567612053363347461.html

I agree that there should be a way for a shared library to link crtfastmath.o
if it wants that behavior. But is there a reason -l:crtfastmath.o isn't
sufficient in that case? Why does it need to be enabled automatically when
-Ofast/-ffast-math/-funsafe-math optimizations are turned on?

The other note I would add is that in multi-threaded applications,
crtfastmath.o is already not behaving as intended: FTZ/DAZ will only be set in
the CPU state of the thread that loaded the shared library; it's hard to
imagine a case where a user wants individual threads to have different FTZ/DAZ
(unless they explicitly manage that by hand). Example:

$ cat baz.c
#include <stdio.h>
#include <unistd.h>
#include <dlfcn.h>
#include <pthread.h>

void loadlib() {
    void *handle = dlopen("./gofast.so", RTLD_LAZY);
    if (!handle) {
        fprintf(stderr, "dlopen: %s\n", dlerror());
    }
}

#define MXCSR_DAZ (1 << 6)  /* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15) /* Enable flush to zero mode */
void printftz(int i) {
    unsigned int mxcsr = __builtin_ia32_stmxcsr ();
    printf("[%d] mxcsr.FTZ = %d, mxcsr.DAZ = %d\n", i, !!(mxcsr & MXCSR_FTZ),
!!(mxcsr & MXCSR_DAZ));
    return;
}

void *thread(void *arg) {
    // Print thread id
    int i = *(int *)arg;
    if (i == 0) loadlib();
    sleep(1);
    printftz(i);
}

int main(int argc, char **argv) {
    // Create 4 threads
    pthread_t threads[4];
    int tids[4];
    for (int i = 0; i < 4; i++) {
        tids[i] = i;
        pthread_create(&threads[i], NULL, thread, &tids[i]);
    }
    // Wait for all threads to finish
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);
    }
    return 0;
}

$ touch gofast.c
$ gcc -Ofast -fpic -shared gofast.c -o gofast.so
$ gcc -pthread baz.c -o baz -ldl

$ ./baz
[3] mxcsr.FTZ = 0, mxcsr.DAZ = 0
[0] mxcsr.FTZ = 1, mxcsr.DAZ = 1
[2] mxcsr.FTZ = 0, mxcsr.DAZ = 0
[1] mxcsr.FTZ = 0, mxcsr.DAZ = 0

Reply via email to