Hello,

I'm using the following version of GCC:
artpol@artpol-ThinkPad-T430 ~ $ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I'm facing the problem with the slurm compilation:
https://github.com/SchedMD/slurm
I'm building the master branch. And the problem I get is related to the 
following construction:
https://github.com/SchedMD/slurm/blob/master/src/slurmd/slurmd/req.c#L6171:L6172

The problem I see is that "connect" returns "+1" instead of "-1". And I can't 
reproduce this with the standalone code:
https://github.com/artpol84/poc/blob/master/linux-internals/syscalls/connect/cli_c1.c#L37

Here is what GDB shows for those 2 codes:

SLURM's req.c:
0x43e966 <_rpc_forward_data+326>        lea    0x2(%rax),%edx
0x43e969 <_rpc_forward_data+329>        lea    -0x80(%rbp),%rcx
0x43e96d <_rpc_forward_data+333>        mov    -0xc4(%rbp),%eax
0x43e973 <_rpc_forward_data+339>        mov    %rcx,%rsi
0x43e976 <_rpc_forward_data+342>        mov    %eax,%edi

0x43e978 <_rpc_forward_data+344>        callq  0x4287e0 <connect@plt>
# eax == -1

0x43e97d <_rpc_forward_data+349>        shr    $0x1f,%eax
# eax == 1

0x43e980 <_rpc_forward_data+352>        movzbl %al,%eax
# eax == 1

0x43e983 <_rpc_forward_data+355>        mov    %eax,-0xc0(%rbp)
# (gdb) p $rbp
# $17 = (void *) 0x7fcb8e15ee60
# (gdb) p &rc
# $15 = (int *) 0x7fcb8e15eda0
# $rbp - 0xC0 == &rc
        
0x43e989 <_rpc_forward_data+361>        cmpl   $0x0,-0xc0(%rbp)
# if comparison


Standalone program:

0x400863 <usock_connect+198>    mov    %rcx,%rsi
0x400866 <usock_connect+201>    mov    %eax,%edi
0x400868 <usock_connect+203>    callq  0x400690 <connect@plt>
# eax == -1

0x40086d <usock_connect+208>    mov    %eax,-0x94(%rbp)
# (gdb) p &rc
# $1 = (int *) 0x7fffffffdeac
# (gdb) p $rbp - 0x94
# $2 = (void *) 0x7fffffffdeac
# => we just move ( $eax == -1 ) to the rc.

0x400873 <usock_connect+214>    cmpl   $0x0,-0x94(%rbp)
0x40087a <usock_connect+221>    jns    0x400888 <usock_connect+235>  

So for some reasons the code issued by compiler for equal codes is different. 
The compilation flags used was:

req.c:
gcc -DHAVE_CONFIG_H -I. -I../../.. -I../../../slurm  
-DLIBSLURM_SO=\"/lxc-data/sandbox/slurm/lib/libslurm.so\" -I../../..    -g -O2 
-pthread -Wall -g -O0 -fno-strict-aliasing -MT req.o -MD -MP -MF .deps/req.Tpo 
-c -o req.o req.c

standalone (was trying to use the same flags):
gcc -g -O2 -pthread -Wall -g -O0 -fno-strict-aliasing -c -o cli_c1.o cli_c1.c

FIX that changes the situation in SLURM: if I apply attached patch to the SLURM 
sources that shightly changes C construction:
-   while ((rc = connect(fd, (struct sockaddr *)&sa, SUN_LEN(&sa)) < 0) &&
-          (errno == EINTR));
+
+   do {
+       rc = connect(fd, (struct sockaddr *)&sa, SUN_LEN(&sa));
+   } while( (rc < 0) && (errno == EINTR) );

I'm getting correct code:
0x43e971 <_rpc_forward_data+337>        mov    %eax,%edi
0x43e973 <_rpc_forward_data+339>        callq  0x4287e0 <connect@plt>
0x43e978 <_rpc_forward_data+344>        mov    %eax,-0xc0(%rbp)
0x43e97e <_rpc_forward_data+350>        cmpl   $0x0,-0xc0(%rbp)



     
 ----
  
Best regards, Artem  Y. Polyakov 
HPC Engineer, Mellanox Ltd. 
​
     

Reply via email to