Corinna Vinschen wrote :

So I hope you wouldn't mind I attached a short testing program you can easily compil with gcc to reproduce the bug.



Cool, that's exactly what I was asking for. I was immediately able to reproduce the problem and it turned out, that on fork() the socket duplication from parent to child process for some reason occupied space in the child, which in the parent is occupied by the shared memory returned by shmat.

Consequentially the duplication of the shared memory couldn't occupy the
same address as in the parent.  That's a fatal error so the forked child
terminated itself with error 487, which basically means "Invalid address".

I've changed fork() so that the shared memory is duplicated before sockets
are duplicated, which is ok because sockets don't have special requirements
for memory addresses.  That works fine for me, but it would be good if you
could test the next snapshot, which I just uploaded, nevertheless.

It's just incredible that nobody found this problem before.



Yes, I find this incredible as any unix server which use IPC (instead of threads for exemple), will wants to support multiple connections at a time so use this mechanisms.
I doubt that we're the only ones to use shared memory, socket and multi-process !!


Anyway, BIG THANKS to have resolved the problem so quickly.
I recompiled from the cygwin cvs, and it solved my problem, my master now runs well.


However, there is still a problem, sorry ;)

This time with semaphores (either part of IPC). It's less important for me as the master can runs without them, but it's better to have them.
So i updated the test case to see what happens.


I added semaphore lock/release function that I call in the child process, so each child want to lock before accepting connection and released when connection is finished.

For one child, it is ok, but starting second child, the semaphore lock operation (semop() with sem_flg=SEM_UNDO and sem_op=-1) makes cygserver hangs !
Then I get "lost connection to cygserver" errors from my process, plus some "error getting signal_arrived to server(6)" from cygserver process.


So, instead of waiting for semaphore release (semval to go back from 0 to 1), semop returns even if the semaphore is locked, then the program continues like the semaphore was unlocked, but it is still locked.

moreover, sem value is decremented at each semaphore_lock call, so it get -1 value at third call, where we want it to have either 0 for locked and 1 for unlocked. Then it stops here as cygserver is hanged, no more news from next childs (I set 10 child in the exemple).

under osx for exemple, you see the first child locking the semaphore, then all childs wait for the semaphore to be released (semop wait for releasing), and semaphore value is 1 then 0.

I hope this will help,
thank you again for your fix.

Vincent

PS: the same conditions as previous ones apply to this test (windows version, cygwin dll contains your update on fix_shm_after_fork).

------------------------
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/sem.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <sys/errno.h>

#define USE_IPC
#define USE_SEM
//define BIND_AFTER_FORK 

#define BUFFERLEN 256

struct  database
{
        int             shmid;
        int     semid;
        int     test1;
        int     test2;
}
*wdb;

int                     get_shared_memory(char *path_key)
{
        key_t   key;
        int             shmid;
        int             shmflg;
        char    file[BUFFERLEN];

  snprintf(file, BUFFERLEN-1, "%s.exe", path_key);
        if ((key = ftok(file, 'Z')) == -1)
        {
                perror("Getting key for shared memory");
                exit(1);
        }
        shmflg = IPC_CREAT|0600;
        if ((shmid = shmget(key, sizeof(struct database), shmflg)) == -1)
        {
                perror ("Getting shared memory");
                exit(1);
        }
        fprintf(stderr,"shmid: %i\n", shmid);
        return (shmid);
}

int                                     get_semaphores(char *path_key)
{
        key_t                   key;
        int                             semid;
        struct sembuf   op;
        int                             semflg;
        char                    file[BUFFERLEN];

  snprintf(file, BUFFERLEN-1, "%s.exe", path_key);
        if ((key = ftok(file, 'Z')) == -1)
        {
                perror("Getting key for semaphores");
                exit(1);
        }
        semflg = IPC_CREAT|0600;
        if ((semid = semget(key, 1, semflg)) == -1)
        {
                perror("Getting semaphores");
                exit(1);
        }
        if (semctl(semid, 0, SETVAL, 1) == -1)
        {
                perror("semctl SETVAL -> 1");
                exit(1);
        }
        if (semctl(semid, 0, GETVAL) == 0)
        {
                op.sem_num = 0;
                op.sem_op = 1;
                op.sem_flg = 0;
                if (semop(semid, &op, 1) == -1)
                {
                        perror("semaphore_release");
                        exit(1);
                }
        }
        fprintf(stderr,"semval: %i semid: %i\n", semctl (semid, 0, GETVAL), 
semid);
        return (semid);
}

void            *attach_shared_memory(int shmid)
{
        void    *rv; // return value

        if ((rv = shmat(shmid, 0, 0)) == (void *) -1)
        {
                perror("shmat");
                return ((void *) -1);
        }

        return (rv);
}

int             detach_shared_memory(void *shmaddr)
{
        int     rv; // return value

        if ((rv = shmdt(shmaddr)) == -1)
        {
                perror("shmdt");
                return (-1);
        }

        return (rv);
}

void                                    set_signal_handlers (void)
{
        struct sigaction        ignore;

        ignore.sa_handler = SIG_IGN;
        sigemptyset(&ignore.sa_mask);
        ignore.sa_flags = 0;
        sigaction(SIGHUP, &ignore, NULL); // So we keep running as a daemon
}

int                                             get_socket(short port)
{
        int                                     sfd; //socket file descriptor
        struct sockaddr_in      addr;
        int                                     opt;

        opt = 1;
        sfd = socket(PF_INET, SOCK_STREAM, 0);
        if (sfd == -1)
        {
                perror("socket");
                exit(1);
        }
        else
        {
                if (setsockopt(sfd, SOL_SOCKET, SO_REUSEADDR, (int *) &opt, 
sizeof(opt)) == -1)
                        perror ("setsockopt");
                addr.sin_family = AF_INET;
                addr.sin_port = htons(port);
                addr.sin_addr.s_addr = htonl(INADDR_ANY);
                if (bind(sfd, (struct sockaddr *) &addr, sizeof (addr)) == -1)
                {
                        perror("bind");
                        sfd = -1;
                } else {
                        listen (sfd, 5);
                }
        }
        return (sfd);
}

int             accept_socket   (int sfd, struct sockaddr_in *addr)
{
  int   fd;
  int   len = sizeof(struct sockaddr_in);

        if ((fd = accept(sfd, (struct sockaddr *) addr, &len)) == -1)
  {
    perror("Accepting connection\n");
    exit(1);
  }
  return (fd);
}

void                    semaphore_lock(int semid)
{
  struct sembuf op;

  op.sem_num = 0;
  op.sem_op = -1;
  op.sem_flg = SEM_UNDO;

  fprintf(stderr,"Locking... semval: %i semid: %i\n",semctl 
(semid,0,GETVAL),semid);
  if (semop(semid, &op, 1) == -1)
  {
        perror("semaphore_lock");
        printf("%i\n",errno);
        exit(0);
  }
  fprintf(stderr,"Locked !!! semval: %i semid: %i\n",semctl 
(semid,0,GETVAL),semid);
}

void                    semaphore_release(int semid)
{
  struct sembuf op;

  fprintf(stderr,"Unlocking... semval: %i semid: %i\n",semctl 
(semid,0,GETVAL),semid);
  op.sem_num = 0;
  op.sem_op = 1;
  op.sem_flg = SEM_UNDO;
  if (semop(semid, &op, 1) == -1)
  {
    perror ("semaphore_release");
        printf("%i\n",errno);
        exit(0);
  }
  fprintf(stderr,"Unlocked !!! semval: %i semid: %i\n",semctl 
(semid,0,GETVAL),semid);
}

int                                             main(int argc, char *argv[])
{
        int                                     sfd; // socket file descriptor
        int                                     csfd; // child sfd, the socket 
once accepted
        int                                     shmid; // shared memory id
        int                                     semid; // semaphore id
        struct sockaddr_in      addr; // Address of the remote host
        pid_t                           child;
        pid_t                           child_wait;
        int                                     n_children;
        int                                     rc; // Return code
        int                                     i; // For loops

        n_children = 0;
        set_signal_handlers();
        
#ifdef USE_IPC
        shmid = get_shared_memory(argv[0]);
        semid = get_semaphores(argv[0]);
        if ((wdb = attach_shared_memory(shmid)) == (void *) -1)
                exit (1);
        wdb->shmid = shmid;
        wdb->semid = semid;
#endif

#ifndef BIND_AFTER_FORK
        if ((sfd = get_socket(1234)) == -1)
                exit(0);
#endif

        printf ("Waiting for connections...\n");
        while (1)
        {
                if (n_children < 10)
                {
                        if ((child = fork()) == 0)
                        {
#ifdef BIND_AFTER_FORK
                                if ((sfd = get_socket(1234)) == -1)
                                        exit(0);
#endif
#ifdef USE_SEM
                                semaphore_lock(wdb->semid);
#endif
                                if ((csfd = accept_socket(sfd, &addr)) != -1)
                                {
                                        close(sfd);
                                        // handle connection here
                                        close(csfd);
                                }
                                else
                                        perror("Accepting connection\n");
#ifdef USE_SEM
                                semaphore_release(wdb->semid);
#endif
                                exit(0);
                        }
                        else if (child != -1)
                                n_children++;
                        else
                                perror("Forking\n");
                }
                else
                {
                        if ((child_wait = wait (&rc)) != -1)
                                n_children--;
                }
        }
        exit(0);
}

shmid: 65536
semval: 1 semid: 65536
Waiting for connections...
Locking... semval: 1 semid: 65536
Locked !!! semval: 0 semid: 65536
Locking... semval: 0 semid: 65536
     13 [main] a 2468 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
Locked !!! semval: -1 semid: 65536
     10 [main] a 4120 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      7 [main] a 1092 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4616 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      8 [main] a 4844 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
     11 [main] a 4024 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
     15 [main] a 4596 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      8 [main] a 4368 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4448 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 3800 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 2212 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5192 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 588 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5876 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4940 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      7 [main] a 2304 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      4 [main] a 6080 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 1488 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4076 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
     10 [main] a 2980 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4152 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      6 [main] a 1836 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      6 [main] a 3660 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      7 [main] a 5408 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4720 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
     10 [main] a 460 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5444 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 1752 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      4 [main] a 1944 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      8 [main] a 5796 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 2928 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5068 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 1096 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4156 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 3720 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5992 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      9 [main] a 5052 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 3424 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 364 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4360 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4440 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5548 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 3832 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 2756 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 5148 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      9 [main] a 3880 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      5 [main] a 4356 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2
      8 [main] a 5836 transport_layer_pipes::connect: lost connection to 
cygserver, error = 2

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Reply via email to