"socket descriptor leak"

bilouteQNX · July 26, 2006, 9:44am

I’m writing a network program (under QNX 6.3.0 SP2), which uses several threads. Each thread is a TCP server.

Here is my problem:

I create a local socket to be used for listening incoming connections. When this socket is created a file descriptor (IPC channel) is allocated to io-net server. When I close this socket the file descriptor is released to the system. Until now no problem.

But when I call accept() a file descriptor (IPC channel) is allocated to io-net server. Then accept() blocks because it is waiting for some client to connect. If I cancel the thread at this point the file descriptor get lost (not released to the system). Of cource I do close the listening socket.

As a consequence creating and cancelling threads lead to a “file descriptor leak” and after some time the application fails because there is no file descriptor available in the system.

My question is how to release those file descriptors?

Any idea would help.

Thanks in advance,

Armand

Tim · July 26, 2006, 1:57pm

Armand,

How are you canceling the thread (internally from the thread itself or from anothe thread/process)?

Regardless of how you canel the thread (short of using SIGKILL) take a look at the pthread_cleanup_push() family of functions. This allows you to push a specify a function (which you supply) to be called when a thread is being destroyed. Inside this function you can close your open sockets to stop the resource leak.

Tim

bilouteQNX · July 26, 2006, 3:20pm

This exactly what I do. I have a thread clean-up function (passed to pthread_cleanup_push()) which looks at all the open sockets and close them. The problem is that when you cancel the thread when accept() is waiting for some connections to arrive, the socket is already allocated by the system but you don’t know about it since accept() didn’t returned, which is normal since no client was connected.
If a client connects to the server, accept() returns the socket id and then it is no problem because the socket is closed by the thread clean-up function.

The question is why accept() is allocating a socket before a client tries to connect?

Armand

Tim · July 26, 2006, 5:33pm

Armand,

Accept does not allocate a socket before it returns. So the leak can’t be there.

Can you post your code here so we can take a look at it?

Normally the code works as follows:

Create a socket (creates 1 socket)
Accept (creates 1 NEW socket each time it returns)

Now each time accept returns, it creates a NEW socket. So if accept returns 5 times (as 5 clients connect) you’ll have a total of 6 sockets open.

So when you do your thread shutdown you need to close all 6 sockets, not just 1 (assuming your also shutting down the other threads that process the client connections).

Tim

bilouteQNX · July 27, 2006, 1:44pm

    //----- listen for connections (max 2 pending connections) -----
    if( 0 == listen( _sid_local, 2 ) )
    {
        TRACE("%s: listen on port %u was successful\n", _service_name.c_str(), ntohs(local_addr.sin_port) );

        while(1)
        {
            //----- init sockaddr_in with zero -----
            struct sockaddr_in  peer_addr;
            int                 peer_addr_len = sizeof(peer_addr);
            memset(&peer_addr, 0, sizeof(peer_addr));

UNTIL HERE ONLY ONE FILE DESCRIPTOR IS ALLOCATED TO IO-NET (rw IOFLAGS and /dev/socket/2 resource name are set)

            //----- accept connection -----
            TRACE("%s: ready to accept new connection on port %u\n", _service_name.c_str(), ntohs(local_addr.sin_port) );
            _sid_peer = accept(_sid_local, (struct sockaddr*)&peer_addr, &peer_addr_len );
WHEN ACCEPT IS BLOCKED WAITING FOR SOME CLIENT A NEW FILE DESCRIPTOR IS ALLOCATED (no IOFLAGS and resource name set)
IF A CLIENT CONNECT THEN THE FILE DESCRIPTOR IS RETURNED, IOFLAGS is then set to rw and resource name to /dev/socket/2

IF THE THREAD IS CANCELLED BEFORE accept RETURNS, BECAUSE NO CLIENT TRIED TO CONNECT, THE FILE DESCRIPTOR ALLOCATED BY ACCEPT IS LOST.

            //----- check for error -----
            if( -1 == _sid_peer )
            {
                TRACE("%s: accept() FAILED, %s\n", _service_name.c_str(), strerror(errno) );
                break;  // exit function
            }
            else
            {
                PROCESSING OF CLIENT GOES HERE
            }
        }
    }
    else{
        TRACE("%s: listen() FAILED, %s\n", _service_name.c_str(), strerror(errno) );
    }

Tim · July 27, 2006, 5:28pm

Armand,

Just tested with a small test program and I see the problem there too. This the the program I used to test with:

#include <stdio.h>
#include <iostream.h>
#include <errno.h>
#include <unistd.h>
#include <netdb.h>
#include <time.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <pthread.h>

using namespace std;

const int MAX_SOCKETS=20;

int socketList[MAX_SOCKETS];   // Incoming socket connections
int socketFd;                  // Our socket

bool cleaned;                  // Make sure cleanup took place

void cleanUp(void *args)
{
	cleaned = true;
	for (int i=0; i<MAX_SOCKETS; i++)
	{
		if (socketList[i] !=0)
		{
			close(socketList[i]);
		}
	}
	close (socketFd);
}

void *testThread(void *args)
{
	int incomingFd;
    struct sockaddr_in myAddr;
    struct sockaddr_in theirAddr;
    int sin_size;
    int yes=1;

	// Set the cleanup thread
	pthread_cleanup_push(&cleanUp, NULL);
	
    // Initialize socket list
	for (int i=0; i<MAX_SOCKETS; i++)
	{
		socketList[i] = 0;
	}
	
    // Open socket
    if ((socketFd = socket(AF_INET, SOCK_STREAM, 0)) == -1)
    {
        printf("Error %d occurred while creating socket\n", errno);
        exit(-1);
    }

    if (setsockopt(socketFd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
    {
        printf("Error %d occurred while setting socket options\n", errno);
        exit(-1);
    }
        
    myAddr.sin_family = AF_INET;         // Host byte order
    myAddr.sin_port = htons(10000);// Short, network byte order
    myAddr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP
    memset(&(myAddr.sin_zero), '\0', 8); // Zero the rest of the struct

    // Bind the socket to this port
    if (bind(socketFd, (struct sockaddr *)&myAddr, sizeof(struct sockaddr)) == -1)
    {
        printf("Error %d occurred while binding socket\n", errno);
        exit(-1);
    }

    // Listen for an incoming connection (max 2 pending)
    if (listen(socketFd, 2) == -1)
    {
        printf("Error %d occurred while listening on the socket connection\n", errno);
        exit(-1);
    }

    // Wait for an incoming connections
	while (1)
	{
		sin_size = sizeof(struct sockaddr_in);
		if ((incomingFd = accept(socketFd, (struct sockaddr *)&theirAddr, &sin_size)) == -1)
		{
			printf("Error %d occurred while accepting connection from remote socket\n", errno);
			exit(-1);
		}
		
		printf("Got connection from %s\n", inet_ntoa(theirAddr.sin_addr));
		
		// Save socket id and spawn a thread to handle it (not done for test)  
		for (int i=0; i<MAX_SOCKETS; i++)
		{
			if (socketList[i] == 0)
			{
				socketList[i] = incomingFd;
			}
		}
	}
	
	pthread_cleanup_pop(1);
}
					
int main(void) 
{
	int pid;
	
	// /Initialize cleanup flag
	cleaned = false;
	
    // Initialize socket list
	for (int i=0; i<MAX_SOCKETS; i++)
	{
		socketList[i] = 0;
	}
	
	// See how many open fd's there are
	system("sin fd > start");
	
	// Spawn the the thread
    pthread_create(&pid, NULL, &testThread, NULL);
	
	// Wait a second for the accept to be blocked
	sleep(1);
	
	// See how nay open fd's there are while accept is waiting
	system("sin fd > waiting");
	
	// Cancel the thread
	pthread_cancel(pid);
	
	// Wait a second for the cleanup to finish
	sleep(1);

	if (!cleaned)
	{
		printf ("No cleanup took place. Test is invalid\n");
	}
	
	// See how nay open fd's there after thread cleanup
	system("sin fd > finish");
}

Here’s the output of the 3 files:

start: (only stdin, stdout and stderr are open)
a.out 9744437 8K 208K 8K 516K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
0s 1

waiting: (now the socket is open #3 and the accept #4)
a.out 9744437 8K 208K 12K 648K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
3 73742 WR 0/0 I4T *.10000 .
4 73742
0s 1

finish: (socket was closed but NOT the accept)
a.out 9744437 8K 208K 8K 516K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
4 73742
0s 1

So there’s clearly a resource leak here. My guess is whoever implemented accept did the ‘dup’ call on the file descriptor BEFORE they waited for the incoming connection to be satisfied. If they had done it afterward then this wouldn’t be a problem. So if you have tech support, I’d advise you to contact them, send this test program to demonstrate the problem and wait for an upcoming future release (likely a LONG time).

Alternatively I can see 2 other solutions you can consider implementing:

Solution 1:
This is a hack. But in the cleanUp routine if I added in a ‘close(4)’ statement the code properly closed the fd allocated by select. What I’d do is add in a global variable that is a high water count of the LARGEST fd you ever get returned to you from a socket/open file call including the return socket id from accept. Then in the cleanup code you can simply loop from 3 (don’t close stdin, stdout, stderr) to that number+1 (+1 for the missing accept) and do a close() on every number. A hack to be sure but it will close everything.

Solution 2:
This is more of a design issue. To get the behavior you are seeing I had to spawn off testThread as a seperate thread where the accept resides and then destroy that thread. Is there any reason you can’t move that accept code into the ‘main’ thread that never gets destroyed and hence never leaks the descriptor. I guess my question is, why do you need to destroy the thread that is listening for incoming socket connections? Usually that thread is very long lived (because it’s a resource manager) and only the child threads that get spawned off to handle each remote connection get destroyed as the remote side disconnects.

Tim

bilouteQNX · August 4, 2006, 7:50am

Hi Tim,

I found a work around: I created a function called waitForClientToConnect(), and call it before accept.
In this function I use select() to wait for client connection.

QNX doc says:

This way I only call accept when a client is ready to connect

Regards,

Armand

Tim:

Armand,

Just tested with a small test program and I see the problem there too. This the the program I used to test with:
#include <stdio.h>
#include <iostream.h>
#include <errno.h>
#include <unistd.h>
#include <netdb.h>
#include <time.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <pthread.h>

using namespace std;

const int MAX_SOCKETS=20;

int socketList[MAX_SOCKETS];   // Incoming socket connections
int socketFd;                  // Our socket

bool cleaned;                  // Make sure cleanup took place

void cleanUp(void *args)
{
	cleaned = true;
	for (int i=0; i<MAX_SOCKETS; i++)
	{
		if (socketList[i] !=0)
		{
			close(socketList[i]);
		}
	}
	close (socketFd);
}

void *testThread(void *args)
{
	int incomingFd;
    struct sockaddr_in myAddr;
    struct sockaddr_in theirAddr;
    int sin_size;
    int yes=1;

	// Set the cleanup thread
	pthread_cleanup_push(&cleanUp, NULL);
	
    // Initialize socket list
	for (int i=0; i<MAX_SOCKETS; i++)
	{
		socketList[i] = 0;
	}
	
    // Open socket
    if ((socketFd = socket(AF_INET, SOCK_STREAM, 0)) == -1)
    {
        printf("Error %d occurred while creating socket\n", errno);
        exit(-1);
    }

    if (setsockopt(socketFd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
    {
        printf("Error %d occurred while setting socket options\n", errno);
        exit(-1);
    }
        
    myAddr.sin_family = AF_INET;         // Host byte order
    myAddr.sin_port = htons(10000);// Short, network byte order
    myAddr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP
    memset(&(myAddr.sin_zero), '\0', 8); // Zero the rest of the struct

    // Bind the socket to this port
    if (bind(socketFd, (struct sockaddr *)&myAddr, sizeof(struct sockaddr)) == -1)
    {
        printf("Error %d occurred while binding socket\n", errno);
        exit(-1);
    }

    // Listen for an incoming connection (max 2 pending)
    if (listen(socketFd, 2) == -1)
    {
        printf("Error %d occurred while listening on the socket connection\n", errno);
        exit(-1);
    }

    // Wait for an incoming connections
	while (1)
	{
		sin_size = sizeof(struct sockaddr_in);
		if ((incomingFd = accept(socketFd, (struct sockaddr *)&theirAddr, &sin_size)) == -1)
		{
			printf("Error %d occurred while accepting connection from remote socket\n", errno);
			exit(-1);
		}
		
		printf("Got connection from %s\n", inet_ntoa(theirAddr.sin_addr));
		
		// Save socket id and spawn a thread to handle it (not done for test)  
		for (int i=0; i<MAX_SOCKETS; i++)
		{
			if (socketList[i] == 0)
			{
				socketList[i] = incomingFd;
			}
		}
	}
	
	pthread_cleanup_pop(1);
}
					
int main(void) 
{
	int pid;
	
	// /Initialize cleanup flag
	cleaned = false;
	
    // Initialize socket list
	for (int i=0; i<MAX_SOCKETS; i++)
	{
		socketList[i] = 0;
	}
	
	// See how many open fd's there are
	system("sin fd > start");
	
	// Spawn the the thread
    pthread_create(&pid, NULL, &testThread, NULL);
	
	// Wait a second for the accept to be blocked
	sleep(1);
	
	// See how nay open fd's there are while accept is waiting
	system("sin fd > waiting");
	
	// Cancel the thread
	pthread_cancel(pid);
	
	// Wait a second for the cleanup to finish
	sleep(1);

	if (!cleaned)
	{
		printf ("No cleanup took place. Test is invalid\n");
	}
	
	// See how nay open fd's there after thread cleanup
	system("sin fd > finish");
}
Here’s the output of the 3 files:

start: (only stdin, stdout and stderr are open)
a.out 9744437 8K 208K 8K 516K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
0s 1

waiting: (now the socket is open #3 and the accept #4)
a.out 9744437 8K 208K 12K 648K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
3 73742 WR 0/0 I4T *.10000 .
4 73742
0s 1

finish: (socket was closed but NOT the accept)
a.out 9744437 8K 208K 8K 516K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
4 73742
0s 1

So there’s clearly a resource leak here. My guess is whoever implemented accept did the ‘dup’ call on the file descriptor BEFORE they waited for the incoming connection to be satisfied. If they had done it afterward then this wouldn’t be a problem. So if you have tech support, I’d advise you to contact them, send this test program to demonstrate the problem and wait for an upcoming future release (likely a LONG time).

Alternatively I can see 2 other solutions you can consider implementing:

Solution 1:
This is a hack. But in the cleanUp routine if I added in a ‘close(4)’ statement the code properly closed the fd allocated by select. What I’d do is add in a global variable that is a high water count of the LARGEST fd you ever get returned to you from a socket/open file call including the return socket id from accept. Then in the cleanup code you can simply loop from 3 (don’t close stdin, stdout, stderr) to that number+1 (+1 for the missing accept) and do a close() on every number. A hack to be sure but it will close everything.

Solution 2:
This is more of a design issue. To get the behavior you are seeing I had to spawn off testThread as a seperate thread where the accept resides and then destroy that thread. Is there any reason you can’t move that accept code into the ‘main’ thread that never gets destroyed and hence never leaks the descriptor. I guess my question is, why do you need to destroy the thread that is listening for incoming socket connections? Usually that thread is very long lived (because it’s a resource manager) and only the child threads that get spawned off to handle each remote connection get destroyed as the remote side disconnects.

Tim