Armand,
Just tested with a small test program and I see the problem there too. This the the program I used to test with:
#include <stdio.h>
#include <iostream.h>
#include <errno.h>
#include <unistd.h>
#include <netdb.h>
#include <time.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <pthread.h>
using namespace std;
const int MAX_SOCKETS=20;
int socketList[MAX_SOCKETS]; // Incoming socket connections
int socketFd; // Our socket
bool cleaned; // Make sure cleanup took place
void cleanUp(void *args)
{
cleaned = true;
for (int i=0; i<MAX_SOCKETS; i++)
{
if (socketList[i] !=0)
{
close(socketList[i]);
}
}
close (socketFd);
}
void *testThread(void *args)
{
int incomingFd;
struct sockaddr_in myAddr;
struct sockaddr_in theirAddr;
int sin_size;
int yes=1;
// Set the cleanup thread
pthread_cleanup_push(&cleanUp, NULL);
// Initialize socket list
for (int i=0; i<MAX_SOCKETS; i++)
{
socketList[i] = 0;
}
// Open socket
if ((socketFd = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
printf("Error %d occurred while creating socket\n", errno);
exit(-1);
}
if (setsockopt(socketFd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1)
{
printf("Error %d occurred while setting socket options\n", errno);
exit(-1);
}
myAddr.sin_family = AF_INET; // Host byte order
myAddr.sin_port = htons(10000);// Short, network byte order
myAddr.sin_addr.s_addr = INADDR_ANY; // Automatically fill with my IP
memset(&(myAddr.sin_zero), '\0', 8); // Zero the rest of the struct
// Bind the socket to this port
if (bind(socketFd, (struct sockaddr *)&myAddr, sizeof(struct sockaddr)) == -1)
{
printf("Error %d occurred while binding socket\n", errno);
exit(-1);
}
// Listen for an incoming connection (max 2 pending)
if (listen(socketFd, 2) == -1)
{
printf("Error %d occurred while listening on the socket connection\n", errno);
exit(-1);
}
// Wait for an incoming connections
while (1)
{
sin_size = sizeof(struct sockaddr_in);
if ((incomingFd = accept(socketFd, (struct sockaddr *)&theirAddr, &sin_size)) == -1)
{
printf("Error %d occurred while accepting connection from remote socket\n", errno);
exit(-1);
}
printf("Got connection from %s\n", inet_ntoa(theirAddr.sin_addr));
// Save socket id and spawn a thread to handle it (not done for test)
for (int i=0; i<MAX_SOCKETS; i++)
{
if (socketList[i] == 0)
{
socketList[i] = incomingFd;
}
}
}
pthread_cleanup_pop(1);
}
int main(void)
{
int pid;
// /Initialize cleanup flag
cleaned = false;
// Initialize socket list
for (int i=0; i<MAX_SOCKETS; i++)
{
socketList[i] = 0;
}
// See how many open fd's there are
system("sin fd > start");
// Spawn the the thread
pthread_create(&pid, NULL, &testThread, NULL);
// Wait a second for the accept to be blocked
sleep(1);
// See how nay open fd's there are while accept is waiting
system("sin fd > waiting");
// Cancel the thread
pthread_cancel(pid);
// Wait a second for the cleanup to finish
sleep(1);
if (!cleaned)
{
printf ("No cleanup took place. Test is invalid\n");
}
// See how nay open fd's there after thread cleanup
system("sin fd > finish");
}
Here’s the output of the 3 files:
start: (only stdin, stdout and stderr are open)
a.out 9744437 8K 208K 8K 516K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
0s 1
waiting: (now the socket is open #3 and the accept #4)
a.out 9744437 8K 208K 12K 648K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
3 73742 WR 0/0 I4T *.10000 .
4 73742
0s 1
finish: (socket was closed but NOT the accept)
a.out 9744437 8K 208K 8K 516K 1
0 102415 WR 0/0 /dev/ttyp5
1 102415 WR 0/0 /dev/ttyp5
2 102415 WR 0/0 /dev/ttyp5
4 73742
0s 1
So there’s clearly a resource leak here. My guess is whoever implemented accept did the ‘dup’ call on the file descriptor BEFORE they waited for the incoming connection to be satisfied. If they had done it afterward then this wouldn’t be a problem. So if you have tech support, I’d advise you to contact them, send this test program to demonstrate the problem and wait for an upcoming future release (likely a LONG time).
Alternatively I can see 2 other solutions you can consider implementing:
Solution 1:
This is a hack. But in the cleanUp routine if I added in a ‘close(4)’ statement the code properly closed the fd allocated by select. What I’d do is add in a global variable that is a high water count of the LARGEST fd you ever get returned to you from a socket/open file call including the return socket id from accept. Then in the cleanup code you can simply loop from 3 (don’t close stdin, stdout, stderr) to that number+1 (+1 for the missing accept) and do a close() on every number. A hack to be sure but it will close everything.
Solution 2:
This is more of a design issue. To get the behavior you are seeing I had to spawn off testThread as a seperate thread where the accept resides and then destroy that thread. Is there any reason you can’t move that accept code into the ‘main’ thread that never gets destroyed and hence never leaks the descriptor. I guess my question is, why do you need to destroy the thread that is listening for incoming socket connections? Usually that thread is very long lived (because it’s a resource manager) and only the child threads that get spawned off to handle each remote connection get destroyed as the remote side disconnects.
Tim