io-net dumped core on me last night

Hi all,

My QNX6 box went off the 'net last night. When I got in this morning and
went to the console itself, I found that io-net had dumped core (a core
file I still have, BTW). Is there, perhaps, a bug in io-net about which
someone at QNX would like to learn more?

Alternatively, can someone suggest how I might figure out why io-net
died and prevent it from happening again?

Thanks in advance,
Eric

Which driver are you running?

-seanb

Eric Berdahl <berdahl@intelligentparadigm.com> wrote:
: Hi all,

: My QNX6 box went off the 'net last night. When I got in this morning and
: went to the console itself, I found that io-net had dumped core (a core
: file I still have, BTW). Is there, perhaps, a bug in io-net about which
: someone at QNX would like to learn more?

: Alternatively, can someone suggest how I might figure out why io-net
: died and prevent it from happening again?

: Thanks in advance,
: Eric

From the machine now that it is restarted and running happily:

$ pidin ar | grep io-net
81936 io-net -pttcpip -ppppmgr
$ pidin -p 81936 mem
pid tid name prio STATE code data
stack
81936 1 sbin/io-net 10o SIGWAITINFO 36K 368K
8192(516K)*
81936 2 sbin/io-net 10o RECEIVE 36K 368K
4096(12K)
81936 3 sbin/io-net 10o RECEIVE 36K 368K
8192(12K)
81936 4 sbin/io-net 21o RECEIVE 36K 368K
4096(132K)
81936 5 sbin/io-net 10o RECEIVE 36K 368K
4096(12K)
81936 6 sbin/io-net 17f CONDVAR 36K 368K
4096(132K)
81936 7 sbin/io-net 21r RECEIVE 36K 368K
4096(132K)
81936 8 sbin/io-net 10o RECEIVE 36K 368K
4096(12K)
81936 9 sbin/io-net 19f CONDVAR 36K 368K
4096(132K)
81936 12 sbin/io-net 18f CONDVAR 36K 368K
4096(132K)
ldqnx.so.1 @b0300000 300K 12K
npm-ttcpip.so @b034e000 76K 4096
devn-rtl.so @b0362000 44K 4096
npm-pppmgr.so @b036e000 20K 8192

So, I’m guessing the answer to your question is ‘devn-rtl’. If not, you
have more information available. Also, I still have the core dump from
last night’s crash if there are details in it you might find interesting.

Does this help you understand more of what’s going on?

Eric



In article <95v0s5$scl$1@nntp.qnx.com>, Sean Boudreau <seanb@qnx.com>
wrote:

Which driver are you running?

-seanb

Eric Berdahl <> berdahl@intelligentparadigm.com> > wrote:
: Hi all,

: My QNX6 box went off the 'net last night. When I got in this morning
: and
: went to the console itself, I found that io-net had dumped core (a core
: file I still have, BTW). Is there, perhaps, a bug in io-net about which
: someone at QNX would like to learn more?

: Alternatively, can someone suggest how I might figure out why io-net
: died and prevent it from happening again?

: Thanks in advance,
: Eric

This probably a bug in the rtl driver which has been fixed internally.
You can get an interim copy of the driver from http://www.qnx.com/~cdm
until the next patch.

-seanb


Eric Berdahl <berdahl@intelligentparadigm.com> wrote:
: From the machine now that it is restarted and running happily:

: So, I’m guessing the answer to your question is ‘devn-rtl’. If not, you
: have more information available. Also, I still have the core dump from
: last night’s crash if there are details in it you might find interesting.

: Does this help you understand more of what’s going on?

: Eric

In article <960us5$3pu$1@nntp.qnx.com>, Sean Boudreau <seanb@qnx.com>
wrote:

This probably a bug in the rtl driver which has been fixed internally.
You can get an interim copy of the driver from > http://www.qnx.com/~cdm
until the next patch.

I picked up this patch when I was running the 26 September release. I am
now running with the latest patch to QNX6 (from ?? January). Your web
page indicates the that network driver is for use with the 26 September
release. Did your fix not get into the January patch? Should I still use
the driver on your web site?

Confused…

This fix has not been released yet.

-seanb

Eric Berdahl <berdahl@intelligentparadigm.com> wrote:
: In article <960us5$3pu$1@nntp.qnx.com>, Sean Boudreau <seanb@qnx.com>
: wrote:

:> This probably a bug in the rtl driver which has been fixed internally.
:> You can get an interim copy of the driver from http://www.qnx.com/~cdm
:> until the next patch.

: I picked up this patch when I was running the 26 September release. I am
: now running with the latest patch to QNX6 (from ?? January). Your web
: page indicates the that network driver is for use with the 26 September
: release. Did your fix not get into the January patch? Should I still use
: the driver on your web site?

: Confused…

I’m having the same problem, io-net will core dump for no apparent reason
and there is no fix duration before the next core dump will appear.
Although I have a “network checker” script that runs every 5 minutes to
restart io-net if it’s no longer running, there are also situations where
io-net cannot be started because the rtl driver could not initialise.

Another problem that I’m facing is that the io-net memory usage will
increase over time, approx 200K per day (using “pidin mem | grep io-net” to
check). And sometime, my entire system will hang (not sure whether this
problem is related to io-net ?). The only reason I can think of for the
memory increment is the following codes that is run every 5 minutes to
update my backend database that this device is ‘alive’

This API can be called as followed :-
SocketOpen(“10.22.22.22”, “updateDB.cgi?machine=01&state=001”);


#define PORT 80
int SocketOpen(char *IP, char *URL){
int sockfd;
int counter;
int conn, n;
char buffer[1024];
struct sockaddr_in serv_addr;
struct timeval tv;
fd_set rfd;

bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr=inet_addr(IP);
serv_addr.sin_port=htons(PORT);
if((sockfd=socket(AF_INET, SOCK_STREAM, 0))<0){
printf(“Error\n”);
return -1;
}

// Set this socket to NON BLOCK mode
fcntl(sockfd, F_SETFL, O_NONBLOCK);

// Give 5 sec opening connection to remote server
if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
{
printf(“Connect Error[%s]\n”,strerror(errno));
}

FD_ZERO( &rfd );
FD_SET( sockfd, &rfd );

tv.tv_sec = 5;
tv.tv_usec = 0;

switch ( n = select( 1+sockfd, 0, &rfd, 0, &tv ) ) {
case -1:
perror( “select” );
close(sockfd);
return -1;
case 0:
printf(“Connect timed out\n” );
close(sockfd);
return -1;
break;
default:
printf( “%d descriptors ready …\n”, n );
if( FD_ISSET( sockfd, &rfd ) )
printf( “Socket ready to write\n”);
}

sprintf(buffer, “get %s\r\n\r\n”, URL);
counter=write(sockfd,buffer, strlen(buffer));
if (counter<0)
{
printf(“Write error\n”);
close(sockfd);
return -1;
}

// Give up to 5 sec for reading from remote server
counter = 0;
while ((conn=read(sockfd, buffer, 1024)) < 0 && counter < 5)
{
printf(“Read Error[%s]\n”,strerror(errno));
sleep(1);
counter++;
}
if (conn < 0 && counter > 5)
{
close(sockfd);
return -1;
}

counter=atoi(buffer);
close(sockfd);
return counter;
}//end SocketOpeN()



As we need this system to be up 24 hours, I would appreciate that more
information is giving on the fix of the rtl driver and should we be using
this fix ? I’ve now the following 3 version of devn-rtl.so

54405 16 Jun 2000 - early version
46636 13 Oct 2000 - version from the 18 Jan patch
54807 18 Dec 2000 - the unofficial one

A bug like this in the driver is causing instability to our system. Other
suggestions / recommendations will be appreciated.

Thanks,
Eugene


Sean Boudreau <seanb@qnx.com> wrote in message
news:961nfs$h2f$1@nntp.qnx.com

This fix has not been released yet.

-seanb

Eric Berdahl <> berdahl@intelligentparadigm.com> > wrote:
: In article <960us5$3pu$> 1@nntp.qnx.com> >, Sean Boudreau <> seanb@qnx.com
: wrote:

:> This probably a bug in the rtl driver which has been fixed internally.
:> You can get an interim copy of the driver from > http://www.qnx.com/~cdm
:> until the next patch.

: I picked up this patch when I was running the 26 September release. I am
: now running with the latest patch to QNX6 (from ?? January). Your web
: page indicates the that network driver is for use with the 26 September
: release. Did your fix not get into the January patch? Should I still use
: the driver on your web site?

: Confused…