通过哪个函数可以检测io-net进程异常退出了?

我的系统有时会出现io-net退出的情况,我想通过一个守护进程重启它

http://sendreceivereply.wordpress.com/2008/05/12/daddy-the-network-is-down/

xtang,这个链接上不去呀

Daddy, the network is down…
Posted May 12, 2008
Filed under: QNX |
The gateway of my home network, is not one of those “broadband router”. Instead, it’s an old Pentium 200Hz machine in my basement, running, of cause, QNX. Why am I doing this? I think it’s one of those “because I can” thing. Since I compiled my own TCPIP stack, I can really know every detail of the packets in and out of my gateway. Another reason is, of cause, I like to “live on the edge”.

Yes, it’s really “bleeding edge”, though a lot of benifit and fun of running the HEAD branch stack, one of the disadvantage is, while in it’s early stage, the stack “crashes”. The good thing is I have the core dump I could look at, but the bad thing is, that’s also when my kids started shutting at me.

Those of you who had been managed a home network, would really understand how stressful this is. > :slight_smile: > Fortunatly, soon my kids find out the “engineering way” to fix the problem. They went down to the basement, press the little reset button on the old Pentinum, give it a couple of minutes, and wola, everything comes back.

This works well for a while, but one day while I was at home along, the stack on gateway gone again. I have to get out of my comfort couch, went down to the basement and reset it myself. I said to myself, “why can’t I just write a program to resetart the network if it’s crashed”, after all, QNX is all about Micro Kernel and Modular System, isn’t it?

That’s where my “sockmon” program cames from. Once started, it will keeps on monitoring if TCPIP stack is still running, if it disappered, “sockmon” will try to execute a shell script you gave it on command line, to re-start the network. If the restart somehow failed after some try, then it will just reboot the system.

You may wonder “how do you know if TCPIP stack is there or not”? Well, QNX resource manager have builtin notification to all connected clients if their server disappeared. So all you need is to establish a connection to the tcpip stack (by call socket()), and setup to waitfor the notification events.

I have include the source here, the “notification” thing above is true for ALL resource manager (unless the manager is written in such way that turned off this feature), so you can easilly extended my program to any resource manager. Just give it a config file to read about which resource manager (what namespace you care) to watch out, and what to do (which script to execute) if the manager went away.

I will leave this for reader exercise, but if you did that, you would realiz you just got yourself a simple, basic, HA program.

-xtang


#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <process.h>
#include <signal.h>
#include <string.h>
#include <syslog.h>
#include <sys/procmgr.h>
#include <sys/sysmgr.h>
#include <sys/neutrino.h>
#include <sys/socket.h>

int main(int argc, char **argv)
{
        char *script;
        int sd, chid, fcount;
        struct _pulse pulse;

        if (argc < 2) {
                fprintf(stderr, "sockmon <re-start script>\n");
                return -1;
        }

        script = argv[1];
        if (access(script, X_OK) != 0) {
                fprintf(stderr, "access(’%s’): %s\n", script, strerror(errno));
                return -1;
        }

        /* creat a channel for accept COIDDEATH pulse */
        if ((chid = ChannelCreate(_NTO_CHF_COID_DISCONNECT)) == -1) {
                perror("ChannelCreate");
                return -1;
        }

        /* don’t care about the child */
        signal(SIGCHLD, SIG_IGN);

#ifdef NDEBUG
        if (procmgr_daemon(0, PROCMGR_DAEMON_NOCLOSE) == -1) {
                perror("procmgr_daemon");
                return -1;
        }
#endif

        openlog("sockmon", LOG_PID, LOG_DAEMON);
        setlogmask(LOG_UPTO(LOG_INFO));

        for (;;)
        {
                fcount = 0;

                /* connect to tcpip to monitoring, give it 30 seconds, if still can’t
                 * connect, reboot the system
                 */
                while ((sd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) {
                        if (++fcount >= 6) {
                                syslog(LOG_ERR, "Can’t connect to socket after 3 minutes, reboot..."
);
                                spawnl(P_NOWAIT, "/bin/slay", "slay", "-f", "syslogd", NULL);
                                sleep(1);
                                sysmgr_reboot();
                                return 0;
                        }
                        sleep(5);
                        syslog(LOG_INFO, "Connect to Socket failed: %m");
                }

                syslog(LOG_INFO, "Connected to Network, start monitoring...");
                if (MsgReceivePulse(chid, &pulse, sizeof(pulse), NULL) == -1) {
                        syslog(LOG_ERR, "MsgReceivePulse(): %m");
                        return -1;
                }

                if (pulse.code != _PULSE_CODE_COIDDEATH) {
                        syslog(LOG_ERR, "MsgReceivePulse(): %m");
                        return -1;
                }

                if (pulse.value.sival_int != sd) {
                        syslog(LOG_ERR, "COIDDEATH pulse for %d\n", pulse.value);
                        continue;
                }

                syslog(LOG_INFO, "Network gone, restarting...");
                spawnl(P_WAIT, "/bin/ksh", "/bin/ksh", script, NULL);
        }

        return 0;
}

xtang,请问这个程序的argv[1]参数是什么,代入io-net则出现memory fault

是个Shell Script。

Once started, it will keeps on monitoring if TCPIP stack is still running, if it disappeared, “sockmon” will try to execute a shell script you gave it on command line, to re-start the network.

这个脚本写了些什么内容呢?

“执行这个脚本以重启io-net”.

slay -f io-net inetd dhcp.client
io-net -d xxxx -p xxxx
waitfor /dev/socket/1
dhcp.client
inetd