Unable to make connection via inetd after a few hours

Hi again,

I just built up a small x86 unit using the same installation procedure (QNX6.5.0 SP1) image as 28 others (and yes, all with valid licenses!). Same CPU type - same everything (NIC’s etc). The units are all set up to accept telnet and phindows connections via inetd.

In this latest build however I lose the ability to connect either service after a few hours of up time. Restarting inetd does not help. Starting inetd with debug (-d) doesn’t even show an attempted connection. It appears to only be a problem when connecting over the LAN (wired in this case) - a "telnet localhost or telnet [my IP address] is fine. But not from any other machine on the same LAN be they Windows, Linux, QNX6 or QNX4.

A reboot fixes it - for a few more hours.

I have never seen this before. I have 28 of these builds out in the field and while it is fortunate that none have [so far] exhibited this problem, it makes me nervous not knowing what causes it - a time bomb waiting to go off perhaps.

It’s not super critical - only one of them is connected to the internet so that I can gain remote access. It’s mainly a facility for me when I visit them and want to connect in using my laptop using Phindows (mainly). The prospect of having to reboot a system simply to connect in does not do much for me…

Any ideas?

Geoff.

Where are you seeing this problem? At customer sites or in your office/lab? If it’s only at a customer site, what about setting up a 29th in your lab and see if the problem happens there.

How are you assigning your IP’s to your machines? Are they static (ie all your 28 machines use say 192.168.0.1) or dynamic?

I am assuming the “telnet localhost” or “telnet [my IP address]” that you mention is being done directly on your machine. It makes sense that always works since inetd is running. However can your machine ping anything else on the LAN when the problem occurs?

Do you have any nameserver (DNS) specified in your net.cfg file?

Tim

Occurs in my office.

static. All 192.168.5.0/24 with aliases of 192.168.0.x. For example,

ifconfig en1 192.168.5.48 netmask 255.255.255.0
ifconfig en1 alias 192.168.0.48

Yes. Either through an attached keyboard/monitor console or an already established Phindows (or telnet) session.

Pings to/from the device work fine, as does QNET (also configured). So it doesn’t seem to be simply a bad LAN.

Good question - I’ll check when I get in the office shortly. I don’t set net.cfg myself (hence my vagueness). I set up the networks in rc.local and use (as implied above) ifconfig. I also set the required environment variables using setconf (domain, hostname, and resolve).

I now I specify a nameserver in resolv.conf after a “lookup file bind”. The units generally have a requirement to use only the /etc/hosts file - that is mainly set up to support QNET. This problem unit is set up no differently to my working development targets (two of them) that have never exhibited any such problem, all on the same LAN.

I left the unit on overnight. It will be just my luck that after raising this on this forum that when I get in to my office in the next half hour or so it will be fine!

Geoff.

Geoff,

Any luck tracking it down?

You may also want to look at this older topic about aliasing. It may be what’s happening even thought it’s QNX 4, the TCP/IP stuff may be the same in QNX 6. I’ve never done any work with aliasing before myself but you should be able to confirm if your non-QNX machines have default gateways set and if that’s indeed the problem.

openqnx.com/phpbbforum/viewtopic.php?t=9091

Tim

Hi Tim,

As it turned out I did go into my office that morning and yes the problem was not apparent. I was able to connect a telnet and second Phindows session.

However, I left the machine on and the following day, Saturday I think, I tried re-connections again and they failed. It is still in this condition.

It also occurred to me that I can’t really say that this problem doesn’t occur on the other systems. Their normal operation is for only a few hours at a time (4 or less) and during that time no-one attempts such connections. I am essentially the only person with need or even ability to attempt a remote connection. So the problem might be across the whole lot!

I am now confident that the problem will occur after a few hours or even perhaps a couple of days, if it is going to occur.

I am going to remove the alias and see what happens. Because that is really the only “non-standard” action I can see in the TCP/IP setup. In all other respects it is “stock-standard” - as far as I can tell anyway. Perhaps the only other possible “abnormal” setup is that the CPU has two separate NIC’s - One being 102.168.5.0/24 (that has the alias of 192.168.0.n) and the other 192.168.100.0/20 (with no aliasing).

If it works OK at the end of the week I will then be inclined to think that it is in fact a problem associated with the aliasing.

FYI, on a Linux box I alias extensively, and have done for years, with no such problems. Perhaps QNX is unique in more ways than one!

Geoff.

Geoff,

If you have 2 NIC’s with these addresses:
192.168.100.0/20
192.168.0.n (alias)

What does the rest of you gateway/routing information look like. If for example 192.168.100.0 can ‘see’ the ‘192.168.0.n’ network it’s very possible that the 192.168.0.n connection isn’t doing anything at all because all packets are routed out the other NIC card.

You might want to run ‘nicinfo’ (located in /usr/sbin) and see the byte counts for the 2 cards. This might help confirm if this is the problem.

Tim

Hi Tim,

EN0 was in fact 192.168.100.0/24.

The machines come up in one of two modes - the other mode (not used in this case) has the 20 bit mask.

So, I have left EN0 as is, removed the alias from EN1 and made EN1 what was the alias (does that make sense? :slight_smile: Essentially, the setup is as follows:

ifconfig en1 192.168.0.48 netmask 255.255.255.0 up
ifconfig en0 192.168.100.48 netmask 255.255.255.0 up

Thanks for the pointer to nicinfo (reminder actually - I had forgotton it was there!)

I will see how this goes. If I can still connect on Friday then I will take it that it is to do with the alias, and figure out an alternative approach that meets my needs.

Thanks,

Geoff.

PS: (and BTW) EN0 is actually connected to a wifi device that is configured as a client to an upstream access point. I run QNET over the wifi system - and it is proving VERY effective (and fast). To date I haven’t had any issues - I was initially worried about timing problems. The setup was a bit tricky to figure out so if anyone wants to have a shortcut to the secrets they can feel free to contact me directly.

In my system I set up a node that sends a short (minimal size) message to a process running on another node and measure the time it takes to get the (NULL) reply. On the wired LAN (via a switch) it takes < 1mSec to get the response. Using the wifi it is typically 4 mSecs with occassional rises above that (depending I guess on wifi traffic at the time). The best I was able to achive with a UDP/IP method (usng a resource manager) was 28 mSecs. The effectiveness and efficiency of QNET when used in this way is my “secret weapon”.

To do this I do need to run gns. This works fine with 6.5.0 SP1 but apparently it doesn’t with 6.6. Another reason for me to say where I am within the QNX version releases! (after 27 years with QNX it looks like I won’t be able to progress beyond 6.5.0 now that Photon has been discontinued along with my preferred development environment).