How to make qnet more deterministic

Yes last time we were starting a contract about 1.5 yrs ago I worked with Mike Martin I think and he had a nice fellow come down and address some issues we were having with getting dual head and 3d support working.

But to be fair, this time around I am evaluating OSs for use across our organization - not just for 1 contract - and I don’t want to get everyone’s sales ppl all over me (no offense). I’m evaluating several alternatives and they pretty much need to stand or fall on their own merit, I don’t want to try deal w/tech support on 4 different real time OSs pre-sales, and it’s probably not fair to them either to take up their time.

There’s a reason eval versions are out there :slight_smile: They also reflect the quality of the software I’ve found.

The distributed IO thing was a big deal though, it was the one thing that set QNX apart from the others with respect to our applications.

Plus if guys like Mario and maschoen can’t figure it out :slight_smile: who can :slight_smile: Those guys are like icons answering QNX questions from way back when I was in grad school :slight_smile:

ncostes,

Well maybe it was my vanity, but your praise got me curious about this.  I set this up using your code and had an interesting time of it.  For some strange reason, I had a lot of trouble seeing the data bits flip on the parallel port.  This happened on two machines.   Occaisionally when I would jiggle something on the scope I would see the square wave pattern for a brief instant, but then it was gone.  I don't know what this was about, but it is probably not important.   I fooled around and found I could see one of the control bits flipping if I did out8()'s to 0x37A, which is equivalent.   

I note, I'm using the "compat" version of qnet.  I wasn't willing to mess with this right now.   The square wave coming out of the parallel port was a little junky looking, but was fairly stable otherwise.  If there was any jitter in it at all, it was conservatively less than 5%.   That is not all however.   Ever second or two I could see a major loss of signal.  A square wave about 40ms would appear.    It's as if QNET went off some where every once in a while.   I should also point out, that this was not an isolated network, so there could be a reasonable explaination for the jump unrelated to either of these machines.   I wanted to check these results with you before I look any deeper.   Is that what you are seeing, or is it jitter between individual waves?

Both of these machines are running 6.3 SP2. Both are multiprocessors that are barely loaded.

I didn’t use the parallel port. I use ClockCycles to measure timing.

That’s what I was seeing, every second or so a long (32-48 ms) square wave. Like you, I had almost nothing running, and unlike you, I had this on an isolated network (just the 2 qnx nodes on 1 Fast Ethernet switch) which, for me anyway, rules out the possibility of other machines causing the behavior.

For me every 20-40 pulses one would be double the length followed by a very short one.

Mario,

What is the speed of your processors, specifically the one on the receiving end.   In my setup the receiver is a relatively slow 500Mhz, while the sender is 1.5Gh.   I'm suggesting that since I'm getting much larger pulses, the problem may be on the receiving end.   I guess I should reverse the direction and try again.

Mitchell

One is a Xeon with HT 3.8G and the other is an AMD X2 2.4 Gig. I tried swapping direction (the multi-core can play trick on you with ClockCycles) but that didn’t change the behavior. Both machines were running the SMP kernel.

I reversed the direction and found what looks like no difference. Recall that the processor on one side is at least 3 times the speed of the other.
This suggests that the delay is not related to QNET or anyother processor getting hung up in cpu cycles for the period of the delay. Obviously this could be tracked further, eg. putting the scope on the Ethernet cable, but I think it is clear that something is wrong with QNET, or at the very least some common piece of a NIC driver. I guess we’ll have to wait for someone to have a vestid interest in getting this looked at by QSSL.

True, but if you don’t actually evaluate it, you are possibly cheating yourself (and your company) out of selecting the best product.

This issue, is just crying out to be analyzed with the System Profiler. Have you tried analyzing this with it? If you aren’t you are missing out on seeing how good the product is at debugging. You should be able to figure out clearly what is going on, and when you do you have shown:

  • The product has seriously effective debugging tools

and

  • That the network stack has some blindingly huge hole in it, that none of the thousands of customers who actually have deployed product on it have noticed.

or

  • Exactly what the problem is with your test.

Whatever the result is, you can then honestly say that you evaluated the product properly (if you found the big hole, you’ve prevented your company from making a mistake, and if you didn’t then you now have a really good idea that the product can be used to build a solid product and that you’ll have the tools you need to debug real problems when they -invariably- occur in your development). Either way you’re a hero!

To be fair, not many of QSSL’s current targeted customers care about the deterministic properties of QNET, if they use it at all. This is a big change over QNX 4. I do however agree about your evaluation point.

If you decide against QNX without giving QNX the chance to address your problem, this would be unfair to the company.

This is a little off-topic, but I am trying to send output through my parallel port’s data register, and cannot get it to work. Could you forward me the code you were using? Thanks.

Code is not handy, but here’s a quick primer.

  1. Disable the printer driver if it is running
  2. Run your program as super user
  3. Use ThreadCtl(_NTO_TCTL_IO, 0) to enable I/O
  4. The normal base addresses are 0x378, 0x278, 0x3BC
  5. The DATA register is offset 0, CTRL is 1(I think?), INFO at 2.
  6. The normal printer protocol is to check the INFO register for any blocks, then out data to the DATA register and then set and unset the strobe bit,
    I think this is 0x01 in CTRL. You need a small delay after setting.
    There are other data protocols. And there are other pitfalls.
    There’s a bit to reverse the data direction in CTRL, which can appear in two locations.
    If it’s set wrong, you can get but not put data.

Hi,

this is what I have done:

  1. login as “root”
  2. open terminal and “slay devc-par”
  3. create program as follows:

int main()
{
if ( ThreadCtl( _NTO_TCTL_IO,0 )==-1 ) exit(0);
else printf(“good start\n”);
out8( 0x378, 0xFF );
return 0;
}

The data bits all go to 1, as expected. If i run the program again, but change the value sent to the port, say out8(0x378,0x00), nothing happens! It appears it will only send info to port once. the status register has a value of 0x7F, while control has 0xC6.

I tried your suggestion of sending a falling edge to the strobe-bit, however this done nothing either.

If i do NOT slay devc-par, I can run the program multiple times to change the value of the bit, it works. However, I cannot add multiple out8()'s within the same program to change the port’s value.

Any help??? I’m really stuck now.

Rennie,

I’m not looking to be a hero, I know QNX is the best choice, I have an uphill fight to demonstrate this to the powers that be that think Linux is a real time operating system…

I’m asking QNX to do something that no other OS can do (but I know it can be done because I’ve done it with QNX 4).

I don’t know how to use the profiler, I will look into it when I get a chance. There’s not much code there to profile if you’ve looked at my sample, and it all comes from the QNX docs. I suppose if the profiler lets me see if the NIC driver or Qnet is misbehaving then that would be good.

I would not be surprised that none of the thousands of ppl who’ve installed Qnet have run into this - how many of those apps are donig distributed IO at 64Hz or higher and are actually looking at the output on a scope to see if there are delays?

In addition we have three people on this forum now, 2 of them competent (that excludes me :slight_smile:) who’ve duplicated these results using the test programs I attached.

If I have time I’ll get back to this but right now either I’m not getting the same behavior I got in QNX 4.25 with FLEET - or else I’m having overly fond memories of the deterministic behavior of FLEET :slight_smile: Though I doubt it’s the latter because I was just as much of a stickler back then - we were doing a direct visual servo experiment and the top of the line p2/400’s back then couldn’t handle doing the capture AND the image processing AND the trajectory gen/inv kinematics/motor control so I set up 1 PC to capture /process images and send the target coords to the other PC which then controlled the robot. Same network topology (isolated 100Mb full duplex segment). We were running at 955Hz (frame rate of the camera) and I’m sure I’d have complained back then too if we were getting 32-48ms delays (and the control would probably have gone unstable - that’s 32-48 control periods delays).

I’m using the same network topology, but MUCH faster CPUs with the current QNX6 experiment.

Thanks for the replies guys. I’ll check back into this when I get a little free time. I probably will have to contact QSSL because my demo key is going to run out before I can get back to this.

I have another project to work on that doesn’t rely on deterministic network access that I will try to get them to use QNX on, if they go for it on that project it will give me time to check this out further.

You are not memory mapping the ports…

// Allow this thread to access IO ports and map the parallel
// port’s memory.
ThreadCtl (_NTO_TCTL_IO, 0);
parallel_port = mmap_device_io (1, 0x378);

if (!strncmp(buf, “on”,2))
out8 (parallel_port, 0xff);
else
out8 (parallel_port, 0x00);

This works for me.

m2asselli,

Well you got my interest again.   I think this was happening to me too when I tried to do the same thing.  I wasn't interested in the problem at that time, So I just started using a bit on the control register.  I'll recreate this and see if I can figure out what is wrong.   I have a vested interest, as I'll be working on a parallel port Zip driver next month.

MS

Memory mapping still doesn’t do the trick. Could there be something wrong with my “slay devc-par”??? Can I disable this from booting when qnx loads? I am able to run the program once, get the correct data on the port, but afterwards, nothing, unless I reboot the computer.

Thanks again.

Thanks MASCHOEN. It’s really bothering me also. I’ve started another thread for this topic, if you want to post results there.

Memory mapping ???