QNX4 and Tcpip 5.00A

When I create a normal blocking stream socket, connect to a host, and write more than 100 bytes, why do my writes become multiple TCP packets?

For example, if I write 37 bytes, it goes out in one data packet.

If I write 141 bytes, it goes out in a 100-byte data packet and a 41-byte data packet.

This is causing problems with Tandem systems I talk to, which assume they can get the whole request in a single read.

This worked in Socket/Socklet, but doesn’t work in Tcpip. I like Tcpip because it fixes disastrous problems in Socket/Socklet, but now I have this one irritation.

You Tandem system is broken. TCP/IP stacks are free to split/join write or read request as they see fit. TCP/IP is a streaming protocol and has no concept of packet from a users point view.

I do not know of a way to control that, sorry,

I don’t know how to check it or change it, but see if you can look at the mtu size. In QNX6, you can see it with ifconfig. I don’t have qnx4 accessable right now to see what hasto be done there. If for some reason it is set low, it would force the stack to break the packets into smaller than required pieces.

It should be around 1400-1500 for ethernet.

I thought about MTU, but I don’t see a way to check that.

The funny thing is, if I write, say, 252 bytes to someone else, it goes through as one packet.

Would the stack remember a separate MTU per host/gateway?

That would imply that you have a bridge or something which is breaking the packets up. Is there a bridge or some kind of medium change which may have a smaller mtu in the path to your destination?

We have the QNX4 box and the router to a 56K link on the same switch. The router reports an MTU of 1500 on all interfaces.

By the way, the QNX4 box is breaking my writes up, not intermediate hardware.

ok, I have no idea then. Maybe someone else will comment on this. Mario is correct that the code on the other end is broken to assume you will get the whole message in one read, but I assume that is beyond your control to fix.

Sadly, yes, it’s beyond my control. These people have had the same Tandems running for years without incident, then we up and change our stack, and lo and behold, it stops working for us. They see it as our problem, of course. And because of this, only our test system is running the new stack.

One other tidbit: on my socket, I set TCP_NODELAY, just to see what would happen. Oddly enough, the QNX stack now sends two TCP packets (100 bytes and 41 bytes, respectively) back-to-back without waiting for an ACK from the other side.

With TCP_NODELAY off, first 100 bytes is sent, then an ACK is received and 41 bytes is sent.

I noticed netstat only shows “-” in the MTU field for each entry in the routing table. Does anyone know whether that means the field is just not implemented?

The lastest stack (Tcpip) has new -s and -p option (from memory) that could change the behavior. That’s a stab in the dark…

Get the Tandem people to read a book on TCP/IP, that’s were the problem is. Any other solution is based on black magic and is asking for trouble.

I agree that the Tandem people have a problem, but the QNX stack obviously has a problem as well. If nothing else, it’s communicating inefficiently by sending two packets when one would do.

I’d be interested to see the latest stack. Is it beta or released?

Alright, this is puzzling me, so I went and dug out my Steven’s TCP/IP Illistrated Volume 1 and tried to do a little research.

The MSS (Maximum Segment Size) is established on connection. Each end announces what it’s MSS is and the smaller of the two values is used. It is possible that QNX never paid attention before (or there is a bug in the new version). If you have a packet analyzer, the SYN packet should contain the MSS for the initiator and the ACK should contain the MSS for the server.

It might be interesting to see what values are going each way for this particular connection. It does explain why the size is particular to each host you talk to.

As it turns out, the reason the 252-byte write isn’t becoming two packets is because it’s more than 208 bytes, not because it’s a different connection.

We wrote a test program that sends data to an echo server on our local network, then reads the data back in. We cycled from one byte to 300.

With writes via Socket/Socklet, we always get one read for every write.

With writes via Tcpip 5.00A, the same is true except for the range of 101 through 208 bytes, where we get two reads for every write, with the first read being 100 bytes and the second read being the remainder.

FYI, the negotiated MSS was 1460 with my connections to the Tandem.

As others mentioned, the tcpip stack is free to split/join any write into a stream socket.

Your observe is also correct, the reason that a small packet like 200 bytes would break into 2, is the stack made a decision that it could use 2 small structure (mbuf) to store the packet instead of one piece of big memory (mcluster). I think it is actually explained in Steve’s book somewhere.

I don’t think there is anyway to change this behavor from outside.

I don’t think you understood me fully, so I’ll paraphrase:

For writes under 101 bytes, one TCP data packet is sent.
For writes from 101 bytes through 208 bytes, two TCP data packets are sent.
For writes from 209 bytes through at least 300 bytes, one TCP data packet is sent.

Sure the stack is free to split/join wherever, but I can’t think of a good reason why it would do it like this. There must be some problem with the algorithm used to split/join.

There is no way for me to change this from the outside. You are correct. The change, if any, must be made to the stack.

I think I’ve given clear evidence that there’s cause for someone at QNX to examine what the stack is doing here to see whether it’s a symptom of a more serious problem.

The reason it breaks is because 208 user data, could be actually put into 2 mbuf instead of use a big mcluster (internally). The stack did this in hope of keep a small foot print.

It is designed that way, because using 2048 bytes memory to store 209 bytes is a BIG WASTE in those days.

The 2 mbuf leads to 2 packets is a perfomance lost, but it’s a memory win, at the time the stack is designed, I guess “memroy” is wight more heavier than “performance” :slight_smile: .

But one is okay for 100 bytes? Or 209 bytes?

I’m sorry, I’m not seeing the logic here. It seems like an arbitrary range.

Those days ??? Isn’t the 5.0 stack based on a more recent version of somebody elses stack? ( don’t remember the name ,)

That being said I’m still a big beleiver in not relying on any sort of behavior of a TCP/IP stack. If you need things to be deterministic, time or packet wise, look at UDP or a totaly different protocol.

sptanley, the fix MUST NOT be made to the stack. it’s not broken. It’s not working the way you want it, but it ain’t broken so it doesn’t need to be fixed ;-))) Somebody has written code that does not follow the rules and it’s Tandem. If the stack would decide to send data 1 byte per packet it would still be legal, although very inneficient ;-)

Well, I’ve already stated that I’m connecting to Tandems that won’t be changing their errant behavior. Really, you’re preaching to the choir here. But I had a conversation with one of their techies, who stated that:

  1. It was our behavior that changed, not theirs;
  2. Nobody else who connects to them has any problems.

These are both true statements, but they obviously sidestep the issue, which is getting them to make a change. Sadly, we need to communicate with them, but we can’t be a squeaky enough wheel to get them to grease us.

All I’m asking is that someone at QNX take a peek at the code to see whether this behavior might be part of a large enough problem that it might need to be fixed after all. Would someone take a look, if only to explain why it does what it does?

So far, our only alternative is to communicate through an intermediate BSD box, which exhibits no such odd behavior.

As Xiodan (xtang) already explained, a choice was made by the implementor to trade memory footprint for performance. Since he works for QNX and is very familiar with the networking side of it (He is the QNET implmentor in QNX6), I trust that his explaination is accurate.

As also pointed out, although unusual, it is completely compatible with the standard. From QNX’s point of view, there is no problem, therefor, nothing to fix.

An alternative for you would be to contact QSS Custom Engineering and tell them you want a modified version of thier TCP stack. They will determine how much effort this will take and give you a quote. If you are a large enough customer, you may try to convince your sales guy that they should throw the customization in for free, but I suspect the only way you will get a change, is to pay them for it.

If there is no money available, then you might as well start building the BSD box now. :frowning:

Well said Rick. Funny enough, about 1 year ago I was working on a problem where a customer had upgraded to a more recent stack (not 5.0) and it was consuming more memory and creating them problem, because their embedded system was running out of ram. I beleive their solution was to paid QSS to have them implement some options to limit how much ram the stack was using (at the risk of loosing packet)