TCP Timing problem

I have a question about TCP timing.

I have an application using TCP in QNX4.25, shuttling lots of streaming
data from local hardware to the network. Normally the TCP driver takes
my data, which consists of lots of relatively small messages,
concatenates them and transmits them quite quickly, not usually waiting
for an ACK from the receiver.

The receiver’s window size is about 60K, and we are sending typically
30-120 KB/sec. The MSS is 1460.

When the data stops for one second or more then restarts here is what
happens:

  • The first message only (e.g. small message, about 35 bytes) is sent on

the network

  • The QNX machine waits for an ACK before sending any more data, which
    usually takes about 100 ms.
  • In the meantime the application continues to send messages, but they
    often fill the buffer up before the ACK is received.
  • When the ACK is finally received the buffered messages get sent, and
    the system works fine from that point on.

(Note: This does NOT happen when the silence period is less than one
second)

I believe what is happening is that the driver is timing out during the
period of inactivity and “forgetting” the window sent in the last ACK
from the receiver. Therefore the driver only attempts to send a small
message, fully anticipating an ACK sometime later to inform it of the
actual window size. Unfortunately for our application we often don’t
have that 100 ms. to spare.

We suspect that maybe the Nagle algorithm is working here, but it does
not fit the Nagle algorithm description completely. Although the Nagle
algorithm does indicate that subsequent data is not sent until an ACK
arrives, my inerpretation is that once enough data is received by the
TCP driver to fill a full TCP packet that the packet is sent immediately

and an ACK is not waited for.

Here is what I have tried:

  • Set the TCP_NODELAY option. No effect.
  • Increase the send buffer size to the max. This works for most
    situations, but in some cases the buffer still fills up.
  • Buffer individual messages in the application for a large atomic send
    to the network, thinking that this might defeat the Nagle algorithm by
    presenting a message large enough for a full TCP packet. This did not
    work.

Does anyone have:

  • A better explanation of what is going on here?
  • A better suggestion for overcoming this problem?

Thanks
Matt Schrier

I’ve been able to reproduce this here using the following code.
It appears to be fixed in the tcprt5.0 suite which is close to
being released (this is why I couldn’t reproduce it under RTP
(same source base)).

-seanb


#include <sys/socket.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/uio.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int
main(void)
{
int s, i, j;
struct sockaddr_in sad;
unsigned char buff[35];

if((s = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror(“socket”);
return 1;
}

memset(&sad, 0x00, sizeof sad);

sad.sin_family = AF_INET;
sad.sin_len = sizeof sad;
sad.sin_port = htons(200);
inet_aton(“10.163”, &sad.sin_addr);

if(connect(s, (struct sockaddr *)&sad, sizeof sad) == -1)
{
perror(“connect”);
return 1;
}

if(fcntl(s, F_SETFL, fcntl(s, F_GETFL) | O_NONBLOCK) == -1)
{
perror(“fcntl”);
return 1;
}


for(i = 0;; i++)
{
for(j=0; j < 200; j++)
{
/*

  • 35 * 200 = 7000 (don’t fill up send
  • buffer if starting from empty
    /
    if(write(s, buff, sizeof buff) <= 0)
    {
    perror(“write”);
    return 1;
    }
    }
    if(!(i % 200))
    {
    delay(3000);
    }
    else
    {
    /
  • Neglecting the time for the 200 writes (we’re nonblocking),
  • most times this works out to 200*35/.06 = 117Kb / sec.
    */
    delay(60);
    }
    }

close(s);
return 0;
}



#include <sys/socket.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/uio.h>
#include <netinet/in.h>

int
main(void)
{
int s, f, n;
struct sockaddr_in sad;
unsigned char buff[1024];

if((s = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror(“socket”);
return 1;
}

memset(&sad, 0x00, sizeof sad);

sad.sin_family = AF_INET;
sad.sin_len = sizeof sad;
sad.sin_port = htons(200);

if(bind(s, (struct sockaddr *)&sad, sizeof sad) == -1)
{
perror(“bind”);
return 1;
}

listen(s, 5);
if((f = accept(s, NULL, 0)) == -1)
{
perror(“accept”);
return 1;
}

while((n = read(f, buff, sizeof buff)) > 0);

close(f);
close(s);
return 0;
}




Matthew Schrier <mschrier@wgate.com> wrote:
: I have a question about TCP timing.

: I have an application using TCP in QNX4.25, shuttling lots of streaming
: data from local hardware to the network. Normally the TCP driver takes
: my data, which consists of lots of relatively small messages,
: concatenates them and transmits them quite quickly, not usually waiting
: for an ACK from the receiver.

: The receiver’s window size is about 60K, and we are sending typically
: 30-120 KB/sec. The MSS is 1460.

: When the data stops for one second or more then restarts here is what
: happens:
: - The first message only (e.g. small message, about 35 bytes) is sent on

: the network
: - The QNX machine waits for an ACK before sending any more data, which
: usually takes about 100 ms.
: - In the meantime the application continues to send messages, but they
: often fill the buffer up before the ACK is received.
: - When the ACK is finally received the buffered messages get sent, and
: the system works fine from that point on.

: (Note: This does NOT happen when the silence period is less than one
: second)

: I believe what is happening is that the driver is timing out during the
: period of inactivity and “forgetting” the window sent in the last ACK
: from the receiver. Therefore the driver only attempts to send a small
: message, fully anticipating an ACK sometime later to inform it of the
: actual window size. Unfortunately for our application we often don’t
: have that 100 ms. to spare.

: We suspect that maybe the Nagle algorithm is working here, but it does
: not fit the Nagle algorithm description completely. Although the Nagle
: algorithm does indicate that subsequent data is not sent until an ACK
: arrives, my inerpretation is that once enough data is received by the
: TCP driver to fill a full TCP packet that the packet is sent immediately

: and an ACK is not waited for.

: Here is what I have tried:
: - Set the TCP_NODELAY option. No effect.
: - Increase the send buffer size to the max. This works for most
: situations, but in some cases the buffer still fills up.
: - Buffer individual messages in the application for a large atomic send
: to the network, thinking that this might defeat the Nagle algorithm by
: presenting a message large enough for a full TCP packet. This did not
: work.

: Does anyone have:
: - A better explanation of what is going on here?
: - A better suggestion for overcoming this problem?

: Thanks
: Matt Schrier

Sean,

Thanks for checking that out.


Sean Boudreau wrote:

I’ve been able to reproduce this here using the following code.
It appears to be fixed in the tcprt5.0 suite which is close to
being released (this is why I couldn’t reproduce it under RTP
(same source base)).

-seanb

#include <sys/socket.h
#include <stdio.h
#include <unistd.h
#include <sys/uio.h
#include <fcntl.h
#include <netinet/in.h
#include <arpa/inet.h

int
main(void)
{
int s, i, j;
struct sockaddr_in sad;
unsigned char buff[35];

if((s = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror(“socket”);
return 1;
}

memset(&sad, 0x00, sizeof sad);

sad.sin_family = AF_INET;
sad.sin_len = sizeof sad;
sad.sin_port = htons(200);
inet_aton(“10.163”, &sad.sin_addr);

if(connect(s, (struct sockaddr *)&sad, sizeof sad) == -1)
{
perror(“connect”);
return 1;
}

if(fcntl(s, F_SETFL, fcntl(s, F_GETFL) | O_NONBLOCK) == -1)
{
perror(“fcntl”);
return 1;
}

for(i = 0;; i++)
{
for(j=0; j < 200; j++)
{
/*

  • 35 * 200 = 7000 (don’t fill up send
  • buffer if starting from empty
    /
    if(write(s, buff, sizeof buff) <= 0)
    {
    perror(“write”);
    return 1;
    }
    }
    if(!(i % 200))
    {
    delay(3000);
    }
    else
    {
    /
  • Neglecting the time for the 200 writes (we’re nonblocking),
  • most times this works out to 200*35/.06 = 117Kb / sec.
    */
    delay(60);
    }
    }

close(s);
return 0;
}



#include <sys/socket.h
#include <stdio.h
#include <unistd.h
#include <sys/uio.h
#include <netinet/in.h

int
main(void)
{
int s, f, n;
struct sockaddr_in sad;
unsigned char buff[1024];

if((s = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror(“socket”);
return 1;
}

memset(&sad, 0x00, sizeof sad);

sad.sin_family = AF_INET;
sad.sin_len = sizeof sad;
sad.sin_port = htons(200);

if(bind(s, (struct sockaddr *)&sad, sizeof sad) == -1)
{
perror(“bind”);
return 1;
}

listen(s, 5);
if((f = accept(s, NULL, 0)) == -1)
{
perror(“accept”);
return 1;
}

while((n = read(f, buff, sizeof buff)) > 0);

close(f);
close(s);
return 0;
}



Matthew Schrier <> mschrier@wgate.com> > wrote:
: I have a question about TCP timing.

: I have an application using TCP in QNX4.25, shuttling lots of streaming
: data from local hardware to the network. Normally the TCP driver takes
: my data, which consists of lots of relatively small messages,
: concatenates them and transmits them quite quickly, not usually waiting
: for an ACK from the receiver.

: The receiver’s window size is about 60K, and we are sending typically
: 30-120 KB/sec. The MSS is 1460.

: When the data stops for one second or more then restarts here is what
: happens:
: - The first message only (e.g. small message, about 35 bytes) is sent on

: the network
: - The QNX machine waits for an ACK before sending any more data, which
: usually takes about 100 ms.
: - In the meantime the application continues to send messages, but they
: often fill the buffer up before the ACK is received.
: - When the ACK is finally received the buffered messages get sent, and
: the system works fine from that point on.

: (Note: This does NOT happen when the silence period is less than one
: second)

: I believe what is happening is that the driver is timing out during the
: period of inactivity and “forgetting” the window sent in the last ACK
: from the receiver. Therefore the driver only attempts to send a small
: message, fully anticipating an ACK sometime later to inform it of the
: actual window size. Unfortunately for our application we often don’t
: have that 100 ms. to spare.

: We suspect that maybe the Nagle algorithm is working here, but it does
: not fit the Nagle algorithm description completely. Although the Nagle
: algorithm does indicate that subsequent data is not sent until an ACK
: arrives, my inerpretation is that once enough data is received by the
: TCP driver to fill a full TCP packet that the packet is sent immediately

: and an ACK is not waited for.

: Here is what I have tried:
: - Set the TCP_NODELAY option. No effect.
: - Increase the send buffer size to the max. This works for most
: situations, but in some cases the buffer still fills up.
: - Buffer individual messages in the application for a large atomic send
: to the network, thinking that this might defeat the Nagle algorithm by
: presenting a message large enough for a full TCP packet. This did not
: work.

: Does anyone have:
: - A better explanation of what is going on here?
: - A better suggestion for overcoming this problem?

: Thanks
: Matt Schrier

We have, i believe, run into the same problem:

We send a large buffer (>8000 bytes) with a single send request at the ‘C’
socket level…
Almost immediately, the TCP stack send out a minimum of a single frame;
sometimes several frames.
However, the window size does not get exhausted nor do the entire queued
data bytes get transmitted.
Before any more frames are sent, the QNX TCP stack seems to go to sleep.
An ack message from the peer system will “wake up” the stack and some
additional message(s) will then be transmitted.
This goes on, taking much longer that necessary to transfer data to a peer.
But if (in a bad network condition situation) the ACK response is lost, the
delay will become very noticable, with timeouts and retransmissions
sometimes taking > 10 seconds for a single message.

This seems to be a problem in 4.25. This did not seem to be problem (ar at
least not as pronounced) in the QNX 4.24 version that we replaced.

What is the availability of a stack (Beta or otherwise) which will fix this
problem?

Thanks,

Jim Johnson

PS: We also see that there is much (3x to 10x) slower response to Phindows
1.20 by this 4.25 stack. I am not sure if this is related or we need to
look for another cause.



“Matthew Schrier” <mschrier@wgate.com> wrote in message
news:3A71E1DC.558DA7BA@wgate.com

Sean,

Thanks for checking that out.


Sean Boudreau wrote:

I’ve been able to reproduce this here using the following code.
It appears to be fixed in the tcprt5.0 suite which is close to
being released (this is why I couldn’t reproduce it under RTP
(same source base)).

-seanb

#include <sys/socket.h
#include <stdio.h
#include <unistd.h
#include <sys/uio.h
#include <fcntl.h
#include <netinet/in.h
#include <arpa/inet.h

int
main(void)
{
int s, i, j;
struct sockaddr_in sad;
unsigned char buff[35];

if((s = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror(“socket”);
return 1;
}

memset(&sad, 0x00, sizeof sad);

sad.sin_family = AF_INET;
sad.sin_len = sizeof sad;
sad.sin_port = htons(200);
inet_aton(“10.163”, &sad.sin_addr);

if(connect(s, (struct sockaddr *)&sad, sizeof sad) == -1)
{
perror(“connect”);
return 1;
}

if(fcntl(s, F_SETFL, fcntl(s, F_GETFL) | O_NONBLOCK) == -1)
{
perror(“fcntl”);
return 1;
}

for(i = 0;; i++)
{
for(j=0; j < 200; j++)
{
/*

  • 35 * 200 = 7000 (don’t fill up send
  • buffer if starting from empty
    /
    if(write(s, buff, sizeof buff) <= 0)
    {
    perror(“write”);
    return 1;
    }
    }
    if(!(i % 200))
    {
    delay(3000);
    }
    else
    {
    /
  • Neglecting the time for the 200 writes (we’re
    nonblocking),
  • most times this works out to 200*35/.06 =
    117Kb / sec.
    */
    delay(60);
    }
    }

close(s);
return 0;
}



#include <sys/socket.h
#include <stdio.h
#include <unistd.h
#include <sys/uio.h
#include <netinet/in.h

int
main(void)
{
int s, f, n;
struct sockaddr_in sad;
unsigned char buff[1024];

if((s = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror(“socket”);
return 1;
}

memset(&sad, 0x00, sizeof sad);

sad.sin_family = AF_INET;
sad.sin_len = sizeof sad;
sad.sin_port = htons(200);

if(bind(s, (struct sockaddr *)&sad, sizeof sad) == -1)
{
perror(“bind”);
return 1;
}

listen(s, 5);
if((f = accept(s, NULL, 0)) == -1)
{
perror(“accept”);
return 1;
}

while((n = read(f, buff, sizeof buff)) > 0);

close(f);
close(s);
return 0;
}



Matthew Schrier <> mschrier@wgate.com> > wrote:
: I have a question about TCP timing.

: I have an application using TCP in QNX4.25, shuttling lots of
streaming
: data from local hardware to the network. Normally the TCP driver takes
: my data, which consists of lots of relatively small messages,
: concatenates them and transmits them quite quickly, not usually
waiting
: for an ACK from the receiver.

: The receiver’s window size is about 60K, and we are sending typically
: 30-120 KB/sec. The MSS is 1460.

: When the data stops for one second or more then restarts here is what
: happens:
: - The first message only (e.g. small message, about 35 bytes) is sent
on

: the network
: - The QNX machine waits for an ACK before sending any more data, which
: usually takes about 100 ms.
: - In the meantime the application continues to send messages, but they
: often fill the buffer up before the ACK is received.
: - When the ACK is finally received the buffered messages get sent, and
: the system works fine from that point on.

: (Note: This does NOT happen when the silence period is less than one
: second)

: I believe what is happening is that the driver is timing out during
the
: period of inactivity and “forgetting” the window sent in the last ACK
: from the receiver. Therefore the driver only attempts to send a small
: message, fully anticipating an ACK sometime later to inform it of the
: actual window size. Unfortunately for our application we often don’t
: have that 100 ms. to spare.

: We suspect that maybe the Nagle algorithm is working here, but it does
: not fit the Nagle algorithm description completely. Although the
Nagle
: algorithm does indicate that subsequent data is not sent until an ACK
: arrives, my inerpretation is that once enough data is received by the
: TCP driver to fill a full TCP packet that the packet is sent
immediately

: and an ACK is not waited for.

: Here is what I have tried:
: - Set the TCP_NODELAY option. No effect.
: - Increase the send buffer size to the max. This works for most
: situations, but in some cases the buffer still fills up.
: - Buffer individual messages in the application for a large atomic
send
: to the network, thinking that this might defeat the Nagle algorithm by
: presenting a message large enough for a full TCP packet. This did not
: work.

: Does anyone have:
: - A better explanation of what is going on here?
: - A better suggestion for overcoming this problem?

: Thanks
: Matt Schrier