Thread usage

Shaun_Jackman · August 13, 2002, 7:34pm

When an io-net self function is called, when does the function return? For
example, when the down-producer calls tx_down(), will the convert, filter,
and Ethernet driver all act (rx_down(), tx_down()) before the
down-producer’s tx_down() returns? I expect this is the way it works, but
I’m doing some timing analysis of my code and want to make sure I understand
the model correctly.

Also, when the last layer (a filter in my case) calls tx_down(), the next
step will be the Ethernet driver. How much work is done before tx_down()
returns? Is the packet simply queued in software at this point, or has it
been written to the NIC already?

I’ll give you some background on the reasons for my questions. I developed
my io-net module on a PIII-750 and it worked fine. Now I’m trying to move it
to an embedded P-266 and am running into performance bottlenecks. I’m not
sure if it’s related to CPU, bus, or NIC, but it’s hurting me. I used
ClockCycles() to time the execution time of tx_down() in the down-producer.
On the PIII-750 it takes 3us on average. On the P-266 it takes 40us on
average. This difference seems far greater than I expected.

Thanks,
Shaun

Xiaodan_Tang1 · August 14, 2002, 12:11am

Shaun Jackman <sjackman@nospam.vortek.com> wrote:

When an io-net self function is called, when does the function return? For
example, when the down-producer calls tx_down(), will the convert, filter,
and Ethernet driver all act (rx_down(), tx_down()) before the
down-producer’s tx_down() returns? I expect this is the way it works, but
I’m doing some timing analysis of my code and want to make sure I understand
the model correctly.

Yes or No. It totally depends on each module who get the packet.

tx_down()-> io-net (find out the module below) ->below->rx_down()
…

The module below (suppose it is a converter), could choose to:

just used the thread to process and tx_down() again
queue and return, and schedule another thread de-queue and
process.
call tx_done() → io-net (find out reg_tx_done guys, and)
module->tx_done.

So one module have no control of the “tx_down” thread. This
is usually OK because a module usually don’t have its own
thread anyway.

If you don’t want your thread be used by your lower layer
you have to queue it, and use another thread to “tx_down()”.

Also, when the last layer (a filter in my case) calls tx_down(), the next
step will be the Ethernet driver. How much work is done before tx_down()
returns? Is the packet simply queued in software at this point, or has it
been written to the NIC already?

It depends on the driver. Usually they put packet into hardware
if they can. Otherwise, they queue and return.

I’ll give you some background on the reasons for my questions. I developed
my io-net module on a PIII-750 and it worked fine. Now I’m trying to move it
to an embedded P-266 and am running into performance bottlenecks. I’m not
sure if it’s related to CPU, bus, or NIC, but it’s hurting me. I used
ClockCycles() to time the execution time of tx_down() in the down-producer.
On the PIII-750 it takes 3us on average. On the P-266 it takes 40us on
average. This difference seems far greater than I expected.

I don’t have a clean explaination for this. But in tha past, the
io-net and known to “CPU hanugry”. Our driver guy did a lot of
performance measure, and to my understanding, drivers in 6.2 are
a lot more better.

Also, sometime, changing NIC is also helpful (some chip are buggy
and slow, some not).

-xtang

Shaun_Jackman · August 14, 2002, 6:53pm

Thank you for your answers. They helped concrete my understanding of io-net.

I’ll give you some background on the reasons for my questions. I
developed
my io-net module on a PIII-750 and it worked fine. Now I’m trying to
move it
to an embedded P-266 and am running into performance bottlenecks. I’m
not
sure if it’s related to CPU, bus, or NIC, but it’s hurting me. I used
ClockCycles() to time the execution time of tx_down() in the
down-producer.
On the PIII-750 it takes 3us on average. On the P-266 it takes 40us on
average. This difference seems far greater than I expected.

I don’t have a clean explaination for this. But in tha past, the
io-net and known to “CPU hanugry”. Our driver guy did a lot of
performance measure, and to my understanding, drivers in 6.2 are
a lot more better.

I measured the time between the tx_down() of the down-producer and the
rx_down() of the converter. I found that on the PIII-750 the mean time is
1.69us. On the P-266 though this time jumps fouteen fold to 22.80us. Packets
leaving the down-producer of type “vn” can only go one place: the “vn_en”
converter. What could io-net be doing in this time?

Also, sometime, changing NIC is also helpful (some chip are buggy
and slow, some not).

The time to tx_down() a packet from the filter to the ethernet driver is 5us
on the PIII-750 and 28us on the P-266. This 6x jump is closer to what I was
expecting in performance difference. So, I don’t think the NIC (Intel 82557)
is the bottle-neck in my case.

Thanks,
Shaun

Sean_Boudreau1 · August 21, 2002, 3:01pm

Shaun Jackman <sjackman@nospam.vortek.com> wrote:

Thank you for your answers. They helped concrete my understanding of io-net.

I’ll give you some background on the reasons for my questions. I
developed
my io-net module on a PIII-750 and it worked fine. Now I’m trying to
move it
to an embedded P-266 and am running into performance bottlenecks. I’m
not
sure if it’s related to CPU, bus, or NIC, but it’s hurting me. I used
ClockCycles() to time the execution time of tx_down() in the
down-producer.
On the PIII-750 it takes 3us on average. On the P-266 it takes 40us on
average. This difference seems far greater than I expected.

I don’t have a clean explaination for this. But in tha past, the
io-net and known to “CPU hanugry”. Our driver guy did a lot of
performance measure, and to my understanding, drivers in 6.2 are
a lot more better.

I measured the time between the tx_down() of the down-producer and the
rx_down() of the converter. I found that on the PIII-750 the mean time is
1.69us. On the P-266 though this time jumps fouteen fold to 22.80us. Packets
leaving the down-producer of type “vn” can only go one place: the “vn_en”
converter. What could io-net be doing in this time?

Are you receiving packets at this time? This generally happens at
priority 21. Try increasing your priority.

-seanb

Shaun_Jackman · August 21, 2002, 4:30pm

Are you receiving packets at this time? This generally happens at
priority 21. Try increasing your priority.

I’m not receiving at this time. I’m running at priority 29. I don’t think
the increased time is due to being pre-empted. The measured time is pretty
constant across all the samples.

As a comparison, upgoing packets on the same box take 7.78 us from tx_up()
of the filter to rx_up() of the converter, and 2.03 us from the tx_up() of
the converter to rx_up() of the down-producer.

Cheers,
Shaun

I measured the time between the tx_down() of the down-producer and the
rx_down() of the converter. I found that on the PIII-750 the mean time
is
1.69us. On the P-266 though this time jumps fouteen fold to 22.80us.
Packets
leaving the down-producer of type “vn” can only go one place: the
“vn_en”
converter. What could io-net be doing in this time?

Shaun_Jackman · August 21, 2002, 5:35pm

I made a mistake in my measurements and had lumped together the time of the
reg_tx_done() along with the tx_down(). So, here’s the breakdown:
alloc_down_pkt: 24.00 us
reg_tx_done: 10.01 us
tx_down to rx_down (down-producer to converter): 5.05 us
tx_down to rx_down (converter to filter): 5.09 us
tx_down (filter to ethernet): 31.55 us
I’m posting these times mostly for reference. Hopefully if somebody else
ever finds the need to start counting cycles in io-net they can have an idea
of the relative cost of each call.

The cost of the tx_down (filter to ethernet) still poses a problem for me
though. The other function calls I can put in places that aren’t super time
critical, but this last tx_down() is right in the middle of my critical
path. I’d like to be able to send a number of packets in quick succession.
At 100 Mb/s the minimum size packet takes only 6.7us to travel on the wire,
which means if I want to be able to send packets back-to-back tx_down() must
return to me in under this time. My preferred solution would be to send a
linked list of packets (linked by the next filed of npkt_t) to tx_down() so
that the driver can assure me of back-to-back performance. Can I submit this
as a feature request? In the mean time, do you see any solution to my
problem?

Thanks,
Shaun

I measured the time between the tx_down() of the down-producer and the
rx_down() of the converter. I found that on the PIII-750 the mean time is
1.69us. On the P-266 though this time jumps fouteen fold to 22.80us.
Packets
leaving the down-producer of type “vn” can only go one place: the “vn_en”
converter. What could io-net be doing in this time?