SMC EtherPowerII 9432Tx Problems

Our typical system consists of 13 nodes (Compaq Proliant 1600R). Each node
has 2 SMC9432TX (Chipset 171 ) NICs installed in slots 1 (LAN1) and 2
(LAN2). Each NIC is connected to a Cisco Catalyst 2900XL series switch.
LAN1 connects to one switch, LAN2 to another. Using Net.epic we use the
following options “-IX -lY -t2 -n2” so the driver negotiates with the
switch to determine speed and mode. The switch is set for auto/auto on
every port. The cables have been tested Ok using a cable analyzer.

Most of the cards work fine under light loads but when I perform a remote
dcheck I see quite a few cards reporting receive errors. Using the remote
dcheck as my test I get 25% of the cards working great (negotiate 100/Full
and report 0 errors (netinfo -l)), the other 75% have show stopper problems
(negotiate 100/Full, operate ok for a while and then report some form of
receive error (CRC or alignment), then proceed to negotiate). If I am
telnet’d in then I lose my connection on one of these errors and am forced
to re-telnet in. I tried forcing the card and the switch port to 100/Full
and while the receive errors still occurred the connections was definitely
more resilient. In this setting the NIC reported roughly 75 CRC errors
while the switch reported 5 and I did not lose my telnet connection once.

I think this sounds like a hardware problem at the NIC side but given the
number of NICs that respond this way (roughly 75%) I question that
assumption? What do I need to do to ensure myself and the manufacturer that
these NICs are faulty. Are there any performance test tools available that
can be used to say these cards fail @ X load?

Whoa - task switch, another thought just entered my micro-brain:
Could this also be caused by insufficient power? I will investigate this
avenue.

  • Richard

Previously, Brown, Richard wrote in qdn.public.qnx4:

Our typical system consists of 13 nodes (Compaq Proliant 1600R). Each node
has 2 SMC9432TX (Chipset 171 ) NICs installed in slots 1 (LAN1) and 2
(LAN2). Each NIC is connected to a Cisco Catalyst 2900XL series switch.
LAN1 connects to one switch, LAN2 to another. Using Net.epic we use the
following options “-IX -lY -t2 -n2” so the driver negotiates with the
switch to determine speed and mode. The switch is set for auto/auto on
every port. The cables have been tested Ok using a cable analyzer.

Most of the cards work fine under light loads but when I perform a remote
dcheck I see quite a few cards reporting receive errors. Using the remote
dcheck as my test I get 25% of the cards working great (negotiate 100/Full
and report 0 errors (netinfo -l)), the other 75% have show stopper problems
(negotiate 100/Full, operate ok for a while and then report some form of
receive error (CRC or alignment), then proceed to negotiate). If I am
telnet’d in then I lose my connection on one of these errors and am forced
to re-telnet in. I tried forcing the card and the switch port to 100/Full
and while the receive errors still occurred the connections was definitely
more resilient. In this setting the NIC reported roughly 75 CRC errors
while the switch reported 5 and I did not lose my telnet connection once.

I think this sounds like a hardware problem at the NIC side but given the
number of NICs that respond this way (roughly 75%) I question that
assumption? What do I need to do to ensure myself and the manufacturer that
these NICs are faulty. Are there any performance test tools available that
can be used to say these cards fail @ X load?

Whoa - task switch, another thought just entered my micro-brain:
Could this also be caused by insufficient power? I will investigate this
avenue.

I doubt whether you could have power supply problems with just 2 NICs
installed. Is it possible to eliminate some of the switches and try hubs
in their place?

Also why are you reducing the transmit retry ticks to 2 (-t2)? This will
also cause you problems. What version of the epic driver are you using?

  • Richard

Hugh Brown <hsbrown@qnx.com> wrote in article
<Voyager.000821081700.28984A@qnx.com>…

I doubt whether you could have power supply problems with just 2 NICs
installed.

I also doubt it since the Compaqs have 325W supplies. However our typical
configuration includes 2 Diamond Stealth II S220 video cards as well as the
2 NICs. I say typical because we have two additional configurations: one
that employs 3 of these video cards and one that employs 1 of these video
cards.

Is it possible to eliminate some of the switches and try hubs
in their place?

The project that I am working on does not have any 100Mbs hubs but I could
try to find one or two within the company if you think it is a problem with
the switches.

Also why are you reducing the transmit retry ticks to 2 (-t2)? This will
also cause you problems.

I saw another posting a while back that recommended tight settings -t2 -n2
for use on dual redundant LAN. The reasoning behind these settings is the
desire for a fairly small timeout in order to retry quicker on the
redundant NIC.

What version of the epic driver are you using?

We are using the Net.epic 4.25F from QNX4.25C archive.

  • Richard

We had the same problem. I purchased 22 SMC9432TX boards and about 15 didn’t work reliably. These are known problems with the older boards. SMC now ships SMC9432MP boards when you order SMC9432TX. The SMC9432MP are drop in replacements. Contact SMC customer service. The SMC9432MP boards operate 30% faster than the “good” SMC9432Tx boards.


“Brown, Richard” wrote:

Our typical system consists of 13 nodes (Compaq Proliant 1600R). Each node
has 2 SMC9432TX (Chipset 171 ) NICs installed in slots 1 (LAN1) and 2
(LAN2). Each NIC is connected to a Cisco Catalyst 2900XL series switch.
LAN1 connects to one switch, LAN2 to another. Using Net.epic we use the
following options “-IX -lY -t2 -n2” so the driver negotiates with the
switch to determine speed and mode. The switch is set for auto/auto on
every port. The cables have been tested Ok using a cable analyzer.

Most of the cards work fine under light loads but when I perform a remote
dcheck I see quite a few cards reporting receive errors. Using the remote
dcheck as my test I get 25% of the cards working great (negotiate 100/Full
and report 0 errors (netinfo -l)), the other 75% have show stopper problems
(negotiate 100/Full, operate ok for a while and then report some form of
receive error (CRC or alignment), then proceed to negotiate). If I am
telnet’d in then I lose my connection on one of these errors and am forced
to re-telnet in. I tried forcing the card and the switch port to 100/Full
and while the receive errors still occurred the connections was definitely
more resilient. In this setting the NIC reported roughly 75 CRC errors
while the switch reported 5 and I did not lose my telnet connection once.

I think this sounds like a hardware problem at the NIC side but given the
number of NICs that respond this way (roughly 75%) I question that
assumption? What do I need to do to ensure myself and the manufacturer that
these NICs are faulty. Are there any performance test tools available that
can be used to say these cards fail @ X load?

Whoa - task switch, another thought just entered my micro-brain:
Could this also be caused by insufficient power? I will investigate this
avenue.

  • Richard

Jack Rosenbloom
Webcraft Mail Systems
Manager, Process Control Systems Engineering
4371 County Line Road
Chalfont PA. 18914
Phone: (215) 997-5269
FAX: (215) 997-5455
E-Mail jrosenbloom@webcraft.com

Following up with SMC we have learned that the 171 chipset becomes unstable
after a period of time. We are returning our cards for 172 chipset models.

This stinks because when we originally purchased the cards we got 170
chipsets. At the time we didn’t have 100Mbs connection so they looked fine.
Then we went to 100Mbs and found these cards were unstable and contacted
SMC who agreed and said they have not had any problems reported with the
171 chipset. All our cards were returned for the 171 chipset models and now
we have to do it all over again.

I hope this new set works (fingers crossed).

  • Richard

Previously, Brown, Richard wrote in qdn.public.qnx4:

Following up with SMC we have learned that the 171 chipset becomes unstable
after a period of time. We are returning our cards for 172 chipset models.

This stinks because when we originally purchased the cards we got 170
chipsets. At the time we didn’t have 100Mbs connection so they looked fine.
Then we went to 100Mbs and found these cards were unstable and contacted
SMC who agreed and said they have not had any problems reported with the
171 chipset. All our cards were returned for the 171 chipset models and now
we have to do it all over again.

I hope this new set works (fingers crossed).

  • Richard

Please let us know if the new chipset solves your problem, as this is the
sort of information we can pass on to other customers. I have also
recently fixed a bug in the epic driver that has to do with oversize packets.
One of our customers is busy testing this new driver and as soon as we are
happy that it is stable, we will be releasing it.

Hugh.

Sure, I will be glad to post our findings. This RMA process may take while.
Hopefully I can share some positive news within a couple of weeks.

In the meantime can I please get a copy of this beta driver.

  • Richard

After discussions with other SMC reps they now say that the 172 chipset is
not that different than the 171. They now say the 170 was unstable and the
171/172 are stable. We tried their DOS based test software (EZ-Start) and
it reported failures. They sent us a newer version and now their test
software reports that the cards pass. I wonder what this new version of
test software does differently than the previous version?

Now I’m not sure where this is going. I don’t know if these cards are being
returned or not.

Jack, if your still following this thread can you tell me what chipset your
cards have and tell me if its working reliably @ 100/Full. I believe there
should be a large chip on the card stamped with a number. Three digits in
the middle should read 170, 171 or 172. If any other listeners out there
are using cards I would appreciate your feedback about chipset and
reliability @ 100/Full.

Hugh, I checked with our IT department and they have no 100Mbs hubs so I
will test a few cards with a Xover cable. In the meantime we are looking at
getting other cards from other vendors for comparison.

Sure, I will be glad to post our findings. This RMA process may take
while.
Hopefully I can share some positive news within a couple of weeks.

In the meantime can I please get a copy of this beta driver.

  • Richard

    \

Previously, Brown, Richard wrote in qdn.public.qnx4:

After discussions with other SMC reps they now say that the 172 chipset is
not that different than the 171. They now say the 170 was unstable and the
171/172 are stable. We tried their DOS based test software (EZ-Start) and
it reported failures. They sent us a newer version and now their test
software reports that the cards pass. I wonder what this new version of
test software does differently than the previous version?

Now I’m not sure where this is going. I don’t know if these cards are being
returned or not.

Jack, if your still following this thread can you tell me what chipset your
cards have and tell me if its working reliably @ 100/Full. I believe there
should be a large chip on the card stamped with a number. Three digits in
the middle should read 170, 171 or 172. If any other listeners out there
are using cards I would appreciate your feedback about chipset and
reliability @ 100/Full.

Hugh, I checked with our IT department and they have no 100Mbs hubs so I
will test a few cards with a Xover cable. In the meantime we are looking at
getting other cards from other vendors for comparison.

I have been running tests here for another customer with dual epic cards
in each machine. I ran our network tests for over a week at 100Mbps and
had no failures with over 4,000,000,000 messages per network! This was
done with the 83C171A2QF network chip.

  • Richard











    Brown, Richard <> brownr@aecl.ca> > wrote in article
    01c00dfc$f10721b0$b5a7e184@spw1296>…
    Sure, I will be glad to post our findings. This RMA process may take
    while.
    Hopefully I can share some positive news within a couple of weeks.

In the meantime can I please get a copy of this beta driver.

  • Richard


    \

The big chip in the middle is 83C171A2QF. The board part # is SMC9432TX/MP. Most of my systems have 2 of these boards. I run a FTP client thru one board averaging 1.5Mbytes/sec for hours on end with no issues. Speed is limited by my mainframe server. I have achieved over 3 Mbytes/sec using a QNX server (also with the same SMC board)on a private network. The second board, also at 100Mbits/sec, isn’t pushed very hard. I’m very happy with the current situation and SMC was very helpfully resolving the problem.

Jack