advice please!

Rennie Allen wrote:

Rennie Allen wrote:

In a single master configuration doesn’t the master share the token
… that means there is not reconfiguration possible and no
suffering will happen > :slight_smile:

Yes, but any serious real-world application will use multi-masters, so
this is only a theoretical advantage.

That simply wrong. The typical PREOFIBUS DP installation is a SINGLE
master installation … also the proposed configuration is SINGLE
master one.

Then the typical Profibus installation is not a serious control
application.

Nonsense … 99% of the typical control systems aren’t fault
tolerant.

Almost every single system we ship has redundant masters.

… because it is requiered by your special control appplication.

Control systems that don’t have redundant masters are not fault
tolerant, and hence are not “typical” real-world control systems.

Might be correct for your “real-world”.

As
far as the proposed configuration, it seems quite clear to me that a
multi-master (or peer-to-peer) system is being proposed.

A multi master configuration offeres only fault tolerance for the
case that a “master” PLC would fail (hot swap …).

The RTUs and the bus aren’t fault tolerant.

A weak multi-master based fault tolerance makes only sense for the
PLC vendors :slight_smile:

Armin

What is the bigger picture actually? Are you building some kind of
neural network?

If the network is the bottleneck, then why not use less nodes as Kris
Warkentin suggests. Or to be more extreme: how about one big powerful
node with lots of sensors. In other words, you bundle the processes that
would run on the separate nodes, into one node and use internal IPC to
pass the data around.

Maybe not elegant, but it might save you money on network hardware, and
you might meet your real time targets more easily. But again, this
depends on the bigger picture, here.

regards,
rick

ycao wrote:

Oh,i really thanks for so many zealous advicers.
You bing me so many possibilities and impossibilities ,although may some of
them are contradictory.

I’ll give a more detail description about my system.But my English is so
shabby. > :slight_smile:

1.The 20 IPCs can be gathered in one room.

2.We have only 1.3 milliscend to transfer data from one node to
others,inluding the date processing on every nodes.

Q: I want to know the general kernel Operration timings ,for example:
sem_pos,fork,spawn etc,the benchmark results on the CPU faster than P500
will be better. I think the data processing may not be neglected,though
which is not so important.

3.The data transfered from one node to another is about 20 bytes .
Q:I want to know what size the head may takes in one package in common!

4.Among the control cycle time the transfering from one node to another may
happen 20 times.some can be paralleled.Some are not so demmanding.And some
nodes are listening nodes,not needing a reply.Shall i use ‘virtual
proxy’?Does it save half of time?

Q:We have a simple test that: On a net connected by 100Mbit 3c905 Ethernet
network card,each node has the QNX 4.24 ,the message transfering from one
node to the other may take 250us per time.You see ,we have no on-chipconcise
timer ,and the QNX time ticksize resolution may ranging from 0.5ms~50ms.So
we use the tatistical method.we send message 2000 times and conclude a
average time,maybe that’s not the very fact. But the length of the message
ranging from 10bytes to 1000byte does’t show obvious differences.However,as
An ARchitectural Overview of QNX>> reported at the Apendix B,that’s is not
the turth.

An ARchitectural Overview of QNX>> Apendix B:
QNX 4.24 System Performance Numbers

Hardware Environment Processor: Intel 133 MHz Pentium (Triton chipset)
RAM: 16 Megabytes
Disk Drive: 4 Gbyte Barracuda Wide SCSI
Disk Controller: Adaptec 2940 Wide SCSI (PCI-bus)
Network Cards: 10 Mbit ISA-bus NE2000 Ethernet; 100 Mbit PCI-bus Digital
21040 Ethernet

Network Throughput
10 Mbit Ethernet: 1.1 Mbytes/second
100 Mbit Ethernet: 7.5 Mbytes/second

Message-Passing Throughput
100-byte message: 1.0 Mbytes/second
1000-byte message: 6.0 Mbytes/second
4000-byte message: 8.5 Mbytes/second
Note: As the size of the message exceeds that of the processor’s cache,
throughput will drop off because of cache-miss overhead.

As it also says:
Since primitives copy data directly from process to process without
queuing, message delivery performance approaches the memory bandwidth of the
underlying hardware.

5.Avoiding the colission we may use a swither.

Q:I want to know the the dalay time a swither may bing!(We want to use a
3c16456 swither.)

I would think that switches do not directly connect port to port, or
they
would have been pretty much a bunch of wires in a box. May be I’m
wrong here

That’s exactly what they are (on a microscopic scale of course), which
is why the chips that they use are called switch fabrics.

but I would expect that swithes actually allow all ports to transmit
at the
same time, by streaming all ports to internal buffers first. Whether
those
transmissions will be passed to destinations immediately or not would
depend
on what destinations are.

This is basically correct, although the buffer need only be big enough
to handle the destination address before the route can be instantiated
in the switch fabric (in the cases where the destination port is not
already in use - i.e. no contention).

If 10 ports transmit to 10 other ports, yes. If 19
ports transmit to 1 then switch would have to store (and possibly
bundle)
incoming packets and then spill them out as receiver is ready to
handle next
portion. You still have contention of course but in this case you have
bound
worst case latency because arbitration mechanism (queing) is
deterministic.

Is it deterministic ? How do you know ? Is it a priority ordered queue
? If it is FIFO how is this deterministic (since presumably the queue
can be of indeterminate size when any particular packet arrives) ?

Again, I think that you are probably (99.X%) right in saying that a
packet will get through vendor XYZ’a switch in 1.3 milliseconds
@100Mbit, however, this is a probabilistic not deterministic statement.
The probabilistic nature of this is evident when you consider what
happens if the media rate is dropped to 10Mbit; do you think that at
10Mbit the chances are higher, lower, or the same that a packet will get
through in 1.3 milliseconds (if the Ethernet were deterministic you
would be able to state unequivocally that 10Mbit is either fast enough
to support a 1.3 millisecond latency - based on the packet size - or
not).

Rennie

A multi master configuration offeres only fault tolerance for the
case that a “master” PLC would fail (hot swap …).

True.

The RTUs and the bus aren’t fault tolerant.

Are you saying that Profibus I/O modules don’t support redundant points
? And that there are no Profibus modules that support redundant cabling
(clearly a Profibus master is capable of supporting redundant cabling,
and redundant points) ?

This makes the market for Profibus extremely limited.

A weak multi-master based fault tolerance makes only sense for the
PLC vendors > :slight_smile:

It is far weaker to have no master redundancy than to have only I/O and
cable redundancy, although I agree that for any critical system there
should be master redundancy, cable redundancy and I/O redundancy.

Research in safety critical systems is pretty conclusive that the most
likely point for failures are in the areas with the highest
concentration of software (i.e. the masters).

Rennie

“Rennie Allen” <RAllen@csical.com> wrote in message
news:D4907B331846D31198090050046F80C904B6A9@exchangecal.hq.csical.com

portion. You still have contention of course but in this case you have
bound
worst case latency because arbitration mechanism (queing) is
deterministic.

Is it deterministic ? How do you know ? Is it a priority ordered queue
? If it is FIFO how is this deterministic (since presumably the queue
can be of indeterminate size when any particular packet arrives) ?

I don’t know if ethernet packets can carry priority. Even if they could it
would have to be integrated with OS to make any sense and I don’t believe
ethernet works that way. Just don’t get started on priority inversion over
network :wink:

I’d expect it it be FIFO with limited max. size. Since memory is limited and
packets can’t live forever anyway you can’t (and there is no point) to have
arbitrary long queues. Thus, a worst case can be calculated and it won’t be
‘infinity’.

Again, I think that you are probably (99.X%) right in saying that a
packet will get through vendor XYZ’a switch in 1.3 milliseconds
@100Mbit, however, this is a probabilistic not deterministic statement.
The probabilistic nature of this is evident when you consider what
happens if the media rate is dropped to 10Mbit; do you think that at
10Mbit the chances are higher, lower, or the same that a packet will get
through in 1.3 milliseconds (if the Ethernet were deterministic you
would be able to state unequivocally that 10Mbit is either fast enough
to support a 1.3 millisecond latency - based on the packet size - or
not).

Given max. possible queue size you could. If your traffic overloads that
queue then your system in not deterministic by definition since load exceeds
capacity.

  • Igor

Rennie Allen wrote:

A multi master configuration offeres only fault tolerance for the
case that a “master” PLC would fail (hot swap …).

True.

The RTUs and the bus aren’t fault tolerant.

Are you saying that Profibus I/O modules don’t support redundant
points ?

YES … and this is the fact for e.g. DeviceNet, ControlNet, CAN,
CANopen, Interbus, FlexI/O, Interbus, ASI, SERIPLEX, LON a.s.o
based I/O modules and 99% of all PLCs !
°°°°°°°°°°°°°°°°°°°°°

And that there are no Profibus modules that support redundant cabling
(clearly a Profibus master is capable of supporting redundant cabling,
and redundant points) ?

YES … and this is also the fact for e.g. DeviceNet, ControlNet,
CAN, CANopen, Interbus, FlexI/O, SERIPLEX, Modbus, Interbus, ASI
and LON based I/O modules !

This makes the market for Profibus extremely limited.

The increasing PROFIBUS market is around this globe … that’s the
only limitation :slight_smile:

A weak multi-master based fault tolerance makes only sense for
the PLC vendors > :slight_smile:

It is far weaker to have no master redundancy than to have only I/O and
cable redundancy, although I agree that for any critical system there
should be master redundancy, cable redundancy and I/O redundancy.

Redundancy is not sufficient for fault tolerance … PROFIBUS I/O
modules have in general a lot of intelligent mechanism (watch dogs,
detailed diagnostic infos a.s.o) to realize fault tolerant behavior
and preventive diagnosis … even if they don’t have redundant bus
connections :slight_smile:

Adding hardware redundacy means adding hardware → which means
adding sources of faults, especially when the hardware becomes old
… software doesn’t get elder :slight_smile:

Therefore replacing hardware by software is the better strategy !

Research in safety critical systems is pretty conclusive that the
most likely point for failures are in the areas with the highest
concentration of software (i.e. the masters).

Correct … we have a lot of DUAL (hot standby) PROFIBUS master
systems running in the field (e.g. oil platforms).

However, the existence of ONE second master doesn’t slow down
the bus … just to come back to the root of our discussion :slight_smile:

Armin

I don’t know if ethernet packets can carry priority. Even if they
could it
would have to be integrated with OS to make any sense and I don’t
believe
ethernet works that way. Just don’t get started on priority inversion
over
network > :wink:

Ethernet packets don’t carry priority (you aren’t claiming that as a
point in Ethernets favor are you :slight_smile: What 100VG did was have two
priorities (real-time, and not real-time). IMO this is generally
sufficient priority data for the MAC layer (I’m not sure how the QNX
driver worked, but ideally it would be a command line switch to the
driver that says anything above system prio X is real-time). A network
does not have to have any priority information to be deterministic, but
a simple two priority system allows the spec to have low latencies for
RT traffic, and still perform well for bulk transfers (non RT).

As far as priority inversion over the network, that is definately one of
my buttons :slight_smile:

I’d expect it it be FIFO with limited max. size. Since memory is
limited and
packets can’t live forever anyway you can’t (and there is no point) to
have
arbitrary long queues. Thus, a worst case can be calculated and it
won’t be
‘infinity’.

Firstly, I am not arguing that someone could not make a deterministic
switch, only that no one (that I know of) does; secondly, limiting the
queue size simply means that at some point a packet will be dropped,
what happens then (there are several scenerios but what does the
Ethernet standard have to say about this) ?

Given max. possible queue size you could. If your traffic overloads
that
queue then your system in not deterministic by definition since load
exceeds
capacity.

Certainly, given a specified maximum queue size (do you know the max
queue size for your switch ?), and a specified arbitration mechanism
(to insure that ports can’t be starved), one can calculate a worst case
switch latency, but these specs are exactly what don’t exist (yet). As
I said, AFAIK there is on-going work to try and address these issues.

Rennie

Redundancy is not sufficient for fault tolerance … PROFIBUS I/O
modules have in general a lot of intelligent mechanism (watch dogs,
detailed diagnostic infos a.s.o) to realize fault tolerant behavior
and preventive diagnosis … even if they don’t have redundant bus
connections > :slight_smile:

Redundancy is not sufficient for fault tolerance, but it is required
for fault tolerance. None of the features you described above enable
fault-tolerance (they simply report faults). Of course, you need to
know when there is a fault to be fault tolerant, so the features you
describe are also required in a fault tolerant system, but fault
tolerance may only be realized by detecting failures, and compensating
for them (being able to compensate implies some degree of redundancy).

Adding hardware redundacy means adding hardware → which means
adding sources of faults, especially when the hardware becomes old
… software doesn’t get elder > :slight_smile:

Fault tolerance is not about reducing the failure rate of components, it
is about coping with component failures, a system that has five times as
many component failures as a non redundant system, but copes with with
each of these failures successfully is an infinitely superior system
(from a fault tolerance POV), to a system that has one fifth the
component failure rate, but responds catastrophically when a failure
does occur.

Therefore replacing hardware by software is the better strategy !

That’s not a strategy, it sounds more like the mating call of a software
salesman :wink:

Correct … we have a lot of DUAL (hot standby) PROFIBUS master
systems running in the field (e.g. oil platforms).

However, the existence of ONE second master doesn’t slow down
the bus … just to come back to the root of our discussion > :slight_smile:

Originally, I had asked what the bus reconfiguration time (there will
have to be a bus reconfiguration when the secondary master comes back
on-line after the failure) is. I don’t doubt that it is deterministic,
I was simply wondering if you know what it is ?

Rennie

Rennie Allen wrote:

[ clip …}

Fault tolerance is not about reducing the failure rate of components, it
is about coping with component failures,

less components mean less failures … so why should we add
components??

Therefore replacing hardware by software is the better strategy !

That’s not a strategy,

A piece of hardware based on VHDL ‘software’ in e.g. FPGAs needs
less components and is therefore more reliable. Virtual devices
realized with a 100MIPS microproceesor are absolutely reliable …
that’s a simple and real strategy.

it sounds more like the mating call of a software salesman > :wink:

Do you have problems to hear the right things … ?

Correct … we have a lot of DUAL (hot standby) PROFIBUS master
systems running in the field (e.g. oil platforms).

However, the existence of ONE second master doesn’t slow down
the bus … just to come back to the root of our discussion > :slight_smile:

Originally, I had asked what the bus reconfiguration time … is

In the case of a hot standby … there is no reconfiguration in a
case of a software failure. In the case of a hardware failure, the
reconfiguration of a dual master system takes several microseconds.

I don’t doubt that it is deterministic,

The reconfiguration process is deterministic.

I was simply wondering if you know what it is ?

Don’t worry … you are asking yourself the wrong questions :slight_smile:

Armin

Armin Steinhoff wrote:

Rennie Allen wrote:

[ clip …}

Fault tolerance is not about reducing the failure rate of components, it
is about coping with component failures,

less components mean less failures … so why should we add
components??

“CPU not detected, do you want to turn on software emulation?”

You are being unreasonable Armin. Yes increasing number of hardware
components you increase probability of hardware failure, but that is
only wrong when every component is unique (and therefore critical). If
new components are redundant copies of existing ones you actually
decrease probability of whole system failure because it now takes
failure of not just any component but 2 (or more) instances of the same
component. Statistically odds are against such combo. It does not take
much of math to get this conclusion, just plain logic.

Have you ever played poker? Let’s say your system consists of 13
components (number of unique cards in the game). Every time you draw you
get 5 cards. Let’s imagine that those 5 cards represent 5 failing
components. Your system would fail every time you draw… Now consider
that we actually have 4 cards of each denomination (spades, hearts,
diamonds …) and they represent redundant copies of your 13 unique
components so you have 52 cards altogether. Does your system fail now
every time you draw? Nope, because ‘4 of a kind’ is a VERY rare
combination. And you are really lucky in the game of poker if you get
it… If you don’t believe me, go to a casino and see how much you get
for a ‘high pair’ and how much for ‘4 of a kind’. Keep in mind those
people are really good about probabilities since this is what they make
money with …

In real life every really mission-critical system has redundant
hardware, look at airplanes for example - some of their systems have
4-fold redundancy just like with cards example. Remember the story with
US-Chineese air collision? Why the jet crashed but spy plane landed? I
bet because jet had only one engine and spy plane had 4.

Do you have problems to hear the right things … ?

LOL. Self-confidence is certainly not one of your problems Armin :slight_smile:
One might as well say self-righteousness…

I don’t doubt that it is deterministic,

The reconfiguration process is deterministic.

I was simply wondering if you know what it is ?

Don’t worry … you are asking yourself the wrong questions > :slight_smile:

I think he asked for number. Can you tell us or not?

  • igor

“Igor Kovalenko” <Igor.Kovalenko@motorola.com> wrote in message
news:3B16DFB6.2F8CEF30@motorola.com

“CPU not detected, do you want to turn on software emulation?”

OK. Igor. I know you were being sarcastic here but I swear that I’ve seen
more than one BIOS that said, “Keyboard Not Detected – Hit to
Continue”


Bill Caroselli - Sattel Global Networks
1-818-709-6201 ext 122

Igor Kovalenko wrote:

Armin Steinhoff wrote:

Rennie Allen wrote:

[ clip …}

Fault tolerance is not about reducing the failure rate of components,
it is about coping with component failures,

less components mean less failures … so why should we add
components??


“CPU not detected, do you want to turn on software emulation?”

You are being unreasonable Armin.

It depends on your view :slight_smile:

Yes increasing number of hardware
components you increase probability of hardware failure, but that is
only wrong when every component is unique (and therefore critical).

If I understand you right. … you mean a unique component has a
higher failure rate than two of it?

If new components are redundant copies of existing ones you actually
decrease probability of whole system failure because it now takes
failure of not just any component but 2 (or more) instances of the same
component. Statistically odds are against such combo. It does not take
much of math to get this conclusion, just plain logic.

The computer system of the ISS (international Space Station) is four
times redundant … but it has failed completely because of a four
times duplicated hardware/software failures :slight_smile:

[ clip … I don’t play poker :wink: ]

In real life every really mission-critical system has redundant
hardware, look at airplanes for example - some of their systems have
4-fold redundancy just like with cards example. Remember the story with
US-Chineese air collision? Why the jet crashed but spy plane landed? I
bet because jet had only one engine and spy plane had 4.

OK Igor … my statement above was just a rethorical one which
should show the contradiction between simple duplicating hardware
for fault tolerance and the increased probability of failures at
component level.

If you build a fault tolerant system which works in a hot or active
standby operation, it is a good practice to minimize at first the
number of hardware components. It is mostly be done by placing logic
components into programmable hardware (CPLDs, FPGAs …) that means
you are replaceing hardware by software.
You are a winner if the duplicated hardware doesn’t have more
components than the previous original system.


An other instance: build an embedded system with 4 serial ports.

a) use a PIC micro with several serial external components
b) use a Scenix Micro with 100Mips and 4 serial virtual devices
(virtual devices are build by software + CPU power …)

Then build a hot standby system using a) or b) …

Which one would be more reliable??
I believe using point b) would be the real strategy. Software
doesn’t have e.g. thermal problems or problems with radiation …

Do you have problems to hear the right things … ?

LOL. Self-confidence is certainly not one of your problems Armin > :slight_smile:
One might as well say self-righteousness …

</philosophy mode on>

Filtered perception in human communication is quite normal. Everyone
has that problem and it depends on the individual experiences,
prejudices, emotional constitution a.s.o. … it’s sometimes hard to
remove such filters.

But this is not a problem as long as we are able to discuss …

</philosophy mode on>

[ clip …]

I think he asked for number. Can you tell us or not?

I don’t have precise numbers at hand, but it’s in the range of us.

Armin

“Armin Steinhoff” <A-Steinhoff@web_.de> wrote in message
news:3B176834.8EA234E6@web_.de…

Yes increasing number of hardware
components you increase probability of hardware failure, but that is
only wrong when every component is unique (and therefore critical).

If I understand you right. … you mean a unique component has a
higher failure rate than two of it?

No, they have the same failure rate. However probability of
2_components_of_the_same_kind failing at the same time will be lower than
probability of any 2 arbitrary components failing at the same time.

OK Igor … my statement above was just a rethorical one which
should show the contradiction between simple duplicating hardware
for fault tolerance and the increased probability of failures at
component level.

If you build a fault tolerant system which works in a hot or active
standby operation, it is a good practice to minimize at first the
number of hardware components. It is mostly be done by placing logic
components into programmable hardware (CPLDs, FPGAs …) that means
you are replaceing hardware by software.
You are a winner if the duplicated hardware doesn’t have more
components than the previous original system.

Not that simple. Without redundancy probability of total system failure is
equal to probability of any single component failure. As you start to add
redundant components, those 2 probabilities become separate functions. Since
logic does not help obviously, let’s try some math. I’m not a big
mathematician, so if I make a mistake someone will correct me :slight_smile:

Let’s call level of redundancy K
Then number of unique types of components is M
Total number of components is N=K*M
Number of all possible unique K-size subsets of set N will be C(N,K) =
N!/((N-K)!*K!)

Probability of single component failure will increase linearly: N*x
Probability of total system failure will be reverse-proportional to C(N,K):
((N-K)!*K!/N!)x
Now since we can’t increase K without increasing N, we should rewrite it as:
((K
M-K)!K!/(KM)!)*x

To spare you calculations, here is an example. C(12,2) = 66, C(18,3) = 816,
C(24,4) = 10626. So by adding 2-fold redundancy into 6-unique component
system you increase probability of component failure 2 times, but decrease
probability of system failure 66 times.

Since C(N,K) is non-linear function there will be ‘saturation’ effect
somewhere (i.e., at some point adding more redundancy will have almost no
effect on system failure probability while component failure probability
will still grow linearly). Which means redundancy level should be optimized.
Fresh college graduates might jump in here, find differential functions and
solve the system to find out what is theoretical optimum redundancy level…
but I wanna bet cost considerations will limit redundancy before saturation
effect will :wink:

  • Igor

Previously, Armin Steinhoff wrote in qdn.public.qnxrtp.advocacy:

If I understand you right. … you mean a unique component has a
higher failure rate than two of it?

Am i misunderstanding something really obvious here? It sounds like
you are arguing that redundancy is of no value? In the context of
what Igor is saying, the answer to your question is yes. The issue
is not component failure, but system failure.

A) If a single critical component fails, the system fails.

B) If one of two redundant components fail, the system does not fail

C) If both of two redundant components faile, the system fails.

The probability of the previous items is in this order.


C < A < B

So yes it is more probably that one of two redundant compents will fail
than one, but it is less likely that both will fail than a single
component.


There are other annoying and technical issues that one must deal with
to improve reliability, for example, what if we are talking about
disk drives. You buy a pair of redundant disk drives and unbeknownst
to you they both will die after the same amount of run time. Now
if the primary dies the probability that the secondary will die before
you have a chance to replace it is very high.



Mitchell Schoenbrun --------- maschoen@pobox.com

An other instance: build an embedded system with 4 serial ports.

a) use a PIC micro with several serial external components
b) use a Scenix Micro with 100Mips and 4 serial virtual devices
(virtual devices are build by software + CPU power …)

Then build a hot standby system using a) or b) …

Which one would be more reliable??
I believe using point b) would be the real strategy. Software
doesn’t have e.g. thermal problems or problems with radiation …

Armin, I have got to get you talking with the FDA (the scene from
Ghostbusters springs to mind, where Bill Murray says “We have GOT to get
these two together” - and Harold Ramis replies “That would be
extraordinarily dangerous” :wink:, they are of the opinion that if the
application is safety critical the device must have no software at all.
I suspect that they (the FDA) are right in a fanatical sense, in that
software tends to be more complex and less rigorously engineered than
hardware; however, I believe that their position is as unrealistic from
the standpoint of economics, as is your position (that everything should
be software).

I don’t have precise numbers at hand, but it’s in the range of us.

Thanks, I actually need to know this (this thread hasn’t been a total
waste of time :slight_smile: as long as it is less than 10ms I’m happy.

Rennie

No, they have the same failure rate. However probability of
2_components_of_the_same_kind failing at the same time will be lower
than
probability of any 2 arbitrary components failing at the same time.

I don’t want to get involved in the rest of the discussion but
the above statement does not sound right, using common sense
logic.

I would say that probability of 2_components_of_the_same_kind
failing at the same time is higher than 2 arbitrary components.
Theorectically, 2 identical components will wear out at the same
time as they have the same construction and materials, etc.

Look at candles. :slight_smile:
When lighting two candles of the same type, they burn for 2
hours, and then “dies” within 5 seconds from each other.
That’s not the case for two different candles.


\

Mats Byggmastar
http://www.multi.fi/~mbc

I think what Igor means is that when comparing two different components,
the probability of system-wide failure is higher. Reason being that
those two components are each single points of failure. OTOH, with
redundant components you get a warning that something has failed, even
though it isn’t system-wide (yet). So the probability of system-wide
failure is lower.

In your candle analogy, you would notice that one candle has expired,
but you still have five seconds to light another one quickly before
total darkness :slight_smile:

regards,
rick

Mats Byggmastar wrote:

No, they have the same failure rate. However probability of
2_components_of_the_same_kind failing at the same time will be lower
than
probability of any 2 arbitrary components failing at the same time.

I don’t want to get involved in the rest of the discussion but
the above statement does not sound right, using common sense
logic.

I would say that probability of 2_components_of_the_same_kind
failing at the same time is higher than 2 arbitrary components.
Theorectically, 2 identical components will wear out at the same
time as they have the same construction and materials, etc.

Look at candles. > :slight_smile:
When lighting two candles of the same type, they burn for 2
hours, and then “dies” within 5 seconds from each other.
That’s not the case for two different candles.


Mats Byggmastar
http://www.multi.fi/~mbc

I think what Igor means is that when comparing two different components,
the probability of system-wide failure is higher. Reason being that
those two components are each single points of failure. OTOH, with
redundant components you get a warning that something has failed, even
though it isn’t system-wide (yet). So the probability of system-wide
failure is lower.

OK, yes. I took the statement out of content.


In your candle analogy, you would notice that one candle has expired,
but you still have five seconds to light another one quickly before
total darkness > :slight_smile:

Hmmm… But two candles of different types would give
me longer time to light a new one as they probably wont
expire withing 5 seconds of each other.

So what I had in mind was that when adding a redundant component,
don’t choose exactly the same type as both might wear out at the
same time or break at the same time due to some external event.
But this theory is perhaps of minor importanse…


\

Mats Byggmastar
http://www.multi.fi/~mbc

Somewhat off topic but I don’t think that the candle analogy is really that
good. After all, even the simplist component on a computer is orders of
magnitude more complex that a candle. There is no way to reliably determine
when a single component is going to fail, never mind two or more. I wouldn’t
think that the probability of two identical components failing at the same
time would statistically be any different than for different components.

Kris

Mats Byggmastar <mats.byggmastar@multi.nojunk.fi> wrote:

I think what Igor means is that when comparing two different components,
the probability of system-wide failure is higher. Reason being that
those two components are each single points of failure. OTOH, with
redundant components you get a warning that something has failed, even
though it isn’t system-wide (yet). So the probability of system-wide
failure is lower.

OK, yes. I took the statement out of content.



In your candle analogy, you would notice that one candle has expired,
but you still have five seconds to light another one quickly before
total darkness > :slight_smile:

Hmmm… But two candles of different types would give
me longer time to light a new one as they probably wont
expire withing 5 seconds of each other.

So what I had in mind was that when adding a redundant component,
don’t choose exactly the same type as both might wear out at the
same time or break at the same time due to some external event.
But this theory is perhaps of minor importanse…


Mats Byggmastar
http://www.multi.fi/~mbc



Kris Warkentin
kewarken@qnx.com
(613)591-0836 x9368
“You’re bound to be unhappy if you optimize everything” - Donald Knuth

In your candle analogy, you would notice that one candle has expired,
but you still have five seconds to light another one quickly before
total darkness > :slight_smile:

Hmmm… But two candles of different types would give
me longer time to light a new one as they probably wont
expire withing 5 seconds of each other.

So what I had in mind was that when adding a redundant component,
don’t choose exactly the same type as both might wear out at the
same time or break at the same time due to some external event.
But this theory is perhaps of minor importanse…

Matt,
There is big flaw in this analogy. Those 2 different candles would
correspond to 2 unique types of components. Therefore if even one of them
goes out your system is already failed no matter if another one is still
alive :wink:

Not to mention that MTBF concept does not even apply to candles since they
have predictable life span. The idea that 2 redundant computer components
would fail roughly at the same time is plain wrong. First, they do not wear
out at the same rate because one of them is usually in ‘standby’ mode and
goes into active duty only when another one failed. Second, even if they
were wearing out at the same rate their life span would be still rather
different because no 2 computer components are really ‘same’ (each consists
of numerous subcomponents each having its own MTBF). While MTBF would be
same, it is still mean time between failures, so there would be deviations
and rather wide devations.

  • igor