Over the past 3 days we have had 4 “lock-ups” of the “Standby” node.
For the first two days we had Node 1 “Active” and Node 2 as “Standby” most of the time - since the communications lines were only connected to Node 1.
During this time, on 3 occasions RealFlex reported that the “Standby” had failed. Node 2 could not be accessed via the network (e.g. ping) and the console had “locked-up” (i.e. either the screen was black or frozen with the mouse and keyboard not having any effect). The server had to be powered off and on again.
We thought this may have been a hardware issue with Node 2. In order to confirm this we swapped the communications lines over to Node 2 and ran Node 1 “Standby” and Node 2 as “Active” today. After about 5 hours Node 1 locked-up in exactly the same manner that Node 2 had been.
It looks like there is an issue with the Standby node locking-up after running for a number of hours.
We have not had this issue on our in house test system running the same software and database. However, our system does not have the same system loa (i.e. we don’t have actual RTUs connected etc.)
DELL R200 server QNX 6.3 and RealFlex 6.4.76
Any suggestion …how to debug this issue ?