verifying that cp worked

We have sometimes network issues here as test hardware is slapped up and
nodes reconfigured etc,etc and it seems to bite us every now and then with
corruption in copying large ( 10-20 meg ) files across the network.

Is there an automatic way to copy and verify the copy? ie we don’t want to
use cp and then diff to make sure the cp worked, is there a verifying cp out
there? or an option to cp that will report an error if the destination
doesn’t match the source?

Sheldon Parkes
sheldon@onlinedata.com

Do a cksum on the file, and then on the destination. Different cksum means
a different binary :slight_smile:

-Adam

Sheldon Parkes <sheldon@onlinedata.com> wrote in message
news:b899ej$hcv$1@inn.qnx.com

We have sometimes network issues here as test hardware is slapped up and
nodes reconfigured etc,etc and it seems to bite us every now and then with
corruption in copying large ( 10-20 meg ) files across the network.

Is there an automatic way to copy and verify the copy? ie we don’t want to
use cp and then diff to make sure the cp worked, is there a verifying cp
out
there? or an option to cp that will report an error if the destination
doesn’t match the source?

Sheldon Parkes
sheldon@onlinedata.com

I remember this from long, lomg ago. Wasn’t there a bug fix for this?
I don’t remember what was fixed. But cp should recognize a network
error and abort with an error message.

What version OS, Net, Net.driver and cp are you running?
You may want to upgrade.


Adam Mallory <amallory@qnx.com> wrote:
AM > Do a cksum on the file, and then on the destination. Different cksum means
AM > a different binary :slight_smile:

AM > -Adam

AM > Sheldon Parkes <sheldon@onlinedata.com> wrote in message
AM > news:b899ej$hcv$1@inn.qnx.com

We have sometimes network issues here as test hardware is slapped up and
nodes reconfigured etc,etc and it seems to bite us every now and then with
corruption in copying large ( 10-20 meg ) files across the network.

Is there an automatic way to copy and verify the copy? ie we don’t want to
use cp and then diff to make sure the cp worked, is there a verifying cp
AM > out
there? or an option to cp that will report an error if the destination
doesn’t match the source?

Sheldon Parkes
sheldon@onlinedata.com

“Bill Caroselli” <qtps@earthlink.net> wrote in message
news:b89do7$lhi$2@inn.qnx.com

I remember this from long, lomg ago. Wasn’t there a bug fix for this?
I don’t remember what was fixed. But cp should recognize a network
error and abort with an error message.

This problem can be caused by a hardware failure when data is corrupted when

moved from the network card to system memory. Unlike TCP/IP FLEET doesn’t
do a checksum on the data once it’s in memory, thus corrupted data can leak
to files/application. I guess it would be possible to modify cp to perform
a checksum, that would be only plugging one very little hole. I remember
QSS saying it would be impossible to modify FLEET to include some sort of
checksum at a higher level without breaking compatibility.

What version OS, Net, Net.driver and cp are you running?
You may want to upgrade.


Adam Mallory <> amallory@qnx.com> > wrote:
AM > Do a cksum on the file, and then on the destination. Different cksum
means
AM > a different binary > :slight_smile:

AM > -Adam

AM > Sheldon Parkes <> sheldon@onlinedata.com> > wrote in message
AM > news:b899ej$hcv$> 1@inn.qnx.com> …
We have sometimes network issues here as test hardware is slapped up
and
nodes reconfigured etc,etc and it seems to bite us every now and then
with
corruption in copying large ( 10-20 meg ) files across the network.

Is there an automatic way to copy and verify the copy? ie we don’t want
to
use cp and then diff to make sure the cp worked, is there a verifying
cp
AM > out
there? or an option to cp that will report an error if the destination
doesn’t match the source?

Sheldon Parkes
sheldon@onlinedata.com

Mario Charest postmaster@127.0.0.1 wrote in message
news:b89etr$n5r$1@inn.qnx.com

This problem can be caused by a hardware failure when data is corrupted
when
moved from the network card to system memory. Unlike TCP/IP FLEET doesn’t
do a checksum on the data once it’s in memory, thus corrupted data can
leak
to files/application.

Extactly, you can’t count on the upper layers detecting what the lower
layers are corrupting - hence doing the cksum after the copy seems like the
easiest way (without requiring a modification to cp).

Below is a dirt dumb shell script to do the verification (it’s quite
limited, as you can only specify one src file).

-Adam

—vcp.sh—
#!/bin/sh
CKSUM=which cksum
CP=which cp
CUT=which cut
ERROR_RET=1
EXIT_OK=0

SRC=$1
DST=$2

if [ “$SRC” == “” -o “$DST” == “” ]; then
echo “Need a src and/or dst file”
exit $ERROR_RET
fi

#echo “Copying $SRC → $DST”
$CP $SRC $DST

#echo “Verifying $DST …”
CKSUM1=$CKSUM $SRC | $CUT -f1 -d" "
CKSUM2=$CKSUM $DST | $CUT -f1 -d" "

#echo “C1 : $CKSUM1 C2 : $CKSUM2”
if [ “$CKSUM1” != “$CKSUM2” ]; then
echo “Bad Copy!”
exit $ERROR_RET
fi

echo “Good Copy!”
exit $EXIT_OK

I hate doing things through the network that don’t have to be through
the network. Especially when the issue is detecting obscure network
transmission errors. Since QNX4 allow “//x cksum” (or if you prefer
on -nx cksum) I would add the following lines and change the two cksum
lines as follows:

change these two lines

SRC=fullpath -t $1
DST=fullpath -t $2

add these two lines

SRC_NODE=${SRC%%/[a-z,A-Z]}
DST_NODE=${DST%%/[a-z,A-Z]
}

change these two lines

CKSUM1=$SRC_NODE $CKSUM $SRC | $CUT -f1 -d" "
CKSUM2=$DST_NODE $CKSUM $DST | $CUT -f1 -d" "


Adam Mallory <amallory@qnx.com> wrote:
AM > Mario Charest postmaster@127.0.0.1 wrote in message
AM > news:b89etr$n5r$1@inn.qnx.com

This problem can be caused by a hardware failure when data is corrupted
AM > when
moved from the network card to system memory. Unlike TCP/IP FLEET doesn’t
do a checksum on the data once it’s in memory, thus corrupted data can
AM > leak
to files/application.

AM >

AM > Extactly, you can’t count on the upper layers detecting what the lower
AM > layers are corrupting - hence doing the cksum after the copy seems like the
AM > easiest way (without requiring a modification to cp).

AM > Below is a dirt dumb shell script to do the verification (it’s quite
AM > limited, as you can only specify one src file).

AM > -Adam

AM > —vcp.sh—
AM > #!/bin/sh
AM > CKSUM=which cksum
AM > CP=which cp
AM > CUT=which cut
AM > ERROR_RET=1
AM > EXIT_OK=0

AM > SRC=$1
AM > DST=$2

AM > if [ “$SRC” == “” -o “$DST” == “” ]; then
AM > echo “Need a src and/or dst file”
AM > exit $ERROR_RET
AM > fi

AM > #echo “Copying $SRC → $DST”
AM > $CP $SRC $DST

AM > #echo “Verifying $DST …”
AM > CKSUM1=$CKSUM $SRC | $CUT -f1 -d" "
AM > CKSUM2=$CKSUM $DST | $CUT -f1 -d" "

AM > #echo “C1 : $CKSUM1 C2 : $CKSUM2”
AM > if [ “$CKSUM1” != “$CKSUM2” ]; then
AM > echo “Bad Copy!”
AM > exit $ERROR_RET
AM > fi

AM > echo “Good Copy!”
AM > exit $EXIT_OK

Thanks all :slight_smile:

Now what would be a good systematic way of finding out where the problem in
the network is exactly since it works 99.9 percent of the time without a
hitch?

Sheldon Parkes
sheldon@onlinedata.com



“Adam Mallory” <amallory@qnx.com> wrote in message
news:b89m22$hav$1@nntp.qnx.com

Mario Charest postmaster@127.0.0.1 wrote in message
news:b89etr$n5r$> 1@inn.qnx.com> …

This problem can be caused by a hardware failure when data is corrupted
when
moved from the network card to system memory. Unlike TCP/IP FLEET
doesn’t
do a checksum on the data once it’s in memory, thus corrupted data can
leak
to files/application.

snip

Extactly, you can’t count on the upper layers detecting what the lower
layers are corrupting - hence doing the cksum after the copy seems like
the
easiest way (without requiring a modification to cp).

Below is a dirt dumb shell script to do the verification (it’s quite
limited, as you can only specify one src file).

-Adam

—vcp.sh—
#!/bin/sh
CKSUM=which cksum
CP=which cp
CUT=which cut
ERROR_RET=1
EXIT_OK=0

SRC=$1
DST=$2

if [ “$SRC” == “” -o “$DST” == “” ]; then
echo “Need a src and/or dst file”
exit $ERROR_RET
fi

#echo “Copying $SRC → $DST”
$CP $SRC $DST

#echo “Verifying $DST …”
CKSUM1=$CKSUM $SRC | $CUT -f1 -d" "
CKSUM2=$CKSUM $DST | $CUT -f1 -d" "

#echo “C1 : $CKSUM1 C2 : $CKSUM2”
if [ “$CKSUM1” != “$CKSUM2” ]; then
echo “Bad Copy!”
exit $ERROR_RET
fi

echo “Good Copy!”
exit $EXIT_OK

Sheldon Parkes wrote:

Thanks all > :slight_smile:

Now what would be a good systematic way of finding out where the problem in
the network is exactly since it works 99.9 percent of the time without a
hitch?

Way back we wrote a simple pair of programs to send random
data across the network and back and verify that the data
survived the round trip. I found this to be a very effective
way of identifying bad hardware which would introduce this
type of data error.

-Norton Allen