QNX 6.3 copy protection won't allow safe mode - cannot run c

First, my apologies for not getting to this sooner. I only
became aware today…

At first glance I would agree with Colin that the license
issue is not related (it would be related to the mounting
diskboot does when running in safe mode). It would appear
that diskboot is broken w.r.t. to the debug shell and/or
safe mode.

I have run in safe mode to get to a standard login many times,
typically when I mess up my Photon settings. I have yet to
see this problem. Though it looks easy to reproduce.

I’m looking into it and will post my findings shortly.

I will also investigate how to improve the usability of the safe
mode to make it really safe. I’m thinking along the lines of
standard *nix single user mode. One console, no extra drivers
beyond what is in the boot image.

BTW, there is a sample build file in the helpveiwer documentation
which performs the same basic functionality as diskboot. I believe
it is in the Neutrino User’s Guide Appendix (Examples). This would
not help in this case since the drivers and other services are started
via etc/system/sysinit, not diskboot.

Thanks,
Keith Russell

John Nagle <nagle@downside.com> wrote:

We haven’t heard anything.

This is apparently a problem with overly aggressive
copy protection in “diskboot”, which runs so early
that a workaround is tough. We may have to wait
for the next release of QNX. Our QNX 6.3 test system
sits idle, awaiting a fix.

We continue to use QNX 6.21 for real work. It’s working
well for us. We demonstrated our robot vehicle to about
30 people today, in our first public demo. Some well known
Silicon Vally names were present.

We may just stay with QNX 6.21, rather than fighting
with the copy protection in 6.3. It eats up too much
of our time.

John Nagle
Team Overbot

Evan Hillas wrote:

John Nagle wrote:

F1 F3 yields a mode in which the file systems are mounted.
As the help page says:

The chkfsys utility should be used only when the filesystem is stable.
There should be NO files open for writing when chkfsys is running.

You CAN run “chkfsys” from the F1 F3 state, but it does NOT lead
to a clean file system. I’m unable to reach a state where
“chkfsys” reports no errors. If I reboot and rerun it again,
I see problems caused by daemons which have been started up.
The F1 F3 “safe mode” mounts all file systems, runs
the “rc.local” file, and even starts up “lpd”.
That’s not “safe” enough for “chkfsys”.

So it seems that if a QNX 6.3 system crashes, no simple
recovery is possible.



Resolved?

Thanks for looking into this.

The problem is getting to the safe mode in which no file
systems are mounted, so you can run file recovery. You
can get to a text console mode, but the file systems are
mounted and everything started by the rc files is running,
so you can’t safely run file recovery. The help documentation
says that “chkfsys” should never be run on a mounted
file system.

“chkfsys” doesn’t check for mounted file
systems, which is a separate but related problem.
It’s possible for a user to run “chkfsys” on a live
file system and make things worse, while getting messages
that the file system was successfully repaired. The safe
mode problem in QNX 6.3, coupled with the multiplicity of
safe modes and a confusing user interface, encourages
users to run “chkfsys” in the wrong safe mode, which may
lead to invisible file corruption.

I agree that the usability of “safe mode” needs some
attention. File system recovery for QNX needs to work at
least as well as it does on Linux and Windows. I’d suggest
that by default it run on every boot on every file system
not marked as “clean”, which is the industry standard.

Thanks.

John Nagle
Team Overbot

“QNX - building a more reliable world”

Keith Russell wrote:

First, my apologies for not getting to this sooner. I only
became aware today…

At first glance I would agree with Colin that the license
issue is not related (it would be related to the mounting
diskboot does when running in safe mode). It would appear
that diskboot is broken w.r.t. to the debug shell and/or
safe mode.

I have run in safe mode to get to a standard login many times,
typically when I mess up my Photon settings. I have yet to
see this problem. Though it looks easy to reproduce.

I’m looking into it and will post my findings shortly.

I will also investigate how to improve the usability of the safe
mode to make it really safe. I’m thinking along the lines of
standard *nix single user mode. One console, no extra drivers
beyond what is in the boot image.

BTW, there is a sample build file in the helpveiwer documentation
which performs the same basic functionality as diskboot. I believe
it is in the Neutrino User’s Guide Appendix (Examples). This would
not help in this case since the drivers and other services are started
via etc/system/sysinit, not diskboot.

Thanks,
Keith Russell


John Nagle wrote:

We haven’t heard anything.

This is apparently a problem with overly aggressive
copy protection in “diskboot”, which runs so early
that a workaround is tough. We may have to wait
for the next release of QNX. Our QNX 6.3 test system
sits idle, awaiting a fix.

We continue to use QNX 6.21 for real work. It’s working
well for us. We demonstrated our robot vehicle to about
30 people today, in our first public demo. Some well known
Silicon Vally names were present.

We may just stay with QNX 6.21, rather than fighting
with the copy protection in 6.3. It eats up too much
of our time.

John Nagle
Team Overbot

Evan Hillas wrote:

John Nagle wrote:

F1 F3 yields a mode in which the file systems are mounted.
As the help page says:

The chkfsys utility should be used only when the filesystem is
stable.
There should be NO files open for writing when chkfsys is running.

You CAN run “chkfsys” from the F1 F3 state, but it does NOT lead
to a clean file system. I’m unable to reach a state where
“chkfsys” reports no errors. If I reboot and rerun it again,
I see problems caused by daemons which have been started up.
The F1 F3 “safe mode” mounts all file systems, runs
the “rc.local” file, and even starts up “lpd”.
That’s not “safe” enough for “chkfsys”.

So it seems that if a QNX 6.3 system crashes, no simple
recovery is possible.



Resolved?

John Nagle wrote:

users to run “chkfsys” in the wrong safe mode, which may
lead to invisible file corruption.

Usually the opposite effect happens, chkfsys has a habit of totaling it.

I took a quick look and what you are really looking for is the F5
option (the debug shell). That simply starts fesh just after mounting
the filesystems. So after the shell starts, run ‘/sbin/chkfsys /’.
The filesystem will be totally idle since nothing is started.

In the meantime I’m still looking at ways to improve the usability
of the safe mode. Just as a quick explanation: the main purpose of
safe mode was to allow you to revert to a known configuration of
packages when we used the package file system. In 6.3.0 we no longer
use that filesystem for self-hosted so the safe modes are really
a legacy thing. The “safe configuration” is now the same as your
current configuration.

The debug shell is the closest thing to standard *nix single user
mode with the caveat that no paths are set. Until I get things
sorted out with safe mode this is your best bet.

Thanks for the heads up,
Keith Russell


Keith Russell <keith@qnx.com> wrote:

First, my apologies for not getting to this sooner. I only
became aware today…

At first glance I would agree with Colin that the license
issue is not related (it would be related to the mounting
diskboot does when running in safe mode). It would appear
that diskboot is broken w.r.t. to the debug shell and/or
safe mode.

I have run in safe mode to get to a standard login many times,
typically when I mess up my Photon settings. I have yet to
see this problem. Though it looks easy to reproduce.

I’m looking into it and will post my findings shortly.

I will also investigate how to improve the usability of the safe
mode to make it really safe. I’m thinking along the lines of
standard *nix single user mode. One console, no extra drivers
beyond what is in the boot image.

BTW, there is a sample build file in the helpveiwer documentation
which performs the same basic functionality as diskboot. I believe
it is in the Neutrino User’s Guide Appendix (Examples). This would
not help in this case since the drivers and other services are started
via etc/system/sysinit, not diskboot.

Thanks,
Keith Russell

John Nagle <> nagle@downside.com> > wrote:
We haven’t heard anything.

This is apparently a problem with overly aggressive
copy protection in “diskboot”, which runs so early
that a workaround is tough. We may have to wait
for the next release of QNX. Our QNX 6.3 test system
sits idle, awaiting a fix.

We continue to use QNX 6.21 for real work. It’s working
well for us. We demonstrated our robot vehicle to about
30 people today, in our first public demo. Some well known
Silicon Vally names were present.

We may just stay with QNX 6.21, rather than fighting
with the copy protection in 6.3. It eats up too much
of our time.

John Nagle
Team Overbot

Evan Hillas wrote:

John Nagle wrote:

F1 F3 yields a mode in which the file systems are mounted.
As the help page says:

The chkfsys utility should be used only when the filesystem is stable.
There should be NO files open for writing when chkfsys is running.

You CAN run “chkfsys” from the F1 F3 state, but it does NOT lead
to a clean file system. I’m unable to reach a state where
“chkfsys” reports no errors. If I reboot and rerun it again,
I see problems caused by daemons which have been started up.
The F1 F3 “safe mode” mounts all file systems, runs
the “rc.local” file, and even starts up “lpd”.
That’s not “safe” enough for “chkfsys”.

So it seems that if a QNX 6.3 system crashes, no simple
recovery is possible.



Resolved?

Keith Russell wrote:

I took a quick look and what you are really looking for is the F5
option (the debug shell). That simply starts fesh just after mounting
the filesystems. So after the shell starts, run ‘/sbin/chkfsys /’.
The filesystem will be totally idle since nothing is started.

OK. Thanks for the workaround. That worked, and we now
have a valid QNX 6.3 file system.

It looks like proper recovery procedure after a QNX 6.3
crash is as follows:

  1. Reset system and let boot process start.
  2. Wait for “press space bar for boot modes” and do so.
  3. Press F5 for a debug shell, then ENTER.
  4. You get a shell prompt as root. File systems are mounted
    but very little is running.
  5. While in this mode, do NOT do anything that could possibly
    cause a write to any file.
  6. Run “/sbin/chkfsys /dev/hd0t79”. Answer the usual prompts
    (typically with ENTER). File system recovery may report
    an incorrect bitmap and should fix it. Note that after
    running “chkfsys” on a mounted file system, the in-memory
    info about the mounted file system may be out of sync and
    another reboot is needed.
  7. Run “/bin/shutdown”. System should reboot.
  8. Repeat steps 2-5 to get a debug shell again.
  9. Run “/sbin/chkfsys -f /dev/hd0t79”. It should run without any
    errors and report that the file system bitmaps match.
    If if fails to do so, the file system is seriously corrupted.
  10. Type “exit”, to allow the system to come up normally.

Is this correct?

I realize there’s probably a more l33t way to do this, unmounting
and remounting file systems manually at the debug shell prompt.
But this straightforward approach seems less error-prone,
even though it does take two reboots.

A more user-friendly startup mechanism would be desirable.

John Nagle
Team Overbot

Opps, looks like I botched. I don’t know what I was doing wrong. I
must have run it 4 or 5 times back then. You are correct though, F5
debug shell does indeed work.

Evan Hillas wrote:

Opps, looks like I botched. I don’t know what I was doing wrong. I
must have run it 4 or 5 times back then. You are correct though, F5
debug shell does indeed work.

You have to be very careful here. If you use safe mode F5 and
run 'chkfsys", you recover the file system. If you use safe
mode F4 and run “chkfsys”, you damage the file system. If you
use safe mode F5 and do anything that changes the file system
before or after running “chkfsys”, you damage the file system.
In all cases, you get the same messages, indicating
successful repair.

So be very careful recovering QNX 6.3 systems using diskboot
until there’s a fix.

John Nagle
Team Overbot

“QNX - building a more reliable world”

That would explain the many failures I’ve had, as I had usually exited the first prompt of F5 safe mode before running chkfsys.

When you say don’t modify anything afterward, does that mean hitting the reset button is the best shutdown approach?

John Nagle <nagle@overbot.com> wrote:

Keith Russell wrote:
I took a quick look and what you are really looking for is the F5
option (the debug shell). That simply starts fesh just after mounting
the filesystems. So after the shell starts, run ‘/sbin/chkfsys /’.
The filesystem will be totally idle since nothing is started.

OK. Thanks for the workaround. That worked, and we now
have a valid QNX 6.3 file system.

It looks like proper recovery procedure after a QNX 6.3
crash is as follows:

  1. Reset system and let boot process start.
  2. Wait for “press space bar for boot modes” and do so.
  3. Press F5 for a debug shell, then ENTER.
  4. You get a shell prompt as root. File systems are mounted
    but very little is running.
  5. While in this mode, do NOT do anything that could possibly
    cause a write to any file.
  6. Run “/sbin/chkfsys /dev/hd0t79”. Answer the usual prompts
    (typically with ENTER). File system recovery may report
    an incorrect bitmap and should fix it. Note that after
    running “chkfsys” on a mounted file system, the in-memory
    info about the mounted file system may be out of sync and
    another reboot is needed.
  7. Run “/bin/shutdown”. System should reboot.
  8. Repeat steps 2-5 to get a debug shell again.
  9. Run “/sbin/chkfsys -f /dev/hd0t79”. It should run without any
    errors and report that the file system bitmaps match.
    If if fails to do so, the file system is seriously corrupted.
  10. Type “exit”, to allow the system to come up normally.

Is this correct?

That’s about it.

I realize there’s probably a more l33t way to do this, unmounting
and remounting file systems manually at the debug shell prompt.
But this straightforward approach seems less error-prone,
even though it does take two reboots.

For now, I’d have to agree with you that this is the safest way.

A more user-friendly startup mechanism would be desirable.

That’s what I’m looking at.

John Nagle
Team Overbot