tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Small tip proposal for headless systems boot resiliency
Hello,
The headless systems, or boxes that may be far away are a pain to get back
online if stuck during the boot sequence... (embedded/headless do not have
IP-KVM or remote control).
* The point is : with a headless system you want it back online, really. Even
if it needs maintenance, you need a ssh or whatever to get your hands dirty. So
don't stop the boot sequence. If it's damaged beyond basic/rescue usability, it
makes no difference anyway.
Example: The stop_boot() function of /etc/rc.subr got it to stop a few time
here... and these scripts may call it:
grep stop_boot /etc/rc.d/*
/etc/rc.d/ipfilter: stop_boot
/etc/rc.d/ipsec: stop_boot
/etc/rc.d/pf: stop_boot
/etc/rc.d/pf_boot: stop_boot
(see, these are network related scripts, ones that you may play with remotely
and get stuck for many reasons, let's tackle this one in particular).
A failed fsck may call it too, but this can be moderated with fsck_flags="-p -y
-P" in your rc.conf.
* Proposal:
Add a "headless" flag to rc.conf, and alter stop_boot function this way:
/etc/rc.conf:
headless=yes
diff -u /mnt/sd0d/backup/rc.subr /etc/rc.subr
--- /mnt/sd0d/backup/rc.subr 2013-12-21 23:29:01.000000000 +0100
+++ /etc/rc.subr 2013-12-31 00:10:18.000000000 +0100
@@ -100,14 +100,24 @@
# If booting directly to multiuser, send SIGTERM to
# the parent (/etc/rc) to abort the boot.
# Otherwise just exit.
+# OR
+# If this is a headless system, just send a warning, pause to give a hint,
+# and try resuming the boot sequence.
#
stop_boot()
{
- if [ "$autoboot" = yes ]; then
- echo "ERROR: ABORTING BOOT (sending SIGTERM to parent)!"
- kill -TERM ${RC_PID}
+ if [ "$headless" = yes ] || [ "$headless" = YES ]; then
+ echo "WARNING: BOOT *SHOULD* HAVE BEEN STOPPED"
+ echo "Resuming boot sequence in 15s, the System may be
unusable."
+ sleep 30
+ touch /CHECK_BOOT_LOG.warn
+ else
+ if [ "$autoboot" = yes ]; then
+ echo "ERROR: ABORTING BOOT (sending SIGTERM to parent)!"
+ kill -TERM ${RC_PID}
+ fi
+ exit 1
fi
- exit 1
}
Happy end of year with your unstoppable NetBSD systems ;)
Kind regards,
Mat.
Home |
Main Index |
Thread Index |
Old Index