convert rst to asciidoc
for i in *.rst ; do pandoc -f rst -t asciidoc -o `basename $i .rst`.adoc $i ;done
This commit is contained in:
352
doc/admin.adoc
Normal file
352
doc/admin.adoc
Normal file
@@ -0,0 +1,352 @@
|
||||
== System Administration
|
||||
|
||||
=== Services on a running system
|
||||
|
||||
Liminix services are built on s6-rc, which is itself layered on s6.
|
||||
Services are defined at build time in your configuration (see
|
||||
`+configuration-services+` for information) and can't be added
|
||||
to/changed at runtime, but to monitor events or diagnose problems you
|
||||
may need to inspect them on the running system. Here are some of the
|
||||
most commonly used s6,-rc commands:
|
||||
|
||||
.Service management quick reference
|
||||
[width="100%",cols="55%,45%",options="header",]
|
||||
|===
|
||||
|What |How
|
||||
|List all running services |`+s6-rc -a list+`
|
||||
|
||||
|List all services that are *not* running |`+s6-rc -da list+`
|
||||
|
||||
|List services that `+wombat+` depends on
|
||||
|`+s6-rc-db dependencies wombat+`
|
||||
|
||||
|... transitively |`+s6-rc-db all-dependencies wombat+`
|
||||
|
||||
|List services that depend on service `+wombat+`
|
||||
|`+s6-rc-db -d dependencies wombat+`
|
||||
|
||||
|... transitively |`+s6-rc-db -d all-dependencies wombat+`
|
||||
|
||||
|Stop service `+wombat+` and everything depending on it
|
||||
|`+s6-rc -d change wombat+`
|
||||
|
||||
|Start service `+wombat+` (but not any services depending on it)
|
||||
|`+s6-rc -u change wombat+`
|
||||
|
||||
|Start service `+wombat+` and all* services depending on it
|
||||
|`+s6-rc-up-tree wombat+`
|
||||
|===
|
||||
|
||||
`+s6-rc-up-tree+` brings up a service and all services that depend on
|
||||
it, except for any services that depend on a "controlled" service that
|
||||
is not currently running. Controlled services are not started at boot
|
||||
time but in response to external events (e.g. plugging in a particular
|
||||
piece of hardware) so you probably don't want to be starting them by
|
||||
hand if the conditions aren't there.
|
||||
|
||||
A service may be *up* or *down* (there are no intermediate states like
|
||||
"started" or "stopping" or "dying" or "cogitating"). Some (but not all)
|
||||
services have "readiness" notifications: the dependents of a service
|
||||
with a readiness notification won't be started until the service signals
|
||||
(by writing to a nominated file descriptor) that it's prepared to start
|
||||
work. Most services defined by Liminix also have a `+timeout-up+`
|
||||
parameter, which means that if a service has readiness notifications and
|
||||
doesn't become ready in the allotted time (defaults 20 seconds) it will
|
||||
be terminated and its state set to *down*.
|
||||
|
||||
If the process providing a service dies, it will be restarted
|
||||
automatically. Liminix does not automatically set it to *down*.
|
||||
|
||||
(If the process providing a service dies without ever notifying
|
||||
readiness, Liminix will restart it as many times as it has to until the
|
||||
timeout period elapses, and then stop it and mark it down.)
|
||||
|
||||
==== Controlled services
|
||||
|
||||
*Controlled* services are those which are started/stopped on demand by a
|
||||
*controller* (another service) instead of being started at boot time.
|
||||
For example:
|
||||
|
||||
* `+svc.uevent-rule.build+` creates a controlled service which is active
|
||||
when a particular hardware device (identified by uevent/sysfs directory)
|
||||
is present.
|
||||
* `+svc.round-robin.build+` creates a service controller that invokes
|
||||
two or more services in turn, running the next one when the process
|
||||
providing the previous one exits. We use this for failover from one
|
||||
network connection to a backup connection, for example.
|
||||
* `+svc.health-check.build+` creates a service controller that runs a
|
||||
controlled service and periodically tests whether it is healthy by
|
||||
running an external health check command or script. If the check command
|
||||
repeatedly fails, the controlled service is restarted.
|
||||
+
|
||||
The Configuration section of the manual describes controlled services in
|
||||
more detail. Some operational considerations
|
||||
* `+round-robin+` detects a service status by looking at its `+outputs+`
|
||||
directory, so it won't work unless the service creates some outputs.
|
||||
This is considered a bug and will be fixed in a future release
|
||||
* `+health-check+` works for longruns but not for oneshots, as it
|
||||
internally relies on `+s6-svc+` to restart the process
|
||||
|
||||
==== Logs
|
||||
|
||||
Logs for all services are collated into `+/run/log/current+`. The log
|
||||
file is rotated when it reaches a threshold size, into another file in
|
||||
the same directory whose name contains a TAI64 timestamp.
|
||||
|
||||
Each log line is prefixed with a TAI64 timestamp and the name of the
|
||||
service, if it is a longrun. If it is a oneshot, a timestamp and the
|
||||
name of some other service. To convert the timestamp into a
|
||||
human-readable format, use `+s6-tai64nlocal+`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# ls -l /run/log/
|
||||
-rw-r--r-- 1 0 lock
|
||||
-rw-r--r-- 1 0 state
|
||||
-rwxr--r-- 1 98059 @4000000000025cb629c311ac.s
|
||||
-rwxr--r-- 1 98061 @40000000000260f7309c7fb4.s
|
||||
-rwxr--r-- 1 98041 @40000000000265233a6cc0b6.s
|
||||
-rwxr--r-- 1 98019 @400000000002695d10c06929.s
|
||||
-rwxr--r-- 1 98064 @4000000000026d84189559e0.s
|
||||
-rwxr--r-- 1 98055 @40000000000271ce1e031d91.s
|
||||
-rwxr--r-- 1 98054 @400000000002760229733626.s
|
||||
-rwxr--r-- 1 98104 @4000000000027a2e3b6f4e12.s
|
||||
-rwxr--r-- 1 98023 @4000000000027e6f0ed24a6c.s
|
||||
-rw-r--r-- 1 42374 current
|
||||
|
||||
# tail -2 /run/log/current
|
||||
@40000000000284f130747343 wan.link.pppoe Connect: ppp0 <--> /dev/pts/0
|
||||
@40000000000284f230acc669 wan.link.pppoe sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x667a9594> <pcomp> <accomp>]
|
||||
# tail -2 /run/log/current | s6-tai64nlocal
|
||||
1970-01-02 21:51:45.828598156 wan.link.pppoe sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x667a9594> <pcomp> <accom
|
||||
p>]
|
||||
1970-01-02 21:51:48.832588765 wan.link.pppoe sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0x667a9594> <pcomp> <accom
|
||||
p>]
|
||||
----
|
||||
|
||||
===== Log persistence
|
||||
|
||||
Logs written to `+/run/log/+` will not survive a reboot or crash, as it
|
||||
is an ephemeral filesystem.
|
||||
|
||||
On supported hardware you can enable logging to
|
||||
https://www.kernel.org/doc/Documentation/ABI/testing/pstore[pstore]
|
||||
which means the most recent log messages will be preserved on reboot.
|
||||
Set the config option `+logging.persistent.enable = true+` to log
|
||||
messages to `+/dev/pmsg0+` as well as to the regular log. This is a
|
||||
circular buffer, so when it fills up newer messages will overwrite the
|
||||
oldest messages.
|
||||
|
||||
Logs found in pstore after a reboot will be moved at startup to
|
||||
`+/run/log/previous-boot+`
|
||||
|
||||
=== Updating an installed system
|
||||
|
||||
If your system has a writable root filesystem (JFFS2, btrfs etc
|
||||
-anything but squashfs), we have mechanisms for in-places updates
|
||||
analogous to `+nixos-rebuild+`, but the operation is a bit different
|
||||
because it expects to run on a build machine and then copy to the host
|
||||
device using `+ssh+`.
|
||||
|
||||
To use this, build the `+outputs.updater+` target and then run the
|
||||
`+update.sh+` script it generates.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
nix-build -I liminix-config=./my-configuration.nix \
|
||||
--arg device "import ./devices/mydevice" \
|
||||
-A outputs.updater
|
||||
./result/bin/update.sh root@the-device
|
||||
----
|
||||
|
||||
The update script uses min-copy-closure to copy new or changed packages
|
||||
to the device, then (perhaps) reboots it. The reboot behaviour can be
|
||||
affected by flags:
|
||||
|
||||
* [.title-ref]#--no-reboot# will cause it not to reboot at all, if you
|
||||
would rather do that yourself. Note that none of the newly-installed or
|
||||
updated services will be running until you do.
|
||||
* [.title-ref]#--fast# causes it tn not do a full reboot, but instead to
|
||||
restart only the services that have been changed. This will restart all
|
||||
of the services that have updated store paths (and anything that depends
|
||||
on them), but will not affect services that haven't changed.
|
||||
|
||||
It doesn't delete old packages automatically: to do that run
|
||||
`+min-collect-garbage+`, which will delete any packages not in the
|
||||
current system closure. Note that Liminix does not have the NixOS
|
||||
concept of environments or generations, and there is no way back from
|
||||
this except for building the previous configuration again.
|
||||
|
||||
==== Caveats
|
||||
|
||||
* it needs there to be enough free space on the device for all the new
|
||||
packages in addition to all the packages already on it - which may be a
|
||||
problem if there is little flash storage or if a lot of things have
|
||||
changed (e.g. a new version of nixpkgs).
|
||||
* it may not be able to upgrade the kernel: this is device-dependent. If
|
||||
your device boots from a kernel image on a raw MTD partition or or UBI
|
||||
volume, update.sh is unable to alter the kernel partition. If your
|
||||
device boots from a kernel inside the filesystem (e.g. using
|
||||
bootloader.extlinux or bootloder.fit) then the kernel will be upgraded
|
||||
along with the userland
|
||||
|
||||
==== Recovery/downgrades
|
||||
|
||||
The `+update.sh+` script also creates a timestamped symlink on the
|
||||
device which points to the system configuration it installs. If you
|
||||
install a configuration that doesn't work, you can revert to any other
|
||||
installed configuration by
|
||||
|
||||
[arabic]
|
||||
. booting to some kind of rescue or recovery system (which may be some
|
||||
vendor-provided rescue option, or your own recovery system perhaps based
|
||||
on `+examples/recovery.nix+`) and mounting your Liminix filesystem on
|
||||
/mnt
|
||||
. picking another previously-installed configuration that _link:[did]
|
||||
work, and switching back to it:
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# ls -ld /mnt/*configuration
|
||||
lrwxrwxrwx 1 90 /mnt/20252102T182104.configuration -> nix/store/v1w0h4zw65ah4c2r0k7nyy125qrxhq78-system-configuration-aarch64-unknown-linux-musl
|
||||
lrwxrwxrwx 1 90 /mnt/20251802T181822.configuration -> nix/store/wqjl9s9xljl2wg8257292zghws9ssidk-system-configuration-aarch64-unknown-linux-musl
|
||||
# : 20251802T181822 is the working system, so reinstall it
|
||||
# /mnt/20251802T181822.configuration/bin/install /mnt
|
||||
# umount /mnt
|
||||
# reboot
|
||||
----
|
||||
|
||||
This will install the previous configuration's activation binary into
|
||||
/bin, and copy its kernel and initramfs into /boot. Note that it depends
|
||||
on the previous system not having been garbage-collected.
|
||||
|
||||
[[levitate]]
|
||||
==== Adding packages
|
||||
|
||||
If you simply wish to add a package without any change to services, you
|
||||
can call `+min-copy-closure+` directly to install any package in Nixpkgs
|
||||
or in the Liminix overlay
|
||||
|
||||
[source,console]
|
||||
----
|
||||
nix-build -I liminix-config=./my-configuration.nix \
|
||||
--arg device "import ./devices/mydevice" -A pkgs.tcpdump
|
||||
|
||||
nix-shell -p min-copy-closure root@the-device result/
|
||||
----
|
||||
|
||||
Note that this only copies the package and its dependencies to the
|
||||
device: it doesn't update any profile to add it to `+$PATH+`
|
||||
|
||||
[[rebuilding the system]]
|
||||
=== Reinstalling on a running system
|
||||
|
||||
Liminix is initially installed from a monolithic `+firmware.bin+` - and
|
||||
unless you're running a writable filesystem, the only way to update it
|
||||
is to build and install a whole new `+firmware.bin+`. However, you
|
||||
probably would prefer not to have to remove it from its installation
|
||||
site, unplug it from the network and stick serial cables in it all over
|
||||
again.
|
||||
|
||||
It is not (generally) safe to install a new firmware onto the flash
|
||||
partitions that the active system is running on. To address this we have
|
||||
`+levitate+`, which a way for a running Liminix system to "soft restart"
|
||||
into a ramdisk running only a limited set of services, so that the main
|
||||
partitions can then be safely flashed.
|
||||
|
||||
==== Configuration
|
||||
|
||||
Levitate _needs to be configured when you create the initial system_ to
|
||||
specify which services/packages/etc to run in maintenance mode. Most
|
||||
likely you want to configure a network interface and an ssh for example
|
||||
so that you can login to reflash it.
|
||||
|
||||
[source,nix]
|
||||
----
|
||||
defaultProfile.packages = with pkgs; [
|
||||
...
|
||||
(levitate.override {
|
||||
config = {
|
||||
services = {
|
||||
inherit (config.services) dhcpc sshd watchdog;
|
||||
};
|
||||
defaultProfile.packages = [ mtdutils ];
|
||||
users.root = config.users.root;
|
||||
};
|
||||
})
|
||||
];
|
||||
----
|
||||
|
||||
==== Use
|
||||
|
||||
Connect (with ssh, probably) to the running Liminix system that you wish
|
||||
to upgrade.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
bash$ ssh root@the-device
|
||||
----
|
||||
|
||||
Run `+levitate+`. This takes a little while (perhaps a few tens of
|
||||
seconds) to execute, and copies all config required for maintenance mode
|
||||
to `+/run/maintenance+`.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# levitate
|
||||
----
|
||||
|
||||
Reboot into maintenance mode. You will be logged out
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# reboot
|
||||
----
|
||||
|
||||
Connect to the device again - note that the ssh host key will have
|
||||
changed.
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# ssh -o UserKnownHostsFile=/dev/null root@the-device
|
||||
----
|
||||
|
||||
Check we're in maintenance mode
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# cat /etc/banner
|
||||
|
||||
LADIES AND GENTLEMEN WE ARE FLOATING IN SPACE
|
||||
|
||||
Most services are disabled. The system is operating
|
||||
with a ram-based root filesystem, making it safe to
|
||||
overwrite the flash devices in order to perform
|
||||
upgrades and maintenance.
|
||||
|
||||
Don't forget to reboot when you have finished.
|
||||
----
|
||||
|
||||
Perform the upgrade, using flashcp. This is an example, your device will
|
||||
differ
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# cat /proc/mtd
|
||||
dev: size erasesize name
|
||||
mtd0: 00030000 00010000 "u-boot"
|
||||
mtd1: 00010000 00010000 "u-boot-env"
|
||||
mtd2: 00010000 00010000 "factory"
|
||||
mtd3: 00f80000 00010000 "firmware"
|
||||
mtd4: 00220000 00010000 "kernel"
|
||||
mtd5: 00d60000 00010000 "rootfs"
|
||||
mtd6: 00010000 00010000 "art"
|
||||
# flashcp -v firmware.bin mtd:firmware
|
||||
----
|
||||
|
||||
All done
|
||||
|
||||
[source,console]
|
||||
----
|
||||
# reboot
|
||||
----
|
Reference in New Issue
Block a user