1499 lines
		
	
	
		
			49 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			1499 lines
		
	
	
		
			49 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Thu Sep 22 00:12:42 BST 2022
 | ||
| 
 | ||
| Making quite reasonable progress, though only running under emulation.
 | ||
| Since almost everything so far has been a recap of nixwrt, that's to
 | ||
| be expected.
 | ||
| 
 | ||
| The example config starts some services at boot, or at least attempts
 | ||
| to. Next we shoud
 | ||
| 
 | ||
|  - add some network config to run-qemu
 | ||
|  - implement udhcp and odhcp properly to write outputs
 | ||
|   and create resolv.conf and all that
 | ||
|  - write some kind of test so we can refactor the crap
 | ||
|  - not let the tests write random junk everywhere
 | ||
| 
 | ||
| Thu Sep 22 12:46:36 BST 2022
 | ||
| 
 | ||
| We can store outputs in the s6 scan directory, it seems:
 | ||
| 
 | ||
| > There is, however, a guarantee that s6-supervise will never touch subdirectories named data or env. So if you need to store user information in the service directory with the guarantee that it will never be mistaken for a configuration file, no matter the version of s6, you should store that information in the data or env subdirectories of the service directory.
 | ||
| 
 | ||
| https://skarnet.org/software/s6/servicedir.html
 | ||
| 
 | ||
| > process 'store/pj0b27l5728cypa5mmagz0q8ibzpik0h-execline-mips-unknown-linux-musl-2.9.0.1-bin/bin/execlineb' started with executable stack
 | ||
| 
 | ||
| https://skarnet.org/lists/skaware/1550.html
 | ||
| 
 | ||
| 
 | ||
| Thu Sep 22 16:14:49 BST 2022
 | ||
| 
 | ||
| what network peers do we want to model for testing?
 | ||
| 
 | ||
| - wan: pppoe
 | ||
| - wan: ip over ethernet, w/ dhcp service provided
 | ||
| - wan: l2tp over (ip over ethernet, w/ dhcp service provided)
 | ||
| - lan: something with a dhcp client
 | ||
| 
 | ||
| https://accel-ppp.readthedocs.io/en/latest/ could use this for testing
 | ||
| pppoe and l2tp?
 | ||
| 
 | ||
| 
 | ||
| Thu Sep 22 22:57:47 BST 2022
 | ||
| 
 | ||
| To build a nixos vm with accel-ppp installed (not yet configured)
 | ||
| 
 | ||
|   nix-build '<nixpkgs/nixos>' -A vm -I nixos-config=./tests/ppp-server-configuration.nix -o ppp-server
 | ||
|   QEMU_OPTS="-display none -serial mon:stdio -nographic" ./ppp-server/bin/run-nixos-vm
 | ||
| 
 | ||
| To test it's configured I thought I'd run it against an OpenWrt qemu
 | ||
| install, so, fun with qemu networking ensues. This config in ../openwrt-qemu
 | ||
| is using two multicast socket networks -
 | ||
| 
 | ||
| nix-shell -p qemu --run "./run.sh ./openwrt-22.03.0-x86-64-generic-kernel.bin openwrt-22.03.0-x86-64-generic-ext4-rootfs.img "
 | ||
| 
 | ||
| so hopefully we can spin up other VMs connected either to its lan or
 | ||
| its wan: *however* we do first need to configure its wan to use pppoe
 | ||
| 
 | ||
| uci set network.wan=interface
 | ||
| uci set network.wan.device='eth1'
 | ||
| uci set network.wan.proto='pppoe'
 | ||
| uci set network.wan.username='db123@a.1'
 | ||
| uci set network.wan.password='NotReallyTheSecret'
 | ||
| 
 | ||
| (it's ext4 so this will probably stick)
 | ||
| 
 | ||
| 
 | ||
| Fri Sep 23 10:27:22 BST 2022
 | ||
| 
 | ||
| * mcast=230.0.0.1:1234  : access (interconnect between router and isp)
 | ||
| * mcast=230.0.0.1:1235  : lan
 | ||
| * mcast=230.0.0.1:1236  : world (the internet)
 | ||
| 
 | ||
| 
 | ||
| Sun Sep 25 20:56:28 BST 2022
 | ||
| 
 | ||
| TODO - bugs, missing bits, other infelicities as they occur to me:
 | ||
| 
 | ||
| DONE 1) shutdown doesn't work as its using the busybox one not s6.
 | ||
| 
 | ||
| 2) perhaps we shouldn't have process-based services like dhcp, ppp
 | ||
| implement "address provider interface" - instead have a separate
 | ||
| service for interface address that depends on the service and uses its
 | ||
| output
 | ||
| 
 | ||
| * ppp is not like dhcp because dhcp finds addresses for an existing
 | ||
|   interface but ppp makes a new one
 | ||
| 
 | ||
| 3) when I killed ppp it restarted, but I don't think it reran
 | ||
| defaultroute which is supposed to depend on it. (Might be important
 | ||
| e.g. if we'd been assigned a different IP address). Investigate
 | ||
| semantics of s6-rc service dependencies
 | ||
| 
 | ||
| DONE 4) make the pppoe test run unattended
 | ||
| 
 | ||
| 5) write a test for udhcp
 | ||
| 
 | ||
| 6) squashfs size is ~ 14MB for a configuration with not much in it,
 | ||
| look for obvious wastes of space
 | ||
| 
 | ||
| 7) some of the pppoe config should be moved into a ppp service
 | ||
| 
 | ||
| 8) some of configuration.nix (e.g. defining routes) should be moved into
 | ||
| tools
 | ||
| 
 | ||
| DONE 9) split tools up instead of having it all one file
 | ||
| 
 | ||
| 10) is it OK to depend on squashfs pseudofiles if we might want to
 | ||
| switch to ubifs? will there always be a squashfs underneath? might
 | ||
| we want to change the pseudofiles in an overlay?
 | ||
| 
 | ||
| 11) haven't done (overlayfs) overlays at all
 | ||
| 
 | ||
| 12) overlay.nix needs splitting up
 | ||
| 
 | ||
| 13) upgrade ppp to something with an ipv6-up-script option
 | ||
| 
 | ||
| 14) add ipv6 support generally
 | ||
| 
 | ||
| 15) "ip address add" seems to magically recognise v4 vs v6 but
 | ||
| is that specified or fluke?
 | ||
| 
 | ||
| 16) tighten up the module specs. (DONE) services.foo should be a s6-rc
 | ||
| service, (DONE) kernel config should be checked in some way
 | ||
| 
 | ||
| DONE 17) rename nixwrt references in kernel builder
 | ||
| 
 | ||
| 18) maybe stop suffixing all the service names with .service
 | ||
| 
 | ||
| 19) syslogd - use busybox or s6?
 | ||
| 
 | ||
| chat -s -S ogin:--ogin: root / "ip address show dev ppp0 | grep ppp0" 192.168.100.1  "/nix/store/*-s6-linux-init-*/bin/s6-linux-init-hpr -p"
 | ||
| 
 | ||
| 
 | ||
| Working towards a general goal of having a derivation we can
 | ||
| usefully run `nix path-info` on - or some other tool that will
 | ||
| tell us what's making the images big. The squashfs doesn't
 | ||
| have this information.
 | ||
| 
 | ||
| Towards that end (really? can't remember how ...) what would be a
 | ||
| way for packages to declare "I want to add files to /etc"? Is that
 | ||
| even a good idea?
 | ||
| 
 | ||
| Thinking we should turn s6-init-files back into a real derivation.
 | ||
| 
 | ||
| Tue Sep 27 00:31:45 BST 2022
 | ||
| 
 | ||
| > Thinking we should turn s6-init-files back into a real derivation.
 | ||
| 
 | ||
| This turns out to be Not That Simple, because it contains weird shit
 | ||
| (sticky bits and fifos).
 | ||
| 
 | ||
| Tue Sep 27 09:50:44 BST 2022
 | ||
| 
 | ||
| * allow modules to register activation scripts that are run on the
 | ||
| root filesystem once all packages are installed
 | ||
| 
 | ||
|   - do they run on build or on host? if we're upgrading in place
 | ||
|   how do we ship filesystem changes to the host?
 | ||
| 
 | ||
| or:
 | ||
| 
 | ||
| * allow modules to declare environment.*, use pseudofile on build and
 | ||
| create real files on host. will need to keep the implementation on
 | ||
|   host faily simple because restricted environment
 | ||
| 
 | ||
| Tue Sep 27 16:14:18 BST 2022
 | ||
| 
 | ||
| TODO list is getting both longer and shorter, though longer on
 | ||
| average.
 | ||
| 
 | ||
| 2) perhaps we shouldn't use process-based services like [ou]dhcp as
 | ||
| queryable endpoint for interface addresses (e.g. when adding routes).
 | ||
| Instead have a separate service for interface address that depends on
 | ||
| the *dhcp and uses its output
 | ||
| 
 | ||
| 3) when I killed ppp it restarted, but I don't think it reran
 | ||
| defaultroute which is supposed to depend on it. (Might be important
 | ||
| e.g. if we'd been assigned a different IP address). Investigate
 | ||
| semantics of s6-rc service dependencies
 | ||
| 
 | ||
| 4) figure out a nice way to fit ppp into this model as it actually
 | ||
| creates the interface instead of using an existing unconfigured one
 | ||
| 
 | ||
| 5) write a test for udhcp
 | ||
| 
 | ||
| 7) some of the pppoe config should be moved into a ppp service
 | ||
| 
 | ||
| 11) haven't done (overlayfs) overlays at all
 | ||
| 
 | ||
| 13) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
 | ||
| 
 | ||
| 14) add ipv6 support generally
 | ||
| 
 | ||
| 15) "ip address add" seems to magically recognise v4 vs v6 but
 | ||
| is that specified or fluke?
 | ||
| 
 | ||
| 19) ship logs somehow to log collection system
 | ||
| 
 | ||
| 21) dhcp, dns, hostap service for lan
 | ||
| 
 | ||
| 22) support real hardware
 | ||
| 
 | ||
| Tue Sep 27 22:00:36 BST 2022
 | ||
| 
 | ||
| Found the cause of huge image size: rp-pppoe ships with scripts that
 | ||
| reference build-time packages, so we have x86-64 glibc in there
 | ||
| 
 | ||
| We don't need syslog just to accommodate ppp, there's an underdocumented
 | ||
| option for it to log to a file descriptor
 | ||
| 
 | ||
| Wed Sep 28 16:04:02 BST 2022
 | ||
| 
 | ||
| Based on https://unix.stackexchange.com/a/431953 if we can forge
 | ||
| ethernet packets we might be able to write tests for e.g. "is the vm
 | ||
| running a dhcp server"
 | ||
| 
 | ||
| Wed Sep 28 21:29:05 BST 2022
 | ||
| 
 | ||
| We can use Python "scapy" to generate dhcp request packets, and Python
 | ||
| 'socket' model to send them encapsulated in UDP. Win
 | ||
| 
 | ||
| It's extremely janky python
 | ||
| 
 | ||
| Thu Sep 29 15:24:37 BST 2022
 | ||
| 
 | ||
| Two points to ponder
 | ||
| 
 | ||
| 1) where service config depends on outputs of other services, we
 | ||
| do that rather ugly "$(cat ${output ....})" construct. Can we improve on
 | ||
| that? Maybe we could have some kind of tooling to read them as environment
 | ||
| variables ...
 | ||
| 
 | ||
| 2) we have given no consideration yet to secrets. we want the secrets to
 | ||
| be not in the store; we want some way of refreshing them when they change
 | ||
| 
 | ||
| Sat Oct  1 14:24:21 BST 2022
 | ||
| 
 | ||
| The MAC80211_HWSIM kernel config creates virtual wlan[01] devices
 | ||
| which hostapd will work with, and a hwsim0 which we can use to monitor
 | ||
| (though not inject) trafic. Could we use this for wifi tests? How do
 | ||
| we make the guest hwsim0 visible to the host?
 | ||
| 
 | ||
| 
 | ||
| Sat Oct  1 18:41:31 BST 2022
 | ||
| 
 | ||
| virtual serial ports: I struggled with qemu for ages to get this to work.
 | ||
| You also need the unhelpfully named CONFIG_VIRTIO_CONSOLE option in
 | ||
| kconfig
 | ||
| 
 | ||
| QEMU_OPTIONS="-nodefaults  -chardev socket,path=/tmp/wlan,server=on,wait=off,id=wlan  -device virtio-serial-pci -device virtserialport,name=wlan,chardev=wlan"
 | ||
| 
 | ||
| Sun Oct  2 09:34:48 BST 2022
 | ||
| 
 | ||
| We could implement the secrets store as a service, then the secrets
 | ||
| are outputs.
 | ||
| 
 | ||
| Things we can do in qemu
 | ||
| 
 | ||
| 1) make interface address service that depends on dhcp, instead of
 | ||
|   being set by it directly
 | ||
| 2) check out restart behaviour of dependent services when depended-on
 | ||
|   service dies
 | ||
| 3) pppd _creates_ an interface, work out how to fit it into this model
 | ||
| 5) add bridge support for lan
 | ||
| 8) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
 | ||
| 9) get ipv6 address from pppoe
 | ||
| 10) get ipv6 delegation from pppoe and add prefix to lan
 | ||
| 11) support dhcp6 in dnsmasq, and advertise prefix on lan
 | ||
| 12) firewalling and nat
 | ||
|  - default deny or zero trust?
 | ||
| 14) write secrets holder as a service with outputs
 | ||
| 20) should we check that references to outputs actually correspond with
 | ||
|   those provided by a service
 | ||
| 
 | ||
| Things we probably do on hardware
 | ||
| 
 | ||
| 6) writable filesystem (ubifs?)
 | ||
| 7) overlay with squashfs/ubifs - useful? think about workflows for
 | ||
| how this thing is installed
 | ||
| 16) gl-ar750
 | ||
| 17) mediatek device - gl-mt300 or whatever I have lying around
 | ||
| 18) some kind of arm (banana pi router?)
 | ||
| 19) should we give routeros a hardware ethernet and maybe an l2tp upstream,
 | ||
|  then we could dogfood the hardware devices.  we could run an l2tp service
 | ||
|  at mythic-beasts, got a /48 there
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| https://skarnet.org/software/s6/s6-fghack.html looks like a handy thing
 | ||
| we hope we'll never have to use
 | ||
| 
 | ||
| Sun Oct  2 22:22:17 BST 2022
 | ||
| 
 | ||
| > make interface address service that depends on dhcp, instead of being set by it directly
 | ||
| 
 | ||
| We can do this for dhcp, but we can't do it for ppp. Running the ppp service
 | ||
| creates a ppp[012n] interface and assigns it an ipv4 address and there's not
 | ||
| a whole lot we can easily do to unbundle that.
 | ||
| 
 | ||
| So
 | ||
| 
 | ||
| - the ppp service needs to behave as if it were a "link" service
 | ||
| - either it *also* needs to behave as an address service, or we could
 | ||
|   have an address service that subscribes to it and does nothing other than
 | ||
|   translate output formats
 | ||
| 
 | ||
| Note regarding that second bullet: at the moment the static address
 | ||
| service has no outputs anyway!
 | ||
| 
 | ||
| 
 | ||
| Tue Oct  4 22:43:02 BST 2022
 | ||
| 
 | ||
| While trying to make the TFTP workflow not awful I seem to have written
 | ||
| a TFTP server.
 | ||
| 
 | ||
| 
 | ||
| Thu Oct  6 19:26:40 BST 2022
 | ||
| 
 | ||
| We have a booting kernel on gl-ar750, but we aren't at a point that it can
 | ||
| find a root filesystem
 | ||
| 
 | ||
| I'd *like* to be able to use the same delivery mechanism (kernel uimage
 | ||
| concatenated monolithic
 | ||
| 
 | ||
| 
 | ||
| Sat Oct  8 11:12:09 BST 2022
 | ||
| 
 | ||
| We have it booting on hardware, mounting root fs, running getty :-)
 | ||
| 
 | ||
| For NixWRT TFTP boots we used a single image with both kernel and squashfs, and
 | ||
| relied on CONFIG_MTD_SPLIT_FIRMWARE to identify where the boundary was and create
 | ||
| /dev/mdtn devices at the right offsets so that the kernel could find the
 | ||
| squashfs
 | ||
| 
 | ||
| For Liminix we're not going to do that.
 | ||
| 
 | ||
| * CONFIG_MTD_SPLIT_FIRMWARE is only available in OpenWrt patches
 | ||
| * it's an uncomfortable level of automagic just to save us doing two TFTPs
 | ||
|   instea of one
 | ||
| * the generated image is anyway not the one we'd write to flash (has unneeded
 | ||
|    PHRAM support)
 | ||
| * it means we need to memmap out enough ram for the whole image inc kernel when really
 | ||
|   all we need to reserve is the rootfs bit
 | ||
| 
 | ||
| 
 | ||
| Sat Oct  8 11:23:08 BST 2022
 | ||
| 
 | ||
| "halt" and "reboot" don't work on gl-ar750
 | ||
| 
 | ||
| Sat Oct  8 13:10:00 BST 2022
 | ||
| 
 | ||
| Where do we go with this ar750?
 | ||
| 
 | ||
| - wired networking
 | ||
| - wifi
 | ||
| 
 | ||
| 
 | ||
| Sun Oct  9 09:57:35 BST 2022
 | ||
| 
 | ||
| We want to be able to package kernel modules as regular derivations, so that
 | ||
| they get added to the filesystem
 | ||
| 
 | ||
| This means they need access to kernel.modulesupport
 | ||
| 
 | ||
| This means  kernel.modulesupport needs to be in pkgs too?
 | ||
| 
 | ||
| This is fine, probably, but we'd like to avoid closing over vmlinux because
 | ||
| there's no need for it to be in the filesystem
 | ||
| 
 | ||
| Mon Oct 10 22:57:23 BST 2022
 | ||
| 
 | ||
| The problem is that kernel kconfig options are manipulated in the
 | ||
| liminix modules, which means that data must be (transitively) available
 | ||
| to modules, so they can't be regular packages as they're tied so tightly
 | ||
| to the exact config. Unless we define a second overlay that references
 | ||
| the configuration object, but my head hurts when I start to think about that
 | ||
| so maybe not.
 | ||
| 
 | ||
| Tue Oct 11 00:00:13 BST 2022
 | ||
| 
 | ||
| Building ag71xx (ethernet driver) as a module doesn't work because
 | ||
| it references a symbol ath79_pll_base in the kernel that hasn't been
 | ||
| marked with EXPORT_SYMBOL.
 | ||
| 
 | ||
| We could forge an object file that "declares" it with a gross and disgusting hack like this
 | ||
| 
 | ||
| $ echo > empty # not actually "empty", objcopy complains about that
 | ||
| $ grep ath79_pll_base /nix/store/jcc114cd13xa8aa4mil35rlnmxnlmv09-vmlinux-mips-unknown-linux-musl-modulesupport/System.map
 | ||
| ffffffff807b2094 B ath79_pll_base
 | ||
| $ mips-unknown-linux-musl-objcopy   -I binary -O elf32-big --add-section .bss=empty  --add-symbol ath79_pll_base=.bss:0x807b2094  empty f.o
 | ||
| 
 | ||
| I don't claim this is a good idea, just an idea. Thought was that we would not
 | ||
| have to declare its type this way. Also it might not work with kaslr
 | ||
| https://stackoverflow.com/a/68903503
 | ||
| 
 | ||
| 
 | ||
| Backstory: why are we trying to build this as a module? because the
 | ||
| openwrt fork of it seems to be a bit more advanced than the mainline,
 | ||
| and I *suspect* that the mainline version doesn't work with our
 | ||
| openwrt-based device tree which ahs the mdio as a nested node inside
 | ||
| the ag71xx node - in mainline the driver seems to have all the mdio
 | ||
| stuff inline. So, could we build the openwrt driver without patching
 | ||
| the crap out of our kernel
 | ||
| 
 | ||
| Sun Oct 16 15:25:33 BST 2022
 | ||
| 
 | ||
| Executive decision: let's use the openwrt kernel (at least for
 | ||
| gl-ar750).  Mainline kernel doesn’t have devicetree support for this
 | ||
| device or the SoC it’s based on, and the OpenWrt dts for it doesn’t
 | ||
| have the same "compatible"s, which makes me think that an indefinite
 | ||
| amount of patching will be necessary to make dts/modules for one of
 | ||
| them work with a kernel for the other
 | ||
| 
 | ||
| As a result: now we have eth0 appearing, but not eth1?  Guessing we
 | ||
| need to add some kconfig for the switch
 | ||
| 
 | ||
| Mon Oct 17 21:23:37 BST 2022
 | ||
| 
 | ||
| we are spending ridiculous amounts of cpu/io time copying kernel source
 | ||
| trees from place to place, because we have kernel tree preparation
 | ||
| and actual building as two separate derivations.
 | ||
| 
 | ||
| I think the answer is to have a generic kernel build derivation
 | ||
| in the overlay, and then have the device overlays override it with
 | ||
| an additional phase to do openwrt patching or whatever else they
 | ||
| need to do.
 | ||
| 
 | ||
| Tue Oct 18 23:02:43 BST 2022
 | ||
| 
 | ||
| * previous TODO list is Aug 02, need to review
 | ||
| * dts is hardcoded to gl-ar750, that needs cleaning up
 | ||
| * figure out persistent addresses for ethernet
 | ||
| * fix halt/reboot
 | ||
| * "link" services have a "device" attribute, would much rather
 | ||
|   have everything referenced using outputs than having two
 | ||
|   different mechanisms for reading similar things
 | ||
| * Kconfig.local do we still need it?
 | ||
| * check all config instead of differentiating config/checkedConfig
 | ||
| 
 | ||
| Sun Feb  5 18:14:02 GMT 2023
 | ||
| 
 | ||
| We have resumed.
 | ||
| commit eb4efab6a215bf03cf5aab10d4ac909e83e9c148
 | ||
| Author: Daniel Barlow <dan@telent.net>
 | ||
| Date:   Sat Jan 28 23:18:28 2023 +0000
 | ||
| 
 | ||
| 
 | ||
| * find out what works
 | ||
| * add that stuff to hydra
 | ||
| * fix the rest
 | ||
| * add that stuff to hydra
 | ||
| * convert to flake
 | ||
| * check if routeros can be run interactively
 | ||
| * some per-device docs in a form that can be transcluded for website
 | ||
| 
 | ||
| 
 | ||
| ci builds
 | ||
| 
 | ||
| * each of the tests has hardcoded device/config/etc
 | ||
| * build an "empty" configuration for each target device
 | ||
| * build an unstable configuration for qemu
 | ||
| 
 | ||
| 
 | ||
| Wed Feb  8 16:52:22 GMT 2023
 | ||
| 
 | ||
| We have hydra builds for all the previously-working devices, though we
 | ||
| don't yet know if any of those builds actually boots or does anything
 | ||
| useful.
 | ||
| 
 | ||
| [DONE] Would be nice to clean up the run-qemu and connect-qemu scripts
 | ||
| and put them in the buildEnv
 | ||
| 
 | ||
| Some thought needed about how to hook up the gl-ar750 to the internets,
 | ||
| ideally in a way that mirrors typical real uses. AAISP have an L2TP
 | ||
| service, but I would prefer to use pppoe on the device, so how to
 | ||
| translate one to t'other on an intermediary/gateway machine?
 | ||
| https://www.rfc-archive.org/getrfc.php?rfc=3817#gsc.tab=0 exists
 | ||
| as an RFC but I can't find anything that actually implements it
 | ||
| 
 | ||
| Actual Documentation (e.g.  user and developer manuals) should live in
 | ||
| the liminix repo so it corresponds with the code, and can be rsynced
 | ||
| from there to the web site, maybe with a deploy hook or something.
 | ||
| Haven't decided what a good doc format is yet
 | ||
| 
 | ||
| If we create a flake for Hydra to run on, that _more or less_ means we
 | ||
| don't have any manual hydra jobset configuration to document.
 | ||
| 
 | ||
| There are still some tests that need adding to CI
 | ||
| 
 | ||
| [DONE] Should the per-device config be a module not an overlay? Given that
 | ||
| half of what's in it is kernel config (a module could set this)
 | ||
| and the rest is source tarball download specs (needs nixpkgs,
 | ||
| a module has this and could set it too) I wonder why it isn't already
 | ||
| 
 | ||
| [ALREADY DOES] Can we make Hydra report output sizes so we can plot closure size
 | ||
| trends and see if it all goes awful?
 | ||
| 
 | ||
| Thu Feb  9 08:14:39 GMT 2023
 | ||
| 
 | ||
| For better developer experience, I am thinking that either (1)
 | ||
| swap tasks 2 and 3 (writable filesystem before module system)
 | ||
| or (2) add NBD support so I can iterate on a real device without
 | ||
| full rebuilds every time
 | ||
| 
 | ||
| 
 | ||
| Fri Feb 10 06:18:25 PM GMT 2023
 | ||
| 
 | ||
| did the overlay->module thing
 | ||
| 
 | ||
| [DONE] Need to fix all the configuration around PHRAM, I can't see how it
 | ||
| would ever work
 | ||
| 
 | ||
| Sat Feb 11 14:37:45 GMT 2023
 | ||
| 
 | ||
| Consolidated TODO
 | ||
| 
 | ||
| * figure out persistent addresses for ethernet (?)
 | ||
| [SEEMS DONE] * fix halt/reboot
 | ||
| [DONE, NO] * Kconfig.local do we still need it?
 | ||
| [DONE] * check all config instead of differentiating config/checkedConfig
 | ||
| 
 | ||
| Things we can do in qemu
 | ||
| 
 | ||
| * "link" services have a "device" attribute, would much rather
 | ||
|   have everything referenced using outputs than having two
 | ||
|   different mechanisms for reading similar things
 | ||
| 1) make interface address service that depends on dhcp, instead of
 | ||
|   being set by it directly
 | ||
| 2) check out restart behaviour of dependent services when depended-on
 | ||
|   service dies
 | ||
| 3) pppd _creates_ an interface, work out how to fit it into this model
 | ||
| 5) add bridge support for lan
 | ||
| 8) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
 | ||
| 9) get ipv6 address from pppoe
 | ||
| 10) get ipv6 delegation from pppoe and add prefix to lan
 | ||
| 11) support dhcp6 in dnsmasq, and advertise prefix on lan
 | ||
| 12) firewalling and nat
 | ||
|  - default deny or zero trust?
 | ||
| 14) write secrets holder as a service with outputs
 | ||
| 20) should we check that references to outputs actually correspond with
 | ||
|   those provided by a service
 | ||
| * Actual Documentation (e.g.  user and developer manuals)
 | ||
| * make a flake
 | ||
| * There are still some tests that need adding to CI
 | ||
| 
 | ||
| Things we probably do on hardware
 | ||
| 
 | ||
| [DONE] * dts is hardcoded to gl-ar750, that needs cleaning up
 | ||
| 6) writable filesystem (ubifs?)
 | ||
| 7) overlay with squashfs/ubifs - useful? think about workflows for
 | ||
| how this thing is installed
 | ||
| 16) gl-ar750
 | ||
| [DONE] * decide how to hook up the gl-ar750 to the internets
 | ||
| 17) mediatek device - gl-mt300 or whatever I have lying around
 | ||
| 18) some kind of arm (banana pi router?)
 | ||
| [DONE DIFERENTLY] 19) should we give routeros a hardware ethernet and maybe an l2tp upstream,
 | ||
|  then we could dogfood the hardware devices.  we could run an l2tp service
 | ||
|  at mythic-beasts, got a /48 there
 | ||
| 
 | ||
| 
 | ||
| Sat Feb 11 15:57:31 GMT 2023
 | ||
| 
 | ||
| The reason we would like to run PPPoE instead of L2TP on the "rotuer" device is
 | ||
| 
 | ||
| - closer to real world scenario
 | ||
| - means no need to run dhcp client on the wan interface before we
 | ||
|    even get to start the l2tpd
 | ||
| 
 | ||
| 
 | ||
| Sun Feb 12 14:57:28 GMT 2023
 | ||
| 
 | ||
| https://github.com/katalix/go-l2tp#kpppoed
 | ||
| 
 | ||
| 
 | ||
| Mon Feb 13 04:44:09 PM GMT 2023
 | ||
| 
 | ||
| if the gl-ar750 is connected to an ethernet card that linux is ignoring,
 | ||
| we're going to have to set up _some_ qemu thing just to run tftp from.
 | ||
| 
 | ||
| Tue Feb 14 17:59:34 GMT 2023
 | ||
| 
 | ||
| We should do a derivation that creates an ISO image and a qemu shell
 | ||
| script based on a configuration.nix, and put it in buildEnv. We'll
 | ||
| call it "borderNetVm" :
 | ||
| 
 | ||
| > A broadband remote access server (BRAS, B-RAS or BBRAS) routes
 | ||
|   traffic to and from broadband remote access devices such as digital
 | ||
|   subscriber line access multiplexers (DSLAM) on an Internet service
 | ||
|   provider's (ISP) network.[1][2] BRAS can also be referred to as a
 | ||
|   broadband network gateway or border network gateway (BNG).[3]
 | ||
| 
 | ||
| (for consistency we should rename the "access" qemu socket network to
 | ||
| match whatever we call this)
 | ||
| 
 | ||
|  rm border.qcow2 ; nix-shell --argstr liminix `pwd`  --argstr nixpkgs `pwd`/../nixpkgs  --argstr unstable `pwd`/../unstable-nixpkgs/ ci.nix -A buildEnv --run "run-border-vm"
 | ||
| 
 | ||
| Wed Feb 15 22:56:59 GMT 2023
 | ||
| 
 | ||
| configuration for border vm needs to come from somewhere so it's good
 | ||
| for more people than just me
 | ||
| 
 | ||
| - pci device for setting up the ethernet
 | ||
| - lns address
 | ||
| - uid so it can do 9p shares? do we need to map things here?
 | ||
| 
 | ||
| also need to document the host-side bits so that people can set up
 | ||
| their spare ethernet as vfio
 | ||
| 
 | ||
| next step for hacking is to figure out what I was doing with pppoe
 | ||
| 
 | ||
| Wed Feb 15 22:59:56 GMT 2023
 | ||
| 
 | ||
| docs ...
 | ||
| 
 | ||
| * introduction
 | ||
| 
 | ||
| * user guide
 | ||
| ** how to build it
 | ||
| ** how to flash it on your device
 | ||
| ** what to put in configuration.nix
 | ||
| ** modules
 | ||
| 
 | ||
| * developer guide
 | ||
| ** building/running with qemu
 | ||
| *** emulated upstream
 | ||
| ** building/running on hardware
 | ||
| *** run in place with TFTP
 | ||
| *** emulated upstream
 | ||
| ** CI
 | ||
| ** Roadmap
 | ||
| ** Contributing
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
|  nix-shell -p sphinx --run "make -C doc html"
 | ||
| 
 | ||
| https://francis.begyn.be/blog/nixos-home-router contains information about avahi reflector
 | ||
| 
 | ||
| 
 | ||
| Fri Feb 17 00:09:34 GMT 2023
 | ||
| 
 | ||
|    29 11.282085831 81.187.76.242 → 8.8.8.8      ICMP 106 Echo (ping) request  id=0x0187, seq=2/512, 4
 | ||
|    30 11.286314642 90.155.53.19 → 81.187.76.242 ICMP 78 Destination unreachable (Communication admin)
 | ||
| 
 | ||
| We're getting packets over the pppoe-l2tp relay thing. Just have to
 | ||
| work out now why we're not routing
 | ||
| 
 | ||
| Fri Feb 17 16:54:41 GMT 2023
 | ||
| 
 | ||
| Haha.  We weren't routing because we'd used the wrong CHAP password
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| Fri Feb 17 16:58:27 GMT 2023
 | ||
| 
 | ||
| This TODO is for nlnet task 1 and for bits of subsequent tasks that
 | ||
| are annoying enough that I might poke at them anyway:
 | ||
| 
 | ||
| 
 | ||
| 1) gl-ar750, why do we get "ag71xx 19000000.eth: invalid MAC address, using random address"
 | ||
| 2) gl-ar750, wifi
 | ||
| 3) document services so I can remember how they work. Refer back to Oct 18 for notes that no longer make sense
 | ||
| 4) check out restart behaviour of dependent services when depended-on service dies
 | ||
| 5) pppd _creates_ an interface, work out how to fit it into this model
 | ||
| 6) add bridge support for lan
 | ||
| 7) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
 | ||
| 8) get ipv6 address from pppoe
 | ||
| 9) get ipv6 delegation from pppoe and add prefix to lan
 | ||
| 10) support dhcp6 in dnsmasq, and advertise prefix on lan
 | ||
| 11) firewalling and nat - default deny or zero trust?
 | ||
| 13) should we check that references to outputs actually correspond with
 | ||
| 14) make a flake?
 | ||
| 15) see if there are other tests that need adding to CI
 | ||
| 15a) is bordervm derivation tested?
 | ||
| 18) gl-mt300a
 | ||
| 19) gl-mt300n-v2
 | ||
| 20) publish the manual using CI
 | ||
| 
 | ||
| 12) write secrets holder as a service with outputs
 | ||
| 16) writable filesystem (ubifs?)
 | ||
| 17) overlay with squashfs/ubifs - useful? think about workflows for how this thing is installed
 | ||
| 
 | ||
| 
 | ||
| I could plug tninkpad into the gl-ar750 LAN port to dogfood the wired
 | ||
| networking
 | ||
| 
 | ||
| Sat Feb 18 14:26:45 GMT 2023
 | ||
| 
 | ||
| Apparently we're not currently doing anything special with busybox,
 | ||
| just using the default nixos build with the default applets.
 | ||
| 
 | ||
| We'd like to be able to say in modules which applets they need,
 | ||
| so that we build all necessary applets but don't waste any space.
 | ||
| But we don't want to build a busybox for each module because that
 | ||
| would be a big waste of space.
 | ||
| 
 | ||
| One option:
 | ||
| - add busybox configuration to `config` so that modules can maul it
 | ||
| - add a busybox module that builds it with union of all config and
 | ||
|  adds link in /bin
 | ||
| - make everything else look in /bin instead of referencing pkgs.busybox
 | ||
| 
 | ||
| It would be good if services could assert somehow that their required
 | ||
| config is present
 | ||
| 
 | ||
| Sat Feb 18 23:45:13 GMT 2023
 | ||
| 
 | ||
| # lsmod
 | ||
| 
 | ||
| cd /lib/modules/mac80211
 | ||
| insmod ./compat/compat.ko
 | ||
| insmod ./net/wireless/cfg80211.ko
 | ||
| insmod ./net/mac80211/mac80211.ko
 | ||
| insmod ./drivers/net/wireless/ath/ath.ko
 | ||
| insmod ./drivers/net/wireless/ath/ath9k/ath9k_hw.ko
 | ||
| insmod ./drivers/net/wireless/ath/ath9k/ath9k_common.ko
 | ||
| insmod ./drivers/net/wireless/ath/ath9k/ath9k.ko
 | ||
| insmod ./drivers/net/wireless/ath/ath10k/ath10k_core.ko
 | ||
| insmod ./drivers/net/wireless/ath/ath10k/ath10k_pci.ko
 | ||
| 
 | ||
| [21.344930] ath9k 18100000.wmac: failed to load calibration data from mtd device
 | ||
| [21.352728] ath: phy0: parsing configuration from OF node
 | ||
| [21.362576] ath: phy0: serialize_regmode is 0
 | ||
| [21.367092] ath: phy0: UNDEFINED -> AWAKE
 | ||
| [21.372051] ath: phy0: Trying EEPROM access at Address 0x03ff
 | ||
| [21.377999] ath: phy0: Trying EEPROM access at Address 0x0fff
 | ||
| [21.383940] ath: phy0: Trying EEPROM access at Address 0x01ff
 | ||
| [21.389879] ath: phy0: Trying OTP access at Address 0x03ff
 | ||
| [21.400396] Data bus error, epc == 8027964c, ra == 83125880
 | ||
| [21.406156] Oops[#1]:
 | ||
| 
 | ||
| 
 | ||
| Sun Feb 19 18:15:27 GMT 2023
 | ||
| 
 | ||
| We have ath9k listening for packets. To make this ready to use:
 | ||
| 
 | ||
| - need to load the modules
 | ||
| - enable bridging lan with wlan
 | ||
| - packet forwarding
 | ||
| - firewall
 | ||
| 
 | ||
| 
 | ||
| Mon Feb 20 20:41:17 GMT 2023
 | ||
| 
 | ||
| need to fix all the other broken ci jobs :-(
 | ||
| 
 | ||
| The wlan test is failing because we moved mac80211 to a module and
 | ||
| there's nothing running to insmod it
 | ||
| 
 | ||
| Wed Feb 22 18:17:17 GMT 2023
 | ||
| 
 | ||
| bridge is e2b3738d0f8c3f2fd76ebcef65612de502a7b121 but it's the wrong
 | ||
| way around: the master interface needs to be up whether or not all
 | ||
| of its children are, so members depend on master not vice versa
 | ||
| 
 | ||
| Next steps:
 | ||
| - re-implement bridge, enable bridging lan with wlan
 | ||
| - packet forwarding
 | ||
| - firewall
 | ||
| - ath10k
 | ||
| - ipv6
 | ||
| 
 | ||
| Fri Feb 24 23:37:56 GMT 2023
 | ||
| 
 | ||
| bridging wlan was made complex because can't add a device to a bridge
 | ||
| until it's operational, and wlan0 is not operational until hostapd
 | ||
| has churned awhile. Therefore, "waitup" listens for netlink messages
 | ||
| and notifies s6 readiness stuff
 | ||
| 
 | ||
| we have a firewall nft script but we're not running it on boot
 | ||
| 
 | ||
| we have forwarding but no dns, maybe because we haven't told
 | ||
| dnsmasq about any upstream servers
 | ||
| 
 | ||
| Sun Feb 26 21:08:47 GMT 2023
 | ||
| 
 | ||
| to add firmware we need to put files in /lib/firmware, which means
 | ||
| a module
 | ||
| 
 | ||
| i guess we should do that in the device module
 | ||
| 
 | ||
| we can create the firmware files as packages
 | ||
| 
 | ||
| 
 | ||
| for the cal data we would like to get it from the device MTD "art"
 | ||
| partition at
 | ||
| boot time.
 | ||
| 
 | ||
| f
 | ||
| ====from openwrt
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| case "$FIRMWARE" in
 | ||
| "ath10k/cal-pci-0000:00:00.0.bin")
 | ||
|         case $board in
 | ||
|         allnet,all-wap02860ac|\
 | ||
|         araknis,an-500-ap-i-ac|\
 | ||
|         araknis,an-700-ap-i-ac|\
 | ||
|         engenius,eap1200h|\
 | ||
|         engenius,enstationac-v1|\
 | ||
|         glinet,gl-x750|\
 | ||
|         watchguard,ap300)
 | ||
|                 caldata_extract "art" 0x5000 0x844
 | ||
|                 ath10k_patch_mac $(macaddr_add $(mtd_get_mac_binary art 0x0) 2)
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| caldata_extract part offset count
 | ||
|       caldata_dd $mtd /lib/firmware/$FIRMWARE $count $offset || \
 | ||
|                 caldata_die "failed to extract calibration data from $mtd"
 | ||
|         dd if=$source of=$target iflag=skip_bytes,fullblock bs=$count skip=$offset count=1 2>/dev/null
 | ||
| 
 | ||
| =======
 | ||
| 
 | ||
| part=$(basename $(dirname $(grep -l art /sys/class/mtd/*/name)))
 | ||
| dd if=/dev/$part \
 | ||
|   of=/run/cal-pci-0000:00:00.0.bin iflag=skip_bytes,fullblock \
 | ||
|   bs=0x844 skip=0x5000 count=1
 | ||
| 
 | ||
| Mon Feb 27 22:46:37 GMT 2023
 | ||
| 
 | ||
| Found and fixed a bunchg of things that were stopping ath10k from
 | ||
| working. The remaining problem is (I think) that insmod is not
 | ||
| synchronous, so "ip link set up dev wlan1" doesn't work immediately
 | ||
| after the module is inserted. Maybe we need another netlink thing
 | ||
| to wait until the interface is present.
 | ||
| 
 | ||
| 
 | ||
| Wed Mar  1 18:26:44 GMT 2023
 | ||
| 
 | ||
| ath10k works, but the wlan module loading stuff is quite kludgey
 | ||
| 
 | ||
| I wonder if wlan0, wlan1, eth0, eth1 etc should be defined per-device
 | ||
| - how does the aplication config know which devices exist? If we
 | ||
| decide to switch to some form of persistent device naming, the names
 | ||
| will differ from one device to the next. Perhaps the device should
 | ||
| also provide standard names where possible?
 | ||
| 
 | ||
| services.network.links = {
 | ||
|   lan = interface { ... };
 | ||
|   wan = interface { ... };
 | ||
|   wlan_24 = interface { ... };
 | ||
|   wlan_5 = interface { ... };
 | ||
| }
 | ||
| 
 | ||
| Thu Mar  2 22:45:11 GMT 2023
 | ||
| 
 | ||
| We have a flashable image!
 | ||
| 
 | ||
| Now we can use the gl-ar750 for internet access in the shed, we can
 | ||
| apppropriate the other device that's in there and try Liminix on it
 | ||
| 
 | ||
| Fri Mar  3 23:08:58 GMT 2023
 | ||
| 
 | ||
| If we're going to unplug serial console from the gl-ar750 maybe we
 | ||
| should install an ssh server first.
 | ||
| 
 | ||
| 0) set a root password
 | ||
| 1) allow setting a root password from configuration.nix
 | ||
| (means defining config.users properly)
 | ||
| 2) allow authorizedKeys per user
 | ||
| 3) dropbear service
 | ||
| 4) see if the wired lan works! :-)
 | ||
| 
 | ||
| 
 | ||
| Sat Mar  4 12:31:07 GMT 2023
 | ||
| 
 | ||
| To improve logging, each service should have its own s6-log service
 | ||
| which prefixes the service name onto the log line and then sends to
 | ||
| stdout
 | ||
| 
 | ||
|   https://skarnet.org/software/s6/servicedir.html
 | ||
|   https://skarnet.org/software/s6/s6-log.html
 | ||
| 
 | ||
| As far as I can tell, the `log` directory inside the service
 | ||
| directory should itself be a service directory for the s6-log
 | ||
| process that does this
 | ||
| 
 | ||
| .... hahaha no that doesn't work
 | ||
| 
 | ||
| s6-rc, for some reason, ignores the `log` directory and requires
 | ||
| that loggers be done with consumer-for and producer-for instead
 | ||
| 
 | ||
| 
 | ||
| Sat Mar  4 23:27:00 GMT 2023
 | ||
| 
 | ||
| notes for this week's news update
 | ||
| 
 | ||
| * ath10k kernel support and and firmware
 | ||
| 
 | ||
| - 5GHz wifi works
 | ||
| 
 | ||
| - need to retrieve the firmware from a special - partition on the
 | ||
|   device itself, so we do that using a service that - the wlan
 | ||
|   interface depends on
 | ||
| 
 | ||
| * replace waitup with more generally useful ifwait
 | ||
| 
 | ||
| to make the ath10k load at boot, we need to insert the module and then
 | ||
| wait for it to do something or other in the background before we can
 | ||
| configure the interface. so we need something like waitup but
 | ||
| for presence not operational state
 | ||
| 
 | ||
| it turns out that a program that just waits for a particular interface
 | ||
| state and then exits is quite simple to add into run scripts and
 | ||
| we don't need all that notification-fd stuff anyway
 | ||
| 
 | ||
| * move FW_LOADER* config to modules/base
 | ||
| 
 | ||
| * rejig config a bit.
 | ||
| - device hardware characteristics are now under
 | ||
|   the `hardware` key and include the available network interfaces.
 | ||
| - options for users and groups are now defined a bit more
 | ||
|   specifically than "attrset", making it possible to e.g. set a
 | ||
|   root password
 | ||
| - dts is moved from `boot` to `hardware`
 | ||
| 
 | ||
| 
 | ||
| * now producing flashable images, so you can generate a liminix config
 | ||
| and write it to the device instead of having to boot using TFTP and
 | ||
| a serial console every time
 | ||
| 
 | ||
| * ssh support
 | ||
| 
 | ||
| * prefix logs with the service name
 | ||
| 
 | ||
| Sun Mar  5 22:51:21 GMT 2023
 | ||
| 
 | ||
| Added swconfig: it was a straight copy from nixwrt and hasn't changed
 | ||
| upstream since. But don't need it, because the lan port works fine
 | ||
| without it (I assume both lan ports and the cpu are all connected
 | ||
| untagged)
 | ||
| 
 | ||
| Mon Mar  6 09:42:33 GMT 2023
 | ||
| 
 | ||
| Today I plugged in the mt300a.
 | ||
| 
 | ||
| echo 17 >/sys/class/gpio/export
 | ||
| echo out >/sys/class/gpio/gpio17/direction
 | ||
| 
 | ||
| 
 | ||
| why are our images getting big
 | ||
| 
 | ||
| - lua links ncurses
 | ||
| - hostapd links openssl and sqlite
 | ||
| - nftables needs
 | ||
|   - iptables?
 | ||
|   - jansson? what is that?
 | ||
|   - libedit/readline
 | ||
| - ifwait needs bash
 | ||
| 
 | ||
| 
 | ||
|   File: result/squashfs
 | ||
|   Size: 10371072        Blocks: 20256      IO Block: 4096   regular file
 | ||
| 
 | ||
| with smaller nftables:    9617408        Blocks: 18784
 | ||
| 
 | ||
| hostapd wqithout sqlite   9003008        Blocks: 17584
 | ||
| 
 | ||
| without bash:             8622080         Blocks: 16840      IO Block: 4096   regular file
 | ||
| 
 | ||
| without lua readline: bigger?!  8769536         Blocks: 17128      IO Block: 4096   regular file
 | ||
| 
 | ||
| 
 | ||
| Mon Mar  6 20:57:49 GMT 2023
 | ||
| 
 | ||
| [    0.539992] mtk_soc_eth 10100000.ethernet: mdio-bus disabled
 | ||
| [   10.493918] platform regulatory.0: Direct firmware load for regulatory.db fail
 | ||
| ed with error -2
 | ||
| [   10.502828] cfg80211: failed to load regulatory.db
 | ||
| 
 | ||
| Check in morning, but whichever port the ethernet cable is plugged into,
 | ||
| is considered by the kernel as port 0 - which I think we should treat as
 | ||
| WAN
 | ||
| 
 | ||
| VLAN 1:
 | ||
|         vid: 1
 | ||
|         ports: 1 2 3 4 5 6t
 | ||
| VLAN 2:
 | ||
|         vid: 2
 | ||
|         ports: 0 6t
 | ||
| 
 | ||
| ip link add link eth0 name lan type vlan id 1
 | ||
| ip link add link eth0 name wan type vlan id 2
 | ||
| 
 | ||
| figure out how to add these to gl-mt300a device config
 | ||
| then extedner.nix can add a bridge
 | ||
| 
 | ||
| Tue Mar  7 20:13:15 GMT 2023
 | ||
| 
 | ||
| We need NTP or some other way to get accurate time
 | ||
| 
 | ||
| [done] Need to add regulatory.db somewhere standard, maybe modules/wlan?
 | ||
| 
 | ||
| Tue Mar  7 21:43:56 GMT 2023
 | ||
| 
 | ||
| When we get to phase 2, need to review how network interfaces and
 | ||
| their addresses interplay. It should be possible to have a network
 | ||
| interface and interrogate the addresses associated with it - esp
 | ||
| with ipv6 where there are multiple addresses for the device
 | ||
| 
 | ||
| This thought prompted by looking at the loopback interface, which is
 | ||
| a bundle of addresses and therefore we can't see what any of them are
 | ||
| 
 | ||
| 
 | ||
| Tue Mar  7 22:05:44 GMT 2023
 | ||
| 
 | ||
| [phase 1]
 | ||
| 20) publish the manual using CI
 | ||
| 30) document flashing process
 | ||
| 31) go through all the unexpected dmesg and triage it
 | ||
| 25) ntp or some other accurate time source
 | ||
| 
 | ||
| [phase 1.5]
 | ||
| 26) ssh keys
 | ||
| 8) get ipv6 address from pppoe
 | ||
| 9) get ipv6 delegation from pppoe and add prefix to lan
 | ||
| 10) support dhcp6 in dnsmasq, and advertise prefix on lan
 | ||
| 11) firewalling and nat - default deny or zero trust?
 | ||
| 7) upgrade ppp to something with an ipv6-up-script option, move ppp and pppoe derivations into their own files
 | ||
| 32) set up iperf and do some performance measurement
 | ||
| 35) also we need to check our wireless country code
 | ||
| 
 | ||
| 
 | ||
| [phase 2]
 | ||
| 3) document services so I can remember how they work. Refer back to Oct 18 for notes that no longer make sense
 | ||
| 4) check out restart behaviour of dependent services when depended-on service dies
 | ||
| 13) check that references to outputs correspond with declared outputs
 | ||
| 33) network interfaces vs the services that manage their addresses
 | ||
| 34) write a short guide explaining how to use s6-svc
 | ||
| 
 | ||
| [phase n]
 | ||
| 12) write secrets holder as a service with outputs
 | ||
| 16) writable filesystem (ubifs?)
 | ||
| 17) overlay with squashfs/ubifs - useful? think about workflows for how this thing is installed
 | ||
| 
 | ||
| 
 | ||
| dmesg lines to investigate for gl-mt300a:
 | ||
| [    0.467314] OF: Bad cell count for /palmbus@10000000/spi@b00/flash@0/partition
 | ||
| [    0.539709] mtk_soc_eth 10100000.ethernet: mdio-bus disabled  ?
 | ||
| [    8.778513] compat: loading out-of-tree module taints kernel.
 | ||
| [   17.686561] ieee80211 phy0: rt2800_wait_bbp_rf_ready: Error - BBP/RF register access failed, aborting
 | ||
| [   17.696025] ieee80211 phy0: rt2800_loft_iq_calibration: Warning - RF RX busy in LOFT IQ calibration
 | ||
| [   17.875147] ieee80211 phy0: rt2800_rxiq_calibration: Warning - Timeout waiting for MAC status in RXIQ calibration
 | ||
| 
 | ||
| for gl-ar750:
 | ||
| [    0.000000] Unknown kernel command line parameters "earlyprintk=serial,ttyS0", will be passed to user space.
 | ||
| [    0.416679] OF: Bad cell count for /ahb/spi@1f000000/flash@0/partitions
 | ||
| [    0.825495] ag71xx 19000000.eth: Could not connect to PHY device. Deferring probe.
 | ||
| [    1.632700] pci_bus 0000:00: root bus resource [mem 0x10000000-0x13ffffff]
 | ||
| [    1.639824] pci_bus 0000:00: root bus resource [io  0x0000]
 | ||
| [    1.645601] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
 | ||
| [   32.032326] ath10k_pci 0000:00:00.0: pdev param 0 not supported by firmware
 | ||
| [   36.627844] ath10k_pci 0000:00:00.0: failed to receive initialized event from target: 00000000
 | ||
| 
 | ||
| 
 | ||
| Fri Mar 10 13:17:56 GMT 2023
 | ||
| 
 | ||
| Lunchtime notes on images for real devices, vs ci.nix
 | ||
| 
 | ||
| * successfully building an image doesn't mean that the image boots
 | ||
| or does anything useful
 | ||
| 
 | ||
| * don't want to faff with serial wires on every device every time
 | ||
| to test it. so
 | ||
| 
 | ||
| * ideally, build ram-based images of rotuer, extneder, arhcive in CI
 | ||
| with a watchdog timer that will reboot if it can't see the network
 | ||
| 
 | ||
| * figure out how to boot into the new image from an ssh connection.  I
 | ||
| assume the challenging bit here is grabbing x MB of contiguous phys
 | ||
| mem after boot: I think we'd have to reserve it at _first_ boot and
 | ||
| then somehow copy into it before rebooting
 | ||
| 
 | ||
| An easier first goal might be a tool to flash from the shell command line,
 | ||
| but that runs a greater risk of bricking
 | ||
| 
 | ||
| 
 | ||
| Fri Mar 10 14:35:40 GMT 2023
 | ||
| 
 | ||
| programs.busybox = {
 | ||
|   enable = true;
 | ||
|   applets = [... ];
 | ||
|   config = {
 | ||
|   };
 | ||
| }
 | ||
| 
 | ||
| Fri Mar 10 23:49:04 GMT 2023
 | ||
| 
 | ||
| Well, we have the backup host config up and running - though haven't
 | ||
| plugged it back into its disk yet.
 | ||
| 
 | ||
| For task 1 what remains is
 | ||
| 
 | ||
| 1) ntp sync
 | ||
| 2) write up the flashing procedure
 | ||
| 3) a video?
 | ||
| 
 | ||
| 
 | ||
| Sat Mar 11 13:58:20 GMT 2023
 | ||
| 
 | ||
| ================== for video
 | ||
| 
 | ||
| what is liminix
 | ||
| - nix-based system for creating OS images for routers
 | ||
| - not "nixos on your router"
 | ||
|   - nixos-like module system,
 | ||
|   - musl for libc
 | ||
|   - s6/s6-rc for services
 | ||
|   - entirely cross-compiled
 | ||
| 
 | ||
| why am i making a video?
 | ||
| - unless you have a suitable spare device to install on,
 | ||
|   and you want to take it apart, it's currently hard to
 | ||
|   take liminix for a spin
 | ||
| - I have these things, so I can give you a tour
 | ||
| 
 | ||
| let's have a look at how the hardware's hooked up
 | ||
| 
 | ||
| web site & manual
 | ||
| 
 | ||
| a config file:
 | ||
| - observe the comments:
 | ||
|   - not going to spend ages on this because it's not in its final form.
 | ||
|   - as we get more configs for more use cases, we will
 | ||
|     get a better feel for what can be abstracted
 | ||
|   - that will come later: work so far has been on the
 | ||
|     hardware support side
 | ||
| 
 | ||
| to show that it builds, we're going to add a package. otherwise,
 | ||
| everything from this build is probably already cached
 | ||
| 
 | ||
| build the config
 | ||
| 
 | ||
| tftpboot on hardware
 | ||
| 
 | ||
| flash on hardware
 | ||
| 
 | ||
| show ci
 | ||
| 
 | ||
| show a qemu target
 | ||
| 
 | ||
| Mon Mar 13 22:46:46 GMT 2023
 | ||
| 
 | ||
| 1) rsync on arhcive is failing because no nogroup group
 | ||
| 
 | ||
|   "/nix/store/gfzzl157r8xyp38lpcfxydkiiy6zrs3c-rsync-3.2.6/bin/rsync" "--verbose" "--stats" "--password-file" "/etc/nixos/secrets/arhcive-rsync" "-rltgoDz" "/var/spool/backup" "backup@arhcive.lan::srv/"
 | ||
| @ERROR: invalid gid nogroup
 | ||
| rsync error: error starting client-server protocol (code 5) at main.c(1837) [sender=3.2.6]
 | ||
| 
 | ||
| 2) we can run findfs in a loop until the disk appears
 | ||
| 
 | ||
| 3) still haven't decided how to do ntp but maybe we should just use
 | ||
| the busybox one
 | ||
| 
 | ||
| 4) some way to do upgrades over the wire
 | ||
| 
 | ||
| - boot with reserved mem and a phram device at 110-128MB even in the
 | ||
|   flashable version
 | ||
| 
 | ||
| - watchdog timer in kernel
 | ||
| 
 | ||
| - kexec in kernel
 | ||
| 
 | ||
| - userland service to feed the dog as long as local network is up
 | ||
|  (may need to start it a couple of minutes after boot, how do we
 | ||
|   do that?)
 | ||
| 
 | ||
| - can we use flashcp on a phram mtd?
 | ||
| 
 | ||
| 5) maybe setup a vhost for hydra or something
 | ||
| 
 | ||
| [nix-shell:~/t]$ wget --reject-regex '\?' -D localhost -N -r --exclude-directories=/api --level=2 --convert-links -e robots=off http://localhost:3003/jobset/liminix/build/
 | ||
| 
 | ||
| is almost a mirror
 | ||
| 
 | ||
| Tue Mar 14 20:17:35 GMT 2023
 | ||
| 
 | ||
| - do we have a phram mtd? need config for size and location
 | ||
| 
 | ||
| - how do we set the boot device
 | ||
|  - for first boot need to boot real flash, use  dtb, ignore bootloader args
 | ||
|  - for kexec, boot phram, specify args somehow (could rewrite dtb)
 | ||
| 
 | ||
|  => can use same kernel for both if we can give kexec a dtb with
 | ||
|     different params, which seems to be possible
 | ||
| 
 | ||
| so we need a module for the initial kernel to say
 | ||
| - create phram mtd
 | ||
| - boot from real mtd (will be index + 1)
 | ||
| - enable KEXEC in kernel
 | ||
| - add kexec-tools
 | ||
| 
 | ||
| and for the kernel we boot into
 | ||
| - most of the above
 | ||
| - except for the boot device
 | ||
| - create an output with objects that kexec(8) can parse
 | ||
| 
 | ||
| Could be same module for both with different outputs
 | ||
| 
 | ||
| what do we call this thing? "revertable"
 | ||
| 
 | ||
| Wed Mar 15 19:11:09 GMT 2023
 | ||
| 
 | ||
| "revertable" implies mtd support for the rootfs and a ramdisk at
 | ||
| defined location
 | ||
| 
 | ||
| "tftpboot" implies "revertable", because it will use the same ramdisk
 | ||
| 
 | ||
| 
 | ||
| Fri Mar 17 11:44:40 GMT 2023
 | ||
| 
 | ||
| - patch the kernel kexec code to pass DTB to new kernel unconditionally
 | ||
| - unpatch pernel to pass command line to kexec (breaks DTB passing)
 | ||
| - decide how we specify rootfs. doing it by number is awkward
 | ||
|   - may be phram
 | ||
|   - may be real mtd root
 | ||
|   - may be real mtd root but renumbered becuase phram exists
 | ||
| 
 | ||
| Wild idea: we could probably get rid of the need for declaring a phram
 | ||
| device in the first kernel, if we can use kexec to copy the squashfs into
 | ||
| physical ram. As far as I can see this is a simple(sic) matter of
 | ||
| specifying it as a segment, but we would have to extend kexec-tools
 | ||
| to do this and it's quite a niche option if we make it do all the
 | ||
| mtd setup.
 | ||
| 
 | ||
| kexec --dtb=foo.dtb --map-file=squashfs@0x120000
 | ||
| 
 | ||
| 
 | ||
| Sat Mar 18 18:02:26 GMT 2023
 | ||
| What if: we added derivations for "apply openwrt changes" as packages,
 | ||
| which could then be called from the kernel derivation's extraPatchPhase?
 | ||
| There could be one for generic and one for each openwrt targetop
 | ||
| 
 | ||
| Mon Mar 20 18:40:53 GMT 2023
 | ||
| 
 | ||
| - kexec patch is sent to mailing list, keep an eye for replies
 | ||
| - watchdog
 | ||
| - ntp
 | ||
| - rebuild images for live devices
 | ||
| - can we build a static busybox with flashcp applet and scp it
 | ||
|  to arhcive etc?
 | ||
| - [DONE] install mailman and hyperkitty on myhtic, create mailing list
 | ||
| 
 | ||
| Tue Mar 21 22:59:54 GMT 2023
 | ||
| 
 | ||
| I haven't found a way to arm the watchdog before userland runs, which
 | ||
| would be really nice: although there's WATCHDOG_HANDLE_BOOT_ENABLED
 | ||
| and WATCHDOG_OPEN_TIMEOUT it doesn't seem to be sufficient, Maybe
 | ||
| those options work only when the hardware watchdog is already
 | ||
| armed. It might not be completely awful insofar as any failure to
 | ||
| mount root usually results in panic anyway, so provided we start
 | ||
| watching early in boot then there's not a big window for anything
 | ||
| to go wrong
 | ||
| 
 | ||
| What should the watchdog service do?  Ideally we want something that
 | ||
| "ratchets" : can be started in early boot and signals health as long
 | ||
| as the system is starting up, then once the system is in "steady
 | ||
| state" it stops pinging as soon as any part of that steady state
 | ||
| becomes unhealthy. This feels like a refinement for a much later
 | ||
| phase though.
 | ||
| 
 | ||
| Maybe the health criteria might be
 | ||
| (sshd and lan services are running) or (time since boot < 120s)
 | ||
| 
 | ||
| Thu Mar 23 00:11:23 GMT 2023
 | ||
| 
 | ||
| tftpboot and (kexecboot || flashable) have incompatible DTB-finding
 | ||
| strategies which is painful if you add both modules and then
 | ||
| expect tftp booting to still work
 | ||
| 
 | ||
| Maybe we could patch the kernel to use some better strategy for when
 | ||
| to use/ignore the bootloader command line: e.g "only if it
 | ||
| contains the string 'liminix'". Could do this by patching
 | ||
| arch/mips/kernel/setup.c bootcmdline_init to
 | ||
| 
 | ||
| if(strstr(arcs_cmdline, "liminix") == NULL) arcs_cmdline[0] = '\0'
 | ||
| and then defining CONFIG_MIPS_CMDLINE_DTB_EXTEND. The
 | ||
| bootloader command line then needs to specify only the
 | ||
| _additional_ parameters that weren't in the DTB
 | ||
| 
 | ||
| (later: that turned out to be quite straighforward)
 | ||
| 
 | ||
| Fri Mar 24 23:45:12 GMT 2023
 | ||
| 
 | ||
| - add ntp support
 | ||
| - [DONE] expose hydra to internet
 | ||
| - check MAC address weirdness?
 | ||
| - call Task 1 "done"
 | ||
| 
 | ||
| Sun Mar 26 00:19:15 GMT 2023
 | ||
| 
 | ||
| Would be nice to have a flash.sh built in outputs.flashable
 | ||
| 
 | ||
| Sun Mar 26 15:27:14 BST 2023
 | ||
| 
 | ||
| Let's think about services and modules.
 | ||
| 
 | ||
| Module
 | ||
| + can change global config
 | ||
|   * add users, groups etc
 | ||
|   * change kernel config
 | ||
|   * change busybox config
 | ||
| + well-typed parameters
 | ||
| - is a "singleton": can't have the same module included twice
 | ||
|   with different config. e.g. can't have two hostap modules running on
 | ||
|   different wlan radios.
 | ||
| - can't express dependencies: a depends on b
 | ||
| 
 | ||
| suppose:
 | ||
| 
 | ||
| * modules add service functions to the config? then there's no
 | ||
| way to define a service while forgetting to import the module
 | ||
| 
 | ||
| * we use the lib.types stuff for service function arguments
 | ||
| 
 | ||
| * maybe we stop naming services.foo for every damn thing
 | ||
| 
 | ||
| * but remember, s6 services do need unique names
 | ||
| 
 | ||
| 
 | ||
| imports = [ ../modules/dhcp4 ];
 | ||
| 
 | ||
| services.dhcp4 = config.services.udhcp {
 | ||
|   interface = lan.device;
 | ||
|   options = {
 | ||
|     foo = true;
 | ||
|     bar = 42;
 | ||
|   };
 | ||
|   depends = [ services.some_other_thing ];
 | ||
| }
 | ||
| 
 | ||
| modules/dhcp4 udhcp fn needs to define a type for its argument, then
 | ||
| use something like
 | ||
| 
 | ||
|   if arg_type.check def.value then res
 | ||
|             else throw "The option value `${showOption loc}' in `${def.file}' is not a ${arg_type.name}.")
 | ||
| 
 | ||
| (where def comes from I don't know yet)
 | ||
| 
 | ||
| Tue Mar 28 10:44:40 BST 2023
 | ||
| 
 | ||
| we should reserve the name "service" for actual instantiated
 | ||
| services. This means we need a name for the functions that
 | ||
| make services. "class", "template", "fn", "maker", "factory"?
 | ||
| And a namespace name so they're not interleaved with real
 | ||
| services, which sort of suggests they are packages
 | ||
| 
 | ||
| if we want to do services = {
 | ||
|  foo = longrun { ... };
 | ||
|  bar = longrun { ... };
 | ||
| }
 | ||
| 
 | ||
| without repeating the `name` as an attribute of the longrun,
 | ||
| then longrun can't return a derivation: it has to return some
 | ||
| function that accepts `name` as a parameter.
 | ||
| 
 | ||
| where services.a depends on services.b, at the time its builder is run
 | ||
| it needs to know what name s6-rc will use for service b
 | ||
| 
 | ||
| maybe an s6 service definition should be an attrset not a derivation.
 | ||
| 
 | ||
| maybe this is outside scope for phase 2
 | ||
| 
 | ||
| Tue Mar 28 13:22:06 BST 2023
 | ||
| 
 | ||
| Reading nixos/doc/manual/development/building-parts.chapter.md it
 | ||
| suggests to me that we should rename config.outputs to
 | ||
| config.system.outputs. The more general question here is whether it's
 | ||
| good to be augmenting a variable called "config" with all this
 | ||
| generated stuff that is patently not configuration - perhaps putting
 | ||
| it under a "system" key will keep it all in one place
 | ||
| 
 | ||
| Tue Mar 28 13:32:30 BST 2023
 | ||
| 
 | ||
| how should we handle filesystem state? e.g. resolvconf service
 | ||
| 
 | ||
| if a service provides a file at a known global pathname, it can't be
 | ||
| parametrised - it must be a singleton.
 | ||
| 
 | ||
| Tue Mar 28 20:25:20 BST 2023
 | ||
| 
 | ||
| wondering if we should swap phases 2 and 3. We can't really address
 | ||
| modules without addressing services, which is phase +n, whereas we
 | ||
| can tackle overlay/ubi whenever
 | ||
| 
 | ||
| nand flash may have bad blocks
 | ||
| nor flash (supposedly) doesn't
 | ||
| 
 | ||
| ubi provides erase counts and bad block remapping on top of the mtd
 | ||
| interface. this means we should avoid flashcp of a ubi image straight
 | ||
| onto (nand) mtd as we will lose the erase counts and bad block information
 | ||
| that UBI tracks.
 | ||
| 
 | ||
| overlayfs works on a filename basis, so might not be very effective :
 | ||
| any change that results in a new store path will mean the entire package
 | ||
| appears in two places. I think it's reasonable to offer squashfs or
 | ||
| ubifs without overlay.
 | ||
| 
 | ||
| open questions:
 | ||
| 
 | ||
| 1) if uboot doesn't support UBI, we can't boot a kernel on a ubifs
 | ||
| so we need reserved space for the kernel.
 | ||
| 
 | ||
| - unless we add some padding after the kernel, every new kernel
 | ||
| that's bigger than its predecessor will trash the start of the
 | ||
| ubi space (and wipe out its erase count)
 | ||
| - This suggests we should build more stuff as modules and less as
 | ||
| compiled-in
 | ||
| 
 | ||
| 2) once a device has had a ubi volume created on it, probably we want
 | ||
| to use ubi-aware tools to update that volume in future instead of a
 | ||
| whole new flash, because we wish to preserve erase counts. This means
 | ||
| running ubiformat --image-file=foo.ubi on the device instead of flashcp
 | ||
| 
 | ||
| we can add a "ubi-flashable" output that creates a .ubi image and
 | ||
| a flashcp image that wraps it, with instructions on which to use.
 | ||
| 
 | ||
| Fri Mar 31 22:13:54 BST 2023
 | ||
| 
 | ||
| Error: too small LEB size 3968, minimum is 15360
 | ||
| 
 | ||
| > This error means that you are trying to mount too small UBI
 | ||
| volume. Probably because your flash is too small? Try to use JFFS2,
 | ||
| then, because it suits small flashes better since it has much lower
 | ||
| space overhead. Indeed, UBIFS stores much more indexing information on
 | ||
| the flash media than JFFS2, so it has much higher overhead. Also, UBI
 | ||
| has some overhead (see here). Thus, if you have a small flash device
 | ||
| (e.g., about 64MiB), it makes sense to consider using JFFS2.
 | ||
| 
 | ||
| 
 | ||
| Argh. Oh well,
 | ||
| 
 | ||
| Sat Apr  1 15:27:39 BST 2023
 | ||
| 
 | ||
| There's limited value in recreating pseudofiles for jffs2 because
 | ||
| the system is writable - changes made to /bin, /dev etc in config.filesystem
 | ||
| should take effect on a running system.
 | ||
| 
 | ||
| Can we take inspiration from https://grahamc.com/blog/erase-your-darlings/ ?
 | ||
| 
 | ||
| in early boot:
 | ||
|  mount ramfs on /
 | ||
|  mount the writeable filesystem on /persist/
 | ||
|  bind mount /persist/nix on /nix
 | ||
|  run script to populate rootfs from pseudofiles
 | ||
| 
 | ||
| on a router, do we need _anything_ persistent that's outside the store?
 | ||
| 
 | ||
|  - state for dhcp leases and stuff
 | ||
|  - secrets
 | ||
|  - maybe, files that the user has downloaded
 | ||
| 
 | ||
| this will probably require initramfs. if just use jffs2 as the rootfs and
 | ||
| don't worry about /persist, we can skip that step.
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| [ aside: I think we may be putting two busyboxes in the image:
 | ||
| see modules/s6/default.nix s6-init-scripts has buildInputs = [busybox];  ]
 | ||
| 
 | ||
| Mon Apr  3 18:34:26 BST 2023
 | ||
| 
 | ||
| suppose
 | ||
| - we boot the system with systemConfig=/nix/store/eeeeee-system
 | ||
| - the early-init script runs /target/$systemConfig/create-root /target
 | ||
|   after mounting /target
 | ||
| - then it runs chroot /target $systemConfig/bin/init "$@"
 | ||
| 
 | ||
| or maybe we could combine those steps?
 | ||
| or maybe it doesn't matter too much ...
 | ||
| 
 | ||
| Thu Apr  6 21:25:41 BST 2023
 | ||
| 
 | ||
| what now?
 | ||
| 
 | ||
| - put a jffs2 onto some hardware device
 | ||
|   - what do we do with uboot?
 | ||
|   - should we pad the kernel?
 | ||
|   - maybe kernel module support would be good if we're making it
 | ||
|     hard to do kernel updates
 | ||
| - try the nix-copy-closure thing and work out what else we don't know
 | ||
| - [done] detect endian correctly
 | ||
| 
 | ||
| 
 | ||
| to ask a different question, what else do we need to dogfood a router?
 | ||
| 
 | ||
| Sun Apr  9 10:06:08 BST 2023
 | ||
| 
 | ||
| - rename outputs.flashable to outputs.flashimage
 | ||
| - rename modules/flashable to modules/flashable_ro
 | ||
| - create outputs.flashable in modules/jffs2
 | ||
| - rename modules/jffs2 to modules/flashable_rw
 | ||
| - add enable config to both?
 | ||
| - enable kernel module compilation
 | 
