5.5 KiB
lan-power-ctrl — LAN power-cycle for wedged hosts
When a host's userspace freezes (D-state, OOM, disk hang), the kernel still answers ICMP and accepts TCP, but no daemon completes a request. SSH banner never arrives → unreachable. The only recovery is a hardware power-cycle.
This doc covers the hardware to buy, the network topology, and the two
scripts (host-probe, power-cycle) that turn "apricot froze" into a
one-liner from any machine on the LAN/VPN.
Diagnosis vs recovery
layer healthy host frozen userspace unreachable host
───── ──────────── ──────────────── ────────────────
ICMP reply <50ms reply <50ms no reply
TCP accept :22 succeeds succeeds connection refused / timeout
SSH banner arrives <100ms never arrives —
host-probe up wedged down
power-cycle n/a needed power's the cure regardless
A wedged classification is the signature that a remote shell can't help —
every login path needs the same userspace that's stuck. Cut power.
Shopping list
Per host to be remotely recoverable
| Device | Where | Purpose | Link |
|---|---|---|---|
| Shelly Plus Plug US (Gen2) or Shelly Plug US Gen4 | Between the host's PSU and the wall | Local HTTP RPC, no cloud | Plus (Gen2) · Gen4 |
Either generation works — Gen4 adds power monitoring and Matter, both share
the same Gen2-compatible /rpc/Switch.Set API the power-cycle script uses.
Buy whichever is in stock and cheapest at order time. ~$25.
For the modem (separate problem — can't rescue itself over its own WiFi)
| Device | Where | Purpose | Link |
|---|---|---|---|
| SwitchBot Plug Mini | Modem | BLE-controllable plug | product |
| USB Bluetooth dongle | Plugged into black (X399 boards have no integrated BT — see reference-host-hardware) | Lets black drive BLE to the modem plug | Search "USB Bluetooth 5 adapter CSR8510 or RTL8761" (~$10, plug-and-play under bluez) |
Why black and not apricot: using apricot to rescue the modem couples failure modes — apricot is also a recoverable host. Black is the independent watchdog.
Why not put the SwitchBot Plug Mini on a host (so WiFi-controlled)? It can be — but for the modem you can't, because the modem dying takes WiFi down with it. BLE from a LAN-attached watchdog works regardless of WAN state, assuming separate modem and router (see Caveat).
Caveat: combined modem-router vs separate
If your ISP gave you a combined modem-router (one box does both), modem-dead also means LAN-dead. Nothing on the LAN can rescue it. Options narrow to: phone/laptop in BLE range, or an auto-rebooter plug (watchdog built into the plug itself — pings a target, power-cycles on failure).
If modem and router are separate (typical home-server setup), LAN stays alive when WAN dies, and the black-as-watchdog plan works as designed.
One-time setup after plugs arrive
- Set the Shelly's own restore behavior (so it remembers its on/off
state across a wall outage):
curl 'http://<plug>/rpc/Switch.SetConfig?id=0&config={"initial_state":"restore_last"}' - Set the host's BIOS to "Restore on AC power loss" (so when the Shelly restores power, the host actually boots up rather than sitting dark).
- Add the plug to the config:
Get# ~/.config/power-cycle/plugs.conf apricot http://<plug-ip><plug-ip>from the router DHCP table orarp -a | grep -i shelly.
Scripts
host-probe — classify reachability
host-probe apricot.lan # → up | wedged | down (one-shot)
host-probe --watch apricot.lan # loop; emits one line per state change
HOST_PROBE_INTERVAL=10 HOST_PROBE_TIMEOUT=2 host-probe --watch apricot.lan 22
Three independent probes: ICMP, TCP accept, SSH banner. The banner timing is
what distinguishes a wedged host from a healthy one — a real sshd flushes
its SSH-2.0-… banner within milliseconds of connect.
Composes with the harness Monitor tool: each state-change line becomes one
notification.
power-cycle — Shelly Gen2 RPC wrapper
power-cycle apricot # off → POWER_CYCLE_OFF_SECS (default 5) → on
power-cycle off|on|status apricot
power-cycle list # show configured host → plug mappings
Config: ~/.config/power-cycle/plugs.conf, one line per host:
<host> <plug-base-url>
Errors covered: missing config, unknown host, plug unreachable (with hint to fall back to BLE for modem outages), unrecognized Shelly response.
Future: modem watchdog daemon on black
Once the USB BT dongle and SwitchBot plug arrive:
bleak-based Python daemon, runs assystemd --userunit on black- Pings WAN (e.g.
1.1.1.1) every 30s - After N consecutive failures, BLE-toggles the SwitchBot plug off → 10s → on
- Logs to journal; emits a notification via the existing apricot speech-synthesis path (see reference-rvoice) when it acts
Not implemented yet — waiting on hardware.
Files
| Path | Role |
|---|---|
bin/host-probe |
Three-layer reachability probe |
bin/power-cycle |
Shelly Gen2 RPC wrapper |
~/.config/power-cycle/plugs.conf |
Per-host plug URL mapping (user-created) |