diff --git a/README.md b/README.md index 57c175c..d555fb2 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,13 @@ Components: - **`bin/net`** — **the one command**: `status · whoami · doctor · issues · sync · up · down · enroll phone · gui`. Imports the agent as a library, so every surface shares one implementation. The renderers (`host-apply`, - `mesh-hosts-render`, `wg-dns-sync`, `fleet-status`) remain as internals/direct - tools. + `mesh-hosts-render`, `wg-dns-sync`, `wg-render`, `fleet-status`) remain as + internals/direct tools. +- **`bin/wg-render`** — renders THIS host's `/etc/wireguard/wg1.conf` from the + source of truth (the piece that used to be hand-built). Multi-segment hub model + (see below). `--keygen` bootstraps a fresh host's key and prints the pubkey to + paste back as `wg_pubkey`; `--apply` installs + `wg syncconf` (idempotent, + rolls back on failure); `--dry-run`/`--whoami` for inspection. - **[`data/known-issues.json`](data/known-issues.json)** — the **triage registry**: features that are known-broken or intentionally parked. `net issues` lists them; `net doctor ` annotates each host with its parked @@ -56,6 +61,26 @@ the WireGuard app with `DNS=10.9.0.2`. Current: **strawberry** (alias `phone-quinn`, ios, `10.9.0.5`). Enroll new ones with `wg-phone-add`, then add the entry. +## WireGuard segments (multi-hub) + +A **segment** is a WireGuard hub plus its spokes. `mesh.segments` maps a segment +name to `{ hub, endpoint, dns_host, dns_listen }`; each host carries `segment` +(which it belongs to) and `wg_pubkey` (its public key — the private key never +leaves the box). `bin/wg-render` reads this and emits the right `wg1.conf`: the +segment **hub** gets a `[Peer]` per spoke (+ `ip_forward`/MASQUERADE so it relays); +a **spoke** gets one `[Peer]` = its hub (Endpoint + `AllowedIPs = mesh cidr`). + +Today: **iceland** (hub `yuzu`) and **nyc3** (hub `citron`) are independent stars — +the DO/nyc3 droplets (lime/redroid/…) peer `citron`, not the Iceland box. +`wg-dns-sync` is segment-aware too: the segment's `dns_host` binds its own +`dns_listen` (citron serves nyc3 on `10.9.0.7`; apricot still serves on `10.9.0.2`). +Hosts with no `segment` fall back to the legacy single hub (`mesh.hub`). + +Bootstrap a new cloud host: provision (DO droplet, reverse-DNS name +`com.uvlava..`), `ssh` in and run `wg-render --keygen`, paste the +printed pubkey into that host's `wg_pubkey` in `mesh-hosts.json`, then +`wg-render --apply` on the host and on its hub. + ## Naming: one rule per suffix - **bare ``** and **`.lan`** → the host's **current LAN IP** diff --git a/bin/test b/bin/test new file mode 100755 index 0000000..38bd4b2 --- /dev/null +++ b/bin/test @@ -0,0 +1,56 @@ +#!/bin/sh +# test — run the net-tools test suite (no root required). +set -eu + +self=$0 +while [ -L "$self" ]; do + link=$(readlink "$self") + case $link in /*) self=$link ;; *) self=$(dirname "$self")/$link ;; esac +done +root=$(cd "$(dirname "$self")" && pwd) +while [ "$root" != "/" ] && [ ! -f "$root/data/mesh-hosts.json" ]; do root=$(dirname "$root"); done + +echo "==> python unit tests" +python3 "$root/smart-lan-router/test_smart_lan_router.py" + +echo "==> shell syntax" +sh -n "$root/bin/mesh-hosts-render" +sh -n "$root/bin/host-apply" +sh -n "$root/bin/fleet-status" +sh -n "$root/bin/wg-dns-sync" +sh -n "$root/bin/wg-render" + +echo "==> wg-render multi-segment" +hub_who=$(WG_RENDER_SELF=citron "$root/bin/wg-render" --whoami 2>/dev/null) +case "$hub_who" in *"role=hub"*) : ;; *) echo "FAIL: citron not resolved as hub ($hub_who)" >&2; exit 1 ;; esac +# Hub config must include a [Peer] block for its spoke(s). +WG_RENDER_SELF=citron "$root/bin/wg-render" --dry-run 2>/dev/null | grep -q '^\[Peer\]' \ + || { echo "FAIL: citron hub render has no [Peer]" >&2; exit 1; } +# Spoke config must point at the segment hub endpoint with a keepalive. +spoke=$(WG_RENDER_SELF=lime "$root/bin/wg-render" --dry-run 2>/dev/null) +case "$spoke" in *"Endpoint = 143.244.223.5:51820"*"PersistentKeepalive"*) : ;; + *) echo "FAIL: lime spoke render missing hub endpoint/keepalive" >&2; exit 1 ;; esac +echo "ok: citron=hub peers spokes, lime=spoke -> citron" + +echo "==> wg-dns-sync segment listen" +cit_listen=$(WG_DNS_SELF=citron "$root/bin/wg-dns-sync" --dry-run 2>/dev/null | sed -n 's/^listen-address=//p') +apr_listen=$(WG_DNS_SELF=apricot "$root/bin/wg-dns-sync" --dry-run 2>/dev/null | sed -n 's/^listen-address=//p') +case "$cit_listen" in *10.9.0.7*) : ;; *) echo "FAIL: citron dns listen not nyc3 ($cit_listen)" >&2; exit 1 ;; esac +case "$apr_listen" in *10.9.0.2*) : ;; *) echo "FAIL: apricot dns listen changed ($apr_listen)" >&2; exit 1 ;; esac +echo "ok: citron serves nyc3 ($cit_listen), apricot unchanged ($apr_listen)" + +echo "==> wg-dns-sync overlay" +lan_ip=$("$root/bin/wg-dns-sync" --dry-run | sed -n 's|address=/apricot\.lan/\([^[:space:]#]*\).*|\1|p' | head -1) +state_ip=$(jq -r '.apricot // empty' "$root/data/lan-state.json" 2>/dev/null || true) +seed_ip=$(jq -r '.hosts[] | select(.name=="apricot") | .lan' "$root/data/mesh-hosts.json") +if [ -n "$state_ip" ] && [ "$lan_ip" != "$state_ip" ]; then + echo "FAIL: apricot.lan is $lan_ip, expected overlay $state_ip" >&2 + exit 1 +fi +if [ -z "$state_ip" ] && [ "$lan_ip" != "$seed_ip" ]; then + echo "FAIL: apricot.lan is $lan_ip, expected seed $seed_ip" >&2 + exit 1 +fi +echo "ok: apricot.lan -> ${lan_ip:-?}" + +echo "==> all passed" \ No newline at end of file diff --git a/bin/wg-dns-sync b/bin/wg-dns-sync index f5e18eb..84ea884 100755 --- a/bin/wg-dns-sync +++ b/bin/wg-dns-sync @@ -1,6 +1,6 @@ #!/bin/sh # wg-dns-sync — render dnsmasq records for the wg1 mesh from data/mesh-hosts.json -# and (re-)install them to /etc/dnsmasq.d/wg-mesh.conf on the local host. +# (+ data/lan-state.json overlay) and install to /etc/dnsmasq.d/wg-mesh.conf. # # Source of truth: data/mesh-hosts.json (located by walking up from this script, # resolving symlinks first — so it works when invoked via a @@ -13,7 +13,7 @@ # # Renders the host records (both views) into one conf, from hosts[]: # 1. .wg -> mesh IP (10.9.0.x) -# 2. .lan -> LAN IP (10.0.0.x) (hosts that have a lan IP) +# 2. .lan -> current LAN IP (lan-state overlay over static seed) # (The old *.local platform service records are RETIRED — platform uses .com, # infra uses .lan — and are no longer rendered here.) # @@ -62,8 +62,31 @@ target=/etc/dnsmasq.d/wg-mesh.conf command -v jq >/dev/null || { echo "wg-dns-sync: jq not installed" >&2; exit 1; } jq empty "$data_file" || { echo "wg-dns-sync: invalid JSON in $data_file" >&2; exit 1; } -listen=$(jq -r '.dnsmasq.listen_address // empty' "$data_file") -[ -n "$listen" ] || { echo "wg-dns-sync: missing .dnsmasq.listen_address" >&2; exit 1; } +# Segment-aware listen address: if THIS host is a segment's dns_host, bind that +# segment's dns_listen (e.g. citron -> nyc3 -> 127.0.0.1,10.9.0.7); otherwise fall +# back to the legacy global .dnsmasq.listen_address (apricot's historical behavior). +# WG_DNS_SELF overrides self-detection (tests / deliberate ops). +if [ -n "${WG_DNS_SELF:-}" ]; then + dns_self=$WG_DNS_SELF +else + dns_self=$(hostname 2>/dev/null | cut -d. -f1); [ -n "$dns_self" ] || dns_self=$(uname -n | cut -d. -f1) +fi +seg_listen=$(jq -r --arg s "$dns_self" ' + (.mesh.segments // {}) | to_entries[] + | select(.value | type == "object") + | select(.value.dns_host == $s) | .value.dns_listen' "$data_file" 2>/dev/null | head -1) +if [ -n "$seg_listen" ] && [ "$seg_listen" != "null" ]; then + listen=$seg_listen +else + listen=$(jq -r '.dnsmasq.listen_address // empty' "$data_file") +fi +[ -n "$listen" ] || { echo "wg-dns-sync: missing listen address (no segment match and no .dnsmasq.listen_address)" >&2; exit 1; } + +overlay='{}' +state_file="$root/data/lan-state.json" +if [ -f "$state_file" ] && jq -e . "$state_file" >/dev/null 2>&1; then + overlay=$(cat "$state_file") +fi # --- render -------------------------------------------------------------------- tmp=$(mktemp "${TMPDIR:-/tmp}/wg-mesh.conf.XXXXXX") @@ -78,11 +101,18 @@ when=$(date -u +%Y-%m-%dT%H:%M:%SZ) host=$(hostname -s 2>/dev/null || hostname) { - printf '# Generated by smart-lan-router/bin/wg-dns-sync — DO NOT EDIT MANUALLY\n' - printf '# To change records: edit smart-lan-router/data/mesh-hosts.json and re-run this script.\n' + printf '# Generated by net-tools/bin/wg-dns-sync — DO NOT EDIT MANUALLY\n' + printf '# To change records: edit data/mesh-hosts.json (+ lan-state.json overlay) and re-run.\n' printf '# rendered_at: %s\n' "$when" printf '# rendered_on: %s\n' "$host" printf '# source_sha256: %s\n' "$data_sha" + if [ -f "$state_file" ]; then + if command -v sha256sum >/dev/null 2>&1; then + printf '# lan_state_sha256: %s\n' "$(sha256sum "$state_file" | awk '{print $1}')" + else + printf '# lan_state_sha256: %s\n' "$(shasum -a 256 "$state_file" | awk '{print $1}')" + fi + fi printf '\n' printf '# Bind only to the wg1 IP so this view is invisible to LAN/loopback clients\n' printf '# (which lilith-local.conf serves with split-horizon 127.0.0.1 records).\n' @@ -99,13 +129,14 @@ host=$(hostname -s 2>/dev/null || hostname) | "address=/\(.).wg/\($h.wg) # \($h.role)" ' "$data_file" printf '\n' - printf '# === LAN host records (.lan -> LAN IP) — from hosts[] with a lan IP ===\n' - jq -r ' + printf '# === LAN host records (.lan -> current LAN IP) — overlay over static seed ===\n' + jq -r --argjson ov "$overlay" ' .hosts[] - | select(.lan != null) | . as $h + | (($ov[$h.name]) // $h.lan) as $lan + | select($lan != null) | ([$h.name] + ($h.aliases // []))[] - | "address=/\(.).lan/\($h.lan) # \($h.role)" + | "address=/\(.).lan/\($lan) # \($h.role)" ' "$data_file" } > "$tmp" diff --git a/bin/wg-render b/bin/wg-render new file mode 100755 index 0000000..b289669 --- /dev/null +++ b/bin/wg-render @@ -0,0 +1,216 @@ +#!/bin/sh +# wg-render — render THIS host's /etc/wireguard/wg1.conf from data/mesh-hosts.json. +# +# net-tools' missing piece: SSH (host-apply), /etc/hosts + mesh DNS +# (mesh-hosts-render / wg-dns-sync) were already reconciler-owned; the WireGuard +# config was not (set up by hand). This renders it from the one source of truth. +# +# MULTI-SEGMENT HUB MODEL +# A segment is a hub + its spokes. mesh.segments maps -> { hub, +# endpoint, dns_host, dns_listen }. Each host carries `segment` (which segment +# it belongs to) and `wg_pubkey` (its public key — NEVER the private key). +# - The segment's HUB renders [Interface] (+ ip_forward/MASQUERADE PostUp) and a +# [Peer] for every spoke in its segment (AllowedIPs = spoke/32). +# - A SPOKE renders [Interface] + a single [Peer] = its segment hub +# (AllowedIPs = mesh cidr, Endpoint = segment endpoint, keepalive). +# yuzu (Iceland) and citron (nyc3) are independent segments — no cross-segment +# routing unless a hub is also listed as another segment's spoke. +# +# BACKWARD COMPATIBLE: if mesh.segments is absent, falls back to the legacy single +# hub (mesh.hub / mesh.hub_endpoint) and treats every non-hub host as its spoke. +# +# The PRIVATE key is read from /etc/wireguard/wg1.key (generated on the box, never +# in the repo). Bootstrap a fresh host with `wg-render --keygen` which generates +# the key and prints the PUBLIC key to paste into the host's wg_pubkey field. +# +# Usage: +# wg-render # --dry-run : print this host's wg1.conf (default) +# wg-render --dry-run # same, explicit +# wg-render --apply # install /etc/wireguard/wg1.conf + `wg syncconf` (root) +# wg-render --keygen # ensure /etc/wireguard/wg1.key exists; print pubkey +# wg-render --pubkey # print this host's public key (from the private key) +# wg-render --whoami # print self name + segment + role (hub|spoke) +# +# Exit codes: 0 ok/no-op · 1 bad input/deps · 2 need root · 3 wg failed (rolled back) + +set -eu + +mode=dry-run +case "${1:-}" in + ""|--dry-run) mode=dry-run ;; + --apply) mode=apply ;; + --keygen) mode=keygen ;; + --pubkey) mode=pubkey ;; + --whoami) mode=whoami ;; + *) echo "wg-render: unknown arg '$1'" >&2; exit 1 ;; +esac + +# --- locate data file (symlink-resolving walk, matches the other renderers) ----- +self_path=$0 +while [ -L "$self_path" ]; do + link=$(readlink "$self_path") + case $link in /*) self_path=$link ;; *) self_path=$(dirname "$self_path")/$link ;; esac +done +root=$(cd "$(dirname "$self_path")" && pwd) +while [ "$root" != "/" ] && [ ! -f "$root/data/mesh-hosts.json" ]; do root=$(dirname "$root"); done +data_file="$root/data/mesh-hosts.json" +[ -f "$data_file" ] || { echo "wg-render: cannot locate data/mesh-hosts.json" >&2; exit 1; } +command -v jq >/dev/null || { echo "wg-render: jq not installed" >&2; exit 1; } +jq empty "$data_file" || { echo "wg-render: invalid JSON in $data_file" >&2; exit 1; } + +WG_DIR=/etc/wireguard +KEY_FILE="$WG_DIR/wg1.key" +CONF_FILE="$WG_DIR/wg1.conf" +iface=$(jq -r '.mesh.interface // "wg1"' "$data_file") +cidr=$(jq -r '.mesh.cidr // "10.9.0.0/24"' "$data_file") +port=$(jq -r '.mesh.segments | (.. | .endpoint? // empty)' "$data_file" 2>/dev/null | head -1 | sed "s/.*://" ) +[ -n "${port:-}" ] || port=$(jq -r '(.mesh.hub_endpoint // "x:51820") | split(":")[1]' "$data_file") +[ -n "$port" ] || port=51820 + +# --- key helpers --------------------------------------------------------------- +ensure_key() { + command -v wg >/dev/null || { echo "wg-render: wireguard-tools (wg) not installed" >&2; exit 1; } + if [ ! -f "$KEY_FILE" ]; then + need_root "create $KEY_FILE" + $SUDO mkdir -p "$WG_DIR"; $SUDO chmod 700 "$WG_DIR" + umask 077 + wg genkey | $SUDO tee "$KEY_FILE" >/dev/null + $SUDO chmod 600 "$KEY_FILE" + echo "wg-render: generated $KEY_FILE" >&2 + fi +} +pubkey_of_self() { + [ -f "$KEY_FILE" ] || { echo "wg-render: no $KEY_FILE (run --keygen first)" >&2; exit 1; } + $SUDO cat "$KEY_FILE" 2>/dev/null | wg pubkey +} + +SUDO= +need_root() { + [ "$(id -u)" -eq 0 ] && return 0 + if command -v sudo >/dev/null 2>&1 && sudo -n true 2>/dev/null; then SUDO="sudo"; return 0; fi + echo "wg-render: need root to $1 (run with sudo)" >&2; exit 2 +} + +# --- identify self (name/alias or any local IPv4 incl. wg) --------------------- +short=$(hostname 2>/dev/null | cut -d. -f1); [ -n "$short" ] || short=$(uname -n | cut -d. -f1) +if command -v ip >/dev/null 2>&1; then + local_ips=$(ip -o -4 addr show 2>/dev/null | awk '{print $4}' | cut -d/ -f1) +else + local_ips=$(ifconfig 2>/dev/null | awk '/inet /{print $2}') +fi +ips_json=$(printf '%s\n' $local_ips | jq -R . | jq -s .) +# WG_RENDER_SELF forces the self identity (tests + deliberate ops override). +if [ -n "${WG_RENDER_SELF:-}" ]; then + self=$(jq -r --arg h "$WG_RENDER_SELF" '[.hosts[] | select(.name==$h or ((.aliases//[])|index($h))) | .name] | first // empty' "$data_file") + [ -n "$self" ] || { echo "wg-render: WG_RENDER_SELF='$WG_RENDER_SELF' not in mesh-hosts.json" >&2; exit 1; } +else + self=$(jq -r --arg h "$short" --argjson ips "$ips_json" ' + [ .hosts[] | . as $x + | select(($x.name==$h) or (($x.aliases//[])|index($h)) or ($x.wg!=null and ($ips|index($x.wg))) + or ($x.lan!=null and ($ips|index($x.lan))) ) + | $x.name ] | first // empty' "$data_file") +fi +[ -n "$self" ] || { echo "wg-render: cannot identify this host (short=$short ips=$local_ips) in mesh-hosts.json" >&2; exit 1; } + +# Resolve self's segment + the hub for it, with legacy fallback. +self_seg=$(jq -r --arg s "$self" '.hosts[] | select(.name==$s) | .segment // empty' "$data_file") +if [ -z "$self_seg" ]; then + # Legacy single-hub: synthesize a default segment from mesh.hub. + seg_hub=$(jq -r '.mesh.hub // empty' "$data_file") + seg_ep=$(jq -r '.mesh.hub_endpoint // empty' "$data_file") + seg_members_filter='.hosts[]' +else + seg_hub=$(jq -r --arg g "$self_seg" '.mesh.segments[$g].hub // empty' "$data_file") + seg_ep=$(jq -r --arg g "$self_seg" '.mesh.segments[$g].endpoint // empty' "$data_file") + seg_members_filter='.hosts[] | select((.segment // "") == $SEG)' +fi +[ -n "$seg_hub" ] || { echo "wg-render: no hub resolved for self=$self segment=${self_seg:-}" >&2; exit 1; } +[ "$self" = "$seg_hub" ] && role=hub || role=spoke + +self_wg=$(jq -r --arg s "$self" '.hosts[] | select(.name==$s) | .wg' "$data_file") +self_addr_cidr="${self_wg}/$( [ "$role" = hub ] && echo 24 || echo 32 )" + +if [ "$mode" = "whoami" ]; then + printf '%s segment=%s role=%s hub=%s endpoint=%s\n' \ + "$self" "${self_seg:-}" "$role" "$seg_hub" "${seg_ep:-?}" + exit 0 +fi +if [ "$mode" = "keygen" ]; then ensure_key; pubkey_of_self; exit 0; fi +if [ "$mode" = "pubkey" ]; then pubkey_of_self; exit 0; fi + +# --- render wg1.conf ----------------------------------------------------------- +# The private key is substituted from $KEY_FILE at install time, not embedded in +# dry-run output (which prints a placeholder so logs never leak it). +render_conf() { + privkey_repr=$1 + when=$(date -u +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || echo "?") + printf '# Generated by net-tools/bin/wg-render — DO NOT EDIT MANUALLY\n' + printf '# Edit data/mesh-hosts.json (segments + wg_pubkey) and re-run wg-render --apply.\n' + printf '# self: %s segment: %s role: %s rendered_at: %s\n\n' \ + "$self" "${self_seg:-}" "$role" "$when" + printf '[Interface]\n' + printf 'Address = %s\n' "$self_addr_cidr" + printf 'ListenPort = %s\n' "$port" + printf 'PrivateKey = %s\n' "$privkey_repr" + if [ "$role" = hub ]; then + printf 'PostUp = sysctl -w net.ipv4.ip_forward=1; iptables -A FORWARD -i %s -j ACCEPT; iptables -t nat -A POSTROUTING -o %s -j MASQUERADE\n' "$iface" "$iface" + printf 'PostDown = iptables -D FORWARD -i %s -j ACCEPT; iptables -t nat -D POSTROUTING -o %s -j MASQUERADE\n' "$iface" "$iface" + fi + printf '\n' + + if [ "$role" = hub ]; then + # One [Peer] per spoke in this segment that has a published pubkey. + jq -r --arg SEG "${self_seg:-}" --arg SELF "$self" " + ${seg_members_filter} + | select(.name != \$SELF) + | select(.wg_pubkey != null and .wg_pubkey != \"\") + | \"# \(.name)\n[Peer]\nPublicKey = \(.wg_pubkey)\nAllowedIPs = \(.wg)/32\n\" + " "$data_file" + # Warn (to stderr) about spokes still missing a key. + miss=$(jq -r --arg SEG "${self_seg:-}" --arg SELF "$self" " + ${seg_members_filter} | select(.name!=\$SELF) | select((.wg_pubkey//\"\")==\"\") | .name" "$data_file" | tr '\n' ' ') + [ -n "$(echo "$miss" | tr -d ' ')" ] && echo "wg-render: NOTE spokes without wg_pubkey (not peered): $miss" >&2 + else + # Single [Peer] = the segment hub. + hub_pub=$(jq -r --arg H "$seg_hub" '.hosts[] | select(.name==$H) | .wg_pubkey // empty' "$data_file") + [ -n "$hub_pub" ] || { echo "wg-render: hub $seg_hub has no wg_pubkey in mesh-hosts.json — cannot render spoke peer" >&2; exit 1; } + printf '# hub: %s\n[Peer]\nPublicKey = %s\nEndpoint = %s\nAllowedIPs = %s\nPersistentKeepalive = 25\n' \ + "$seg_hub" "$hub_pub" "$seg_ep" "$cidr" + fi +} + +if [ "$mode" = "dry-run" ]; then + render_conf "" + exit 0 +fi + +# --apply +ensure_key +need_root "write $CONF_FILE" +priv=$($SUDO cat "$KEY_FILE") +tmp=$(mktemp "${TMPDIR:-/tmp}/wg1.conf.XXXXXX"); trap 'rm -f "$tmp"' EXIT +render_conf "$priv" > "$tmp" +chmod 600 "$tmp" + +if [ -f "$CONF_FILE" ] && cmp -s "$tmp" "$CONF_FILE"; then + echo "wg-render: $CONF_FILE already up to date for $self ($role/${self_seg:-legacy})" + exit 0 +fi +[ -f "$CONF_FILE" ] && $SUDO cp "$CONF_FILE" "$CONF_FILE.netbak" +$SUDO cp "$tmp" "$CONF_FILE"; $SUDO chmod 600 "$CONF_FILE" +echo "wg-render: wrote $CONF_FILE for $self ($role/${self_seg:-legacy})" + +if command -v systemctl >/dev/null 2>&1; then + $SUDO systemctl enable "wg-quick@${iface}" >/dev/null 2>&1 || true + if $SUDO systemctl is-active "wg-quick@${iface}" >/dev/null 2>&1; then + # Live update without dropping the tunnel. + if $SUDO sh -c "wg syncconf $iface <(wg-quick strip $iface)" 2>/dev/null; then + echo "wg-render: $iface syncconf applied" + else + $SUDO systemctl restart "wg-quick@${iface}" || { echo "wg-render: $iface restart failed — rolling back" >&2; [ -f "$CONF_FILE.netbak" ] && $SUDO cp "$CONF_FILE.netbak" "$CONF_FILE"; $SUDO systemctl restart "wg-quick@${iface}" || true; exit 3; } + fi + else + $SUDO systemctl start "wg-quick@${iface}" || { echo "wg-render: $iface start failed — rolling back" >&2; [ -f "$CONF_FILE.netbak" ] && $SUDO cp "$CONF_FILE.netbak" "$CONF_FILE"; exit 3; } + echo "wg-render: $iface started" + fi +fi diff --git a/data/mesh-hosts.json b/data/mesh-hosts.json index f1f01b9..f0ffe6d 100644 --- a/data/mesh-hosts.json +++ b/data/mesh-hosts.json @@ -1,19 +1,23 @@ { - "_purpose": "Single source of truth for the wg1 mesh + LAN: the four hosts, their addresses on each path, MAC-based DHCP discovery, L7 health probes for `net doctor`, and the DNS records apricot's dnsmasq serves. Everything that needs a host address derives from here — never hardcode mesh IPs, MACs, or identity URLs elsewhere.", + "_purpose": "Single source of truth for the wg1 mesh + LAN: the four hosts, their addresses on each path, MAC-based DHCP discovery, L7 health probes for `net doctor`, and the DNS records apricot's dnsmasq serves. Everything that needs a host address derives from here \u2014 never hardcode mesh IPs, MACs, or identity URLs elsewhere.", "_schema": { "hosts[].name": "Canonical name = fruit family encodes machine class (gpu=stone fruit, cpu=pome, cloud=citrus, laptop=vegetable, phone=berry).", - "fleet.enforce_hostname": "true => every agent converges its node's OS hostname to its canonical name (scutil on darwin, hostnamectl on linux). The FLEET renames hosts — never run hostnamectl by hand.", - "phones": "class=phone (berry family): no agent possible (ios/android run nothing); they are DNS clients — WireGuard app with DNS=10.9.0.2, names served by apricot's mesh dnsmasq (wg-dns-sync). ssh_user null => no ssh stanza rendered. os distinguishes ios/android. Enroll with wg-phone-add, then add the entry here. If the phone's per-SSID Wi-Fi MAC is pinned (iOS 'Private Wi-Fi Address: Fixed'), add mac to get home-LAN discovery too.", + "fleet.enforce_hostname": "true => every agent converges its node's OS hostname to its canonical name (scutil on darwin, hostnamectl on linux). The FLEET renames hosts \u2014 never run hostnamectl by hand.", + "phones": "class=phone (berry family): no agent possible (ios/android run nothing); they are DNS clients \u2014 WireGuard app with DNS=10.9.0.2, names served by apricot's mesh dnsmasq (wg-dns-sync). ssh_user null => no ssh stanza rendered. os distinguishes ios/android. Enroll with wg-phone-add, then add the entry here. If the phone's per-SSID Wi-Fi MAC is pinned (iOS 'Private Wi-Fi Address: Fixed'), add mac to get home-LAN discovery too.", "hosts[].aliases": "Old names, kept working during the alias-first rename. Renderers emit a record for name AND every alias.", "hosts[].class": "gpu | cpu | cloud | laptop.", "hosts[].wg/lan/public": "wg = mesh IP (10.9.0.0/24); lan = home LAN IP (10.0.0.0/24, null if roaming/no LAN leg); public = internet IP (null if none).", - "hosts[].mac": "LAN interface MAC — the stable key the daemon uses to DISCOVER the host's current DHCP IP via ARP (name-sync). null = not discoverable.", + "hosts[].mac": "LAN interface MAC \u2014 the stable key the daemon uses to DISCOVER the host's current DHCP IP via ARP (name-sync). null = not discoverable.", "hosts[].identity": "L7 health probe for `net doctor` only (url with '{ip}' substituted, markers all required). null = skip service check. Routing uses subnet /24 + gateway-MAC fingerprint, not per-host identity.", - "services": "{host: [fqdn, ...]} — service vhost names that live ON a host and must resolve to that host's CURRENT LAN IP. Rendered by mesh-hosts-render with the discovered overlay, so they track DHCP drift. Add names here, never hand-edit /etc/hosts.", - "naming": "'.wg' = mesh IP (explicit tunnel path); '.lan' + BARE '' = current LAN IP (direct at home; via tunnel when away, since the daemon routes the LAN /24 through wg then). Hosts without a LAN IP get bare name → wg IP. ('.local' is retired — platform uses .com, infra .lan.)", + "services": "{host: [fqdn, ...]} \u2014 service vhost names that live ON a host and must resolve to that host's CURRENT LAN IP. Rendered by mesh-hosts-render with the discovered overlay, so they track DHCP drift. Add names here, never hand-edit /etc/hosts.", + "naming": "'.wg' = mesh IP (explicit tunnel path); '.lan' + BARE '' = current LAN IP (direct at home; via tunnel when away, since the daemon routes the LAN /24 through wg then). Hosts without a LAN IP get bare name \u2192 wg IP. ('.local' is retired \u2014 platform uses .com, infra .lan.)", "routing": "smart-lan-router.py (laptop role) routes the entire LAN /24 direct when HOME (gateway MAC match) or via wg when AWAY. No per-host /32 pins." }, - "_consumers": ["bin/wg-dns-sync", "bin/mesh-hosts-render", "smart-lan-router/smart-lan-router.py"], + "_consumers": [ + "bin/wg-dns-sync", + "bin/mesh-hosts-render", + "smart-lan-router/smart-lan-router.py" + ], "fleet": { "enforce_hostname": true }, @@ -23,7 +27,12 @@ "hub": "yuzu", "hub_endpoint": "89.127.233.145:51820", "dns_host": "apricot", - "dns_listen": "10.9.0.2:53" + "dns_listen": "10.9.0.2:53", + "segments": { + "_note": "A segment = a WireGuard hub + its spokes (bin/wg-render). hosts[].segment names the segment a host belongs to; hosts[].wg_pubkey is its public key (never private). yuzu (iceland) and citron (nyc3) are independent stars. Hosts without a `segment` fall back to the legacy single hub (mesh.hub) in wg-render.", + "iceland": { "hub": "yuzu", "endpoint": "89.127.233.145:51820", "dns_host": "apricot", "dns_listen": "127.0.0.1,10.9.0.2" }, + "nyc3": { "hub": "citron", "endpoint": "143.244.223.5:51820", "dns_host": "citron", "dns_listen": "127.0.0.1,10.9.0.7" } + } }, "lan": { "cidr": "10.0.0.0/24", @@ -31,44 +40,58 @@ "dns_listen": "10.0.0.11:53", "gateway": "10.0.0.1", "gateway_mac": "c4:4f:d5:5a:61:6f", - "gateway_note": "Xfinity broadband gateway. gateway_mac is the home-LAN fingerprint: the smart-lan-router daemon treats the laptop as 'home' only when the default gateway on the LAN interface has this MAC — distinguishes the real home LAN from any visited 10.0.0.0/24 network. DHCP reservations only via xFi/web UI, no scriptable API." + "gateway_note": "Xfinity broadband gateway. gateway_mac is the home-LAN fingerprint: the smart-lan-router daemon treats the laptop as 'home' only when the default gateway on the LAN interface has this MAC \u2014 distinguishes the real home LAN from any visited 10.0.0.0/24 network. DHCP reservations only via xFi/web UI, no scriptable API." }, "dx": { "hide_homelan": true, - "_note": "When true, renderers (mesh-hosts-render, host-apply) omit homelan hosts (apricot/pear/fennel + their LAN/.lan/bare names and services on them) from generated /etc/hosts and ~/.ssh/config. Only cloud/DO hosts (yuzu, lime + their public/WG) and dx-forges (separate) are shown. Full homelan data preserved in 'hosts'/'lan'/'services' for easy recovery — set false and re-render (net sync). We are currently DO-only for DX." + "_note": "When true, renderers (mesh-hosts-render, host-apply) omit homelan hosts (apricot/pear/fennel + their LAN/.lan/bare names and services on them) from generated /etc/hosts and ~/.ssh/config. Only cloud/DO hosts (yuzu, lime + their public/WG) and dx-forges (separate) are shown. Full homelan data preserved in 'hosts'/'lan'/'services' for easy recovery \u2014 set false and re-render (net sync). We are currently DO-only for DX." }, "hosts": [ { "name": "apricot", "aliases": [], "class": "gpu", - "role": "Threadripper GPU compute — LLM serving, quinn dev, claude rc units, mesh DNS (dnsmasq 10.9.0.2:53)", + "role": "Threadripper GPU compute \u2014 LLM serving, quinn dev, claude rc units, mesh DNS (dnsmasq 10.9.0.2:53)", "os": "linux", "ssh_user": "lilith", "wg": "10.9.0.2", "lan": "10.0.0.116", "public": null, "mac": "b4:2e:99:35:24:c5", - "identity": { "url": "http://{ip}:8200/health", "markers": ["llama_service_available"] } + "identity": { + "url": "http://{ip}:8200/health", + "markers": [ + "llama_service_available" + ] + } }, { "name": "pear", - "aliases": ["black"], + "aliases": [ + "black" + ], "class": "cpu", - "role": "Threadripper CPU/storage — Forgejo, Verdaccio, LAN DNS (dnsmasq 10.0.0.11:53), NFS/media", + "role": "Threadripper CPU/storage \u2014 Forgejo, Verdaccio, LAN DNS (dnsmasq 10.0.0.11:53), NFS/media", "os": "linux", "ssh_user": "lilith", "wg": "10.9.0.4", "lan": "10.0.0.11", "public": null, "mac": "b4:2e:99:30:a2:9a", - "identity": { "url": "http://{ip}:3000/api/v1/version", "markers": ["version"] } + "identity": { + "url": "http://{ip}:3000/api/v1/version", + "markers": [ + "version" + ] + } }, { "name": "fennel", - "aliases": ["plum"], + "aliases": [ + "plum" + ], "class": "laptop", - "role": "MacBook Air M2 — roams (no fixed LAN IP), mesh client, runs the smart-lan-router daemon", + "role": "MacBook Air M2 \u2014 roams (no fixed LAN IP), mesh client, runs the smart-lan-router daemon", "os": "darwin", "ssh_user": "natalie", "wg": "10.9.0.3", @@ -79,27 +102,40 @@ }, { "name": "lime", - "aliases": ["lilith-store-backend"], + "aliases": [ + "lilith-store-backend", + "com.uvlava.ct.services" + ], "class": "cloud", - "role": "DigitalOcean backend node (nyc3, public IP 209.38.51.98 reached via ProxyJump yuzu / wg — no public app ports) — quinn.api INTERNAL (:3030), MCP gateways (:3910-3914), DO Managed PG (VPC), LISTEN/NOTIFY + private workers. Joins wg1 via phase-b-mesh-join.sh. IaC: uvlava/terraform/do.", + "role": "DigitalOcean backend node (nyc3, public IP 209.38.51.98 reached via ProxyJump yuzu / wg \u2014 no public app ports) \u2014 quinn.api INTERNAL (:3030), MCP gateways (:3910-3914), DO Managed PG (VPC), LISTEN/NOTIFY + private workers. Joins wg1 via phase-b-mesh-join.sh. IaC: uvlava/terraform/do.", "os": "linux", "ssh_user": "root", "ssh_identity": "~/.ssh/id_ed25519_1984", + "segment": "nyc3", + "wg_pubkey": "f9ojTNSwvP4/4LxTyyZPG/KhqehQ7aWiSxhsU4dT10Q=", "wg": "10.9.0.5", "lan": null, "public": null, "mac": null, - "identity": { "url": "http://{ip}:3030/healthz", "markers": ["ok"] } + "identity": { + "url": "http://{ip}:3030/healthz", + "markers": [ + "ok" + ] + } }, { "name": "redroid", - "aliases": ["lilith-store-redroid"], + "aliases": [ + "lilith-store-redroid", + "com.uvlava.ct.redroid" + ], "class": "cloud", - "role": "DigitalOcean redroid (containerized Android) host for screening automation (mr-number + whatsapp lookups). nyc3 under store vpc. Firewalled to admin_ips only. adb :5555 + ws-scrcpy console. Reached direct from plum via key (no wg leg). IaC: uvlava/terraform/do/redroid.tf (clean name 'redroid'; previously bad-named lilith-store-redroid). Used by LP tools and (future) CT application for mrnumbers execution.", + "role": "DigitalOcean redroid (containerized Android) host for screening automation (mr-number + whatsapp lookups). nyc3 under store vpc. wg leg 10.9.0.6. Services (adb:5555, ws-scrcpy:8000, wa ui:8011) locked to mesh only (ufw + bind to lo/wg IP; no public listeners). Reached direct from plum via key (ssh) or over mesh. Tray on plum provides secure tunneled localhost console UI. IaC: uvlava/terraform/do/redroid.tf.", "os": "linux", "ssh_user": "root", "ssh_identity": "~/.ssh/id_ed25519_1984", - "wg": null, + "wg": "10.9.0.6", "lan": null, "public": "45.55.191.82", "mac": null, @@ -107,9 +143,12 @@ }, { "name": "yuzu", - "aliases": ["vps", "quinn-vps"], + "aliases": [ + "vps", + "quinn-vps" + ], "class": "cloud", - "role": "1984 Hosting (Iceland) — WireGuard mesh hub, quinn production", + "role": "1984 Hosting (Iceland) \u2014 WireGuard mesh hub, quinn production", "os": "linux", "ssh_user": "root", "ssh_identity": "~/.ssh/id_ed25519_1984", @@ -118,10 +157,28 @@ "public": "89.127.233.145", "mac": null, "identity": null + }, + { + "name": "citron", + "aliases": [ + "com.uvlava.quinn.infra" + ], + "class": "cloud", + "role": "DigitalOcean essential-services node (nyc3, store vpc) — DNS + WireGuard HUB for the DO/nyc3 segment (lime/redroid/artifacts peer to it; yuzu remains the Iceland-segment hub). dnsmasq + wg1 are OWNED by net-tools host-apply / wg-dns-sync (the cloud-init in uvlava/terraform/do/infra_host.tf is bootstrap only). Reserved hub endpoint 143.244.223.5:51820.", + "os": "linux", + "ssh_user": "root", + "ssh_identity": "~/.ssh/id_ed25519_1984", + "segment": "nyc3", + "wg_pubkey": "DgI6gbwkoiaUACl+RDkS5/lgcVbv6S/hCvDG3y/74xc=", + "wg": "10.9.0.7", + "lan": null, + "public": "143.244.223.5", + "mac": null, + "identity": null } ], "services": { - "_note": "Service vhosts hosted ON a fleet host — adopted from the loose hand-maintained /etc/hosts lines (quinn.* dev vhosts, lm/llm stack, forge/registry). Rendered at the host's CURRENT discovered IP.", + "_note": "Service vhosts hosted ON a fleet host \u2014 adopted from the loose hand-maintained /etc/hosts lines (quinn.* dev vhosts, lm/llm stack, forge/registry). Rendered at the host's CURRENT discovered IP.", "apricot": [ "quinn.apricot.lan", "www.quinn.apricot.lan", @@ -148,7 +205,7 @@ ] }, "dnsmasq": { - "_note": "Mesh DNS served by apricot's dnsmasq (bound 127.0.0.1 + 10.9.0.2), written to /etc/dnsmasq.d/wg-mesh.conf by bin/wg-dns-sync. Consumed by wg clients that set DNS=10.9.0.2 (phones). Renders the host .wg + .lan records from hosts[] — NOT platform service records. The old *.local platform domains are RETIRED (platform uses .com; infra uses .lan); they are deliberately NOT carried here.", + "_note": "Mesh DNS served by apricot's dnsmasq (bound 127.0.0.1 + 10.9.0.2), written to /etc/dnsmasq.d/wg-mesh.conf by bin/wg-dns-sync. Consumed by wg clients that set DNS=10.9.0.2 (phones). Renders the host .wg + .lan records from hosts[] \u2014 NOT platform service records. The old *.local platform domains are RETIRED (platform uses .com; infra uses .lan); they are deliberately NOT carried here.", "listen_address": "127.0.0.1,10.9.0.2" } -} +} \ No newline at end of file