From 5a4633cef6af7afed3bd09e650e8c5358caec816 Mon Sep 17 00:00:00 2001 From: Natalie Date: Sun, 17 May 2026 19:44:44 -0700 Subject: [PATCH] =?UTF-8?q?docs(@scripts):=20=E2=9C=A8=20add=20disk-reclai?= =?UTF-8?q?m=20boot=20wrapper=20and=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- bin/_disk-reclaim-boot | 25 ++++++++++ docs/disk-reclaim.md | 103 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 128 insertions(+) create mode 100755 bin/_disk-reclaim-boot create mode 100644 docs/disk-reclaim.md diff --git a/bin/_disk-reclaim-boot b/bin/_disk-reclaim-boot new file mode 100755 index 0000000..cb3264c --- /dev/null +++ b/bin/_disk-reclaim-boot @@ -0,0 +1,25 @@ +#!/bin/sh +# Wrapper invoked by ~/Library/LaunchAgents/com.lilith.disk-reclaim.plist +# on user login. Appends a timestamped disk-reclaim snapshot to the log. +# +# Not meant for direct human use — invoke `disk-reclaim` instead. + +set -eu + +log="$HOME/Library/Logs/disk-reclaim.log" +mkdir -p "$(dirname "$log")" + +# Absolute path so we don't depend on $PATH in launchd's minimal env. +script_dir=$(cd "$(dirname "$0")" && pwd -P) +reclaim="$script_dir/disk-reclaim" + +{ + echo + echo "=== $(date '+%Y-%m-%d %H:%M:%S %z') (boot) ===" + "$reclaim" "$HOME" --min 1G +} >> "$log" 2>&1 + +# Trim to last ~200KB so it can't grow without bound across years of boots. +if [ -f "$log" ] && [ "$(wc -c <"$log")" -gt 204800 ]; then + tail -c 204800 "$log" > "$log.trim" && mv "$log.trim" "$log" +fi diff --git a/docs/disk-reclaim.md b/docs/disk-reclaim.md new file mode 100644 index 0000000..4825925 --- /dev/null +++ b/docs/disk-reclaim.md @@ -0,0 +1,103 @@ +# disk-reclaim — find reclaimable disk space + +`disk-reclaim` scans a directory tree for generated/cache directories that +regenerate from source — `node_modules`, `target`, `__pycache__`, build +outputs, IDE state — and reports them sorted by size. Read-only by design; +it names paths, it never deletes. + +The intended workflow is: run it, scan the list for entries you don't +need actively built, `rm -rf` those yourself. The script stays out of the +deletion decision because the right answer is project-by-project (a +`target/` you'll need to recompile in 10 minutes is different from one on a +project you haven't touched in a year). + +## Usage + +```sh +disk-reclaim # scan $HOME, default --min 100M +disk-reclaim ~/Code # scope to a subtree +disk-reclaim --min 1G # only the worst offenders +disk-reclaim --all # no minimum filter +disk-reclaim --no-summary # skip the per-category totals +``` + +Sample output (first real run on this machine, `~/Code --min 500M`): + +``` +scanning /Users/natalie/Code (min size: 500M)... + + SIZE PATH + ---- ---- + 15.7G /Users/natalie/Code/@projects/@magic-civilization/src/simulator/target + 5.6G /Users/natalie/Code/@projects/@magic-civilization/.local/build + 1.2G /Users/natalie/Code/@projects/@lilith/lilith-platform.live/node_modules + +top-level cache roots: + 1.5G /Users/natalie/.npm + 868M /Users/natalie/Library/Caches + 579M /Users/natalie/.cargo/registry + +totals by category: + 15.7G target + 5.6G build + 1.2G node_modules +``` + +## What it scans + +**Project-scoped patterns (via `find ... -prune`):** + +| Ecosystem | Patterns | +|---|---| +| JS/TS | `node_modules`, `.next`, `.nuxt`, `.turbo`, `.vite`, `.parcel-cache`, `.svelte-kit`, `.astro`, `.cache`, `dist`, `build`, `out` | +| Python | `__pycache__`, `.pytest_cache`, `.mypy_cache`, `.ruff_cache`, `.tox`, `.venv` | +| Rust | `target` | +| Other | `_build`, `Pods`, `DerivedData`, `.gradle`, `.android` | + +`-prune` matters: once a `node_modules` is matched, `find` doesn't descend +into it looking for nested matches. Otherwise scans are slow and +double-count nested build dirs. + +**Top-level cache roots (checked once each, not via find):** + +- `~/Library/Caches`, `~/Library/Developer/Xcode/DerivedData` (macOS) +- `~/.cache` (XDG) +- `~/.npm`, `~/.pnpm-store`, `~/.yarn/cache` +- `~/.cargo/registry`, `~/.cargo/git` + +These are *the* cache root for their tool — including them in the `find` +sweep would be wrong (the script would find every nested `.cache` inside +them). + +## What it deliberately does NOT scan + +| Pattern | Why excluded | +|---|---| +| `vendor/` | Usually committed (Go) or required at runtime (PHP). Not generated. | +| `.git` | User data. Never delete. | +| Top-level `tmp`, `Downloads`, `Movies` | User-created content, not regenerable from source. | +| Docker images/volumes | Use `docker system prune` instead — separate workflow with its own safety story. | + +## Caveats before `rm -rf` + +| Pattern | Cost to rebuild | +|---|---| +| `node_modules` | `pnpm install` / `npm install` — seconds to minutes depending on cache | +| `target` (Rust) | Full `cargo build` — minutes | +| `.venv` | `uv sync` / `pip install -r requirements.txt` — depends on wheel availability | +| `.next`, `.nuxt`, `dist`, `build` | Single build command — usually fast | +| `DerivedData`, `.gradle` | First build after deletion is slow; subsequent builds fine | + +The script prints this warning at the bottom of every run. Cache roots +(`~/.npm`, `~/.cargo/registry`, etc.) are safe to nuke — they're pure +caches that get repopulated lazily on next install. + +## Files + +| Path | Role | +|---|---| +| `bin/disk-reclaim` | The script | + +## Related + +- [[lan-power-ctrl]] — when *apricot's* disk fills and it wedges, `power-cycle apricot` is the recovery path