Server startup recovery

Recover local launchd-managed ChannelWeave startup when PostgreSQL leaves a stale postmaster.pid lock after an unclean shutdown.

Last updated:

ChannelWeave starts behind launchd on the local Mac and depends on PostgreSQL, Eden, and then the app server becoming healthy in that order.

The launchd startup wrapper now runs a PostgreSQL disaster-recovery preflight before waiting for Eden:

  1. Check whether PostgreSQL accepts connections for the ChannelWeave database.
  2. If PostgreSQL is down and postmaster.pid exists, read the PID from the lock file.
  3. Confirm that PID is not a live PostgreSQL process.
  4. Move the stale lock file aside with a timestamped .stale-* suffix.
  5. Kickstart the Homebrew launchd service for postgresql@18.
  6. Wait for PostgreSQL, validate it with select 1;, then wait for Eden.
  7. Start server.ts only after PostgreSQL and Eden are healthy.

Files involved

  • tools/launchd/recover-postgres-before-start.sh
  • tools/launchd/start-channelweave-after-eden.sh
  • tools/launchd/com.graham.channelweave.plist

The recovery wrapper prepends /opt/homebrew/opt/postgresql@18/bin to PATH so launchd can find pg_isready and psql even when Homebrew has not symlinked those PostgreSQL tools into /opt/homebrew/bin.

The app logger creates logs/server.log if needed without truncating existing entries, so restart and crash-recovery evidence survives normal app restarts.

Safety rules

  • Never delete postmaster.pid blindly.
  • If the PID in postmaster.pid belongs to a live PostgreSQL process, leave the lock file in place and fail startup.
  • If the PID belongs to a non-PostgreSQL process, or no live process exists, move the lock file aside rather than deleting it.
  • Keep the moved file as evidence for later diagnosis.

Manual verification

Use these checks after recovery:

pg_isready -h localhost -p 5432 -d ChannelWeave
psql -h localhost -d ChannelWeave -Atqc "select 1;"
curl -fsS http://127.0.0.1:8787/health
curl -fsS http://127.0.0.1:8000/health

All four checks should succeed before treating the local environment as recovered.