Production deployment guide
Updated 2026-06-22
Last Updated: 2026-06-22 · Applies to: OpenWatch 0.2.0-rc series (Go single-binary)
This guide covers running OpenWatch in production: a single Go binary that serves
the REST API and the embedded React UI over HTTPS, backed by PostgreSQL, managed
by systemd. There is no container runtime, no separate web tier, no Redis, and
no Celery — those belonged to the archived Python stack and are gone.
For first-time install and database provisioning, follow the canonical install guide. This document does not repeat those steps; it focuses on production concerns the install guide only touches lightly: process layout, TLS, the background worker, backups, upgrades, and incident runbooks.
Verify the version you deploy. The current line is a pre-release (
0.2.0-rc.13perpackaging/version.env), not a GA build. Treat it accordingly until a GA tag ships.
Architecture in production
One binary, /usr/bin/openwatch, provides every runtime role through
subcommands (cmd/openwatch/main.go):
| Subcommand | Role | Long-running |
|---|---|---|
serve | HTTPS API + embedded UI, schedulers (liveness, intelligence, discovery), event bus, alert router | Yes — this is the service unit |
worker | Scan-job claimer/dispatcher loop; drains the PostgreSQL job queue and runs Kensa scans | Yes — optional separate unit |
migrate | Apply pending database migrations, then exit | No |
create-admin | Create the first admin user, then exit | No |
check-config | Validate and print the resolved config (secrets redacted), then exit | No |
The packaged openwatch.service runs openwatch serve
(packaging/common/openwatch.service). The serve process also runs the
in-process schedulers, so a minimal single-node deployment needs only that one
unit. The worker subcommand exists for separating scan execution onto its own
process or host; it is HTTP-free and shares the same boot prerequisites
(config, DB pool, audit, license, JWT key, credential DEK) as serve. There is
no packaged worker unit yet — running worker as its own service is a manual
step today (write a unit that runs ExecStart=/usr/bin/openwatch worker).
| Component | Where | Notes |
|---|---|---|
| API + UI | https://<host>:8443/ | UI embedded via go:embed; API under /api/v1/ |
| Database | PostgreSQL 14+ | The only datastore. Not provisioned by the package. |
| Job queue | PostgreSQL table, SKIP LOCKED | No external broker. Drained by serve/worker. |
| Compliance engine | Kensa (Go), in-process | SSH-based, native YAML rules. See the boundary doc. |
Prerequisites
See the install guide requirements for the authoritative list. In short:
- A supported RHEL-family or Debian-family host with
systemd. - PostgreSQL 14 or newer, reachable from the OpenWatch host.
- TCP/8443 inbound (API + UI); TCP/22 outbound to every managed host (Kensa scans over SSH).
- A CA-signed TLS certificate for any non-loopback use.
Install and first run
Follow the install guide end to end:
- Install and provision PostgreSQL.
- Install the signed
.rpm/.deb(verifySHA256SUMS.ascagainstKEYSfirst). - Set
OPENWATCH_DATABASE_DSNin/etc/openwatch/secrets.env. - Run
openwatch migrate. - Run
openwatch create-admin --username admin --email you@example.com. systemctl enable --now openwatch.- Sign in at
https://<host>:8443/.
Before starting the service, validate the resolved configuration:
sudo -u openwatch env $(cat /etc/openwatch/secrets.env | xargs) \
openwatch check-configIt prints the effective config with secrets redacted and exits non-zero if the config is invalid.
Configuration
Config layers, highest precedence first (cmd/openwatch/main.go):
- CLI flags (
--listen,--log-level,--config) - Environment variables (
OPENWATCH_<SECTION>_<KEY>) - The TOML file (
/etc/openwatch/openwatch.toml) - Built-in defaults
The TOML file has four sections (internal/config/config.go,
packaging/common/openwatch.toml):
| Section | Key | Default | Purpose |
|---|---|---|---|
[server] | listen | 0.0.0.0:8443 | Bind address and port for API + UI |
[server] | tls_cert | /etc/openwatch/tls/cert.pem | TLS certificate |
[server] | tls_key | /etc/openwatch/tls/key.pem | TLS private key |
[database] | dsn | — | PostgreSQL DSN (set via secrets.env in production) |
[database] | max_connections | 25 | Connection-pool ceiling |
[logging] | level | info | debug / info / warn / error |
[logging] | format | json | json / text |
Two more values come from [identity] and must be set for serve/worker to
boot — the JWT signing key (jwt_private_key) and the credential DEK file
(credential_key_file). Without them the process exits at startup rather than
running with a silent fallback (see cmdServe in cmd/openwatch/main.go).
Keep the database password out of the world-readable TOML by putting the DSN in
/etc/openwatch/secrets.env, which the systemd unit loads via
EnvironmentFile:
OPENWATCH_DATABASE_DSN=postgres://openwatch:STRONG_PW@db.internal:5432/openwatch?sslmode=requireSet the file to 0640, owner root:openwatch. Use sslmode=require or stronger
for any PostgreSQL not on the loopback interface.
TLS
The package ships a self-signed certificate so the service starts out of the box. Replace it before any non-loopback use:
sudo cp your-cert.pem /etc/openwatch/tls/cert.pem
sudo cp your-key.pem /etc/openwatch/tls/key.pem
sudo chown root:openwatch /etc/openwatch/tls/cert.pem
sudo chown openwatch:openwatch /etc/openwatch/tls/key.pem
sudo chmod 0644 /etc/openwatch/tls/cert.pem
sudo chmod 0600 /etc/openwatch/tls/key.pem
sudo systemctl restart openwatchThe server reads the certificate on each TLS handshake, so new connections pick up a swapped cert without a restart; restart anyway to cover existing keep-alive connections.
Service hardening
The packaged unit (packaging/common/openwatch.service) already applies
systemd sandboxing. Keep these in place:
| Directive | Value | Effect |
|---|---|---|
User / Group | openwatch | Runs unprivileged |
NoNewPrivileges | true | No setuid escalation |
ProtectSystem | strict | Read-only filesystem except ReadWritePaths |
ProtectHome | true | No access to /home, /root |
PrivateTmp | true | Private /tmp |
ReadWritePaths | /var/lib/openwatch /var/log/openwatch | Only writable paths |
RestrictAddressFamilies | AF_INET AF_INET6 AF_UNIX | No raw sockets unless granted |
Restart | on-failure (RestartSec=5s) | Auto-restart on crash |
If you front OpenWatch with a reverse proxy or load balancer, terminate or pass through TLS to 8443 — there is no separate HTTP listener to target.
Health, version, and logs
OpenWatch exposes two anonymous endpoints for probes (api/openapi.yaml):
curl -k https://localhost:8443/api/v1/health
# 200 {"status":"healthy","db_connected":true,"version":"..."}
# 503 when the database ping fails (status "degraded"/unavailable)
curl -k https://localhost:8443/api/v1/version
# {"openwatch":"...","kensa":"...","go":"...","commit":"...","build_time":"..."}/api/v1/health returns 200 with db_connected:true when healthy and 503
when the database ping inside the handler fails. Use it as your liveness and
readiness probe.
The service logs structured JSON to journald:
sudo journalctl -u openwatch -f # tail live
sudo journalctl -u openwatch --since '5 min ago' # recent
sudo journalctl -u openwatch -o cat | jq . # pretty-printPrometheus-style
/metricsscraping is not yet implemented — there is no HTTP metrics endpoint. The in-process connectivity-monitor metrics are exposed only through the authenticatedGET /api/v1/system/connectivity/statusendpoint, not via an open-text scrape target. For production observability today, rely on/api/v1/healthand the journald logs.
Background scan worker
The serve process drains the job queue on its own, so a single-node install
needs nothing extra. To run scan execution as a dedicated process (separate
resource limits, or a separate host), run the worker subcommand:
sudo -u openwatch env $(cat /etc/openwatch/secrets.env | xargs) \
openwatch worker --poll-interval 1s--poll-interval controls the empty-queue sleep between dequeue attempts
(worker.DefaultPollInterval, 1s default, 5s max — cmd/openwatch/worker.go).
The worker needs the same secrets.env, JWT key, and credential DEK as serve,
because it decrypts host credentials and derives the queue HMAC key from the DEK.
There is no packaged worker unit; if you split it out, model a unit on
openwatch.service with ExecStart=/usr/bin/openwatch worker.
Upgrades
# RHEL family
sudo dnf upgrade ./openwatch-<new-version>.rpm
# Debian family
sudo apt install ./openwatch_<new-version>_amd64.debAfter the package upgrade:
sudo -u openwatch env $(cat /etc/openwatch/secrets.env | xargs) openwatch migrate
sudo systemctl restart openwatchopenwatch migrate applies any new migrations from
internal/db/migrations/ and is a safe no-op when the schema is already current.
Restart the service (and the worker, if you run one separately) to load the new
binary. Take a database backup before upgrading (see below). Config under
/etc/openwatch/ is preserved across package upgrades.
Backup and restore
OpenWatch keeps all durable state in PostgreSQL. Back up the database with the standard PostgreSQL tooling — there is no OpenWatch-specific backup command.
# Backup
pg_dump -h 127.0.0.1 -U openwatch -d openwatch -Fc -f openwatch-$(date -u +%Y-%m-%dT%H-%M-%SZ).dump
# Restore into a freshly created, empty database
pg_restore -h 127.0.0.1 -U openwatch -d openwatch --clean --if-exists openwatch-<timestamp>.dumpAlso back up /etc/openwatch/ — it holds the TLS material, the JWT signing key,
the credential DEK, and secrets.env. Losing the credential DEK makes stored SSH
credentials and MFA secrets unrecoverable. Test restores periodically; a backup
you have never restored is a hypothesis, not a backup.
Operational runbooks
Concise, single-binary runbooks follow. Diagnose with systemctl, journalctl,
psql, df, and top — not docker. The standalone files under
docs/runbooks/ still describe the archived Python/Docker stack
and are pending a Go-era rewrite; prefer the steps below until they are updated.
SERVICE_DOWN — service unavailable
Symptoms: https://<host>:8443/ refuses connections, or /api/v1/health times
out or returns 503.
sudo systemctl status openwatch
sudo journalctl -u openwatch --since '10 min ago' -p err
curl -k https://localhost:8443/api/v1/health- If the unit is
failed/inactive, read the journal for the boot error. Common causes: malformedOPENWATCH_DATABASE_DSN, unreadable TLS cert, or a missingjwt_private_key/credential_key_file. - If
/api/v1/healthreturns503withdb_connected:false, treat it as a database problem (see DATABASE_ISSUES below). - Confirm the config is valid, then restart:
sudo -u openwatch env $(cat /etc/openwatch/secrets.env | xargs) openwatch check-config sudo systemctl restart openwatch - Verify recovery:
curl -k https://localhost:8443/api/v1/healthreturns200.
DATABASE_ISSUES — database connectivity
Symptoms: /api/v1/health reports db_connected:false; journal shows
db: ping: errors.
sudo systemctl status postgresql
PGPASSWORD=... psql -h 127.0.0.1 -U openwatch -d openwatch -c 'SELECT 1;'connection refused→ PostgreSQL is down:sudo systemctl restart postgresql.password authentication failed→ the DSN insecrets.envorpg_hba.confis wrong. Recheck both, thensudo systemctl reload postgresql.- Pool exhaustion under load → raise
[database].max_connections(and the server'smax_connections), then restart OpenWatch. - Re-test with
curl -k https://localhost:8443/api/v1/health.
DISK_FULL — disk space exhausted
Symptoms: writes fail; the journal shows no space left on device; the service
may crash-loop.
df -h
sudo du -xh /var/log/openwatch /var/lib/openwatch 2>/dev/null | sort -rh | head
sudo du -xh /var/lib/pgsql /var/lib/postgresql 2>/dev/null | sort -rh | head- Reclaim journald space if logs dominate:
sudo journalctl --vacuum-size=500M - If PostgreSQL data is large, check table bloat and run maintenance:
psql -h 127.0.0.1 -U openwatch -d openwatch -c "\ SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) AS size \ FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;" psql -h 127.0.0.1 -U openwatch -d openwatch -c "VACUUM (ANALYZE);" - The transaction-log/write-on-change model bounds growth (
transactionsplus a fixed-sizehost_rule_state), so unbounded growth usually means audit/event retention or PostgreSQL WAL — not scan results. Investigate before deleting. - After freeing space:
sudo systemctl restart openwatchand re-checkdf -h.
HIGH_CPU — sustained high CPU
Symptoms: load high; UI/API latency up.
top -b -n1 | head -20
sudo journalctl -u openwatch --since '15 min ago' | jq -r 'select(.level=="WARN" or .level=="ERROR")'
psql -h 127.0.0.1 -U openwatch -d openwatch -c "\
SELECT pid, state, wait_event_type, left(query,80) FROM pg_stat_activity \
WHERE datname='openwatch' ORDER BY state;"- If
postgresbackends dominate, look for long-running or stuck queries inpg_stat_activityand for a missingVACUUM/ANALYZE. - If the
openwatchprocess dominates, a scan burst or a tight scheduler loop is the usual cause. The intelligence and discovery schedulers can be paused without a restart by settingmaintenance_global=true:# via the API, as an admin token: # PUT /api/v1/system/intelligence/config {"maintenance_global": true} # PUT /api/v1/system/discovery/config {"maintenance_global": true} - If a separate
workeris saturating the host, raise--poll-intervaltoward its 5s ceiling, or move the worker to its own host. - Re-check
topand API latency after each change; re-enable maintenance flags when load normalizes.
SECURITY_INCIDENT — suspected compromise
- Preserve evidence first. Do not wipe the host. Capture the journal and the
audit trail:
OpenWatch writes an immutable audit event stream to PostgreSQL (see
sudo journalctl -u openwatch --since '24 hours ago' > /tmp/openwatch-journal.logdocs/engineering/audit_event_taxonomy.md); export the relevant range via the audit query API orpsqlbefore any remediation. - Contain. Stop accepting traffic without destroying state:
Or block 8443 at the firewall if you need the process alive for forensics.
sudo systemctl stop openwatch - Rotate secrets in
/etc/openwatch/— the JWT signing key, credential DEK, and database password. Rotating the JWT key invalidates all sessions (everyone re-authenticates). Rotating the DEK requires re-encrypting stored credentials; plan that change deliberately. - Review access. Audit admin accounts and roles
(
docs/engineering/rbac_registry.md) and revoke anything unexpected. - Restore from a known-good backup only after the root cause is understood (see Backup and restore above).
- Document the timeline and follow your organization's incident process.
Quick reference
| Item | Value |
|---|---|
| UI + API | https://<host>:8443/ (API under /api/v1/) |
| Binary | /usr/bin/openwatch |
| Service unit | openwatch.service (runs openwatch serve) |
| Config | /etc/openwatch/openwatch.toml |
| DB secret | /etc/openwatch/secrets.env (OPENWATCH_DATABASE_DSN) |
| TLS | /etc/openwatch/tls/{cert,key}.pem |
| Data / logs | /var/lib/openwatch, /var/log/openwatch, journald |
| User / group | openwatch:openwatch |
| Health probe | GET /api/v1/health |
| Migrate | openwatch migrate |
| Restart | sudo systemctl restart openwatch |
| Logs | journalctl -u openwatch -f |
See also
- Install guide — canonical install and provisioning.
- Kensa ↔ OpenWatch boundary — compliance engine integration.
- RBAC registry — roles and permissions.
- API contract — every endpoint, its permission, and audit events.
- Releasing runbook — building and signing releases.