Scaling guide
Updated 2026-06-22
Last Updated: 2026-06-22 · Applies to: OpenWatch 0.2.0-rc series (Go single-binary)
This guide covers how OpenWatch behaves as you add hosts, run more scans, and
push more concurrent API traffic, and what you can tune today. It describes the
current Go-era stack: a single openwatch binary that serves the REST API and
the embedded React UI over HTTPS on port 8443, backed by PostgreSQL, with the
Kensa compliance engine built in. There is no separate web tier, no container
runtime, no Redis, and no Celery.
For first-time install and configuration, follow
docs/guides/INSTALLATION.md — this guide assumes a working install and
focuses only on capacity and tuning.
What scales, and how
OpenWatch has two long-lived processes and one database:
| Component | What it does | How you scale it today |
|---|---|---|
openwatch serve | HTTPS API + embedded UI + in-process schedulers (liveness, intelligence, discovery) and an in-process worker that drains the scan-job queue | Raise [server].scan_concurrency (how many scans run at once in this process); then vertical CPU/RAM. Stateless apart from PostgreSQL. |
openwatch worker | An optional, additional process that also drains the scan-job queue and runs Kensa scans over SSH | Run one or more for extra/off-box capacity. The queue uses SELECT ... FOR UPDATE SKIP LOCKED, so the serve worker and any openwatch worker processes cooperate without double-claiming a job. |
| PostgreSQL | All state: hosts, scans, transactions, audit events, queue | Vertical first (CPU, RAM, faster disk), then tune max_connections and the OpenWatch pool size. |
openwatch serve runs an in-process worker that does drain the scan-job
queue — the single-binary deployment scans with no extra process. By default it
runs scan_concurrency (4) scans concurrently (internal/worker/worker.go,
wired in internal/server/server.go). A separate openwatch worker is
optional, for additional or off-box capacity.
Scaling the scan workers
Scans are the most resource-intensive work OpenWatch does: each one opens an SSH session to a target host and runs Kensa's native YAML checks. Worker throughput is the usual first bottleneck.
Scan concurrency (the first knob to turn)
The in-process worker runs [server].scan_concurrency scan loops at once
(default 4). Each loop independently claims a job with SKIP LOCKED, so up to
that many different hosts scan in parallel; a per-host advisory lock still
prevents two scans of the same host from overlapping. This is the simplest
way to clear a large queue — one config value, no extra processes:
# /etc/openwatch/openwatch.toml
[server]
scan_concurrency = 8Restart openwatch to apply. Sizing: scans are SSH/IO-bound (they spend most of
their time waiting on the remote host), so concurrency can comfortably exceed
CPU core count. Mind two ceilings — the PostgreSQL pool ([database].max_connections
/ pool size: each in-flight scan uses a connection plus the advisory-lock
transaction) and how many simultaneous SSH sessions your targets and network
tolerate. 8–16 is a reasonable range for a few dozen to a few hundred hosts;
set it to 1 to restore strictly one-at-a-time draining.
Run more worker processes
The scan queue is PostgreSQL-native and claims one job at a time per worker with
SKIP LOCKED. To increase scan throughput, run additional openwatch worker
processes pointed at the same database and config:
openwatch worker --config /etc/openwatch/openwatch.tomlEach worker claims one scan job at a time (internal/worker/scan_worker.go).
Within a single worker, a per-host pg_advisory_xact_lock serializes work so
two jobs for the same host never run concurrently. Across workers, the queue's
SKIP LOCKED semantics prevent any two workers from claiming the same job.
The package ships only the openwatch.service unit, which runs serve
(packaging/common/openwatch.service). There is no packaged worker unit yet, so
run the worker under your own systemd unit or process supervisor. A minimal
unit mirrors the shipped one but changes the ExecStart command:
[Unit]
Description=OpenWatch scan worker
After=network.target postgresql.service
Wants=postgresql.service
[Service]
Type=simple
User=openwatch
Group=openwatch
EnvironmentFile=-/etc/openwatch/secrets.env
ExecStart=/usr/bin/openwatch worker --config /etc/openwatch/openwatch.toml
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.targetUse a systemd template (for example openwatch-worker@.service) if you want to
run several workers on one host. The worker shares the same configuration,
database DSN, JWT key, and credential key as serve, so no extra config is
required.
Poll interval
Each worker sleeps between dequeue attempts when the queue is empty. The
--poll-interval flag controls this; it defaults to 1s and is capped at 5s
(internal/worker/scan_worker.go, DefaultPollInterval / MaxPollInterval):
openwatch worker --poll-interval 2sA shorter interval lowers scan-pickup latency on an idle queue at the cost of
more empty database round-trips. A longer interval does the reverse. The cap
exists because raising it further only adds latency without a corresponding
benefit. Worker concurrency comes from running more processes, not from a
per-worker concurrency knob — there is no --concurrency flag.
Scaling PostgreSQL
PostgreSQL holds all OpenWatch state and is the shared coordination point for the queue. Tune it before reaching for anything else.
Connection pool
OpenWatch opens one pgx pool per process (internal/db/db.go,
db.NewPool). The pool size is the max_connections value under [database]
in /etc/openwatch/openwatch.toml; it defaults to 25
(packaging/common/openwatch.toml, internal/config/config.go):
[database]
dsn = "postgres://openwatch@localhost/openwatch?sslmode=require"
max_connections = 25You can also override it with the environment variable
OPENWATCH_DATABASE_MAX_CONNECTIONS (set it in /etc/openwatch/secrets.env or
the unit's EnvironmentFile). Each running process — serve and every
worker — opens its own pool of up to max_connections. Size PostgreSQL's
server-side max_connections to cover the sum across all OpenWatch processes
plus headroom for psql, backups, and monitoring. As a rule of thumb:
postgres max_connections >= (1 serve + N workers) * openwatch max_connections + slackServer tuning
Standard PostgreSQL tuning applies; OpenWatch does nothing unusual here. Start
from your host's RAM and adjust shared_buffers, effective_cache_size,
work_mem, and max_wal_size to match. Keep the database on fast local or
network-attached SSD storage — the transaction log and audit-event tables are
the highest-write paths.
Migrations
Schema changes ship as ordered migrations in internal/db/migrations/, applied
with openwatch migrate. Run migrations once per upgrade against a single
database before starting the new binary; the command is safe to re-run and
reports the resulting version. Multiple processes can then connect to the
already-migrated schema.
Capacity planning
OpenWatch has no fixed sizing matrix, and the scan cadence — not raw host count — drives load. The intelligence and liveness schedulers run on operator-tunable intervals, and scans are enqueued on a per-host schedule, so a large fleet scanned infrequently can be lighter than a small fleet scanned aggressively.
Plan capacity from these levers rather than a host-count table:
- Scan throughput — add
openwatch workerprocesses until the scan queue drains as fast as you enqueue. Watch for jobs sitting in the queue. - API/UI responsiveness — give the
servehost enough CPU and RAM; it is a single process today, so vertical sizing is the lever. - PostgreSQL — size RAM and connections to the combined pool demand above; this is usually the first thing to upgrade for a large fleet.
Measure on your own workload before committing hardware. The numbers that matter are queue depth, scan duration, API latency, and PostgreSQL connection count and query latency — all observable with the tools below.
Observing load
There is no Prometheus endpoint and no Grafana stack in the current build (see "Not yet implemented"). What you have today:
-
Health —
GET /api/v1/healthreturns200when the service and its database connection are healthy,503when degraded (api/openapi.yaml,internal/server/handlers.go). Use it for load-balancer and uptime probes. -
Version —
GET /api/v1/versionreturns build metadata (api/openapi.yaml).openwatch --versionprints the same locally. -
Logs — both processes emit structured JSON logs to
journald. Follow them withjournalctl:journalctl -u openwatch -fThe worker logs a periodic
worker.loop.tickline (roughly every 60s) with idle/claimed/in-flight/completed counters — a lightweight way to confirm a worker is alive and draining (internal/worker/scan_worker.go). -
Audit and queue state — query PostgreSQL directly:
psql "$OPENWATCH_DATABASE_DSN" -c \ "SELECT status, count(*) FROM job_queue GROUP BY status;"A growing count of non-terminal jobs means workers are not keeping up; add worker processes.
Not yet implemented
Be explicit about what this stack does not offer today, so you do not plan around features that are absent:
- Horizontal API scaling is not packaged. The
serveprocess is stateless apart from PostgreSQL (it uses stateless JWT auth), so running replicas behind a load balancer is architecturally possible, but there is no shipped unit, load-balancer config, or supported procedure for it. Treatserveas a single vertically-scaled process for now. - No packaged worker unit. Only
openwatch.service(runningserve) ships in the RPM/DEB. Running additional scan workers requires the operator-authored unit shown above. - No Prometheus/Grafana/metrics endpoint. There is no
/metricsroute and no bundled monitoring stack. Observability isGET /api/v1/health, the JSON logs injournald, and direct PostgreSQL queries. - No PgBouncer integration, read replicas, or built-in connection proxy. You can place standard PostgreSQL tooling in front of the database yourself; OpenWatch only knows the single DSN it is configured with.
- No Redis, Celery, or message broker. Background work is the PostgreSQL
SKIP LOCKEDqueue only. Anything that referenced these in older docs is from the archived Python stack and does not apply.
Related documentation
| Topic | Document |
|---|---|
| Install and configuration | docs/guides/INSTALLATION.md |
| Roles and permissions | docs/engineering/rbac_registry.md |
| Kensa scanning boundary | docs/KENSA_OPENWATCH_BOUNDARY.md |
| API contract | api/openapi.yaml (paths under /api/v1) |
| Worker behavior spec | specs/system/worker-subcommand.spec.yaml |