Cold Start and Idle Behavior

Stable

How database auto-start works, why `stopped` is a normal state, and how soft basebackup accelerates cold start.

Updated: March 21, 2026

SPG99 is optimized for a serverless compute model: you do not need to keep an active PostgreSQL executor running all the time, because the durable state of the database lives separately from a specific Compute instance.

What cold start means

Cold start happens when the database has no active writer (state=stopped, target_scale=0) and a client initiates a new connection through Gateway.

At that point, a full platform scenario runs:

Gateway initiates autostart through the Control Plane;
Provisioner brings up Compute;
Compute prepares TLS and the managed configuration;
Pageserver serves a soft basebackup — a thin startup image instead of a heavy full local restore;
PostgreSQL starts on a minimal local set of files;
user relation pages are fetched lazily as they are accessed;
storage dependencies are checked, including Safekeeper quorum availability;
the database enters ready.

Why cold start is now faster and smoother

The key change in the new model is this: the pod no longer needs to keep an almost complete working set of user relation files locally.

Now the local pod keeps only:

the startup minimum for PostgreSQL;
managed configs and service state;
a fast write-back cache.

Practical effect:

the pod starts faster;
there is less local disk noise;
the restart path is more predictable;
the storage chain remains the single source of truth.

Why `stopped` is normal

In SPG99, the local Compute PGDATA is a working cache, not the only place where data is stored. That is why stopped is a normal state in the serverless model:

Compute can be stopped safely;
the pod can be recreated;
on the next start, the database restores its working state from the storage chain;
data is not lost as long as the durable layer of the platform is healthy.

Idle stop

To avoid keeping Compute running without load, the platform automatically stops a database when it is idle.

In practice, it usually looks like this:

Gateway stops holding the lease;
the Control Plane sees that there are no more active sessions;
after idle_timeout, the database is moved into stopped;
the next connection attempt wakes it up again automatically.

What your application should do

Implement retry on connect. For the first attempt after stopped, this is required practice.
Keep connect_timeout at least in the 5–10s range.
Set an overall connection deadline with retries to around 20–30s.
If you need a smooth latency peak, make a normal warming connection shortly before the load.
Do not keep many hanging idle connections — they interfere with auto-stop and reduce pooling efficiency.

When cold start may be more noticeable

Startup can take longer than usual if:

the storage chain is under load;
the database has not been started for a long time;
several databases are starting at once;
the platform is in a transitional autoscale state;
pinned or session-heavy traffic is interfering with the required lifecycle step.

In such cases, the correct response is to inspect metrics, logs, and scale_state, rather than trying to “fix” the issue with a manual start.