Cold Start and Idle Behavior

Stable

How database auto-start works, why `stopped` is a normal state, and how soft basebackup accelerates cold start.

Updated: March 21, 2026

SPG99 is optimized for a serverless compute model: you do not need to keep an active PostgreSQL executor running all the time, because the durable state of the database lives separately from a specific Compute instance.

What cold start means

Cold start happens when the database has no active writer (state=stopped, target_scale=0) and a client initiates a new connection through Gateway.

At that point, a full platform scenario runs:

  • Gateway initiates autostart through the Control Plane;
  • Provisioner brings up Compute;
  • Compute prepares TLS and the managed configuration;
  • Pageserver serves a soft basebackup — a thin startup image instead of a heavy full local restore;
  • PostgreSQL starts on a minimal local set of files;
  • user relation pages are fetched lazily as they are accessed;
  • storage dependencies are checked, including Safekeeper quorum availability;
  • the database enters ready.

Why cold start is now faster and smoother

The key change in the new model is this: the pod no longer needs to keep an almost complete working set of user relation files locally.

Now the local pod keeps only:

  • the startup minimum for PostgreSQL;
  • managed configs and service state;
  • a fast write-back cache.

Practical effect:

  • the pod starts faster;
  • there is less local disk noise;
  • the restart path is more predictable;
  • the storage chain remains the single source of truth.

Why stopped is normal

In SPG99, the local Compute PGDATA is a working cache, not the only place where data is stored. That is why stopped is a normal state in the serverless model:

  • Compute can be stopped safely;
  • the pod can be recreated;
  • on the next start, the database restores its working state from the storage chain;
  • data is not lost as long as the durable layer of the platform is healthy.

Idle stop

To avoid keeping Compute running without load, the platform automatically stops a database when it is idle.

In practice, it usually looks like this:

  • Gateway stops holding the lease;
  • the Control Plane sees that there are no more active sessions;
  • after idle_timeout, the database is moved into stopped;
  • the next connection attempt wakes it up again automatically.

What your application should do

  1. Implement retry on connect. For the first attempt after stopped, this is required practice.
  2. Keep connect_timeout at least in the 5–10s range.
  3. Set an overall connection deadline with retries to around 20–30s.
  4. If you need a smooth latency peak, make a normal warming connection shortly before the load.
  5. Do not keep many hanging idle connections — they interfere with auto-stop and reduce pooling efficiency.

When cold start may be more noticeable

Startup can take longer than usual if:

  • the storage chain is under load;
  • the database has not been started for a long time;
  • several databases are starting at once;
  • the platform is in a transitional autoscale state;
  • pinned or session-heavy traffic is interfering with the required lifecycle step.

In such cases, the correct response is to inspect metrics, logs, and scale_state, rather than trying to “fix” the issue with a manual start.