Cold Start Is Too Slow

Stable

How to investigate a case where a stopped database wakes up more slowly than expected: soft basebackup, remote read-through, pooling, and autoscale handoff.

Updated: March 21, 2026

If cold start feels too slow, the first thing to understand is what exactly is slow:

the database is honestly waking up after stopped;
the platform does not have enough resources to start the writer quickly;
Compute is failing soft bootstrap or readiness;
Pageserver or Safekeeper is under load;
the application cuts the attempt off with a timeout that is too short;
at that moment, an autoscale handoff or cooldown is also in progress.

What is important to remember

In SPG99, cold start is a normal part of the serverless model. After the platform update, it goes through:

soft basebackup;
a thin startup image;
a minimal local data set on the pod;
lazy fetching of user relation pages.

This is noticeably faster than a heavy local restore, but it still does not make the first connection instant in every case.

What to check

Was the database actually in stopped?
Does it move from booting to ready, or is it getting stuck?
What is the current scale_state?
Is the client connect_timeout too small?
Are there errors in Logs and signs of problems in Metrics?
Is there pinned or session-heavy traffic that currently prevents the platform from finishing handoff quickly?

Typical causes

the storage chain is under load;
the database has not been started for a long time;
several databases are starting at once;
Pageserver is catching up to the required WAL range;
the first remote read-through of hot pages takes noticeable time;
the platform is currently in PREPARING, DRAINING, or COOLDOWN.

What helps

sensible retry on connect;
connect_timeout of at least 5–10s;
checking state and scale_state through Console / API;
analyzing logs and metrics instead of guessing;
a warming connection before a critical traffic window.

Practical conclusion

If cold start consistently exceeds expected bounds, the problem usually lies not in the fact of stopped itself, but in how the platform passes through one of these paths:

storage -> soft basebackup -> ready,
or profile handoff -> freeze/drain -> ready.