Gateway Metrics

Stable

Which Gateway metrics are especially useful for the client entry point, pooling, freeze/drain, and backend-routing problems.

Updated: March 21, 2026

Gateway is the user-facing PostgreSQL entry point of SPG99, so its metrics best answer questions like “why can’t the application connect,” “how is the pool behaving,” and “is client traffic getting in the way of autoscale handoff.”

Basic liveness signals

  • up on the metrics port — whether Gateway itself is available;
  • /health — a quick liveness/health probe.

TLS and backend connection

gw_backend_tls_ok_total

Counts successful backend TLS handshakes between Gateway and Compute.

gw_backend_tls_handshake_errors_total

Shows backend TLS handshake errors. Useful when investigating problems with the CA chain, SNI, and backend certificate re-creation.

Lease and database lifecycle

spg99_lease_active

Shows whether a lease is active for a specific database.

spg99_lease_acquire_total

Counts lease-acquire attempts, including errors.

Practical meaning: these metrics make it easy to see whether Gateway is waking a sleeping database, whether the lease path is getting stuck, and whether the database is being held longer than needed because of client activity.

Pooling and pressure on the backend

spg99_pool_total_conns

How many backend connections are open in the pool.

spg99_pool_idle_conns

How many backend connections are idle and ready for reuse.

spg99_pool_checkout_wait_seconds

How long clients wait to obtain a backend connection.

spg99_pool_checkout_timeouts_total

How many times checkout hit a timeout or the pool was exhausted.

spg99_pool_backend_connect_total

How many attempts were made to open a backend connection, broken down by result.

spg99_pool_resets_total

How many backend connections were reset before reuse.

What is especially important for the new autoscaler

spg99_session_pinned_total

Shows how often sessions enter the pinned state and for what reason.

This is especially useful when:

  • the application unexpectedly started using SET, temp tables, cursors, LISTEN, or named prepared statements;
  • transaction pooling stopped delivering the expected savings;
  • autoscale handoff cannot reach a safe drain point.

Freeze and cutover

When freeze_new_checkouts=true, Gateway must stop issuing new checkouts to the old writer. At that moment, it is especially useful to watch:

  • checkout waits;
  • checkout timeouts;
  • pinned sessions;
  • overall lease duration and the number of active connections.

How to interpret the metrics in practice

Scenario 1: connections are slow only after idle

Look at spg99_lease_acquire_total, spg99_lease_active, spg99_pool_backend_connect_total, and checkout timings. If startup is slow only on a sleeping database, this is a normal cold-start path.

Scenario 2: handoff is stuck on freeze/drain

Look at spg99_session_pinned_total, checkout wait, and the total number of active/idle client connections. Often the problem is not the platform itself, but the workload keeping too much session state alive.

Scenario 3: the application is bottlenecked by the pool

Look at spg99_pool_checkout_timeouts_total, spg99_pool_total_conns, and spg99_pool_idle_conns. If timeouts are increasing while the idle pool is zero, you either need a larger profile, lower client fan-out, or an explanation of why the workload pins sessions.

Scenario 4: the route to the backend is unstable

Look at gw_backend_tls_handshake_errors_total and spg99_pool_backend_connect_total{result=...}. This helps quickly separate a routing/TLS issue from client-login errors.

Practical conclusion

For Gateway, metrics are the main way to understand:

  • whether autostart works;
  • whether pooling is delivering real savings;
  • whether the workload has shifted from stateless to session-heavy;
  • whether pinned traffic is preventing a safe autoscale handoff.