Solution · Performance

P99 you can trust enough to alert on.

Latency is lognormal. A spike on top of a stable baseline matters. A spike during a known cold boot does not. Engager understands the difference, so the people on call do not have to.

24h

Rolling baseline

Per host, per region

1.5×

Anomaly threshold

Configurable per host

3

Sustained samples

Single spikes never page

200ms

Floor

Below floor, never anomalous

The math

Percentile, not mean.

An average is the mean of the lognormal pile. The number is misleading the moment one outlier lands. Engager keeps the full distribution and reads off P95 and P99 directly.

Reports show both. Alerts evaluate against P99. The mean is there for context, not for paging.

api.rookhq.com

Response time · last 24h

Anomaly detected

P95

412 ms

P99

520 ms

Baseline P95

220 ms

Solid line · live P95. Dashed baseline · 24h rolling P95. Filled red · anomalous samples.

Anomaly detection

A spike on top of stable beats a spike on top of noise.

Engager remembers the rolling P99 baseline for the last twenty four hours, per host, per region. A current ping qualifies as anomalous when it exceeds 1.5× the baseline AND the absolute floor.

And it has to happen three times in a row. Single bumps are noise, sustained drift is signal.

Three samples, one anomaly

Cold start aware

A free tier wake up is not an outage.

Render free tier dynos take fifteen to thirty seconds to wake. Engager stages re verification at ten, twenty five, and fifty five seconds before declaring a failure. That includes the latency budget.

The result: a dyno that just boots is never paged as slow.

Stage 1 · 10s · Stage 2 · 25s · Stage 3 · 55s

What lands in the report

Every section is a number you can defend.

Average

For context only. Reports show it next to the percentile pair so the difference is obvious.

P95 and P99

Computed from the live distribution, not bucketed approximations.

Anomaly count

How many sustained anomalous windows fired in the last twenty four hours.

Min and max

The bookends. Useful when a sample run is an obvious outlier.

Protocol

HTTP/2 by default, HTTP/3 when offered. Downgrades surface as a security event.

Deploy correlation

When latency drifts on a host within fifteen minutes of a tracked Github push, the commit is tagged.

Make P99 the number you alert on.