The track record
Every call is logged when it's made, hashed into a chain so it can't be rewritten, and graded against ERCOT's official settlements. The misses stay on the page. That's the product.
Live · summer 2026
graded as each month settles, provisional until ERCOT reconcilesHow we score
backtest over the 2022 to 2024 summers, frozen at model releaseWe flagged 36 days as WARNING or higher. 6 of the 12 monthly peaks landed on a flagged day. Alerting on every hot day would have meant 81 flagged days and 138 hours of false alerts.
Whiskers are 95% intervals. The sample is small and we say so.
2025 was not used to calibrate the model. Graded the same way, after the fact: we flagged 13 days and caught 3 of the 4 peaks. The hot-day rule flagged 24 days to catch all 4.
Held out means graded retrospectively on data the model never trained on. It is not a live season. The live test is happening right now, below.
Live calls are graded as each month settles. Nothing here is mixed into the backtest numbers above.
Why you can trust this ledger
Each call is hashed into an append-only chain the moment it publishes, and the chain's terminal hash posts daily. If we ever rewrote a call, the chain would break in public. Verify it yourself: the attestations feed and the full method.
The honest tradeoff: we flag fewer days than a flag-every-hot-day rule and accept we might miss a peak. The Lab lets you pick your own line between interruptions and misses.
2025, the held-out summer
graded on data the model never trained on, official settled intervals| Date | Call | p | Outcome | Kind |
|---|---|---|---|---|
| June 20252 calls, 2 flagged | ||||
| 2025-06-01 | ▲ WARNING | 59% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-06-19 | △ WATCH | 35% | Missthe month peak came on a day we did not flag | held out |
| July 20257 calls, 7 flagged | ||||
| 2025-07-01 | ▲ WARNING | 73% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-07-23 | ▲ WARNING | 63% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-07-24 | ▲ WARNING | 65% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-07-28 | ▲ WARNING | 58% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-07-29 | ▲ WARNING | 65% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-07-30 | ▲ WARNING | 67% | Hit | held out |
| 2025-07-31 | ▲ WARNING | 63% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| August 20253 calls, 3 flagged | ||||
| 2025-08-01 | ▲ WARNING | 93% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-08-18 | ▲ WARNING | 66% | Hit | held out |
| 2025-08-28 | ▲ WARNING | 59% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| September 20252 calls, 2 flagged | ||||
| 2025-09-01 | ▲ WARNING | 52% | False alarmflagged WARNING+, but the month peaked elsewhere | held out |
| 2025-09-04 | ▲ WARNING | 58% | Hit | held out |
A hit is a day we flagged WARNING or higher that turned out to hold the month's true peak, graded against official settled intervals. Backtests never masquerade as live performance.
Every call, graded
2024 backtest, graded against official settled intervals| Date | Call | p | Outcome | Kind |
|---|---|---|---|---|
| June 20246 calls, 6 flagged | ||||
| 2024-06-24 | ▲ WARNING | 51% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-06-25 | ▲ WARNING | 56% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-06-27 | ▲ WARNING | 61% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-06-28 | ▲ WARNING | 55% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-06-29 | ▲ WARNING | 54% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-06-30 | ▲ WARNING | 63% | Hit | backtest |
| July 20241 call, 1 flagged | ||||
| 2024-07-01 | ▲ WARNING | 93% | Hit | backtest |
| August 20248 calls, 8 flagged | ||||
| 2024-08-01 | ▲ WARNING | 90% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-08-07 | ▲ WARNING | 67% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-08-18 | ▲ WARNING | 50% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-08-19 | ▲ WARNING | 75% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-08-20 | ▲ WARNING | 71% | Hit | backtest |
| 2024-08-21 | ▲ WARNING | 61% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-08-22 | ▲ WARNING | 65% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-08-23 | ▲ WARNING | 58% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| September 20242 calls, 2 flagged | ||||
| 2024-09-13 | ▲ WARNING | 63% | False alarmflagged WARNING+, but the month peaked elsewhere | backtest |
| 2024-09-19 | △ WATCH | 49% | Missthe month peak came on a day we did not flag | backtest |
A hit is a day we flagged WARNING or higher that turned out to hold the month's true peak, graded against official settled intervals. Backtests never masquerade as live performance.