Roadmap

This roadmap outlines the long-term architectural vision and planned research phases for the NGED Substation Forecast project.

Phase 1: Infrastructure Plumbing & The "Universal" XGBoost Baseline

Before attempting to detect complex switching events, we must prove the data engineering, MLflow tracking, and Dagster orchestration work end-to-end in a production environment.

Ingestion: Download NGED data and ECMWF ensemble NWP. Convert NGED data to Delta Lake and index via H3. Convert ECMWF to Parquet and index via H3.
The "Universal" Model: Build a single XGBoost forecast for the ~50 trial sites. This model is "universal" in two ways: it is trained globally across all substations and across all forecast horizons (0-14 days) using lead_time_hours as a feature. Output an ensemble of power forecasts by passing each NWP ensemble member through the XGBoost model one-by-one.
Purpose: This model will intentionally ignore switching events. The forecast will be flawed, but it serves as an integration test for our infrastructure.

Target completion date: End of May 2026.

Phase 2: Topology Switching Detection

Isolate and label historical switching events to significantly improve the demand forecasts.

Exogenous Baseline: Train K-Fold Out-of-Fold (OOF) XGBoost models using only weather and time features.
Residual Detection: Apply CUSUM or Rolling Difference filters to the baseline residuals to flag topological step-changes.
Spatial Verification (Super-Node Test): Validate switches by applying Kirchhoff's laws to neighbours.

Target completion date: End of June 2026.

Phase 3: Upgrade forecast to handle switching events, and output NRA forecasts

With clean switching labels, build the robust statistical baseline.

Substation NRA (normal running arrangement) Forecast (Universal Model): Train one universal demand forecast XGBoost model across all substations and horizons. Use the switching labels to estimate the transfer magnitude and mathematically "correct" the historical SCADA data.
Customer Meter Forecasts (Clustered/Local Models): Train models clustered by asset type.

Target completion date: End of July 2026.

Phase 4: Grey-Box Physics-Informed Neural Network (PyTorch)

Transitioning from black-box trees to explainable physics to explicitly model Behind-The-Meter (BTM) assets.

Explicit Disaggregation: The model explicitly outputs unmetered solar and unmetered wind alongside gross demand.
Asset Discovery: By feeding CM-SAF irradiance to a differentiable solar-torch module, physical parameters (installed capacity, tilt, azimuth, DC:AC capacity, shading) become learnable via gradient descent.
Natively Handling MVA Telemetry: Older substations report absolute apparent power (MVA), hiding reverse power flows.

Target completion date: End of November 2026.

Phase 5: Spatial Graph Neural Network

Capturing highly non-linear, cross-network interactions during extreme events.

Dynamic Adjacency Matrix: The live Switching Detector acts as a permanent interceptor, continuously updating the adjacency matrix before inference.
This ensures the GNN's message-passing layers route data along the actual physical paths of the switched grid.

Target completion date: End of Feb 2027.

Phase 6: Further research

Continually improve the current best model.
Implement forecasts for BSPs and GSPs.
Experiment with pre-trained encoders.
Multi-sequence alignment with axial attention.
Disaggregate other DERs (e.g. EV chargers, or batteries).