Skip to content

Geo API

Geo Package

This package contains generic geospatial logic and data for the NGED substation forecast project.

Contents

  • geo.h3: H3-related utilities, including grid weight computation.
  • geo.assets: Dagster assets for the UK boundary and H3 grid weights.
  • assets/: Generic geospatial assets like GeoJSON files.

Map of Great Britain using H3 resolution 5 hexagons

Purpose

The geo package is designed to decouple generic geospatial operations from dataset-specific ingestion logic (e.g., ECMWF data processing in dynamical_data). This ensures that any package in the workspace can perform spatial transformations, such as mapping latitude/longitude grids to H3 hexagons, without depending on heavy or unrelated packages.

Key Features

  • H3 Grid Mapping: Utilities to map regular latitude/longitude grids to H3 hexagons.
  • Dagster Assets: Provides uk_boundary and gb_h3_grid_weights assets for use in the main forecasting pipeline.
  • Parameterization: Functions are parameterized to support different grid sizes (e.g., 0.25-degree vs. 1km) and H3 resolutions.
  • Data Contracts: Uses Patito contracts (defined in packages/contracts) like H3GridWeights to ensure strict validation of spatial mapping data.

geo.assets

Classes

H3GridConfig

Bases: Config

Configuration for the H3 grid weights computation.

Attributes:

Name Type Description
h3_res int

The H3 resolution to use for the grid (default 5).

grid_size float

The size of the regular lat/lng grid in degrees (default 0.25).

child_res int | None

The H3 resolution to use for the underlying points. If None, it defaults to h3_res + 2.

Source code in packages/geo/src/geo/assets.py
15
16
17
18
19
20
21
22
23
24
25
26
27
class H3GridConfig(Config):
    """Configuration for the H3 grid weights computation.

    Attributes:
        h3_res: The H3 resolution to use for the grid (default 5).
        grid_size: The size of the regular lat/lng grid in degrees (default 0.25).
        child_res: The H3 resolution to use for the underlying points. If None,
            it defaults to h3_res + 2.
    """

    h3_res: int = 5
    grid_size: float = 0.25
    child_res: int | None = None

Functions

gb_h3_grid_weights(context, config, uk_boundary)

Computes the H3 grid weights for Great Britain based on the UK boundary.

This asset dynamically generates the spatial mapping between the hexagonal H3 grid and the regular lat/lng NWP grid. It acts as the foundational reference data for downstream weather data ingestion (e.g., ECMWF ENS forecasts), ensuring weather variables are correctly area-weighted to the H3 cells.

The mapping is calculated by sampling each H3 cell with finer-resolution child cells and determining which regular grid cell each child falls into. The grid_size parameter is used to snap high-resolution H3 cells to the nearest regular NWP grid points.

Source code in packages/geo/src/geo/assets.py
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
@asset(group_name="reference_data")
def gb_h3_grid_weights(
    context: AssetExecutionContext, config: H3GridConfig, uk_boundary: BaseGeometry
) -> pl.DataFrame:
    """Computes the H3 grid weights for Great Britain based on the UK boundary.

    This asset dynamically generates the spatial mapping between the hexagonal H3 grid
    and the regular lat/lng NWP grid. It acts as the foundational reference data
    for downstream weather data ingestion (e.g., ECMWF ENS forecasts), ensuring
    weather variables are correctly area-weighted to the H3 cells.

    The mapping is calculated by sampling each H3 cell with finer-resolution child
    cells and determining which regular grid cell each child falls into. The
    `grid_size` parameter is used to snap high-resolution H3 cells to the nearest
    regular NWP grid points.
    """
    h3_res = config.h3_res
    grid_size = config.grid_size
    # The `+2` heuristic provides ~49 sample points per H3 cell (7^2), which is a
    # sufficient balance between spatial precision (for area-weighting against a
    # 0.25-degree grid) and computation time/memory overhead. Increasing it
    # further could cause an exponential explosion in the number of child cells
    # and potentially trigger OOM errors.
    child_res = config.child_res if config.child_res is not None else h3_res + 2

    if child_res <= h3_res:
        raise ValueError(f"child_res ({child_res}) must be strictly greater than h3_res ({h3_res})")

    context.log.info(f"Generating H3 cells at resolution {h3_res}...")

    # Note: In h3-py v4+, `h3.geo_to_cells` is the generic entry point for any object
    # implementing `__geo_interface__` (like shapely.Polygon). `h3.polygon_to_cells`
    # is an alias for `h3shape_to_cells` and expects an internal H3Shape object,
    # which would fail here.
    cells = h3.geo_to_cells(uk_boundary, res=h3_res)
    if not cells:
        raise ValueError(
            f"No H3 cells found for the given boundary at resolution {h3_res}. "
            "Check if the boundary geometry is valid and covers the expected area."
        )

    df = pl.DataFrame({"h3_index": list(cells)}, schema={"h3_index": pl.UInt64}).sort("h3_index")

    context.log.info(
        f"Computing H3 grid weights for grid size {grid_size} with child_res {child_res}..."
    )
    df_with_counts = compute_h3_grid_weights(df, grid_size=grid_size, child_res=child_res)

    return pl.DataFrame(H3GridWeights.validate(df_with_counts))

uk_boundary(context)

Loads the UK boundary geometry from a local GeoJSON file.

The boundary is projected to EPSG:27700 (British National Grid) and buffered by 25,000 meters to ensure that coastal substations and nearby islands are included in the resulting H3 grid without spatial distortion.

Source code in packages/geo/src/geo/assets.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
@asset(group_name="reference_data")
def uk_boundary(context: AssetExecutionContext) -> BaseGeometry:
    """Loads the UK boundary geometry from a local GeoJSON file.

    The boundary is projected to EPSG:27700 (British National Grid) and buffered
    by 25,000 meters to ensure that coastal substations and nearby islands are
    included in the resulting H3 grid without spatial distortion.
    """
    geojson_path = importlib.resources.files("geo").joinpath(
        "assets/england_scotland_wales.geojson"
    )

    context.log.info(f"Loading UK boundary from {geojson_path}")
    file_contents = geojson_path.read_text()

    shape: BaseGeometry = shapely.from_geojson(file_contents)

    # Project to OSGB36 (EPSG:27700) for metric buffering
    project_to_osgb = pyproj.Transformer.from_crs(
        "EPSG:4326", "EPSG:27700", always_xy=True
    ).transform
    project_to_wgs84 = pyproj.Transformer.from_crs(
        "EPSG:27700", "EPSG:4326", always_xy=True
    ).transform

    shape_osgb = transform(project_to_osgb, shape)
    # Buffer by 25,000 meters (25km) to ensure that even the most coastal H3 cells
    # will have at least one overlapping NWP grid cell (given the 0.25-degree
    # resolution, which is ~28km at UK latitudes). This prevents coastal
    # substations from losing coverage from the nearest NWP grid points.
    shape_osgb_buffered = shape_osgb.buffer(25000)
    shape_buffered = transform(project_to_wgs84, shape_osgb_buffered)

    return shape_buffered

geo.h3

H3-related utilities for geospatial operations.

Classes

Functions

compute_h3_grid_weights(df, grid_size, child_res=7)

Computes the proportion mapping for H3 grid cells to a regular lat/lng grid.

This function takes a DataFrame containing H3 indices and calculates how many child H3 cells at a finer resolution (child_res) fall into each cell of a regular lat/lng grid of size grid_size.

The regular grid is assumed to be perfectly aligned to 0.0 (e.g., 0.0, grid_size, grid_size * 2).

Parameters:

Name Type Description Default
df DataFrame

A Polars DataFrame containing an 'h3_index' column (UInt64).

required
grid_size float

The size of the regular lat/lng grid in degrees (e.g., 0.25).

required
child_res int

The H3 resolution to use for the underlying points. Must be strictly greater than the resolution of the input 'h3_index' column. Defaults to 7. The +2 heuristic (e.g., res 5 -> res 7) provides ~49 sample points per H3 cell (7^2), which is a sufficient balance between spatial precision (for area-weighting against a 0.25-degree grid) and computation time/memory overhead. Increasing it further could cause an exponential explosion in the number of child cells and potentially trigger OOM errors.

7

Returns:

Type Description
DataFrame

A Polars DataFrame with columns: - h3_index: The original H3 index. - nwp_lat: The latitude of the regular grid cell. - nwp_lng: The longitude of the regular grid cell. - len: The number of child H3 cells in this grid cell. - total: The total number of child H3 cells for this h3_index. - proportion: The proportion of the H3 cell that falls into this grid cell.

Source code in packages/geo/src/geo/h3.py
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
def compute_h3_grid_weights(df: pl.DataFrame, grid_size: float, child_res: int = 7) -> pl.DataFrame:
    """Computes the proportion mapping for H3 grid cells to a regular lat/lng grid.

    This function takes a DataFrame containing H3 indices and calculates how many
    child H3 cells at a finer resolution (`child_res`) fall into each cell of a
    regular lat/lng grid of size `grid_size`.

    The regular grid is assumed to be perfectly aligned to 0.0 (e.g., 0.0, `grid_size`,
    `grid_size * 2`).

    Args:
        df: A Polars DataFrame containing an 'h3_index' column (UInt64).
        grid_size: The size of the regular lat/lng grid in degrees (e.g., 0.25).
        child_res: The H3 resolution to use for the underlying points. Must be
            strictly greater than the resolution of the input 'h3_index' column.
            Defaults to 7. The `+2` heuristic (e.g., res 5 -> res 7) provides ~49
            sample points per H3 cell (7^2), which is a sufficient balance between
            spatial precision (for area-weighting against a 0.25-degree grid) and
            computation time/memory overhead. Increasing it further could cause
            an exponential explosion in the number of child cells and potentially
            trigger OOM errors.

    Returns:
        A Polars DataFrame with columns:
            - h3_index: The original H3 index.
            - nwp_lat: The latitude of the regular grid cell.
            - nwp_lng: The longitude of the regular grid cell.
            - len: The number of child H3 cells in this grid cell.
            - total: The total number of child H3 cells for this h3_index.
            - proportion: The proportion of the H3 cell that falls into this grid cell.
    """
    if df.is_empty():
        raise ValueError("Input DataFrame is empty.")

    # Ensure grid_size is strictly positive to avoid division by zero
    # or nonsensical snapping.
    if grid_size <= 0:
        raise ValueError("grid_size must be strictly positive")

    # Check resolution of all H3 indices to ensure consistency.
    # Using polars_h3.get_resolution for vectorized check.
    h3_res_unique = df.select(plh3.get_resolution("h3_index")).unique()
    if h3_res_unique.height > 1:
        raise ValueError("All H3 indices must have the same resolution.")
    h3_res = h3_res_unique.item()

    if child_res <= h3_res:
        raise ValueError(f"child_res ({child_res}) must be strictly greater than h3_res ({h3_res})")

    weights_df = (
        df.with_columns(child_h3=plh3.cell_to_children("h3_index", child_res))
        .explode("child_h3")
        .with_columns(
            # GRID SNAPPING FORMULA:
            # The half-grid offset binning `((lat + grid_size/2) / grid_size).floor() * grid_size`
            # ensures that points are snapped to the *closest* grid center rather than
            # the bottom-left corner of the grid cell. Adding `grid_size/2` before
            # flooring shifts the bin boundaries so that the grid points (0, 0.25, 0.5, etc.)
            # are at the center of each bin.
            nwp_lat=((plh3.cell_to_lat("child_h3") + (grid_size / 2)) / grid_size).floor()
            * grid_size,
            nwp_lng=((plh3.cell_to_lng("child_h3") + (grid_size / 2)) / grid_size).floor()
            * grid_size,
        )
        .group_by(["h3_index", "nwp_lat", "nwp_lng"])
        .len()
        .with_columns(
            total=pl.col("len").sum().over("h3_index"),
        )
        .with_columns(
            # Ensure len and total are UInt32 as per contract
            len=pl.col("len").cast(pl.UInt32),
            total=pl.col("total").cast(pl.UInt32),
            proportion=pl.col("len") / pl.col("total"),
        )
    )

    return H3GridWeights.validate(weights_df, drop_superfluous_columns=True)

geo.io_managers