geofileops.dissolve#

geofileops.dissolve(input_path: Union[str, os.PathLike[Any]], output_path: Union[str, os.PathLike[Any]], explodecollections: bool, groupby_columns: Optional[Union[List[str], str]] = None, agg_columns: Optional[dict] = None, tiles_path: Optional[Union[str, os.PathLike[Any]]] = None, nb_squarish_tiles: int = 1, input_layer: Optional[str] = None, output_layer: Optional[str] = None, gridsize: float = 0.0, where_post: Optional[str] = None, nb_parallel: int = -1, batchsize: int = -1, force: bool = False)#

Applies a dissolve operation on the input file.

If columns are specified with groupby_columns, the data is first grouped on those columns before the geometries are merged.

Data in other columns can be retained in the output by specifying the agg_columns parameter.

Because the input layer is tiled using a grid to speed up, extra collinear points will typically be present in the output geometries. Rows with null or empty geometries are ignored.

This is an example of how data in the columns that isn’t grouped on can be aggregated to be added to the output file:

import geofileops as gfo

gfo.dissolve(
    input_path=...,
    output_path=...,
    groupby_columns=["cropgroup"],
    agg_columns={
        "columns": [
            {"column": "crop", "agg": "max", "as": "crop_max"},
            {"column": "crop", "agg": "count", "as": "crop_count"},
            {
                "column": "crop",
                "agg": "concat",
                "as": "crop_concat",
                "distinct": True,
                "sep": ";",
            },
            {"column": "area", "agg": "mean", "as": "area_mean"},
        ]
    },
    explodecollections=False,
)

The following example will save all detailed data for the columns “crop_label” and “area” in the output file. The detailed data is encoded per group/row in a “json” text field. Shapefiles only support up to 254 characters in a text field, so this format won’t be very suited as output format for this option.

import geofileops as gfo

gfo.dissolve(
    input_path=...,
    output_path=...,
    groupby_columns=["cropgroup"],
    agg_columns={"json": ["crop", "area"]},
    explodecollections=False,
)

This results in this type of output:

cropgroup  json
Grasses    ["{"crop":"Meadow","area":1290,"fid_orig":5}","{"crop":"Pasture",...
Maize      ["{"crop":"Silo","area":3889.29,"fid_orig":2}","{"crop":"Fodder",...

If the output is tiled (by specifying tiles_path or nb_squarish_tiles > 1), the result will be clipped on the output tiles and the tile borders are never crossed.

Parameters:
  • input_path (PathLike) – the input file

  • output_path (PathLike) – the file to write the result to

  • explodecollections (bool) – True to output only simple geometries. If False, this can result in huge geometries for large files, especially if no groupby_columns are specified.

  • groupby_columns (Union[List[str], str], optional) – columns (case insensitive) to group on while aggregating. Defaults to None, resulting in a spatial union of all geometries that touch.

  • agg_columns (dict, optional) –

    columns to aggregate based on the groupings by groupby columns. Depending on the top-level key value of the dict, the output for the aggregation is different:

    • ”json”: dump all data per group to one “json” column. The value can be None (= all columns) or a list of columns to include.

    • ”columns”: aggregate to seperate columns. The value should be a list of dicts with the following keys:

      • ”column”: column name (case insensitive) in the input file. In addition to standard columns, it is also possible to specify “fid”, a unique index available in all input files.

      • ”agg”: aggregation to use:

        • count: the number of values in the group

        • sum: the sum of the values in the group

        • mean: the mean/average of the values in the group

        • min: the minimum value in the group

        • max: the maximum value in the group

        • median: the median value in the group

        • concat: all non-null values in the group concatenated (in arbitrary order)

      • ”as”: column name in the output file. Note: using “fid” as alias is not recommended: it can cause errors or odd behaviour.

      • ”distinct” (optional): True to distinct the values before aggregation.

      • ”sep” (optional): the separator to use for concat. Default: “,”.

  • tiles_path (PathLike, optional) – a path to a geofile containing tiles. If specified, the output will be dissolved/unioned only within the tiles provided. Can be used to avoid huge geometries being created if the input geometries are very interconnected. Defaults to None (= the output is not tiled).

  • nb_squarish_tiles (int, optional) – the approximate number of tiles the output should be dissolved/unioned to. If > 1, a tiling grid is automatically created based on the total bounds of the input file. The input geometries will be dissolved/unioned only within the tiles generated. Can be used to avoid huge geometries being created if the input geometries are very interconnected. Defaults to 1 (= the output is not tiled).

  • input_layer (str, optional) – input layer name. Optional if the file only contains one layer.

  • output_layer (str, optional) – input layer name. Optional if the file only contains one layer.

  • gridsize (float, optional) – the size of the grid the coordinates of the ouput will be rounded to. Eg. 0.001 to keep 3 decimals. Value 0.0 doesn’t change the precision. Defaults to 0.0.

  • where_post (str, optional) – SQL filter to apply after all other processing, including e.g. explodecollections. It should be in sqlite syntax and spatialite reference functions can be used. Defaults to None.

  • nb_parallel (int, optional) – the number of parallel processes to use. Defaults to -1: use all available CPUs.

  • batchsize (int, optional) – indicative number of rows to process per batch. A smaller batch size, possibly in combination with a smaller nb_parallel, will reduce the memory usage. Defaults to -1: (try to) determine optimal size automatically.

  • force (bool, optional) – overwrite existing output file(s). Defaults to False.