geofileops.dissolve#
- geofileops.dissolve(input_path: Union[str, os.PathLike[Any]], output_path: Union[str, os.PathLike[Any]], explodecollections: bool, groupby_columns: Optional[Union[List[str], str]] = None, agg_columns: Optional[dict] = None, tiles_path: Optional[Union[str, os.PathLike[Any]]] = None, nb_squarish_tiles: int = 1, input_layer: Optional[str] = None, output_layer: Optional[str] = None, gridsize: float = 0.0, where_post: Optional[str] = None, nb_parallel: int = -1, batchsize: int = -1, force: bool = False)#
Applies a dissolve operation on the input file.
If columns are specified with
groupby_columns
, the data is first grouped on those columns before the geometries are merged.Data in other columns can be retained in the output by specifying the
agg_columns
parameter.Because the input layer is tiled using a grid to speed up, extra collinear points will typically be present in the output geometries. Rows with null or empty geometries are ignored.
This is an example of how data in the columns that isn’t grouped on can be aggregated to be added to the output file:
import geofileops as gfo gfo.dissolve( input_path=..., output_path=..., groupby_columns=["cropgroup"], agg_columns={ "columns": [ {"column": "crop", "agg": "max", "as": "crop_max"}, {"column": "crop", "agg": "count", "as": "crop_count"}, { "column": "crop", "agg": "concat", "as": "crop_concat", "distinct": True, "sep": ";", }, {"column": "area", "agg": "mean", "as": "area_mean"}, ] }, explodecollections=False, )
The following example will save all detailed data for the columns “crop_label” and “area” in the output file. The detailed data is encoded per group/row in a “json” text field. Shapefiles only support up to 254 characters in a text field, so this format won’t be very suited as output format for this option.
import geofileops as gfo gfo.dissolve( input_path=..., output_path=..., groupby_columns=["cropgroup"], agg_columns={"json": ["crop", "area"]}, explodecollections=False, )
This results in this type of output:
cropgroup json Grasses ["{"crop":"Meadow","area":1290,"fid_orig":5}","{"crop":"Pasture",... Maize ["{"crop":"Silo","area":3889.29,"fid_orig":2}","{"crop":"Fodder",...
If the output is tiled (by specifying
tiles_path
ornb_squarish_tiles
> 1), the result will be clipped on the output tiles and the tile borders are never crossed.- Parameters:
input_path (PathLike) – the input file
output_path (PathLike) – the file to write the result to
explodecollections (bool) – True to output only simple geometries. If False, this can result in huge geometries for large files, especially if no
groupby_columns
are specified.groupby_columns (Union[List[str], str], optional) – columns (case insensitive) to group on while aggregating. Defaults to None, resulting in a spatial union of all geometries that touch.
agg_columns (dict, optional) –
columns to aggregate based on the groupings by groupby columns. Depending on the top-level key value of the dict, the output for the aggregation is different:
”json”: dump all data per group to one “json” column. The value can be None (= all columns) or a list of columns to include.
”columns”: aggregate to seperate columns. The value should be a list of dicts with the following keys:
”column”: column name (case insensitive) in the input file. In addition to standard columns, it is also possible to specify “fid”, a unique index available in all input files.
”agg”: aggregation to use:
count: the number of values in the group
sum: the sum of the values in the group
mean: the mean/average of the values in the group
min: the minimum value in the group
max: the maximum value in the group
median: the median value in the group
concat: all non-null values in the group concatenated (in arbitrary order)
”as”: column name in the output file. Note: using “fid” as alias is not recommended: it can cause errors or odd behaviour.
”distinct” (optional): True to distinct the values before aggregation.
”sep” (optional): the separator to use for concat. Default: “,”.
tiles_path (PathLike, optional) – a path to a geofile containing tiles. If specified, the output will be dissolved/unioned only within the tiles provided. Can be used to avoid huge geometries being created if the input geometries are very interconnected. Defaults to None (= the output is not tiled).
nb_squarish_tiles (int, optional) – the approximate number of tiles the output should be dissolved/unioned to. If > 1, a tiling grid is automatically created based on the total bounds of the input file. The input geometries will be dissolved/unioned only within the tiles generated. Can be used to avoid huge geometries being created if the input geometries are very interconnected. Defaults to 1 (= the output is not tiled).
input_layer (str, optional) – input layer name. Optional if the file only contains one layer.
output_layer (str, optional) – input layer name. Optional if the file only contains one layer.
gridsize (float, optional) – the size of the grid the coordinates of the ouput will be rounded to. Eg. 0.001 to keep 3 decimals. Value 0.0 doesn’t change the precision. Defaults to 0.0.
where_post (str, optional) – SQL filter to apply after all other processing, including e.g.
explodecollections
. It should be in sqlite syntax and spatialite reference functions can be used. Defaults to None.nb_parallel (int, optional) – the number of parallel processes to use. Defaults to -1: use all available CPUs.
batchsize (int, optional) – indicative number of rows to process per batch. A smaller batch size, possibly in combination with a smaller
nb_parallel
, will reduce the memory usage. Defaults to -1: (try to) determine optimal size automatically.force (bool, optional) – overwrite existing output file(s). Defaults to False.