geofileops.delete_duplicate_geometries#

geofileops.delete_duplicate_geometries(input_path: str | os.PathLike[Any], output_path: str | os.PathLike[Any], input_layer: str | None = None, output_layer: str | None = None, columns: list[str] | None = None, priority_column: str | None = None, priority_ascending: bool = True, explodecollections: bool = False, keep_empty_geoms: bool = False, where_post: str | None = None, nb_parallel: int | None = None, batchsize: int = -1, force: bool = False) None#

Copy all rows to the output file, except for duplicate geometries.

The check for duplicates is done using ST_Equals. ST_Equals is True if` the given geometries are “topologically equal”. This means that the geometries have the same dimension and their point-sets occupy the same space. This means e.g. that the order of vertices may be different, starting points of rings can be different and polygons can contain extra points if they don’t change the surface occupied.

If a priority_column is specified, the row with the lowest value in this column is retained. If priority_ascending is False, the row with the highest value is retained.

If explodecollections is False and the input and output file type is GeoPackage, the fid will be preserved. In other cases this will typically not be the case.

Parameters:
  • input_path (PathLike) – the input file

  • output_path (PathLike) – the file to write the result to

  • input_layer (str, optional) – input layer name. If None, input_path should contain only one layer. Defaults to None.

  • output_layer (str, optional) – output layer name. If None, the output_path stem is used. Defaults to None.

  • columns (List[str], optional) – list of columns to retain. If None, all standard columns are retained. In addition to standard columns, it is also possible to specify “fid”, a unique index available in all input files. Note that the “fid” will be aliased eg. to “fid_1”. Defaults to None.

  • priority_column (str, optional) – column to use as priority for keeping rows. Defaults to None.

  • priority_ascending (bool, optional) – True to keep the row with the lowest priority value. Defaults to True.

  • explodecollections (bool, optional) – True to output only simple geometries. Defaults to False.

  • keep_empty_geoms (bool, optional) – True to keep rows with empty/null geometries in the output. Defaults to False.

  • where_post (str, optional) – SQL filter to apply after all other processing, including e.g. explodecollections. It should be in sqlite syntax and spatialite reference functions can be used. Defaults to None.

  • nb_parallel (int | None, optional) – the number of parallel workers to use. If None, the preference set in the nb_parallel configuration option is used, which defaults to the number of CPU cores available. For more information, see options.set_nb_parallel(). Defaults to None.

  • batchsize (int, optional) – indicative number of rows to process per batch. A smaller batch size, possibly in combination with a smaller nb_parallel, will reduce the memory usage. Defaults to -1: (try to) determine optimal size automatically.

  • force (bool, optional) – overwrite existing output file(s). Defaults to False.