geofileops.delete_duplicate_geometries#
- geofileops.delete_duplicate_geometries(input_path: str | os.PathLike[Any], output_path: str | os.PathLike[Any], input_layer: str | None = None, output_layer: str | None = None, columns: list[str] | None = None, priority_column: str | None = None, priority_ascending: bool = True, explodecollections: bool = False, keep_empty_geoms: bool = False, where_post: str | None = None, nb_parallel: int | None = None, batchsize: int = -1, force: bool = False) None#
Copy all rows to the output file, except for duplicate geometries.
The check for duplicates is done using
ST_Equals.ST_EqualsisTrueif` the given geometries are “topologically equal”. This means that the geometries have the same dimension and their point-sets occupy the same space. This means e.g. that the order of vertices may be different, starting points of rings can be different and polygons can contain extra points if they don’t change the surface occupied.If a
priority_columnis specified, the row with the lowest value in this column is retained. Ifpriority_ascendingis False, the row with the highest value is retained.If
explodecollectionsis False and the input and output file type is GeoPackage, the fid will be preserved. In other cases this will typically not be the case.- Parameters:
input_path (PathLike) – the input file
output_path (PathLike) – the file to write the result to
input_layer (str, optional) – input layer name. If None,
input_pathshould contain only one layer. Defaults to None.output_layer (str, optional) – output layer name. If None, the
output_pathstem is used. Defaults to None.columns (List[str], optional) – list of columns to retain. If None, all standard columns are retained. In addition to standard columns, it is also possible to specify “fid”, a unique index available in all input files. Note that the “fid” will be aliased eg. to “fid_1”. Defaults to None.
priority_column (str, optional) – column to use as priority for keeping rows. Defaults to None.
priority_ascending (bool, optional) – True to keep the row with the lowest priority value. Defaults to True.
explodecollections (bool, optional) – True to output only simple geometries. Defaults to False.
keep_empty_geoms (bool, optional) – True to keep rows with empty/null geometries in the output. Defaults to False.
where_post (str, optional) – SQL filter to apply after all other processing, including e.g.
explodecollections. It should be in sqlite syntax and spatialite reference functions can be used. Defaults to None.nb_parallel (int | None, optional) – the number of parallel workers to use. If None, the preference set in the nb_parallel configuration option is used, which defaults to the number of CPU cores available. For more information, see
options.set_nb_parallel(). Defaults to None.batchsize (int, optional) – indicative number of rows to process per batch. A smaller batch size, possibly in combination with a smaller
nb_parallel, will reduce the memory usage. Defaults to -1: (try to) determine optimal size automatically.force (bool, optional) – overwrite existing output file(s). Defaults to False.