User guide#
The main objective of geofileops is to provide a simple to use but powerful API to do fast spatial operations on large vector GIS files.
General#
To speed up processing, geofileops uses multiprocessing under the hood. Because of that,
you should always use the if __name__ == "__main__":
block in standalone Python
scripts. More information: FAQ - Standalone scripts.
Also interesting to know: because processing large files can take some time, geofileops logs progress info using the standard logging module.
Combining both, a basic script using geofileops can look like this:
import logging
import geofileops as gfo
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
gfo.buffer(input_path="input.gpkg", output_path="output.gpkg", distance=2)
Finally, most general file/layer operations can be used on any file format supported by GDAL. For the spatial tools, only geopackages and shapefiles are supported but geopackage is very recommended for many reasons.
Geometry tools#
The typical geometry tools are directly
supported, eg. buffer()
, simplify()
, convexhull()
,
dissolve()
, …
gfo.simplify(input_path="...", output_path="...", algorythm="vw", tolerance=1)
Some more exotic ones are e.g. dissolve_within_distance()
and warp()
.
For more advanced uses, you can execute any sqlite SQL statement on an input file using
select()
. Because
spatialite functions
are also supported, this is quite powerful. To simplify the SQL statements, there are
some placeholders you can use that will be filled out by geofileops:
city = "Brussels"
sql_stmt = f"""
SELECT ST_OrientedEnvelope({{geometrycolumn}}) AS geom
{{columns_to_select_str}}
FROM "{{input_layer}}" layer
WHERE city_name = '{city}'"
"""
gfo.select(
input_path="...",
output_path="...",
columns=["city_name", "city_code"],
sql_stmt=sql_stmt,
)
Finally, you can apply any python function on the geometry column using apply()
.
import pygeoops
def cleanup(geom, min_area_to_keep):
new_geom = pygeoops.remove_inner_rings(geom, min_area_to_keep=min_area_to_keep)
return new_geom
gfo.apply(
input_path="...",
output_path="...",
func=lambda geom: cleanup(geom, min_area_to_keep=1),
)
Most functions in geofileops have some similar optional parameters. These are the most interesting ones:
columns: if not specified, all standard attribute columns will be retained in the output file. If you don’t need all columns, specify the ones you want to keep. You can retain a copy of the special “fid” column in the output file by specifing “fid” in addition to the standard attribute columns you want to retain.
explodecollections: the output features will be “exploded”, so multipart features will be converted to single parts.
gridsize: the size of the grid the coordinates of the ouput will be rounded to. Eg. 0.001 to keep 3 decimals. If eg. a polygon is narrower than the
gridsize
, it will be removed. Value 0.0, the default, doesn’t change the precision.force: by default, if the
output_path
already exists, geofileops will just log this and return. To overwrite the existingoutput_path
, specifyforce=True
.
Spatial overlays#
The standard spatial overlays are
available: intersection()
, erase()
, clip()
, identity()
,
union()
, …
An example:
gfo.identity(input1_path="...", input2_path="...", output_path="...")
In addition, if you specify input2_path=None
, the result will be the self-overlay of
the 1st input layer. E.g. for intersection
this will result in an output with all
pairwise intersections between the features in this layer. The intersection of features
with itself is omitted.
Spatial joins#
There are several options available to do spatial joins.
The most typical one is join_by_location()
. This allows you to join the features
in two layers with either “named spatial predicates” (e.g. equals, touches,
intersects, …) or with a “spatial mask” as defined by the
DE-9IM model.
Another option is to look for the n nearest features for all features from one layer
compared to all features from the second layer using join_nearest()
.
If you only want to export rows from a layer that have some spatial relationship with
features in another layer you can use export_by_location()
or
export_by_distance()
.
Finally, if you want full control, you can use SQL statements to build your own overlay
and/or join logic. Check out the examples for select_two_layers()
to get some
inspiration.
General file/layer operations#
Finally there are also some general functions
available to manipulate geo files or layers. Eg. copy()
, move()
,
get_layerinfo()
, add_column()
, …
This is an example to get information like the number of features, the columns, … of a layer. If there is only one layer in the file, the layer doesn’t need to be specified:
layerinfo = gfo.get_layerinfo(path="...")
print(f"Layer {layerinfo.name} contains {layerinfo.featurecount} features")