User guide#
The main objective of geofileops is to provide a simple to use but powerful API to do fast spatial operations on large vector GIS files.
General#
To speed up processing, geofileops uses multiprocessing under the hood. Because of that,
you should always use the if __name__ == "__main__": block in standalone Python
scripts. More information: FAQ - Standalone scripts.
Also interesting to know: because processing large files can take some time, geofileops logs progress info using the standard logging module.
Combining both, a basic script using geofileops can look like this:
import logging
import geofileops as gfo
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
gfo.buffer(input_path="input.gpkg", output_path="output.gpkg", distance=2)
Finally, for the spatial tools, only geopackage (“.gpkg”, “.gpkg.zip”) and shapefile (“.shp”, “.shp.zip”) files are supported as input and output formats. However, (unzipped) geopackage files (“.gpkg”) are very recommended because they will lead to significantly better performance because no conversions will be needed under the hood. There are also many other reasons to use Geopackage files.
Most general file/layer operations like get_layerinfo(), read_file(),…
can be used on any file format supported by GDAL as well as on paths using
GDAL VSI handlers.
Geometry tools#
The typical geometry tools are directly
supported, eg. buffer(), simplify(), convexhull(),
dissolve(), …
gfo.simplify(
input_path="input.gpkg", output_path="output.gpkg", algorythm="vw", tolerance=1
)
Some more exotic ones are e.g. dissolve_within_distance() and warp().
For more advanced uses, you can execute any sqlite SQL statement on an input file using
select(). Because
spatialite functions
are also supported, this is quite powerful. To simplify the SQL statements, there are
some placeholders you can use that will be filled out by geofileops:
# When evaluating an f-string, Python replaces double curly braces with a single
# curly brace. Hence, e.g. "{{input_layer}}" becomes "{input_layer}" so it is ready
# to be filled out by geofileops.
city = "Brussels"
sql_stmt = f"""
SELECT ST_OrientedEnvelope({{geometrycolumn}}) AS geom
{{columns_to_select_str}}
FROM "{{input_layer}}" layer
WHERE city_name = '{city}'
"""
gfo.select(
input_path="input.gpkg",
output_path="output.gpkg",
columns=["city_name", "city_code"],
sql_stmt=sql_stmt,
)
Finally, you can apply any python function on the geometry column using apply().
import pygeoops
def cleanup(geom, min_area_to_keep):
new_geom = pygeoops.remove_inner_rings(geom, min_area_to_keep=min_area_to_keep)
return new_geom
gfo.apply(
input_path="input.gpkg",
output_path="output.gpkg",
func=lambda geom: cleanup(geom, min_area_to_keep=1),
)
Most functions in geofileops have some similar optional parameters. These are the most interesting ones:
columns: if not specified, all standard attribute columns will be retained in the output file. If you don’t need all columns, specify the ones you want to keep. You can retain a copy of the special “fid” column in the output file by specifing “fid” in addition to the standard attribute columns you want to retain.
explodecollections: the output features will be “exploded”, so multipart features will be converted to single parts.
gridsize: the size of the grid the coordinates of the ouput will be rounded to. Eg. 0.001 to keep 3 decimals. If eg. a polygon is narrower than the
gridsize, it will be removed. Value 0.0, the default, doesn’t change the precision.force: by default, if the
output_pathalready exists, geofileops will just log this and return. To overwrite the existingoutput_path, specifyforce=True.
Spatial overlays#
The standard spatial overlays are
available: intersection(), difference(), clip(), identity(),
union(), …
An example:
gfo.identity(
input1_path="input1.gpkg", input2_path="input2.gpkg", output_path="output.gpkg"
)
In addition, if you specify input2_path=None, the result will be the self-overlay of
the 1st input layer. E.g. for intersection this will result in an output with all
pairwise intersections between the features in this layer. The intersection of features
with itself is omitted.
For all spatial overlays, sliver polygons are removed from the output by default
starting from geofileops 0.11.0.
Polygons are considered slivers if they are narrower than a certain tolerance. By
default tolerance is 0.001 CRS units if the CRS of the input layers is a projected CRS,
1e-7 if it is a geographic CRS. More information + information how to change this
default tolerance can be found here:
options.set_sliver_tolerance.
Spatial joins#
There are several options available to do spatial joins.
The most typical one is join_by_location(). This allows you to join the features
in two layers with either “named spatial predicates” (e.g. equals, touches,
intersects, …) or with a “spatial mask” as defined by the
DE-9IM model.
Another option is to look for the n nearest features for all features from one layer
compared to all features from the second layer using join_nearest().
If you only want to export rows from a layer that have some spatial relationship with
features in another layer you can use export_by_location() or
export_by_distance().
Finally, if you want full control, you can use SQL statements to build your own overlay
and/or join logic. Check out the examples for select_two_layers() to get some
inspiration.
General file/layer operations#
Finally there are also some general functions
available to manipulate geo files or layers. Eg. copy(), move(),
get_layerinfo(), add_column(), …
A common need is to get general information about a layer, like the number of features, the columns,… , without reading the entire file. If there is only one layer in the file, the layer doesn’t need to be specified:
layerinfo = gfo.get_layerinfo("file.gpkg")
print(f"Layer {layerinfo.name} contains {layerinfo.featurecount} features")
You can also add, update, … columns directly to a file layer using
add_column(), update_column(), … The values can be filled out using a
SQL expression. The SQLexpression should use the SQLite dialect and you can also use
spatialite function
(spatialite functions).
A typical example is to add a column with the area of the geometry to the layer.
This uses the ST_Area spatialite function. Note that such an “area” column won’t be
updated automatically if you change the geometry of the features in the layer, e.g. by
applying a buffer. You can use update_column() to update the area if needed after
such operations:
gfo.add_column("file.gpkg", name="area", type="REAL", expression="ST_Area(geom)")
gfo.update_column("file.gpkg", name="area", expression="ST_Area(geom)")
For string/text type columns, note that in SQL it is mandatory to use single quotes around the string value. For example:
gfo.add_column("file.gpkg", type="TEXT", name="text_column", expression="'Hello!'")
Geofileops also has some functions to read and write GeoDataFrames. They are very similar to the geopandas equivalents with some small differences. For example, you can use some placeholders in the SQL statements that will be filled out by geofileops to make it easier to reuse statements:
sql_stmt = """
SELECT {geometrycolumn}
{columns_to_select_str}
,ST_Area({geometrycolumn}) AS area
FROM "{input_layer}" layer
WHERE ST_Area({geometrycolumn}) > 100
"""
gdf = gfo.read_file(
path="file.gpkg", sql_stmt=sql_stmt, columns=["city_name", "city_code"]
)
gfo.write_file(gdf, path="file2.gpkg")
Because you can use any SQL statement, you can also easily run some statistics on a dataset like this:
sql_stmt = """
SELECT city_code
,city_name
,COUNT(*) AS count
,SUM(ST_Area({geometrycolumn})) AS total_area
FROM "{input_layer}" layer
GROUP BY city_code, city_name
"""
stats_df = gfo.read_file(path="file.gpkg", sql_stmt=sql_stmt)
stats_df.to_excel("stats.xlsx", index=False)
Runtime Options#
Geofileops has several runtime options that can be used to tune its behavior. These can
be set using the geofileops.options module.
Some options can help with debugging, like disabling the removal of temporary files, others can help improve performance depending on the use case, like setting the type of workers to use.
Information about all runtime options and how to use them can be found here: Runtime Options.