geofileops.join_by_location#

geofileops.join_by_location(input1_path: str | os.PathLike[Any], input2_path: str | os.PathLike[Any], output_path: str | os.PathLike[Any], spatial_relations_query: str = 'intersects is True', discard_nonmatching: bool = True, min_area_intersect: float | None = None, area_inters_column_name: str | None = None, input1_layer: str | None = None, input1_columns: list[str] | None = None, input1_columns_prefix: str = 'l1_', input2_layer: str | None = None, input2_columns: list[str] | None = None, input2_columns_prefix: str = 'l2_', output_layer: str | None = None, gridsize: float = 0.0, where_post: str | None = None, nb_parallel: int | None = None, batchsize: int = -1, force: bool = False) → None#

Join two layers based on the spatial relationship between the geometries.

The output will contain the geometries of input1. The spatial_relations_query and min_area_intersect parameters will determine which geometries of input1 will be matched with input2.

The spatial_relations_query has a specific format. Most cases can be covered using the following “named spatial predicates”: contains, coveredby, covers, crosses, equals, intersects, overlaps, touches and within. If you want even more control, you can also use “spatial masks” as defined by the DE-9IM model. It is important to note that the query is used as the matching criterium for the join. Hence, it should not evaluate to True for disjoint features, as this would lead to a cartesian product of both layers. If it does, a warning will be triggered and “intersects is True” is added to the query.

Some examples of valid spatial_relations_query values:

“intersects is True and touches is False”

“within is True or contains is True”

“(T*T***T** is True or 1*T***T** is True) and T*****FF* is False”

Alternative names:

GeoPandas: sjoin
ArcGIS: spatial join
QGIS: join attributes by location

Parameters:

input1_path (PathLike) – the 1st input file
input2_path (PathLike) – the 2nd input file
output_path (PathLike) – the file to write the result to
spatial_relations_query (str, optional) – a query that specifies the spatial relations to match between the 2 layers. Defaults to “intersects is True”.
discard_nonmatching (bool, optional) – True to only keep rows that match with the spatial_relations_query. False to keep rows all rows in the input1_layer (=left outer join). Defaults to True (=inner join).
min_area_intersect (float, optional) – minimum area of the intersection to match. Defaults to None.
area_inters_column_name (str, optional) – column name of the intersect area. If None no area column is added. Defaults to None.
input1_layer (str, optional) – 1st input layer name. If None, input1_path should contain only one layer. Defaults to None.
input1_columns (List[str], optional) – list of columns to retain. If None, all standard columns are retained. In addition to standard columns, it is also possible to specify “fid”, a unique index available in all input files. Note that the “fid” will be aliased even if input1_columns_prefix is “”, eg. to “fid_1”. Defaults to None.
input1_columns_prefix (str, optional) – prefix to use in the column aliases. Defaults to “l1_”.
input2_layer (str, optional) – 2nd input layer name. If None, input2_path should contain only one layer. Defaults to None.
input2_columns (List[str], optional) – columns to select. If None is specified, all columns are selected. As explained for input1_columns, it is also possible to specify “fid”. Defaults to None.
input2_columns_prefix (str, optional) – prefix to use in the column aliases. Defaults to “l2_”.
output_layer (str, optional) – output layer name. If None, the output_path stem is used. Defaults to None.
gridsize (float, optional) – the size of the grid the coordinates of the ouput will be rounded to. Eg. 0.001 to keep 3 decimals. Value 0.0 doesn’t change the precision. Defaults to 0.0.
where_post (str, optional) – SQL filter to apply after all other processing, including e.g. explodecollections. It should be in sqlite syntax and spatialite reference functions can be used. Defaults to None.
nb_parallel (int | None, optional) – the number of parallel workers to use. If None, the preference set in the nb_parallel configuration option is used, which defaults to the number of CPU cores available. For more information, see options.set_nb_parallel(). Defaults to None.
batchsize (int, optional) – indicative number of rows to process per batch. A smaller batch size, possibly in combination with a smaller nb_parallel, will reduce the memory usage. Defaults to -1: (try to) determine optimal size automatically.
force (bool, optional) – overwrite existing output file(s). Defaults to False.