geofileops.read_file#

geofileops.read_file(path: str | PathLike[Any], layer: str | None = None, columns: Iterable[str] | None = None, bbox: tuple[float, float, float, float] | None = None, rows: slice | None = None, where: str | None = None, sql_stmt: str | None = None, sql_dialect: Literal['SQLITE', 'OGRSQL'] | None = None, ignore_geometry: bool = False, fid_as_index: bool = False, **kwargs: object) GeoDataFrame#

Reads a file to a geopandas GeoDataframe.

The file format is detected based on the filepath extension.

If sql_stmt is specified, the sqlite query can contain following placeholders that will be automatically replaced for you:

  • {geometrycolumn}: the column where the primary geometry is stored.

  • {columns_to_select_str}: if columns is not None, those columns, otherwise all columns of the layer.

  • {input_layer}: the layer name of the input layer.

Example SQL statement with placeholders:

SELECT {geometrycolumn}
      {columns_to_select_str}
  FROM "{input_layer}" layer

The underlying library used to read the file can be choosen using the “GFO_IO_ENGINE” environment variable. Possible values are “pyogrio”, “pyogrio-arrow” and “fiona”. In the future support for the “fiona” engine most likely will be removed. Default engine is “pyogrio-arrow”. You can overrule whether arrow is used by passing e.g. use_arrow=False as a parameter.

When a file with CURVE geometries is read, they are transformed on the fly to LINEAR geometries, as shapely/geopandas doesn’t support CURVE geometries.

geofileops.read_file is very similar to geopandas.read_file/pyogrio.read_dataframe, but has some additional or changed behaviour. Notable differences in addition to the ones mentioned above are:

  • The default value of mixed_offsets_as_utc is False instead of True, to avoid losing time zone information when reading datetime columns with mixed offsets.

  • The columns parameter is case-insensitive, and columns are returned in the order and casing used in the columns parameter.

Parameters:
  • path (file path) – path to the file to read from. GDAL vsi paths are also supported.

  • layer (str, optional) – The layer to read. If None and there is only one layer in the file it is read, otherwise an error is thrown. Defaults to None.

  • columns (Iterable[str], optional) – The (non-geometry) columns to read will be returned in the order specified. If None, all standard columns are read. In addition to standard columns, it is also possible to specify “fid”, a unique index available in all input files. Note that the “fid” will be aliased eg. to “fid_1”. Defaults to None.

  • bbox (Tuple, optional) – return only geometries intersecting this bbox. Defaults to None, then all rows are read.

  • rows (slice, optional) – return only the rows specified. For many file formats (e.g. Geopackage) this is slow, so using e.g. a where filter instead is recommended. Defaults to None, then all rows are returned.

  • where (str, optional) – where clause to filter features in layer by attribute values. If the datasource natively supports sql, its specific SQL dialect should be used (eg. SQLite and GeoPackage: “SQLITE”, PostgreSQL). If it doesn’t, the OGRSQL WHERE syntax should be used. Note that it is not possible to overrule the SQL dialect, this is only possible when you use the SQL parameter. Examples: "ISO_A3 = 'CAN'", "POP_EST > 10000000 AND POP_EST < 100000000". Defaults to None.

  • sql_stmt (str) – SQL statement to use. Only supported with “pyogrio” engine.

  • sql_dialect (str, optional) – SQL dialect used. Options are None, “SQLITE” or “OGRSQL”. If None, for data sources with explicit SQL support the statement is processed by the default SQL engine (e.g. for Geopackage and Spatialite this is “SQLITE”). For data sources without native SQL support (e.g. .shp), the “OGRSQL” dialect is the default. If the “SQLITE” dialect is specified, spatialite reference functions can also be used. Defaults to None.

  • ignore_geometry (bool, optional) – True not to read/return the geometry. Defaults to False.

  • fid_as_index (bool, optional) – If True, will use the FIDs of the features that were read as the index of the GeoDataFrame. May start at 0 or 1 depending on the driver. Defaults to False.

  • **kwargs – All additional parameters will be passed on to the io-engine used (“pyogrio” or “fiona”).

Raises:

ValueError – an invalid parameter value was passed.

Returns:

the data read.

Return type:

gpd.GeoDataFrame