sertit.files.get_archived_rio_path

get_archived_rio_path(archive_path: Union[str, cloudpathlib.cloudpath.CloudPath, pathlib.Path], file_regex: str, as_list: bool = False) Union[list, cloudpathlib.cloudpath.CloudPath, pathlib.Path][source]

Get archived file path from inside the archive, to be read with rasterio:

  • zip+file://{zip_path}!{file_name}

  • tar+file://{tar_path}!{file_name}

See [here](https://rasterio.readthedocs.io/en/latest/topics/datasets.html?highlight=zip#dataset-identifiers) for more information.

Warning

It wont be readable by pandas, geopandas or xmltree !

Warning

If as_list is False, it will only return the first file matched !

You can use this [site](https://regexr.com/) to build your regex.

>>> arch_path = 'D:\path\to\zip.zip'
>>> file_regex = '.*dir.*file_name'  # Use .* for any character
>>> path = get_archived_tif_path(arch_path, file_regex)
'zip+file://D:\path\to\output\zip!dir/filename.tif'
>>> rasterio.open(path)
<open DatasetReader name='zip+file://D:\path\to\output\zip!dir/filename.tif' mode='r'>
Parameters
  • archive_path (Union[str, CloudPath, Path]) – Archive path

  • file_regex (str) – File regex (used by re) as it can be found in the getmembers() list

  • as_list (bool) – If true, returns a list (including all found files). If false, returns only the first match

Returns

Band path that can be read by rasterio

Return type

Union[list, str]