sertit.files.read_archived_html¶

read_archived_html(archive_path: Union[str, cloudpathlib.cloudpath.CloudPath, pathlib.Path], regex: str) → lxml.html.HtmlElement[source]¶

Read archived HTML from zip or tar archives.

You can use this [site](https://regexr.com/) to build your regex.

>>> arch_path = 'D:\path\to\zip.zip'
>>> file_regex = '.*dir.*file_name'  # Use .* for any character
>>> read_archived_html(arch_path, file_regex)
<Element html at 0x1c90007f8c8>

Parameters

archive_path (Union[str, CloudPath, Path]) – Archive path
regex (str) – HTML regex (used by re) as it can be found in the getmembers() list

Returns

HTML file

Return type

html._Element