read_archived_html#

read_archived_html(archive_path: str | CloudPath | Path, regex: str, file_list: list | None = None) → HtmlElement[source]#

Read archived HTML from zip or tar archives.

You can use this site to build your regex.

Parameters:

archive_path (AnyPathStrType) – Archive path
regex (str) – HTML regex (used by re) as it can be found in the getmembers() list
file_list (list) – List of files contained in the archive. Optional, if not given it will be re-computed.

Returns:

HTML file

Return type:

html._Element

Example

>>> arch_path = 'D:/path/to/zip.zip'
>>> file_regex = '.*dir.*file_name'  # Use .* for any character
>>> read_archived_html(arch_path, file_regex)
<Element html at 0x1c90007f8c8>

read_archived_html

Contents

read_archived_html#