sertit.files.read_archived_html¶
- read_archived_html(archive_path: Union[str, cloudpathlib.cloudpath.CloudPath, pathlib.Path], regex: str) lxml.html.HtmlElement [source]¶
Read archived HTML from zip or tar archives.
You can use this [site](https://regexr.com/) to build your regex.
>>> arch_path = 'D:\path\to\zip.zip' >>> file_regex = '.*dir.*file_name' # Use .* for any character >>> read_archived_html(arch_path, file_regex) <Element html at 0x1c90007f8c8>