Create WARC files with wget

This #playbook will use wget to generate a Web ARChive (.warc) of a website.

When I’m done with a project I like to create a full mirror of the site, as a WARC file.

wget "=URL=" --mirror --page-requisites --convert-links --warc-file="=NAME="

Reference

--mirror alone is not quite robust enough to produce a complete archive. I’m going to need to add some more parameters. :thinking:

Added a couple more parameters::

‘–page-requisites’

This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

‘–convert-links’

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

Ref: GNU Wget 1.21.1-dirty Manual