Create WARC files with wget

maiki · July 23, 2022, 10:33pm

This playbook will use wget to generate a Web ARChive (.warc) of a website.

When I’m done with a project I like to create a full mirror of the site, as a WARC file.

wget "=URL=" --mirror --page-requisites --convert-links --warc-file="=NAME="

Reference

Wget with WARC output - Archiveteam

maiki · July 28, 2022, 3:16am

--mirror alone is not quite robust enough to produce a complete archive. I’m going to need to add some more parameters.

maiki · July 31, 2022, 1:06am

Added a couple more parameters::

‘–page-requisites’

This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

‘–convert-links’

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

Ref: GNU Wget 1.21.1-dirty Manual

maiki · March 30, 2025, 9:02pm

I have not used this lately; instead I just send everything to a personal archive and also forward it to the Wayback Machine at Internet Archive.

To expand on this note: is this a recommendation for folks seeking to archive a website? How can someone backup information found online?

Topic		Replies	Views
notes backup log computing backups	8	74	February 10, 2026
Archiving game data and artifacts computing	1	5	April 17, 2026
Organizations to learn about processing buckets	4	1	April 14, 2026
Wikidata tool for Discourse AI computing discourse , ai	7	17	January 19, 2026
Building a catalog of items from Breath of the Wild computing gohugo , breath-of-the-wild	8	124	January 25, 2026

Create WARC files with wget

Reference

Related topics