Subject: Website archiving with wget on Windows
Author:
Posted on: 2019-02-01 20:47:00 UTC

This is a copy-paste of a message I sent to Nesh that should be useful for anyone trying to take copies of websites:

The magic incantation for archiving stuff

wget -x -p -m -np -k -E http://example.com/optional-subdirectory/

To use this:

1) Download the latest wget.exe from https://eternallybored.org/misc/wget/ put it in a new folder where you'll want to keep your stuff
2) Then, Shift+right click on the folder you put wget.exe into and select "Open Command Prompt here"
3) Type in the magic incantation, hit Enter. The bits with dashes can go in any order, so long as the URL is last and the word "wget" is first.
4) There'll be a folder named example.com or what have you in the same place as wget.exe, and all the files there will have been rewritten to not have absolute references to other parts of your site (if you don't want that behavior, remove the -k , if you also want the originals, add -K)

(For a short explanation:
- wget is a program that's designed to download things
- Most of this is various options to control how wget will download things
+ -x says to always create a directory for the thing you downloaded
+ -p means to also download all the CSS, images, and so on that a
downloaded page needs
+ -m is shorthand for all the thing you need to ask for to get mirroring
+ -np prevents the download from following links that don't point into the site being downloaded (well, above the directory the thing you started from lives in, but that's minor)
+ -k enables the "make this download self-sufficient" rewrites mentioned above)
+ -E adds a .html to web pages that were missing one)

- Tomash

Reply Return to messages