How to Download an Entire Website for Offline Reading
Downloading a website for offline usage
Although Wi-Fi is available everywhere these days, you may find yourself without it from time to time. And when you do, there may be certain websites you wish you could save and access while offline perhaps for research, entertainment, or posterity.
1. WebCopy
WebCopy by Cyotek takes a website URL and scans it for links, pages, and media. As it finds pages, it recursively looks for more links, pages, and media until the whole website is discovered. Then you can use the configuration options to decide which parts to download offline.
The interesting thing about WebCopy is you can set up multiple “projects” that each has its own settings and configurations. This makes it easy to re-download many different sites whenever you want, each one in the same exact way every time.
One project can copy many websites, so use them with an organized plan (e.g. a “Tech” project for copying tech sites).
How to Download an Entire Website With WebCopy
- Install and launch the app.
- Navigate to File > New to create a new project.
- Type the URL into the Website field.
- Change the Save folder field to where you want the site saved.
- Play around with Project > Rules.
- Navigate to File > Save As… to save the project.
- Click Copy Website in the toolbar to start the process.
Once the copying is done, you can use the Results tab to see the status of each individual page and/or media file. The Errors tab shows any problems that may have occurred and the Skipped tab shows files that weren’t downloaded.
But most important is the Sitemap, which shows the full directory structure of the website as discovered by WebCopy.
To view the website offline, open File Explorer and navigate to the save folder you designated. Open the index.html (or sometimes index.htm) in your browser of choice to start browsing.
2. HTTrack
HTTrack is more known than WebCopy and is arguably better because it’s open-source and available on platforms other than Windows, but the interface is a bit clunky and leaves much to be desired. However, it works well so don’t let that turn you away.
Like WebCopy, it uses a project-based approach that lets you copy multiple websites and keep them all organized. You can pause and resume downloads, and you can update copied websites by re-downloading old and new files.
How to Download a Website With HTTrack
- Install and launch the app.
- Click Next to begin creating a new project.
- Give the project a name, category, base path, then click Next.
- Select Download web site(s) for Action, then type each website’s URL in the Web Addresses box, one URL per line. You can also store URLs in a TXT file and import it, which is convenient when you want to re-download the same sites later. Click Next.
- Adjust parameters if you want, then click Finish.
Once everything is downloaded, you can browse the site like normal by going to where the files were downloaded and opening the index.html or index.htm in a browser.
3. SiteSucker
If you’re on a Mac, your best option is SiteSucker. This simple tool rips entire websites and maintains the same overall structure, and includes all relevant media files too (e.g. images, PDFs, style sheets).
It has a clean and easy-to-use interface that could not be easier to use: you literally paste in the website URL and press Enter.
One nifty feature is the ability to save the download to a file, then use that file to download the same exact files and structure again in the future (or on another machine). This feature is also what allows SiteSucker to pause and resume downloads.
4. Wget
Wget is a command-line utility that can retrieve all kinds of files over the HTTP and FTP protocols. Since websites are served through HTTP and most web media files are accessible through HTTP or FTP, this makes Wget an excellent tool for ripping websites.
While Wget is typically used to download single files, it can be used to recursively download all pages and files that are found through an initial page:
wget -r -p //www.makeuseof.com
However, some sites may detect and prevent what you’re trying to do because ripping a website can cost them a lot of bandwidth. To get around this, you can disguise yourself as a web browser with a user agent string:
wget -r -p -U Mozilla //www.makeuseof.com
If you want to be polite, you should also limit your download speed (so you don’t hog the web server’s bandwidth) and pause between each download (so you don’t overwhelm the webserver with too many requests):
wget -r -p -U Mozilla --wait=10 --limit-rate=35K //www.makeuseof.com
Wget comes bundled with most Unix-based systems. On Mac, you can install Wget using a single Homebrew command: brew install wget.