Make Offline Mirror of a Site using `wget`

Sometimes you want to create an offline copy of a site that you can take and view even without internet access. Using wget you can make such copy easily:

wget --mirror --convert-links --adjust-extension --page-requisites 
--no-parent http://example.org

Explanation of the various flags:

  • --mirror – Makes (among other things) the download recursive.
  • --convert-links – convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.
  • --adjust-extension – Adds suitable extensions to filenames (html or css) depending on their content-type.
  • --page-requisites – Download things like CSS style-sheets and images required to properly display the page offline.
  • --no-parent – When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.

Alternatively, the command above may be shortened:

wget -mkEpnp http://example.org

Note: that the last p is part of np (--no-parent) and hence you see p twice in the flags.

14 thoughts on “Make Offline Mirror of a Site using `wget`

  1. David Wolski

    wget usually doesn’t work very well for complete offline mirrors of website. Due to its parser there is always somethings missing, i.e. stylesheets, scripts, images. It simply isn’t the right tool for this task.
    HTTrack is much slower than wget but a powerful parser. It’s GPL and available in most Linux-Distributions.
    Documentation and sorce-code is available at http://www.httrack.com

  2. Pingback: Make an Offline Mirror of a Site Using `wget` - John Haynes

  3. Pingback: Linux 如何抓取網頁頁面 並 將相關連結置換 或 完整抓取下來 - Tsung's Blog

  4. Pingback: Mirror | stigmatedbrain's corner

  5. Pingback: Download a complete single page with wget - justnorris

  6. bhl

    I second David Wolski’s comment. HTTrack is an outstanding website mirroring tool. I like it because it performs incremental updates. Nothing like sucking down the Washington Post without adverts.

  7. Pingback: web archiving resources for NDSA NE crew (and anyone else reading this!) | Archive Hour

  8. Pingback: Niente stronzate ©

  9. Pingback: Download a complete single page with wget

Leave a Reply

Your email address will not be published. Required fields are marked *