Search This Blog

Tuesday, December 28, 2010

Wget: A utility to download files from the Web

GNU Wget is a free utility for non-interactive download of files from the Web. It supports http, https, and ftp protocols, as well as retrieval through http proxies. Wget is non-interactive, meaning that it can work in the background, while the user is not logged on. This allows you to start a retrieval and disconnect from the system, letting Wget finish the work.

1. Download Single File with wget

$ wget http://www.filehippo.com/download/xyz.tar.bz
2. Store download file With a Different File name

$wget http://www.filehippo.com/download/scripts/index.php?option=
com_content&&Itemid=53

will download the file as 'index.php?option=com_content&&Itemid=53'

but if we use -o option as follows

$ wget -o myscript.php http://www.filehippo.com/download/scripts/
index.php?option=com_content
&&Itemid=53

file will be downloded as 'myscript.php'
3. Continue the Incomplete Download

IF we initiated a very big file download which got interrupted in the middle. Instead of starting the whole download again, we can start the download from where it got interrupted using option -c

$ wget -c 'www.chandomama.com'

If a download is stopped in middle and you restart the download again without the option -c, wget will append .1 to the filename automatically as a file with the previous name already exist. If a file with .1 already exist, it will download the file with .2 at the end.

4. Download in background

For a huge download, put the download in background using wget option -b as shown below.
$wget -b http://www.openss7.org/repos/tarballs/strx25-0.9.2.1.tar.bz2
Continuing in background, pid 1984.
Output will be written to `wget-log'.

It will initiate the download and gives back the shell prompt . We can always check the status of the download using tail -f as shown below.

$ tail -f wget-log
Saving to: `strx25-0.9.2.1.tar.bz2.4'

0K .......... .......... .......... .......... ..... 1% 65.5K 57s
50K .......... .......... .......... .......... .... 2% 85.9K 49s
100K .......... .......... .......... .......... ... 3% 83.3K 47s
150K .......... .......... .......... .......... ... 5% 86.6K 45s

5. Mirror of website

To take a backup of blog or create a local copy of an entire directory of a web site for archiving or reading later. The command:

wget -m http://ginatrapani.googlepages.com

Will save the two pages that exist on the ginatrapani.googlepages.com site in a folder named just that on your computer. The -m in the command stands for "mirror this site."


6. Download recursively

To retrieve all the pages in a site use -r option. It downloads the entire website recursively

wget -r 'www.chandomama.com'

If you want to retrieve all the pages in a site PLUS the pages that site links to. You'd go with:

wget -H -r --level=1 -k -p http://ginatrapani.googlepages.com

This command says, "Download all the pages (-r, recursive) on http://ginatrapani.googlepages.com plus one level (—level=1) into any other sites it links to (-H, span hosts), and convert the links in the downloaded version to point to the other sites' downloaded version (-k). Oh yeah, and get all the components like images that make up each page (-p)."

Warning: Beware, those with small hard drives! This type of command will download a LOT of data from sites that link out a lot (like blogs)! Don't try to backup the Internet, because you'll run out of disk space!

7. Download Only Certain File Types Using wget -r -A

You can use this under following situations:

  • Download all images from a website
  • Download all videos from a website
  • Download all PDF files from a website
$ wget -r -A.pdf http://url-to-webpage-with-pdfs/


Source: http://www.thegeekstuff.com/2009/09/the-ultimate-wget-download-guide-with-15-awesome-examples/#more-1885
http://lifehacker.com/161202/geek-to-live--mastering-wget

No comments:

Post a Comment