[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
2.11 Recursive Accept/Reject Options
- ‘-A acclist --accept acclist’
- ‘-R rejlist --reject rejlist’
Specify comma-separated lists of file name suffixes or patterns to accept or reject (see section Types of Files). Note that if any of the wildcard characters, ‘*’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A "*.mp3"’ or ‘-A '*.mp3'’.
- ‘--accept-regex urlregex’
- ‘--reject-regex urlregex’
Specify a regular expression to accept or reject the complete URL.
- ‘--regex-type regextype’
Specify the regular expression type. Possible types are ‘posix’ or ‘pcre’. Note that to be able to use ‘pcre’ type, wget has to be compiled with libpcre support.
- ‘-D domain-list’
- ‘--domains=domain-list’
Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on ‘-H’.
- ‘--exclude-domains domain-list’
Specify the domains that are not to be followed (see section Spanning Hosts).
- ‘--follow-ftp’
Follow FTP links from HTML documents. Without this option, Wget will ignore all the FTP links.
- ‘--follow-tags=list’
Wget has an internal table of HTML tag / attribute pairs that it considers when looking for linked documents during a recursive retrieval. If a user wants only a subset of those tags to be considered, however, he or she should be specify such tags in a comma-separated list with this option.
- ‘--ignore-tags=list’
This is the opposite of the ‘--follow-tags’ option. To skip certain HTML tags when recursively looking for documents to download, specify them in a comma-separated list.
In the past, this option was the best bet for downloading a single page and its requisites, using a command-line like:
wget --ignore-tags=a,area -H -k -K -r http://site/document
However, the author of this option came across a page with tags like
<LINK REL="home" HREF="/">
and came to the realization that specifying tags to ignore was not enough. One can’t just tell Wget to ignore<LINK>
, because then stylesheets will not be downloaded. Now the best bet for downloading a single page and its requisites is the dedicated ‘--page-requisites’ option.- ‘--ignore-case’
Ignore case when matching files and directories. This influences the behavior of -R, -A, -I, and -X options, as well as globbing implemented when downloading from FTP sites. For example, with this option, ‘-A "*.txt"’ will match ‘file1.txt’, but also ‘file2.TXT’, ‘file3.TxT’, and so on. The quotes in the example are to prevent the shell from expanding the pattern.
- ‘-H’
- ‘--span-hosts’
Enable spanning across hosts when doing recursive retrieving (see section Spanning Hosts).
- ‘-L’
- ‘--relative’
Follow relative links only. Useful for retrieving a specific home page without any distractions, not even those from the same hosts (see section Relative Links).
- ‘-I list’
- ‘--include-directories=list’
Specify a comma-separated list of directories you wish to follow when downloading (see section Directory-Based Limits). Elements of list may contain wildcards.
- ‘-X list’
- ‘--exclude-directories=list’
Specify a comma-separated list of directories you wish to exclude from download (see section Directory-Based Limits). Elements of list may contain wildcards.
- ‘-np’
- ‘--no-parent’
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See section Directory-Based Limits, for more details.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on February 2, 2014 using texi2html 5.0.