On 02/26/2018 06:50 AM, to...@tuxteam.de wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Mon, Feb 26, 2018 at 06:40:02AM -0600, Richard Owlett wrote:
I'm attempting to download a site which is an instruction manual.
Its URL is of the form
http://example.com/index.html
That page has several lines whose target URLs are of form
http://example.com/page1.html
http://example.com/page2.html
http://example.com/page3.html
etc.
I wish a single HTML file consisting of all the pages of the site.
Where <http://example.com/index.html> points to
<http://example.com/pageN.html> I wish my local file to have
appropriate internal references.
There are references of form
http://some_where_else.com/pagex.html
which I do not wish to download.
I tried
wget -l 2 -O owl.html ‐‐no-parent http://example.com/index.html
It *almost* worked as intended.
I did get all the text of the site.
HOWEVER:
1. I also got the text of <http://some_where_else.com/pagex.html>
2. Where <http://example.com/index.html> referenced
<http://example.com/pageN.html> there were still references to
the original site rather than a relative link within owl.html .
Ad (1): this is strange. By default wget doesn't "span" hosts,
i.e. doesn't follow links to other hosts unless you specify
that with -H (--span-hosts).
Ad (2) you want option -k. Quoth the man page:
-k
--convert-links
After the download is complete, convert the links in
the document to make them suitable for local viewing...
However, that usage causes wget to complain saying:
Cannot specify both -k or --convert-file-only and -O if multiple URLs are
given, or in combination
with -p or -r. See the manual for details.
<frown;>