On Thursday, October 13, 2016 6:27:56 PM CEST Dale R. Worley wrote: > If --page-requisites is specified along with --no-parent, then requisite > files will be downloaded even if their URLs would normally be suppressed > by --no-parent. This is implemented by a test in section 4 of > download_child in recur.c, and a flag in struct urlpos, link_inline_p, > which says that the *context* of that URL is as a page requisite. > > This suggests that the exceptional processing we want to implement for > redirections might be more systematically implemented by using the above > processing as a model, and not by testing the value returned by > download_child. This involves adding a flag link_redirect_p to struct > urlpos; this flag functions as an alternative to the additional argument > to download_child that I previously suggested. > > In addition, this approach avoids the problem of ensuring that > download_child returns the correct value if a URL fails more than one > test, e.g., --accept-regex and robots, because any tests that are to be > ignored in the context are not executed and do not affect the return > value. > > It also suggests that we may want to define that --no-parent does not > apply to redirections, in the same way that it does not apply to page > requisites when --page-requisite is set. > > I've also updated the TEXI file to describe the functional changes, and > also the previously-undocumented behavior of --page-requisites > overriding --no-parent. The changes are in the attached diff. > > However, looking at the documentation for --no-parent: > > -np > --no-parent > Do not ever ascend to the parent directory when retrieving > recursively. This is a useful option, since it guarantees that > only the files below a certain hierarchy will be downloaded. > > Note that the effect of --no-parent is suppressed for fetching > redirected URLs and for fetching page requisite URLs if > --page-requisites is specified. > > Perhaps we do not want to have --no-parent suppressed by > --page-requisites. It seems that --no-parent is intended as a security > measure, and the existing code (as well as this proposal) violate its > fundamental premise.
--no-parent seems to be intended as a bandwidth limiter together with -r. When talking about security, what realistic scenario do you have in mind ? Anyways, we definitely don't want to change the default behavior. If someone *really* needs a different precedence and has good arguments and finds someone to implement it (inclusive tests), we'll add such a feature. Regarding redirections, we have --max-redirect and could use --max-redirect=0 to disallow redirections. *But* we have at least two different qualities of redirections: 1. staying on the same host/domain, 2. host spanning. If neither -H/--span-hosts is given nor -D/--domains matches, we should not span hosts for redirections. > > Dale
signature.asc
Description: This is a digitally signed message part.
