Just in case this particular problem comes again…

It turns out that adding adding --adjust-extension and --convert-links forced 
the desired behavior of downloading

    http://politwoops.sunlightfoundation.com/tweet/599258910776754176

as

    politwoops.sunlightfoundation.com/tweet/599258910776754176.html

which then allowed the subsequent fetch for
    
http://politwoops.sunlightfoundation.com/tweet/599258910776754176/thumb/599258910776754176-0.jpg

to succeed since it was able to create a directory at:

    politwoops.sunlightfoundation.com/tweet/599258910776754176

for the jpg. The nice thing is that the WARC reflects the actual URLs 
requested, not the rewritten ones, which is exactly the behavior I wanted.

Thanks for the help,

//Ed

> On Jun 9, 2015, at 12:12 AM, Ed Summers <[email protected]> wrote:
> 
> Hi Ander,
> 
>> On Jun 8, 2015, at 2:48 AM, Ander Juaristi <[email protected]> wrote:
>> 
>> You can work around the issue with -nd, which will download all the files in 
>> the same directory (will not recreate the directory structure). I've tested 
>> it and it correctly downloads the missing image.
> 
> Thanks so much for this. -nd did in fact result in the representation being 
> fetched and stored in the WARC. I really appreciate your help!
> 
> //Ed

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to