Hi all, While using wget to mirror and warc a website I noticed a bunch of errors like this in my log:
pathconf: Not a directory I narrowed it down to a particular request for a set of resources, e.g.: http://politwoops.sunlightfoundation.com/tweet/599258910776754176 which have page requisites like: http://politwoops.sunlightfoundation.com/tweet/599258910776754176/thumb/599258910776754176-0.jpg It seems that the fetch for the HTML creates a file at: politwoops.sunlightfoundation.com/tweet/599258910776754176 But then the fetch for the image is failing because that path is already file and not a directory, so there is nowhere to save the jpg? Here’s a command you can use to see for replicating the error: wget --page-requisites http://politwoops.sunlightfoundation.com/tweet/599258910776754176 At first I didn’t mind because I actually can make do with just the WARC file. But it seems that the representation is not written to the WARC after encountering the write error. Is this a known bug, or perhaps I’ve overlooked a wget option that will help here. Thanks for any assistance you can provide. And since this is my first time writing to bug-wget, thanks for an incredibly useful tool! //Ed
signature.asc
Description: Message signed with OpenPGP using GPGMail
