On Mon, Apr 7, 2014 at 4:21 PM, L Walsh <[email protected]> wrote: > > > Darshit Shah wrote: >> >> Wget could, in theory, use fallocate() for linux, posix_fallocate() for >> other posix-compliant systems and SetFileInformationByHandle (is this >> available on older versions of Windows?) for Windows systems. It isn't going >> out of the way by a large extent but ensures Wget plays well on each system. >> However, this is going to lead to way too many code paths and ifdef >> statements, and personally speaking, I'd rather we use only >> posix_fallocate() everywhere and the Windows SysCalls for Windows. > > ---- > Hey, that'd be fine with me -- OR if the length is not known, > then allocating 1Meg chunks at a time and truncating at the final > write. If performance was an issue, I'd fork off the truncation > in background -- I do something similar in a file util that can > delete duplicates, the deletions I do with async i/o in the > background so they won't slow down the primary function. > > I don't usually have a problem with fragmentation on linux > as I run xfs and will do some pre-allocation for you (more in recent > kernels with it's "speculative preallocation"), AND for those who > have degenerate use cases or who are anal-retentive (*cough*) their > is a file-system reorganizer that can be run when needed or on a nightly > cronjob... So this isn't really a problem for me -- I was answering > the question because MS took preventative measures to try to slow > down disk fragmentation, as NTFS (and FAT for that matter) > will suffer when it gets bad like many file systems. Most don't protect > themselves to the extremes that xfs does to prevent it. > > But a sane middle ground like using posix pre-alloc calls > and such seem like a reasonable middle ground -- or preallocating > larger spaces when downloading large files.... > > I.e. Probably don't want to allocate a meg for each little > 1k file on a mirror, but if you see the file size is large (size known), > or have downloaded a meg or more, then preallocation w/a truncate > starts to make some sense... > I *think* we might be going far from the original issue. Wget as it is right now on origin/master seems to work perfectly. We could probably improve or optimize it, but that is calls for a separate discussion. The issue at hand is how parallel-wget works. Now, one thing we must remember in this use-case is, we *always* know the file size. If we don't Wget should automatically fall back to non-parallel download of a single file.
Armed with the knowledge that we know the file size, I believe the right way is to allocate the complete block with a .tmp/,swp or similar extension and then rename(2) the complete download. This is important since with downloading a single file in multiple parts, you want to be able to randomly write to different locations of the file. Continuing such downloads would be a problem. The guys from metalink, curl, etc have a better idea baout such download scenarios than we do and could probably suggest some easier alternatives. > I was just speaking up to answer the question you posed, about > why someone might copy to one place then another...it wasn't meant > to create a problem as to give some insight as to why it might be done. > Never tried to insinuate that you were. :) All help and advice is always welcome here as we try to learn and understand about new things. -- Thanking You, Darshit Shah
