On Mon, Oct 18, 2004 at 11:05:16PM -0400, Glenn Maynard wrote: > > find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 > > testing.par > > You're splitting into parts which are far too small.
Yes, it's too small for par. It's also clear that it's too small for usenet use. However, my program is not intended to solve usenet problems. My applications are all on the level of packet switching networks. Besides, who ever complains when something is faster than they need? Or as my first computer science prof once said: Insertion sort is fine for most tasks. Sometimes you need quicksort. > It's not designed for thousands of tiny parts No, but mine is. > Most PAR operations are IO-bound (judging by drive thrashing, not from > actual benchmarks). Not to be rude, but you're mistaken here. strace it when there are many small files; it is not doing syscalls. Disk IO and/or thrashing is not the issue for small files. Maybe disk thrashing is a problem during normal par operation, but it is a minor problem compared to the computation (for my goals). [ As aside, my algorithm is also streaming; it reads the 'file' in sequence three times, so disk thrashing should not be a problem. ] > I don't really understand the use of allowing thousands of tiny parts. > What's the intended end use? Note that PAR cannot help you if the unit of failure is very small. Even one missing piece of a 'part' makes that 'part' useless. Florian already mentioned multicast, and that is my first application. Another situation is one where you have any one-way network link (some crazy firewalls [my work; arg!!]). Future (?) wireless networks might have base station with a larger range than the clients. Clients could still download (without ACKs) in this case. Perhaps your ISP has packet loss that sometimes sits at 20% (my home; arg!). If you know how TCP works, you will also know that it will nearly stop sending because it thinks the network is congested, even though the real problem is a faulty switch which drops 20% of packets seemingly at random. Using my code over UDP completely removes this problem. (However, this is dangerous because my code is also 'unfair' in the sense that it will stop all TCP traffic b/c it will not care about packet loss due to conjestion while the TCP traffic will back off) You might also use it to make a version of bittorrent where each packet is independent of the others. This would help prevent 'dead' torrents where there is no seed and all downloads stall b/c the known information overlaps. Another case might be mobile agents where PDAs exchange parts of files they are looking for whenever they run into other PDAs they can bargain with (like bittorrent). However, PDAs move when their owners move, so network sessions are interuppted at random times, and one PDA may never see the other ever again. This scheme would let a PDA broadcast a file to all nearby PDAs which could make use of the information regardless of when they leave (mid 'part'?) or whether they have already pieces of the file. Another situation I would like to apply my code to is to sensor networks where there is a stream of measurements of some variable. My code can not presently handle this correctly, but that is future work for me. I am not a very imaginative person; I am sure there are many other situations where this could be applied. From another point of view, research doesn't *need* to be practical. ;) If other people have ideas, I'd like to hear them. -- Wesley W. Terpstra