On Tue, Oct 19, 2004 at 02:49:09AM +0200, Wesley W. Terpstra wrote: > find -name testing.part.\* -print0 | xargs -0 parchive a -n 16384 testing.par
After taking a look in the source code for par, I found this in rs.c: |*| Calculations over a Galois Field, GF(8) What does that mean? It means there are only 2^8 possible values to evaluate the polynomial at. So, in fact, the above command will not even work, should it terminate. However, this is not a problem with RS codes, just par. I was also wrong about them using Berlekamp-Massey. They use Gaussian elimination to compute the inverse. This is complexity O(R^2N) which is even worse (see rs.c:214). R=number of input files/packets N=total number of files that were in the output (so N>=R)[B ie:N>=R=n -> O(n^3) vs. O(n^2) for Berlekamp-Massey or O(nlogn) for mine. OTOH, since they have at most 2^8 possible 'packets' (including the source), that keeps the complexity from killing them. --> 'only' 2^24 operations =) Also on the plus side, this is the complexity per number of packets, not the complexity of the data to be processed. Although I haven't checked, I would speculate from the simplicity of the code that they use the normal matrix product algorithm, which means (at best) O(LR) where L is the total length of the data, R is the number of files. As I mentioned, my code has an alphabet around 2^62. Actually, it's (2^31-1)^2-1 .. but that's almost the same. Handling potentially so many more packets means you need a new algorithm. Still, par is very cool and I will liberally lift usability from it. ... and, of course, use it for unfair comparisons in my paper. =) -- Wesley W. Terpstra