Re: [Beowulf] Accelerator for data compressing

Vincent Diepeveen Fri, 03 Oct 2008 09:02:14 -0700

The question is Joe,

Why are you storing it uncompressed?


Vincent

On Oct 3, 2008, at 5:45 PM, Joe Landman wrote:

Carsten Aulbert wrote:
If 7-zip can only compress data at a rate of less than say 5 MB/s(inputdata) I can much much faster copy the data over uncompressedregardlessof how many unused cores I have in the system. Exactly for thesecases Iwould like to use all cores available to compress the data fast inorder
to increase the throughput.
This is fundamentally the issue. If the compression time plus thetranmit time for the compressed data is greater than the transmittime for the uncompressed data, then the compression may not beworth it. Sure, if it is nothing but text files, you may get 60-80+% compression rates. But for the case of (non-pathological) binarydata, it might be only a few percent. So in this case, even ifyou could get a few percent delta from the compression, is thatworth all the extra time you spend to get it?
At the end of the day the question is how much lossless compressioncan you do in a short enough time for it to be meaningful in termsof transmitting the data?
Do I miss something vital?
Nope.  You got it nailed.
Several months ago, I tried moving about 600 GB of data from an oldserver to a JackRabbit. The old server and the JackRabbit had agigabit link between them. We regularly saw 45 MB scp rates (oneof the chips in the older server was a Broadcom).
I tried this with and without compression. With compression(simple gzip), the copy took something like 28 hours ( a littlemore than a day). Without compression, it was well under 10 hours.
In this case, compression (gzip) was not worth it. The command Iused for the test was
uncompressed:

        cd /directory
        tar -cpf - ./ | ssh jackrabbit "cd /directory ; tar -xpvf - "

compressed:

        cd /directory
        tar -czpf - ./ | ssh jackrabbit "cd /directory ; tar -xzpvf - "
if you want to spend more time, use "j" rather than "z" in theoptions.
YMMV, but I have been convinced that, apart from specific use caseswith text only documents or documents known to compress quickly/well, that compression prior to transfer may waste more time thanit saves.
This said, if someone has a parallel hack of gzip or similar we canpipe through, by all means, I would be happy to try it. But itwould have to be pretty darned efficient.
100MB/s means 1 byte transmitted,on average, in 10 nanoseconds.Which means for compression to be meaningful, you would need tocompute for less time than that to increase the informationdensity. Put another way, 1 MB takes about 10 ms to send over agigabit link. For compression to be meaningful, you need tocompress this 1MB in far less than 10 ms and still transmit it inthat time. Assuming that any compression algorithm has to walkthrough data at least once, A 1 GB/s memory subsystem takes about1 ms to walk through this data at least once, so you need as fewpasses as possible through the data set to construct the compressedrepresentation, as you will still have on the order of 1E+5 bytesto send.
I am not saying it is hopeless, just hard for complex compressionschemes (bzip2, etc). When we get enough firepower in the CPU (ormaybe GPU ... hmmmm) the situation may improve.
GPU as a compression engine?  Interesting ...

Joe
Cheers
Carsten
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: [EMAIL PROTECTED]
web  : http://www.scalableinformatics.com
       http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615


_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Accelerator for data compressing

Reply via email to