hi Carsten,

Ah you googled 2 seconds and found some oldie homepage.

Try this homepage www.maximumcompression.com

Far better testing over there. Note that it's the same testset there that gets compressed a lot. In real life, database type data is having all kind of patterns which PPM type compressors find.

My experience is that at terabyte level the better compressors at maximumcompression.com,
are a bit too slow (PAQ) and not so good like simple things like 7-zip.

Look especially at compressed sizes and decompression times.

The only thing you want to limit over your network is the amount of bandwidth over your network. A real good compression is very helpful then. How long compression time takes is nearly not relevant, as long as it doesn't take infinite amounts of time (i remember a new zealand compressor which took 24 hours to compress a 100MB data). Note that we are already at a phase that compression time hardly
matters, you can buy a GPU for that to offload your servers for that.

Query time (so decompression time) is important though.

If we look to graphics there:


026     7-Zip 4.60b     -m0=ppmd:o=4              764420        81.58    1.4738
..
94 BZIP2 1.0.5 -9 890163 78.55 1.7162
..
158 PKZIP 2.50 -exx 1250536 69.86 2.4110 159 HIT 2.10 -x 1250601 69.86 2.4111
160     GZIP 1.3.5      -9                                      1254351 69.77   
2.4184
161 ZIP 2.2 -9 1254444 69.77 2.4185
162     WINZIP 8.0      (Max Compression)       1254444 69.77   2.4185

Note a real supercompressor is getting it even tinier:
003     WinRK 3.0.3     PWCM 912MB                568919        86.29    1.0969

Again all these tests are at microlevel. Just a few megabtes of data that gets compressed. You don't build a big infrastructure just for a few megabytes, it's not so relevant.

The traffic over your network dominates there, plenty of idle server cores there is, in fact there is so many companies now that buy dual cores, as they do not know how to keep the cores in quad cores
busy.

This is all microlevel. Things really change when you have terabytes to compress and HUGE files. Bzip2 is ugly slow for files in gigabyte size, 7-zip is totally beating it there.

Vincent


On Oct 3, 2008, at 11:27 AM, Carsten Aulbert wrote:

Hi all

Bill Broadley wrote:

Another example:
http://bbs.archlinux.org/viewtopic.php?t=11670

7zip compress: 19:41
Bzip2 compress:  8:56
Gzip compress:  3:00

Again 7zip is a factor of 6 and change slower than gzip.

Have you looked into threaded/parallel bzip2?

freshmeat has a few of those, e.g.

http://freshmeat.net/projects/bzip2smp/
http://freshmeat.net/projects/lbzip2/

(with the usual disclaimer that I haven't tested them myself).

HTH

carsten
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to