hi Carsten,
Ah you googled 2 seconds and found some oldie homepage.
Try this homepage www.maximumcompression.com
Far better testing over there. Note that it's the same testset there
that gets compressed a lot.
In real life, database type data is having all kind of patterns which
PPM type compressors find.
My experience is that at terabyte level the better compressors at
maximumcompression.com,
are a bit too slow (PAQ) and not so good like simple things like 7-zip.
Look especially at compressed sizes and decompression times.
The only thing you want to limit over your network is the amount of
bandwidth over your network.
A real good compression is very helpful then. How long compression
time takes is nearly not relevant,
as long as it doesn't take infinite amounts of time (i remember a new
zealand compressor which took 24
hours to compress a 100MB data). Note that we are already at a phase
that compression time hardly
matters, you can buy a GPU for that to offload your servers for that.
Query time (so decompression time) is important though.
If we look to graphics there:
026 7-Zip 4.60b -m0=ppmd:o=4 764420 81.58 1.4738
..
94 BZIP2 1.0.5 -9 890163 78.55
1.7162
..
158 PKZIP 2.50 -exx 1250536 69.86
2.4110
159 HIT 2.10 -x 1250601 69.86
2.4111
160 GZIP 1.3.5 -9 1254351 69.77
2.4184
161 ZIP 2.2 -9 1254444 69.77
2.4185
162 WINZIP 8.0 (Max Compression) 1254444 69.77 2.4185
Note a real supercompressor is getting it even tinier:
003 WinRK 3.0.3 PWCM 912MB 568919 86.29 1.0969
Again all these tests are at microlevel. Just a few megabtes of data
that gets compressed.
You don't build a big infrastructure just for a few megabytes, it's
not so relevant.
The traffic over your network dominates there, plenty of idle server
cores there is, in fact there is
so many companies now that buy dual cores, as they do not know how to
keep the cores in quad cores
busy.
This is all microlevel. Things really change when you have terabytes
to compress and HUGE files.
Bzip2 is ugly slow for files in gigabyte size, 7-zip is totally
beating it there.
Vincent
On Oct 3, 2008, at 11:27 AM, Carsten Aulbert wrote:
Hi all
Bill Broadley wrote:
Another example:
http://bbs.archlinux.org/viewtopic.php?t=11670
7zip compress: 19:41
Bzip2 compress: 8:56
Gzip compress: 3:00
Again 7zip is a factor of 6 and change slower than gzip.
Have you looked into threaded/parallel bzip2?
freshmeat has a few of those, e.g.
http://freshmeat.net/projects/bzip2smp/
http://freshmeat.net/projects/lbzip2/
(with the usual disclaimer that I haven't tested them myself).
HTH
carsten
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf