A Tuesday 20 November 2007, Geoffrey Zhu escrigué: > Hi Everyone, > > This is off topic for this mailing list but I don't know where else > to ask. > > I have N tabulated data points { (x_i, y_i, z_i) } that describes a > 3D surface. The surface is pretty "smooth." However, the number of > data points is too large to be stored and manipulated efficiently. To > make it easier to deal with, I am looking for an easy method to > compress and approximate the data. Maybe the approximation can be > described by far fewer number of coefficients. > > If you can give me some hints about possible numpy or non-numpy > solutions or let me know where is better to ask this kind of > question, I would really appreciate it.
First, a good and easy try would be to use PyTables. It does support on-the-flight compression, that is, allows you to access compressed dataset slices without decompressing the complete dataset. This, in combination with a handy 'shuffle' filter (also included), allows for pretty good compression ratios on numerical data. See [1] [2] for a discussion on how to use and what you can expect from a compressor/shuffle process on PyTables. Also, if you can afford lossy compression, you may want to try truncation (quantization) before compressing as it does benefit the compression rate quite a lot. Feel free to experiment with the next function (Jeffrey Whittaker was the original author): def _quantize(data,least_significant_digit): """quantize data to improve compression. data is quantized using around(scale*data)/scale, where scale is 2**bits, and bits is determined from the least_significant_digit. For example, if least_significant_digit=1, bits will be 4.""" precision = 10.**-least_significant_digit exp = math.log(precision,10) if exp < 0: exp = int(math.floor(exp)) else: exp = int(math.ceil(exp)) bits = math.ceil(math.log(10.**-exp,2)) scale = 2.**bits return numpy.around(scale*data)/scale [1] http://www.pytables.org/docs/manual/ch05.html#compressionIssues [2] http://www.pytables.org/docs/manual/ch05.html#ShufflingOptim Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion