Re: [Numpy-discussion] Reading a big netcdf file

Jeff Whitaker Thu, 04 Aug 2011 08:53:11 -0700

On 8/4/11 4:46 AM, Kiko wrote:

Hi, all.


Thank you very much for your replies.

I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdflibraries:


In [4]: import netCDF4 as n4
In [5]: from scipy.io <http://scipy.io> import netcdf as nS
In [6]: import numpy as np
In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r')
In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r')

Now, if a do:

In [9]: z4 = gebco4.variables['z']

I got no problems and I have:

In [14]: type(z4); z4.shape; z4.size
Out[14]: <type 'netCDF4.Variable'>
Out[14]: (233312401,)
Out[14]: 233312401

But if I do:

In [15]: z4 = gebco4.variables['z'][:]
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>

File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__(netCDF4.c:22943)File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in_StartCountStride

    n = len(range(beg,end,inc))
MemoryError

I got a memory error.

Kiko: I think the difference may be that when you read the data withnetcdf4-python, it tries to unpack the short integers to a float32array, thereby using much more memory (more than you have available).scipy.io.netcdf is just returning you a numpy array of short integers.I bet if you do


gebco4.set_automaskandscale(False)

before reading the data from the getco4 variable, it will work, sincethis turns off the auto conversion to float32.

You'll have to do the conversion manually then, at which point you willmay run out of memory anyway.

But if a select a smaller array I've got:

In [16]: z4 = gebco4.variables['z'][:10000000]
In [17]: type(z4); z4.shape; z4.size
Out[17]: <type 'numpy.ndarray'>
Out[17]: (10000000,)
Out[17]: 10000000

What's the difference between z4 as a netCDF4.Variable and as anumpy.ndarray?

the netcdf variable object just refers to the data in the file - onlywhen you slice the object is the data read in and converted to a numpyarray.


-Jeff

Now, if I use scipy.io.netcdf:

In [18]: zS = gebcoS.variables['z']
In [20]: type(zS); zS.shape
Out[20]: <class 'scipy.io.netcdf.netcdf_variable'>
Out[20]: (233312401,)

In [21]: zS = gebcoS.variables['z'][:]
In [22]: type(zS); zS.shape
Out[22]: <type 'numpy.ndarray'>
Out[22]: (233312401,)
What's the difference between zS as a scipy.io.netcdf.netcdf_variableand as a numpy.ndarray?
Why with scipy.io.netcdf I do not have a MemoryError?
Finally, if I do the following (maybe it's a silly thing do this)using Eric suggestions to clear the cache:
In [32]: zS = gebcoS.variables['z']
In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000out of 233.312.401 because I've got a MemoryError
1 loops, best of 1: 73.1 s per loop
(If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000],copy=True), I get a MemoryError and I have to set the size to50.000.000 but it's quite fast).
Than you very much for your replies and excuse me if some questionsare very basic.
Best regards.

***********************************************************************
The results of ncdump -h
netcdf GridOne {
dimensions:
        side = 2 ;
        xysize = 233312401 ;
variables:
        double x_range(side) ;
                x_range:units = "user_x_unit" ;
        double y_range(side) ;
                y_range:units = "user_y_unit" ;
        short z_range(side) ;
                z_range:units = "user_z_unit" ;
        double spacing(side) ;
        short dimension(side) ;
        short z(xysize) ;
                z:scale_factor = 1. ;
                z:add_offset = 0. ;
                z:node_offset = 0 ;

// global attributes:
                :title = "GEBCO One Minute Grid" ;
                :source = "1.02" ;
}
The file is publicly available from:http://www.gebco.net/data_and_products/gridded_bathymetry_data/
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Reading a big netcdf file

Reply via email to