On 8/4/11 4:46 AM, Kiko wrote:
Hi, all.
Thank you very much for your replies.
I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf
libraries:
In [4]: import netCDF4 as n4
In [5]: from scipy.io <http://scipy.io> import netcdf as nS
In [6]: import numpy as np
In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r')
In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r')
Now, if a do:
In [9]: z4 = gebco4.variables['z']
I got no problems and I have:
In [14]: type(z4); z4.shape; z4.size
Out[14]: <type 'netCDF4.Variable'>
Out[14]: (233312401,)
Out[14]: 233312401
But if I do:
In [15]: z4 = gebco4.variables['z'][:]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__
(netCDF4.c:22943)
File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in
_StartCountStride
n = len(range(beg,end,inc))
MemoryError
I got a memory error.
Kiko: I think the difference may be that when you read the data with
netcdf4-python, it tries to unpack the short integers to a float32
array, thereby using much more memory (more than you have available).
scipy.io.netcdf is just returning you a numpy array of short integers.
I bet if you do
gebco4.set_automaskandscale(False)
before reading the data from the getco4 variable, it will work, since
this turns off the auto conversion to float32.
You'll have to do the conversion manually then, at which point you will
may run out of memory anyway.
But if a select a smaller array I've got:
In [16]: z4 = gebco4.variables['z'][:10000000]
In [17]: type(z4); z4.shape; z4.size
Out[17]: <type 'numpy.ndarray'>
Out[17]: (10000000,)
Out[17]: 10000000
What's the difference between z4 as a netCDF4.Variable and as a
numpy.ndarray?
the netcdf variable object just refers to the data in the file - only
when you slice the object is the data read in and converted to a numpy
array.
-Jeff
Now, if I use scipy.io.netcdf:
In [18]: zS = gebcoS.variables['z']
In [20]: type(zS); zS.shape
Out[20]: <class 'scipy.io.netcdf.netcdf_variable'>
Out[20]: (233312401,)
In [21]: zS = gebcoS.variables['z'][:]
In [22]: type(zS); zS.shape
Out[22]: <type 'numpy.ndarray'>
Out[22]: (233312401,)
What's the difference between zS as a scipy.io.netcdf.netcdf_variable
and as a numpy.ndarray?
Why with scipy.io.netcdf I do not have a MemoryError?
Finally, if I do the following (maybe it's a silly thing do this)
using Eric suggestions to clear the cache:
In [32]: zS = gebcoS.variables['z']
In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000
out of 233.312.401 because I've got a MemoryError
1 loops, best of 1: 73.1 s per loop
(If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000],
copy=True), I get a MemoryError and I have to set the size to
50.000.000 but it's quite fast).
Than you very much for your replies and excuse me if some questions
are very basic.
Best regards.
***********************************************************************
The results of ncdump -h
netcdf GridOne {
dimensions:
side = 2 ;
xysize = 233312401 ;
variables:
double x_range(side) ;
x_range:units = "user_x_unit" ;
double y_range(side) ;
y_range:units = "user_y_unit" ;
short z_range(side) ;
z_range:units = "user_z_unit" ;
double spacing(side) ;
short dimension(side) ;
short z(xysize) ;
z:scale_factor = 1. ;
z:add_offset = 0. ;
z:node_offset = 0 ;
// global attributes:
:title = "GEBCO One Minute Grid" ;
:source = "1.02" ;
}
The file is publicly available from:
http://www.gebco.net/data_and_products/gridded_bathymetry_data/
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion