Hi Chao, by mistake did not reply to the list last time...
On 27.06.2011, at 10:30PM, Chao YUE wrote: Hi Derek! > > I tried with the lastest version of python(x,y) package with numpy version of > 1.6.0. I gave the data to you with reduced columns (10 column) and rows. > > b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,usecols=tuple(range(10)),dtype=['S10'] > + [ float for n in range(9)]) works. > if you change usecols=tuple(range(10)) to usecols=range(10), it still works. > > b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=None) > works. > > but > b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=['S10'] > + [ float for n in range(9)]) didn't work. > > I use Python(x,y)-2.6.6.1 with numpy version as 1.6.0, I use windows 32-bit > system. > > Please don't spend too much time on this if it's not a potential problem. > OK, dtype=None works on 1.6.0, that's the important bit. >From your example file it seems the dtype list does work not without >specifying usecols, because your header contains and excess semicolon in the >field "Air temperature (High; HMP45C)", thus genfromtxt expects more data >columns than actually exist. If you replace the semicolon you should be set >(or, if I may suggest, write another header line with catchier field names so >you don't have to work with array fields like "b['Water vapor density by LiCor >7500']" ;-). Otherwise both options work for me with python2.6+numpy-1.5.1 as well as 1.6.0/1.6.1rc1. I am curious though why your python interpreter gave this error message: > ValueError Traceback (most recent call last) > > D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in <module>() > > C:\Python26\lib\site-packages\numpy\lib\npyio.pyc in genfromtxt(fname, dtype, > co > mments, delimiter, skiprows, skip_header, skip_footer, converters, missing, > miss > ing_values, filling_values, usecols, names, excludelist, deletechars, > replace_sp > ace, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, > invalid_rais > e) > 1449 # Raise an exception ? > > 1450 if invalid_raise: > -> 1451 raise ValueError(errmsg) > 1452 # Issue a warning ? > > 1453 else: > > ValueError since ipython2.6 on my Mac reported this: ... 1450 if invalid_raise: -> 1451 raise ValueError(errmsg) 1452 # Issue a warning ? 1453 else: ValueError: Some errors were detected ! Line #3 (got 10 columns instead of 11) Line #4 (got 10 columns instead of 11) etc.... which of course provided the right lead to the problem - was the actual errmsg really missing, or did you cut the message too soon? > the final thing is, when I try to do this (I want to try the missing_values > in numpy 1.6.0), it gives error: > > In [33]: import StringIO as StringIO > > In [34]: data = "1, 2, 3\n4, 5, 6" > > In [35]: np.genfromtxt(StringIO(data), > delimiter=",",dtype="int,int,int",missing_values=2) > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > > D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in <module>() > > TypeError: 'module' object is not callable > You want to use "from StringIO import StringIO" (or write "StringIO.StringIO(data)". But again, this will not work the way you expect it to with int/float numbers set as missing_values, and reading to regular arrays. I've tested this on 1.6.1 and the current development branch as well, and the missing_values are only considered for masked arrays. This is not likely to change soon, and may actually be intentional, so to process those numbers on read-in, your best option would be to define a custom set of "converters=conv" as shown in my last mail. Cheers, Derek > 2011/6/27 Derek Homeier <de...@astro.physik.uni-goettingen.de> > Hi Chao, > > this seems to have become quite a number of different issues! > But let's make sure I understand what's going on... > > > Thanks very much for your quick reply. I make a short summary of what I've > > tried. Actually the ['S10'] + [ float for n in range(48) ] only works when > > you explicitly specify the columns to be read, and genfromtxt cannot > > automatically determine the type if you don't specify the type.... > > > > > In [164]: > > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10'] > > + [ float for n in range(48)]) > ... > > But if I use the following, it gives error: > > > > In [171]: > > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S > > 10'] + [ float for n in range(48)]) > > --------------------------------------------------------------------------- > > ValueError Traceback (most recent call last) > > > And the above (without the usecols) did work if you explicitly typed > dtype=('S10', float, float....)? That by itself would be quite weird, because > the two should be completely equivalent. > What happens if you cast the generated list to a tuple - dtype=tuple(['S10'] > + [ float for n in range(48)])? > If you are using a recent numpy version (1.6.0 or 1.6.1rc1), could you please > file a bug report with complete machine info etc.? But I suspect this might > be an older version, you should also be able to simply use > 'usecols=range(49)' (without the tuple()). Either way, I cannot reproduce > this behaviour with the current numpy version. > > > If I don't specify the dtype, it will not recognize the type of the first > > column (it displays as nan): > > > > In [172]: > > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2)) > > > > In [173]: b > > Out[173]: > > array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997), > > (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0), > > (nan, -999.0, -999.0), (nan, -999.0, -999.0)], > > dtype=[('TIMESTAMP', '<f8'), ('CO2_flux', '<f8'), ('Net_radiation', > > '<f8') > > ]) > > > You _do_ have to specify 'dtype=None', since the default is 'dtype=float', as > I have remarked in my previous mail. If this does not work, it could be a > matter of the numpy version gain - there were a number of type conversion > issues fixed between 1.5.1 and 1.6.0. > > > > Then the final question is, actually the '-999.0' in the data is missing > > value, but I cannot display it as 'nan' by specifying the missing_values as > > '-999.0': > > but either I set the missing_values as -999.0 or using a dictionary, it > > neither work... > ... > > > > Even this doesn't work (suppose 2 is our missing_value), > > In [184]: data = "1, 2, 3\n4, 5, 6" > > > > In [185]: np.genfromtxt(StringIO(data), > > delimiter=",",dtype="int,int,int",missin > > g_values=2) > > Out[185]: > > array([(1, 2, 3), (4, 5, 6)], > > dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')]) > > OK, same behaviour here - I found the only tests involving 'valid numbers' as > missing_values use masked arrays; for regular ndarrays they seem to be > ignored. I don't know if this is by design - the question is, what do you > need to do with the data if you know ' -999' always means a missing value? > You could certainly manipulate them after reading in... > If you have to convert them already on reading in, and using np.mafromtxt is > not an option, your best bet may be to define a custom converter like (note > you have to include any blanks, if present) > > conv = dict(((n, lambda s: s==' -999' and np.nan or float(s)) for n in > range(1,49))) > > Cheers, > Derek > > > > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16 > ************************************************************************************ > > <99Burn2003all_new.csv> _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion