Re: [Numpy-discussion] data type specification when using numpy.genfromtxt

Derek Homeier Mon, 27 Jun 2011 17:14:10 -0700

Hi Chao,

by mistake did not reply to the list last time...


On 27.06.2011, at 10:30PM, Chao YUE wrote:
Hi Derek!
> 
> I tried with the lastest version of python(x,y) package with numpy version of 
> 1.6.0. I gave the data to you with reduced columns (10 column) and rows.
> 
> b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,usecols=tuple(range(10)),dtype=['S10']
>  + [ float for n in range(9)]) works.
> if you change  usecols=tuple(range(10))  to usecols=range(10), it still works.
> 
> b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=None) 
> works.
> 
> but 
> b=np.genfromtxt('99Burn2003all_new.csv',delimiter=';',names=True,dtype=['S10']
>  + [ float for n in range(9)]) didn't work. 
> 
> I use Python(x,y)-2.6.6.1 with numpy version as 1.6.0, I use windows 32-bit 
> system.
> 
> Please don't spend too much time on this if it's not a potential problem.
> 
OK, dtype=None works on 1.6.0, that's the important bit. 
>From your example file it seems the dtype list does work not without 
>specifying usecols, because your header contains and excess semicolon in the 
>field "Air temperature (High; HMP45C)", thus genfromtxt expects more data 
>columns than actually exist. If you replace the semicolon you should be set 
>(or, if I may suggest, write another header line with catchier field names so 
>you don't have to work with array fields like "b['Water vapor density by LiCor 
>7500']"  ;-). 
Otherwise both options work for me with python2.6+numpy-1.5.1 as well as 
1.6.0/1.6.1rc1. 

I am curious though why your python interpreter gave this error message: 
> ValueError                                Traceback (most recent call last)
> 
> D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in <module>()
> 
> C:\Python26\lib\site-packages\numpy\lib\npyio.pyc in genfromtxt(fname, dtype, 
> co
> mments, delimiter, skiprows, skip_header, skip_footer, converters, missing, 
> miss
> ing_values, filling_values, usecols, names, excludelist, deletechars, 
> replace_sp
> ace, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, 
> invalid_rais
> e)
>    1449             # Raise an exception ?
> 
>    1450             if invalid_raise:
> -> 1451                 raise ValueError(errmsg)
>    1452             # Issue a warning ?
> 
>    1453             else:
> 
> ValueError

since ipython2.6 on my Mac reported this:
...
   1450             if invalid_raise:
-> 1451                 raise ValueError(errmsg)
   1452             # Issue a warning ?

   1453             else:

ValueError: Some errors were detected !
    Line #3 (got 10 columns instead of 11)
    Line #4 (got 10 columns instead of 11)
etc....
which of course provided the right lead to the problem - was the actual errmsg 
really missing, or did you cut the message too soon?

> the final thing is, when I try to do this (I want to try the missing_values 
> in numpy 1.6.0), it gives error:  
> 
> In [33]: import StringIO as StringIO
> 
> In [34]: data = "1, 2, 3\n4, 5, 6"
> 
> In [35]: np.genfromtxt(StringIO(data), 
> delimiter=",",dtype="int,int,int",missing_values=2)
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> 
> D:\data\LaThuile_ancillary\Jim_Randerson_data\<ipython console> in <module>()
> 
> TypeError: 'module' object is not callable
> 
You want to use "from StringIO import StringIO" (or write 
"StringIO.StringIO(data)". 
But again, this will not work the way you expect it to with int/float numbers 
set as missing_values, and reading to regular arrays. I've tested this on 1.6.1 
and the current development branch as well, and the missing_values are only 
considered for masked arrays. This is not likely to change soon, and may 
actually be intentional, so to process those numbers on read-in, your best 
option would be to define a custom set of "converters=conv" as shown in my last 
mail.

Cheers,
                                                        Derek

> 2011/6/27 Derek Homeier <de...@astro.physik.uni-goettingen.de>
> Hi Chao,
> 
> this seems to have become quite a number of different issues!
> But let's make sure I understand what's going on...
> 
> > Thanks very much for your quick reply. I make a short summary of what I've 
> > tried. Actually the ['S10'] + [ float for n in range(48) ] only works when 
> > you explicitly specify the columns to be read, and genfromtxt cannot 
> > automatically determine the type if you don't specify the type....
> >
> 
> > In [164]: 
> > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=tuple(range(49)),dtype=['S10']
> >  + [ float for n in range(48)])
> ...
> > But if I use the following, it gives error:
> >
> > In [171]: 
> > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,dtype=['S
> > 10'] + [ float for n in range(48)])
> > ---------------------------------------------------------------------------
> > ValueError                                Traceback (most recent call last)
> >
> And the above (without the usecols) did work if you explicitly typed 
> dtype=('S10', float, float....)? That by itself would be quite weird, because 
> the two should be completely equivalent.
> What happens if you cast the generated list to a tuple - dtype=tuple(['S10'] 
> + [ float for n in range(48)])?
> If you are using a recent numpy version (1.6.0 or 1.6.1rc1), could you please 
> file a bug report with complete machine info etc.? But I suspect this might 
> be an older version, you should also be able to simply use 
> 'usecols=range(49)' (without the tuple()). Either way, I cannot reproduce 
> this behaviour with the current numpy version.
> 
> > If I don't specify the dtype, it will not recognize the type of the first 
> > column (it displays as nan):
> >
> > In [172]: 
> > b=np.genfromtxt('99Burn2003all.csv',delimiter=';',names=True,usecols=(0,1,2))
> >
> > In [173]: b
> > Out[173]:
> > array([(nan, -999.0, -1.028), (nan, -999.0, -0.40899999999999997),
> >        (nan, -999.0, 0.16700000000000001), ..., (nan, -999.0, -999.0),
> >        (nan, -999.0, -999.0), (nan, -999.0, -999.0)],
> >       dtype=[('TIMESTAMP', '<f8'), ('CO2_flux', '<f8'), ('Net_radiation', 
> > '<f8')
> > ])
> >
> You _do_ have to specify 'dtype=None', since the default is 'dtype=float', as 
> I have remarked in my previous mail. If this does not work, it could be a 
> matter of the numpy version gain - there were a number of type conversion 
> issues fixed between 1.5.1 and 1.6.0.
> >
> > Then the final question is, actually the '-999.0' in the data is missing 
> > value, but I cannot display it as 'nan' by specifying the missing_values as 
> > '-999.0':
> > but either I set the missing_values as -999.0 or using a dictionary, it 
> > neither work...
> ...
> >
> > Even this doesn't work (suppose 2 is our missing_value),
> > In [184]: data = "1, 2, 3\n4, 5, 6"
> >
> > In [185]: np.genfromtxt(StringIO(data), 
> > delimiter=",",dtype="int,int,int",missin
> > g_values=2)
> > Out[185]:
> > array([(1, 2, 3), (4, 5, 6)],
> >       dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
> 
> OK, same behaviour here - I found the only tests involving 'valid numbers' as 
> missing_values use masked arrays; for regular ndarrays they seem to be 
> ignored. I don't know if this is by design - the question is, what do you 
> need to do with the data if you know ' -999' always means a missing value? 
> You could certainly manipulate them after reading in...
> If you have to convert them already on reading in, and using np.mafromtxt is 
> not an option, your best bet may be to define a custom converter like (note 
> you have to include any blanks, if present)
> 
> conv = dict(((n, lambda s: s==' -999' and np.nan or float(s)) for n in 
> range(1,49)))
> 
> Cheers,
>                                                Derek
> 
> 
> 
> 
> -- 
> ***********************************************************************************
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16
> ************************************************************************************
> 
> <99Burn2003all_new.csv>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] data type specification when using numpy.genfromtxt

Reply via email to