On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen <[email protected]> wrote:
> On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <[email protected]> wrote:
>> Hey everyone,
>>
>> I have timeseries data in which the column label is simply a filename from
>> which the original data was taken. Here's some sample data:
>>
>> name1.txt name2.txt name3.txt
>> 32 34 953
>> 32 03 402
>>
>> I've noticed that the standard genfromtxt() method works great; however, the
>> names aren't written correctly. That is, if I use the command:
>>
>> print data['name1.txt']
>>
>> Nothing happens.
>>
>> However, when I remove the file extension, Eg:
>>
>> name1 name2 name3
>> 32 34 953
>> 32 03 402
>>
>> Then print data['name1'] return (32, 32) as expected. It seems that the
>> period in the name isn't compatible with the genfromtxt() names attribute.
>> Is there a workaround, or do I need to restructure my program to get the
>> extension removed? I'd rather not do this if possible for reasons that
>> aren't important for the discussion at hand.
>
> It looks like the period is just getting stripped out of the names:
>
> In [1]: import numpy as N
>
> In [2]: N.genfromtxt('sample.txt', names=True)
> Out[2]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
>
> Interestingly, this still happens if you supply the names manually:
>
> In [17]: def reader(filename):
> ....: infile = open(filename, 'r')
> ....: names = infile.readline().split()
> ....: data = N.genfromtxt(infile, names=names)
> ....: infile.close()
> ....: return data
> ....:
>
> In [20]: data = reader('sample.txt')
>
> In [21]: data
> Out[21]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])
>
> What you can do is reset the names after genfromtxt is through with it,
> though:
>
> In [34]: def reader(filename):
> ....: infile = open(filename, 'r')
> ....: names = infile.readline().split()
> ....: infile.close()
> ....: data = N.genfromtxt(filename, names=True)
> ....: data.dtype.names = names
> ....: return data
> ....:
>
> In [35]: data = reader('sample.txt')
>
> In [36]: data
> Out[36]:
> array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
> dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])
>
> Be warned, I don't know why the period is getting stripped; there may
> be a good reason, and adding it in might cause problems.
I think the period is stripped because recarrays also offer attribute
access of names. So you wouldn't be able to do
your_array.sample.txt
All the names get passed through a name validator. IIRC it's something like
from numpy.lib import _iotools
validator = _iotools.NameValidator()
validator.validate('sample1.txt')
validator.validate('a name with spaces')
NameValidator has a good docstring and the gist of this should be in
the genfromtxt docs, if it's not already.
Skipper
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion