Re: Python read text file columnwise

2019-01-12 Thread Peter Otten
[email protected] wrote:

> Hello
>> 
>> I'm very new in python. I have a file in the format:
>> 
>> 2018-05-31   16:00:0028.90   81.77   4.3
>> 2018-05-31   20:32:0028.17   84.89   4.1
>> 2018-06-20   04:09:0027.36   88.01   4.8
>> 2018-06-20   04:15:0027.31   87.09   4.7
>> 2018-06-28   04.07:0027.87   84.91   5.0
>> 2018-06-29   00.42:0032.20   104.61  4.8
> 
> I would like to read this file in python column-wise.
> 
> I tried this way but not working 
>   event_list = open('seismicity_R023E.txt',"r")
> info_event = read(event_list,'%s %s %f %f %f %f\n');

There is actually a library that implements a C-like scanf. You can install 
it with

$ pip install scanf

After that:

$ cat read_table.py
from scanf import scanf

with open("seismicity_R023E.txt") as f:
for line in f:
print(
scanf("%s %s %f %f %f\n", line)
)
$ cat seismicity_R023E.txt 
2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8
$ python read_table.py 
('2018-05-31', '16:00:00', 28.9, 81.77, 4.3)
('2018-05-31', '20:32:00', 28.17, 84.89, 4.1)
('2018-06-20', '04:09:00', 27.36, 88.01, 4.8)
('2018-06-20', '04:15:00', 27.31, 87.09, 4.7)
('2018-06-28', '04.07:00', 27.87, 84.91, 5.0)
('2018-06-29', '00.42:00', 32.2, 104.61, 4.8)
$

However, in the long term you may be better off with a tool like pandas:

>>> import pandas
>>> pandas.read_table(
... "seismicity_R023E.txt", sep=r"\s+",
... names=["date", "time", "foo", "bar", "baz"],
... parse_dates=[["date", "time"]]
... )
date_timefoo bar  baz
0 2018-05-31 16:00:00  28.90   81.77  4.3
1 2018-05-31 20:32:00  28.17   84.89  4.1
2 2018-06-20 04:09:00  27.36   88.01  4.8
3 2018-06-20 04:15:00  27.31   87.09  4.7
4 2018-06-28 04:00:00  27.87   84.91  5.0
5 2018-06-29 00:00:00  32.20  104.61  4.8

[6 rows x 4 columns]
>>>

It will be harder in the beginning, but if you work with tabular data 
regularly it will pay off.

-- 
https://mail.python.org/mailman/listinfo/python-list


Silent data corruption in pandas, was Re: Python read text file columnwise

2019-01-12 Thread Peter Otten
Peter Otten wrote:

> [email protected] wrote:
> 
>> Hello
>>> 
>>> I'm very new in python. I have a file in the format:
>>> 
>>> 2018-05-31  16:00:0028.90   81.77   4.3
>>> 2018-05-31  20:32:0028.17   84.89   4.1
>>> 2018-06-20  04:09:0027.36   88.01   4.8
>>> 2018-06-20  04:15:0027.31   87.09   4.7
>>> 2018-06-28  04.07:0027.87   84.91   5.0
>>> 2018-06-29  00.42:0032.20   104.61  4.8
>> 
>> I would like to read this file in python column-wise.

> However, in the long term you may be better off with a tool like pandas:
> 
 import pandas
 pandas.read_table(
> ... "seismicity_R023E.txt", sep=r"\s+",
> ... names=["date", "time", "foo", "bar", "baz"],
> ... parse_dates=[["date", "time"]]
> ... )
> date_timefoo bar  baz
> 0 2018-05-31 16:00:00  28.90   81.77  4.3
> 1 2018-05-31 20:32:00  28.17   84.89  4.1
> 2 2018-06-20 04:09:00  27.36   88.01  4.8
> 3 2018-06-20 04:15:00  27.31   87.09  4.7
> 4 2018-06-28 04:00:00  27.87   84.91  5.0
> 5 2018-06-29 00:00:00  32.20  104.61  4.8
> 
> [6 rows x 4 columns]

> 
> It will be harder in the beginning, but if you work with tabular data
> regularly it will pay off.

After posting the above I noted that the malformed time in the last two rows 
was silently botched. So I just spent an insane amount of time to try and 
fix this from within pandas:

import datetime

import numpy
import pandas


def parse_datetime(dt):
return datetime.datetime.strptime(
dt.replace(".", ":"), "%Y-%m-%d %H:%M:%S"
)


def date_parser(dates, times):
return numpy.array([
parse_datetime(date + " " + time)
for date, time in zip(dates, times)
])

 
df = pandas.read_table(
"seismicity_R023E.txt", sep=r"\s+",
names=["date", "time", "foo", "bar", "baz"],
parse_dates=[["date", "time"]], date_parser=date_parser
)


print(df)

There's probably a better way as I am only a determined amateur...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Silent data corruption in pandas

2019-01-12 Thread Peter Otten
Peter Otten wrote:

[Practising the bad habit of public soliloquy]

> def parse_datetime(dt):
> return datetime.datetime.strptime(
> dt.replace(".", ":"), "%Y-%m-%d %H:%M:%S"
> )
> 
> 
> def date_parser(dates, times):
> return numpy.array([
> parse_datetime(date + " " + time)
> for date, time in zip(dates, times)
> ])

This can be rewritten:

@numpy.vectorize
def date_parser(date, time):
return datetime.datetime.strptime(
date + " " + time.replace(".", ":"),
"%Y-%m-%d %H:%M:%S"
)


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-12 Thread DL Neil

On 12/01/19 1:03 PM, Piet van Oostrum wrote:

[email protected] writes:


Hello


I'm very new in python. I have a file in the format:

2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8


I would like to read this file in python column-wise.

I tried this way but not working 
   event_list = open('seismicity_R023E.txt',"r")
 info_event = read(event_list,'%s %s %f %f %f %f\n');



To the OP:

Python's standard I/O is based around data "streams". Whilst there is a 
concept of "lines" and thus an end-of-line character, there is not the 
idea of a record, in the sense of fixed-length fields and thus a 
defining and distinction between data items based upon position.


Accordingly, whilst the formatting specification of strings and floats 
might work for output, there is no equivalent for accepting input data. 
Please re-read refs on file, read, readline, etc.




Why would you think that this would work?


To the PO:

Because in languages/libraries built around fixed-length files this is 
how one specifies the composition of fields making up a record - a data 
structure which dates back to FORTRAN and Assembler on mainframes and 
other magtape-era machines.


Whilst fixed-length records/files are, by definition, less flexible than 
the more free-form data input Python accepts, they are more efficient 
and faster in situations where the data (format) is entirely consistent 
- such as the OP is describing!



--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


RE: Python read text file columnwise

2019-01-12 Thread Avi Gross



-Original Message-
From: Avi Gross  
Sent: Saturday, January 12, 2019 8:26 PM
To: 'DL Neil' 
Subject: RE: Python read text file columnwise

I am not sure what the big deal is here. If the data is consistently
formatted you can read in a string per line and use offsets as in line[0:8]
and so on then call the right transformations to comvert them to dates and
so on. If it is delimited by something consistent like spaces or table or
commas, we have all kinds of solutions ranging from splitting the line on
the delimiter to using the kind of functionality that reads in such files
into a pandas DataFrame.

In the latter case, you get the columns already. In the former, there are
well known ways to extract the info such as:

[row[0] for row in listofrows]

And repeat for additional items.

Or am I missing something and there is no end of line and you need to read
in the entire file and split it into size N chunks first? Still fairly
straightforward.

-Original Message-
From: Python-list  On
Behalf Of DL Neil
Sent: Saturday, January 12, 2019 4:48 PM
To: [email protected]
Subject: Re: Python read text file columnwise

On 12/01/19 1:03 PM, Piet van Oostrum wrote:
> [email protected] writes:
> 
>> Hello
>>>
>>> I'm very new in python. I have a file in the format:
>>>
>>> 2018-05-31  16:00:0028.90   81.77   4.3
>>> 2018-05-31  20:32:0028.17   84.89   4.1
>>> 2018-06-20  04:09:0027.36   88.01   4.8
>>> 2018-06-20  04:15:0027.31   87.09   4.7
>>> 2018-06-28  04.07:0027.87   84.91   5.0
>>> 2018-06-29  00.42:0032.20   104.61  4.8
>>
>> I would like to read this file in python column-wise.
>>
>> I tried this way but not working 
>>event_list = open('seismicity_R023E.txt',"r")
>>  info_event = read(event_list,'%s %s %f %f %f %f\n');


To the OP:

Python's standard I/O is based around data "streams". Whilst there is a
concept of "lines" and thus an end-of-line character, there is not the idea
of a record, in the sense of fixed-length fields and thus a defining and
distinction between data items based upon position.

Accordingly, whilst the formatting specification of strings and floats might
work for output, there is no equivalent for accepting input data. 
Please re-read refs on file, read, readline, etc.


> Why would you think that this would work?

To the PO:

Because in languages/libraries built around fixed-length files this is how
one specifies the composition of fields making up a record - a data
structure which dates back to FORTRAN and Assembler on mainframes and other
magtape-era machines.

Whilst fixed-length records/files are, by definition, less flexible than the
more free-form data input Python accepts, they are more efficient and faster
in situations where the data (format) is entirely consistent
- such as the OP is describing!


--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list