New submission from Skip Montanaro:
I have a CSV file. Here are a few rows:
"2013-10-30 14:26:46.000528","1.36097023829"
"2013-10-30 14:26:46.999755","1.36097023829"
"2013-10-30 14:26:47.999308","1.36097023829"
"2013-10-30 14:26:49.002472","1.36097023829"
"2013-10-30 14:26:50","1.36097023829"
"2013-10-30 14:26:51.000549","1.36097023829"
"2013-10-30 14:26:51.999315","1.36097023829"
"2013-10-30 14:26:52.999703","1.36097023829"
"2013-10-30 14:26:53.999640","1.36097023829"
"2013-10-30 14:26:54.999139","1.36097023829"
I want to parse the strings in the first column as timestamps. I can, and often
do, use dateutil.parser.parse(), but in situations like this where all the
timestamps are of the same format, it can be incredibly slow. OTOH, there is no
single format I can pass to datetime.datetime.strptime() that will parse all
the above timestamps. Using "%Y-%m-%d %H:%M:%S" I get errors about the leftover
microseconds. Using "%Y-%m-%d %H:%M:%S".%f" I get errors when I try to parse a
timestamp which doesn't have microseconds.
Alas, it is datetime itself which is to blame for this problem. The above
timestamps were all printed from an earlier Python program which just dumps the
str() of a datetime object to its output CSV file. Consider:
>>> dt = dateutil.parser.parse("2013-10-30 14:26:50")
>>> print dt
2013-10-30 14:26:50
>>> dt2 = dateutil.parser.parse("2013-10-30 14:26:51.000549")
>>> print dt2
2013-10-30 14:26:51.000549
The same holds for isoformat():
>>> print dt.isoformat()
2013-10-30T14:26:50
>>> print dt2.isoformat()
2013-10-30T14:26:51.000549
Whatever happened to "be strict in what you send, but generous in what you
receive"? If strptime() is going to complain the way it does, then str() should
always generate a full timestamp, including microseconds. The above is from a
Python 2.7 session, but I also confirmed that Python 3.3 behaves the same.
I've checked 2.7 and 3.3 in the Versions list, but I don't think it can be
fixed there. Can the __str__ and isoformat methods of datetime (and time)
objects be modified for 3.4 to always include the microseconds? Alternatively,
can the %S format character be modified to consume optional decimal point and
microseconds? I rate this as "easy" considering the easiest fix is to modify
__str__ and isoformat, which seems unchallenging.
----------
components: Extension Modules
keywords: easy
messages: 201917
nosy: skip.montanaro
priority: normal
severity: normal
status: open
title: Inconsistency between datetime's str()/isoformat() and its strptime()
method
type: behavior
versions: Python 2.7, Python 3.3, Python 3.4
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com