[Python-Dev] datetime nanosecond support

2012-07-24 Thread Vincenzo Ampolo
Hi all,

This is the first time I write to this list so thank you for considering
this message (if you will) :)

I know that this has been debated many times but until now there was no
a real use case. If you look on google about "python datetime
nanosecond" you can find more than 141k answer about that. They all say
that "you can't due to hardware imprecisions" or "you don't need it"
even if there is a good amount of people looking for this feature.

But let me explain my use case:

most OSes let users capture network packets (using tools like tcpdump or
wireshark) and store them using file formats like pcap or pcap-ng. These
formats include a timestamp for each of the captured packets, and this
timestamp usually has nanosecond precision. The reason is that on
gigabit and 10 gigabit networks the frame rate is so high that
microsecond precision is not enough to tell two frames apart.
pcap (and now pcap-ng) are extremely popular file formats, with millions
of files stored around the world. Support for nanoseconds in datetime
would make it possible to properly parse these files inside python to
compute precise statistics, for example network delays or round trip times.

More about this issue at http://bugs.python.org/issue15443

I completely agree with the YAGNI principle that seems to have driven
decisions in this area until now but It is the case to reconsider it
since this real use case has shown up?

Thank you for your attention

Best Regards,
-- 
Vincenzo Ampolo
http://vincenzo-ampolo.net
http://goshawknest.wordpress.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] datetime nanosecond support

2012-07-24 Thread Vincenzo Ampolo
On 07/24/2012 06:46 PM, Guido van Rossum wrote:
> 
> You're welcome.

Hi Guido,

I'm glad you spent your time reading my mail. I would have never
imagined that my mail could come to your attention.

> Have you read PEP 410 and my rejection of it
> (http://mail.python.org/pipermail/python-dev/2012-February/116837.html)?
> Even though that's about using Decimal for timestamps, it could still
> be considered related.

I've read it and point 5 is very like in this issue. You said:

"[...]
I see only one real use case for nanosecond precision: faithful
copying of the mtime/atime recorded by filesystems, in cases where the
filesystem (like e.g. ext4) records these times with nanosecond
precision. Even if such timestamps can't be trusted to be accurate,
converting them to floats and back loses precision, and verification
using tools not written in Python will flag the difference. But for
this specific use case a much simpler set of API changes will suffice;
only os.stat() and os.utime() need to change slightly (and variants of
os.stat() like os.fstat()).
[...]"

I think that's based on a wrong hypothesis: just one case -> let's
handle in a different way (modifying os.stat() and os.utime()).
I would say: It's not just one case, there are at lest other two
scenarios. One is like mine, parsing network packets, the other one is
in parsing stock trading data.
But in this case there is no os.stat() or os.utime() that can be
modified. I've to write my own class to deal with time and loose all the
power and flexibility that the datetime module adds to the python language.

> Not every use case deserves an API change. :-)
> 
> First you will have to show how you'd have to code this *without*
> nanosecond precision in datetime and how tedious that is. (I expect
> that representing the timestamp as a long integer expressing a posix
> timestamp times a billion would be very reasonable.)

Yeah that's exactly how we built our Time class to handle this, and we
wrote also a Duration class to represent timedelta.
The code we developed is 383 python lines long but is not comparable
with all the functionalities that the datetime module offers and it's
also really slow compared to native datetime module which is written in C.

As you may think using that approach in a web application is very
limiting since there is no strftime() in this custom class.

I cannot share the code right now since It's copyrighted by the company
I work for but I've asked permission to do so. I just need to wait
tomorrow morning (PDT time) so they approve my request. Looking at the
code you can see how tedious is to try to remake all the conversions
that are already implemented on the datetime module.
Just let me know if you actually want to have a look at the code.

> 
> I didn't read the entire bug, but it mentioned something about storing
> datetimes in databases. Do databases support nanosecond precision?
> 

Yeah. According to
http://wiki.ispirer.com/sqlways/postgresql/data-types/timestamp at least
Oracle support timestamps with nanoseconds accuracy, SQL server supports
100 nanosecond accuracy.
Since I use Postgresql personally the best way to accomplish it (also
suggested by the #postgresql on freenode) is to store the timestamp with
nanosecond (like 1343158283.880338907242) as bigint and let the ORM (so
a python ORM) do all the conversion job.
An yet again, having nanosecond support in datetime would make things
much more easy.

While writing this mail Chris Lambacher answered with more data about
nanosecond support on databases

Best Regards,
-- 
Vincenzo Ampolo
http://vincenzo-ampolo.net
http://goshawknest.wordpress.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] datetime nanosecond support

2012-07-25 Thread Vincenzo Ampolo
On 07/24/2012 08:47 PM, Guido van Rossum wrote:
>
> So what functionality specifically do you require? You speak in
> generalities but I need specifics.

The ability of thinking datetime.datetime as a flexible class that can
give you the representation you need when you need. To be more specific
think about this case:

User selects year, month, day, hour, minute, millisecond, nanosecond of
a network event from a browser the javascript code does a ajax call with
time of this format (variant of iso8601):
-MM-DDTHH:MM:SS.mmnnn (where nnn is the nanosecond representation).
The python server takes that string, converts to a datetime, does all
the math with its data and gives the output back using labeling data
with int(nano_datetime.strftime('MMSSmmnnn')) so I've a sequence
number that javascript can sort and handle easily.
It's this flexibility of conversion I'm talking about.

>
>> As you may think using that approach in a web application is very
>> limiting since there is no strftime() in this custom class.
>
> Apparently you didn't need it? :-) Web frameworks usually have their
> own date/time formatting anyway.

Which is usually derived from python's datetime, like in web2py (
http://web2py.com/books/default/chapter/29/6#Record-representation ) in
which timestamps are real python datetime objects and It's ORM
responsability to find the right representation of that data at database
level.
This lead, as you know, to one of the main advantages of any ORM:
abstract from the database layer and the SQL syntax.

The same applies for another well known framework, Django ( your
personal favorite :) ) in which DateTimeField (
https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.DateTimeField
) is a date and time
represented in Python by a datetime.datetime instance.

We didn't need to build a webapp yet. I've been hired for it :) So I'll
do very soon. Unluckly if datetime does not support nanoseconds, I
cannot blame any ORM for not supporting it natively.

>
>>> I didn't read the entire bug, but it mentioned something about storing
>>> datetimes in databases. Do databases support nanosecond precision?
>>>
>>
>> Yeah. According to
>> http://wiki.ispirer.com/sqlways/postgresql/data-types/timestamp at least
>> Oracle support timestamps with nanoseconds accuracy, SQL server supports
>> 100 nanosecond accuracy.
>> Since I use Postgresql personally the best way to accomplish it (also
>> suggested by the #postgresql on freenode) is to store the timestamp with
>> nanosecond (like 1343158283.880338907242) as bigint and let the ORM (so
>> a python ORM) do all the conversion job.
>> An yet again, having nanosecond support in datetime would make things
>> much more easy.
>
> How so, given that the database you use doesn't support it?

Wasn't the job of an ORM to abstract from actual database (either
relational or not relational) such that people who use the ORM do not
care about how data is represented behind it?

If yes It's job of the ORM to figure out what's the best representation
of a data on the given relational or non relational database.

>
> TBH, I think that adding nanosecond precision to the datetime type is
> not unthinkable. You'll have to come up with some clever backward
> compatibility in the API though, and that will probably be a bit ugly
> (you'd have a microsecond parameter with a range of 0-100 and a
> nanosecond parameter with a range of 0-1000). Also the space it takes
> in memory would probably increase (there's no room for an extra 10
> bits in the carefully arranged 8-byte internal representation).

Sure, that are all open issues but as soon as you are in favour of
adding nanosecond support we can start addressing them. I'm sure there
would be other people here that would like to participate to those
issues too.

>
> But let me be clear -- are you willing to help implement any of this?
> You can't just order a feature, you know...
>

Of course, as I wrote in my second message in the issue (
http://bugs.python.org/issue15443#msg166333 ) I'm ready and excited to
contribute to the python core if I can.

Best Regards,
-- 
Vincenzo Ampolo
http://vincenzo-ampolo.net
http://goshawknest.wordpress.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] datetime nanosecond support. list of proposals

2012-07-28 Thread Vincenzo Ampolo
Hi all again,

I've been quite busy these days and I collected all the suggestions
about the proposal. Here is a small summary:

Christian Heimes:
two numbers:
Julian Day Number (Rata Die) 32 bit signed integer
nanoseconds in a day 64 bit signed or unsigned integer
pro:
fix 2038 bug
cons:
hard conversion to Gregorian calendar

Charles Cazabon:
use tai64/tai64n/tai64na
pro:
well defined
libraries available
cons:
?

As ways to implement the idea there are these advices:

Nick Coghlan:
define common API based on datetime
maybe use TAI
fork the pure Python version of datetime, then fork the C
implementation to make PyPI version faster, then make a PEP

Guido van Rossum:
must do:
clever backward compatibility
use fewer bits as possible
stdlib is not the right place for first implementation

Since I'm not a big expert of calendars and date representation I'm
going to study the Julian Calendar and the TAI representation.
As a first read from Wikipedia the TAI solution looks very promising.

For the ways to implement the idea I also believe that It's better to
have a pure python implementation (so It can be used on python 2.x and
distributed on PyPI) and then a Python 3.x C implementation and a PEP
submission.

I'm open to any other idea/advice. If there are other people that would
like to implement this with me, just write me a mail.

Thank you.

Best Regards,
-- 
Vincenzo Ampolo
http://vincenzo-ampolo.net
http://goshawknest.wordpress.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com